Skip to main content

Async Python is a delivery decision before it is a performance decision

I keep coming back to how async Python changes what I can ship, not just how fast it runs. The new @app.vibe() decorator in FastAPI makes this concrete. It is not about squeezing out more RPS; it is about making certain delivery paths actually viable when they were not before.

Reading the release notes for 0.135.0, I noticed the addition of Server-Sent Events support alongside the vibe() decorator. That pairing is telling. SSE is a delivery pattern, not a performance hack. It lets you push updates to clients without polling. That changes what you can build—live dashboards, progress bars for long-running jobs, real-time notifications—without reaching for websockets or external message brokers. The decorator makes the code feel linear while the runtime does the concurrent work. That is the seam I care about.

The real problem is that we treat async as a performance switch. Flip it, get speed. The reality is messier. Async changes the shape of your system. It forces boundaries around I/O that sync code often lets blur. That blurring is where complexity hides and sprints slow down. I have seen teams ship a "fast" async service that then took three extra sprints to make debuggable in production. The benchmark looked good. The sprint did not.

Where teams usually get it wrong

The most common mistake is adding async because it is modern and documented. A benchmark looks good. A demo feels fast. Then you are in a sprint, and every new feature requires reasoning about event loops, context switches, and why your perfectly good sync library now deadlocks when you call it from a coroutine.

I have seen a PR where 'async' was the commit message. The unwritten follow-up was 'I do not know how to debug this anymore.' The cognitive load is real. Debugging async code is harder because the stack traces are broken. A traceback shows you the coroutine frame, not the actual request path that led there. You need asyncio.run() wrappers, you need task factories that capture context, you need to remember that await yields control and your assumptions about thread-local storage are wrong.

Tracing a request through concurrent tasks is harder. If you use OpenTelemetry, you have to propagate context through contextvars correctly, and not all libraries do. I once spent half a day tracking down why traces were missing spans: a library was using threading.local() instead of contextvars. That is a silent failure. The code runs, the metrics look fine, but your observability is lying to you.

Onboarding someone new to that flow is slower. They write a sync function, call an async helper, forget to await, and now you have a coroutine object sitting in memory doing nothing. The type checker catches it, but only if you have strict mypy settings, and many teams do not. Or they await a CPU-bound function that blocks the event loop for 30 seconds and suddenly no other requests can be processed. The failure mode is not obvious.

The tradeoff is not free. You pay in developer time and system complexity. If you are not shipping features that specifically need concurrent I/O—like handling many long-lived connections, streaming data, or parallel external calls—that cost is pure overhead. It is a silent slowdown disguised as an optimization.

A better working shape

What surprised me was that async's real value is structural. It makes the system's shape match its actual behavior. In sync code, you often pretend I/O is instant. You call a database, you call an API, you wait. The code looks linear, but the runtime is full of gaps where nothing happens. Your thread is blocked, doing zero work, waiting for a network packet.

Async forces you to make those gaps explicit. You await. You yield control. The code's structure mirrors the actual flow of execution. This clarity is useful. It creates natural edges around I/O boundaries. Those edges are where you can insert timeouts, cancellation, and better error handling. They are also where you can test more effectively, mocking at the boundary instead of patching deep inside.

This is similar to how AI systems need clear edges around model boundaries, which I wrote about before. You do not want your business logic tangled with tokenization. You want a clean seam. Async gives you that seam for I/O. The vibe() decorator is a perfect example. It abstracts the asyncio machinery so you can focus on the delivery logic. You write a generator function that yields updates, and the decorator handles the SSE protocol, the connection management, the ping/pong keep-alives. You get the delivery capability without the incidental complexity.

The FastAPI release notes show this direction. Version 0.135.1 fixed a bug with TaskGroup yield, a subtle async context manager issue. That fix is not about performance; it is about making the delivery model reliable. If you are using vibe() for streaming, you need those context managers to work correctly, or you leak resources. The framework is hardening the delivery path, not just adding features.

Testing becomes more explicit. In sync code, you might patch a function at the module level. In async code, you mock an async context manager or an async iterator. That forces you to think about the contract at the boundary. What does this I/O operation return? What errors can it raise? That clarity pays off when you are debugging a production issue at 2 AM.

What to watch in practice

The part I watch is migration pressure. Introducing async into an existing sync codebase is not a flip. It is a slow infection. One async function means its callers become async. Then their callers. You end up with a sync core and an async shell, or a messy hybrid where you are constantly blocking in async or awaiting in sync. Neither feels good.

I would rather start new services that need async as async. Keep the sync services sync. Let the boundaries be the network. That is clean. If you must migrate, migrate whole bounded contexts, not individual functions. Treat it like moving to a new database; it is a big change, not a small tweak.

But sometimes you cannot avoid the hybrid. In those cases, I have used a pattern where the sync core remains untouched, and an async adapter layer translates between them. The adapter uses asyncio.to_thread() for CPU-bound sync work and loop.run_in_executor() for blocking I/O. It is not elegant, but it isolates the async complexity. The key is to make the adapter as thin as possible and document its purpose clearly.

Another thing to watch is library support. Not every library is async-native. Using sync libraries in async code means you are blocking the event loop. You might gain nothing or even lose performance. Check if the libraries you actually use have async versions. Do not assume. I once added async to a service thinking I would get concurrency, only to discover the database driver was still sync and blocking the loop. The CPU usage went down, but latency went up because I had introduced context switching overhead without the benefit.

The useful part is that async makes certain bugs impossible. You cannot forget to await a coroutine; the type system and linters catch it. You cannot accidentally block the main thread in a way that stalls everything. These are real maintainability wins, but only if the team is bought into the model. If you have a mix of developers, some comfortable with async and some not, you will have code reviews where the async expert is constantly explaining why await matters. That is a tax.

Observability is harder but more critical. You need to instrument your async tasks. FastAPI's vibe() decorator likely adds context propagation, but you should verify. I add custom metrics for task count, await time, and event loop lag. If the loop lags, your async system is not keeping up, and you need to scale or optimize. Without those metrics, you are flying blind.

Closing heuristics

Here is what I use now:

Use async when you have high concurrency of I/O-bound work. Many websockets, many parallel external API calls, streaming request/response. If it is a typical CRUD app with a few sequential DB calls, sync is simpler. The threshold I use is around 100 concurrent connections doing meaningful work. Below that, the overhead is not worth it.

Do not use async for CPU-bound work. It does not help. Use threads or processes. Async does not make your math faster. If you have a CPU-bound task, offload it to a process pool. Trying to async-ify it just adds noise.

Start async services as async. Do not half-migrate a monolith. The friction is too high. If you have a monolith, keep it sync until you extract a service that needs async. Then build that service async from day one.

Measure developer time, not just RPS. If onboarding takes longer and bugs are harder to trace, you are paying a tax. Make sure the performance gain is worth it. I once measured this: an async service had 15% better RPS but took 40% longer to debug issues. The net cost was negative.

Use frameworks that get the seam right. FastAPI's vibe() decorator is a good example. It makes the async parts feel natural without forcing everything into the async model. It is a delivery tool, not a performance one.

Check your libraries before you commit. Make a list of the top 10 libraries you use. Check if they are async-native. If more than three are not, you are signing up for adapter code and potential blocking. That is a real cost.

Instrument the event loop. Add metrics for loop lag, task count, and await time. Alert on loop lag. If the loop is blocked for more than 100ms, something is wrong.

Document the async boundaries. In your architecture docs, draw a box around the async parts. Explain why they are async and what the failure modes are. That helps new developers understand the shape of the system.

Async Python is a delivery decision. It changes what you can build and how you maintain it. Choose it for the shape it gives your system, not the speed it promises in a benchmark. The code you ship tomorrow matters more than the demo you ran today.

Resources Worth Reading

  • 0.135.3 is worth opening because ### Features * ✨ Add support for `@app.
  • 0.135.2 is worth opening because ### Upgrades * ⬆️ Increase lower bound to `pydantic >=2.
  • 0.135.1 is worth opening because ### Fixes * 🐛 Fix, avoid yield from a TaskGroup, only as an async context manager, closed in the request async exit stack.
  • 0.135.0 is worth opening because ### Features * ✨ Add support for Server Sent Events.