“What Color is Your Function?” starts off by describing an imaginary language that perversely defines two types of functions: red and blue. The language enforces a set of seemingly arbitrary rules regarding how the two are allowed to interact:
- Every function has a color.
- The way you call a function depends on its color.
- You can only call a red function from within another red function.
- Red functions are more painful to call.
- Some core library functions are red.
Without knowing the details, a reasonable person would agree that the described language is not particularly well designed. Of course, readers of this article in 2021 will not find it hard to recognize the analogy with async: red functions are async functions, and blue functions are just ordinary functions. For example, #2 and #4 refers to the fact that calling an async function requires either explicit callback chaining or
await, whereas a sync function can just be called. #3 refers to the fact that
await constitutes a massive ergonomic improvement for calling async from async (#4), it does nothing to alleviate the split – you still cannot call async code from non-async code because
await requires async.
Function colors in Rust async
How does all this apply to Rust? Many people believe that it applies only in part, or not at all. Several objections have been raised:
- Rust async functions are in essence ordinary functions that happen to return values that implement the
async fnis just syntactic sugar for defining such a function, but you can make one yourself using an ordinary
fnas long as your function returns a type that implements
Future. Async functions, the argument goes, are functions that return
Future<Output = T>instead of
Tand as such are not “special” in any way, any more than functions that return a
Tare special – so rule #1 (“every function has a color”) doesn’t apply.
block_on()primitive that invokes an async function from a non-async context and blocks until the result is available – so rule #3 (“you can only call a red function from within another red function”) doesn’t apply.
- Rule #5 (“some core library functions are red”) doesn’t apply because Rust’s stdlib is sync-only.
If these arguments are correct, the only color rule that remains is rule #4, “red functions are more painful to call”, and that part is almost completely alleviated by
First, the split between the sync and async ecosystems is immediately apparent to anyone who looks at the ecosystem. The very existence of
async_std, a crate with the explicit purpose to provide an “async version of the Rust standard library”, shows that the regular standard library is not usable in an async context. If function colors weren’t present in Rust, the ordinary stdlib would be used in both sync and async code, as is the case in Go, where a distinction between “sync” and “async” is never made to begin with.
Then what of the above objections? Let’s go through them one by one and see how they hold up under scrutiny.
Aren’t Rust async functions just ordinary functions with a wacky return type?
Promise. Python’s async functions are regular callables that immediately return a coroutine object. That doesn’t change the fact that in all those languages the caller must handle the returned
Promise (coroutine object in Python,
Future in Rust) in ways that differ from handling normal values returned from functions. For example, you cannot pass an async function to
Iterator::filter() expects a function that returns an actual
bool, not an opaque value that just might produce a bool at some point in the future. No matter what you put in the body of your async function, it will never return bool, and extracting the bool requires executor magic that creates other problems, as we’ll see below. Regardless of whether it’s technically possible to call an async function from a sync context, inability to retrieve its result is at the core of function color distinction.
Ok, but doesn’t the same apply to
Result? Functions that need a
u32 aren’t particularly happy to receive a
Result<u32, SomeError>. A generic function that accepts
u32, such as
Iterator::min(), has no idea what to do with
Result<u32, SomeError>. And yet people don’t go around claiming that
Result somehow “colors” their functions. I admit that this argument has merit –
Result indeed introduces a semantic shift that is not always easy to bridge, including in the example we used above,
Iterator::filter(). There is even a proposal to add 21 new iterator methods such as
try_is_partitioned(), and so on, in order to support doing IO in your filter function (and key function, etc.). Doing this completely generically might require Haskell-style monads or at least some form of higher-kinded types. All this indicates that supporting both
Result and non-
Result types in fully generic code is far from a trivial matter. But is that enough to justify the claim that
Future are equivalent in how they affect functions that must handle them? I would say it’s not, and here is why.
If the recipient of a
Result doesn’t care about the error case, it can locally resolve
Result to the actual value by unwrapping it. If it doesn’t want to panic on error, it can choose to convert the error to a fallback value, or skip the processing of the value. While it can use the
? operator to propagate the error to its caller, it is not obliged to do so. The recipient of a
Future doesn’t have that option – it can either
.await the future, in which case it must become async itself, or it must ask an executor to resolve the future, in which case it must have access to an executor, and license to block. What it cannot do is get to the underlying value without interaction with the async environment.
Verdict: Rule #1 mostly applies – async functions are special because they return values that require async context to retrieve the actual payload.
block_on() offer a convenient way to invoke an async function from a non-async context?
Yes, provided you are actually allowed to use it. Libraries are expected to work with the executor provided by the environment and don’t have an executor lying around which they can just call to resolve async code. The standard library, for example, is certainly not allowed to assume any particular executor, and there are currently no traits that abstract over third-party executors.
But even if you had access to an executor, there is a more fundamental problem with
block_on(). Consider a sync function
fn foo() that, during its run, needs to obtain the value from an async function
async fn bar(). To do so,
foo() does something like
let bar_result = block_on(bar()). But that means that
foo() is no longer just a non-async function, it’s now a blocking non-async function. What does that mean? It means that
foo() can block for arbitrarily long while waiting for
bar() to complete. Async functions are not allowed to call functions like
foo() for the same reason they’re not allowed to call
TcpStream::connect() – calling a blocking function from async code halts the whole executor thread until the blocking function returns. In case of that happening in multiple threads, or in case of a single-threaded executor, that freezes the whole async system. This is not described in the original function color article because neither
block_on() is no longer blue, but it’s not red either – it’s of a new color, let’s call it purple.
If this looks like it’s changing the landscape, that’s because it is. And it gets worse. Consider another async function,
xyzzy(), that needs to call
foo() were a blue/non-async function,
xyzzy() would just call it and be done with it, the way it’d call
Option::take() without thinking. But
foo() is a purple function which blocks on
xyzzy() is not allowed to call it. The irony is that both
bar() are async and if
xyzz() could just await
bar() directly, everything would be fine. The fact that
bar() through the non-async
foo() is what creates the problem –
foo‘s use of
block_on() breaks the chain of suspensions required for
bar() to communicate to
xyzzy() that it needs to suspend until further notice. The ability to propagate suspension from the bottom-most awaitee all the way to the executor is the actual reason why async must be contagious. By eliminating async from the signature of
foo() one also eliminates much of the advantage of
bar() being async, along with the possibility of calling
foo() from async code.
Verdict: rule #3 applies because
block_on() changes a blue function into something that is neither red nor callable from red.
spawn_blocking() resolve the issue of awaiting blocking functions in async contexts?
spawn_blocking() is a neat bridge between sync and async code: it takes a sync function that might take a long time to execute, and instead of calling it, submits it to a thread pool for execution. It returns a
Future, so you can await
spawn_blocking(|| some_blocking_call()) like you’d await a true async function without the issues associated with
block_on(). This is because the
Future returned by
spawn_blocking() is pending until until the thread pool reports that it’s done executing the submitted sync function. In our extended color metaphor,
spawn_blocking() is an adapter that converts a purple function into a red function. Its main intended use case are CPU-bound functions that might take a long time to execute, as well as blocking functions that just don’t have a good async alternative. The example of the latter are functions that work with the file system, which still don’t have a good async alternative, or legacy blocking code behind FFI (think ancient database drivers and the like).
Problems arise when code tries to avoid multiple function colors and use
spawn_blocking() to hide the “color” of the implementation. For example, a library might be implemented using async code internally, but use
block_on() to expose only a sync API. Someone might then use that library in an async context and wrap the sync calls in
spawn_blocking(). What would be the consequences if that was done across the board? Recall that the important advantage of async is the ability to scale the number of concurrent agents (futures) without increasing the number of OS threads. As long as the agents are mostly IO-bound, you can have literally millions of them executing (most of them being suspended at any given time) on a single thread. But if an async function like the above
spawn_blocking() to await a purple function like
foo(), which itself uses
block_on() to await an async function like
bar(), then we have a problem: the number of
xyzzy() instances that can run concurrently and make progress is now limited by the number of threads in the thread pool employed by
spawn_blocking(). If you need to spawn a large number of tasks awaiting
xyzzy() concurrently, most of them will need to wait for a slot in the thread pool to open up before their
foo() functions even begin executing. And all this because
foo() blocks on
bar(), which is again ironic because
bar(), being an async function, is designed to scale independently of the number of threads available to execute it.
The above is not just a matter of performance degradation; in the worst case
spawn_blocking(|| block_on(...)) can deadlock. Consider what happens if one async function behind
spawn_blocking(|| block_on(...)) needs data from another async function started the same way in order to proceed. It is possible that the other async function cannot make progress because it is waiting for a slot in the thread pool to even begin executing. And the slot won’t free up because it is taken by the first async function, which also runs inside a
spawn_blocking() invocation. The slot is never going to change owner, and a deadlock occurs. This can’t happen with async functions that are directly executed as async tasks because those don’t require a slot in a fixed-size pool. They can all be in a suspended state waiting for something to happen to any of them, and resume execution at any moment. In an async system the number of OS threads deployed by the executor doesn’t limit the number of async functions that can work concurrently. (There are executors that use a single thread to drive all futures.)
spawn_blocking() is fine to use with CPU-bound or true blocking code, but it’s not a good idea to use it with
block_on() because the advantages of async are then lost and there is a possibility of deadlock.
But Rust’s stdlib is sync-only.
That’s technically true, but Rust’s stdlib is intentionally minimal. Important parts of functionality associated with Rust are delegated to external crates, with great success. Many of these external crates now require async, or even a specific executor like tokio. So while the standard library is async-free, you cannot ignore async while programming in Rust.
Verdict: technically true but not useful in a language with a minimalistic standard library.
Dealing with a two-colored world
- Accept that sync and async are two separate worlds, and not try to hide it. In particular, don’t write “sync” interfaces that use
block_on()to hide async ones, and the other way around with
spawn_blocking(). If you absolutely must hide the async interfaces behind sync ones, then do so at immediately at the entry point, document that you’re doing so, and provide a public interface to the underlying native call.
- Respecting the above, use
spawn_blocking()in application-level code on the boundaries between the two worlds.
- In more complex scenarios, create clear and documented boundaries between the two worlds and use channels to communicate between them. This technique is already used for both multi-threaded and async code, so it should come to no surprise to future maintainers. Ideally you’d use channels that provide both a sync and an async interface, but if those are not available, use async channels with
block_on()on the sync side.