As a learning exercise I set out to implement a simple lock-free algorithm in Rust. It was inspired by a problem posed at job interviews at a company where a friend works. The problem is simple enough that it can be tackled by a beginner, but tricky enough to require some thought to get right – and Rust presents several new challenges compared to the original Java.
This series of articles presents the evolution of a simple Rust lock-free container starting from single-threaded, progressing to a multi-threaded variant with locks, and finally settling on a lock-free implementation, discussing the trade-offs at each step. A basic understanding of Rust and of multi-threading programming is assumed, but the articles might be useful to adventurous beginners at both. Do note, however, that the author is not an expert at lock-free programming, so there might be errors – if you find some, please do leave a comment.
Contents
The Exercise
Implement a class providing access to a cached value. In more details:
Write a
LazyTransform
class that stores a single value, allowing it to be updated as needed. The value is potentially expensive to compute, so the setter method,set_source
receives a “source” that will be used to compute the final value using a transformation function received in theLazyTransform
constructor. Transformation must not be attempted until requested byget_transformed
. Once generated, the value is cached and returned by further invocations ofget_transformed
, until invalidated by a new call toset_source
.
A single-threaded version can be summarized with the following Python:
class LazyTransform:
def __init__(self, transform_fn):
self.transform_fn = transform_fn
self.source = None
self.value = None
def set_source(self, new_source):
self.source = new_source
def get_transformed(self):
if self.source is not None:
newval = self.transform_fn(self.source)
if newval is not None:
self.value = newval
self.source = None
return self.value
The class must support being called from multiple threads, with the following semantics:
set_source
andget_transformed
can and will be called on the sameLazyTransformer
instance in parallel;- Once
set_source
completes, future invocations ofget_transformed
must eventually start returning the new value. - Read-heavy usage pattern is expected, so
get_transformed
must not block regardless of how many timesset_source
orget_transformed
are called in other threads. The one exception is when a new source is detected – it is allowed forget_transformed
to block until the transformation finishes before returning the transformed value (and caching it for future calls). - The code must be lock-free: neither
set_source
norget_transformed
should get stuck waiting on each other, even if they are called in quick succession or in parallel by many threads, or both.
Rust API tradeoffs
Before proceeding to parallelization, let’s review how the above interface would map to Rust’s type system. Ideally we’d want to place as few restrictions as possible on the type of values used for the source and the final objects; either could be as simple as a single u32
or a huge heap-allocated object. We know, for example, that both the source and the value type must be Send
, because they need to be accessed from threads different from those that create them.
Another necessary restriction is that the final value type must be Clone
. Why? Consider how the concept of “returning of cached value”, the return self.value
line in the above Python, maps to Rust. In Python the semantics are clear because all of its objects are heap-allocated, and you always get the same instance shared. This is also the specified by the original Java exercise, which returns an Object
. But a correct Rust implementation needs to deal with an actual value stored in the cache, and has three options for returning it:
- move the object out of the container, typically by making the
value
field anOption
and returningself.value.take()
; - return a reference to the object,
return &self.value
; - clone the object and return the cloned value to the caller.
The first option obviously doesn’t work because it would prevent get_transformed
to return the cached value more than once. The second option looks feasible until one considers that the returned reference cannot be allowed to outlive the stored value. Since the stored value can be invalidated by a call to set_source
, which can happen literally at any time, it is clear that allowing a reference to be returned would be unsound. Indeed, all such attempts are promptly rejected by the borrow checker.
Although cloning at first appears like it would be inefficient for arbitrary objects, it actually provides the greatest flexibility for the user. Light values, such as numeric IDs which are Copy
(and hence also Clone
), or small strings which are cheap to clone, can be placed in the cache as-is. Heavy values, on the other hand, can be dynamically allocated and accessed as Arc<ActualData>
, ensuring that their clone
only increments a reference count, providing the semantics one would expect from equivalent Python or Java. If needed, one can even combine the two and store a tuple of a light object and a heavy one.
So, LazyTransform
needs to be generic on the value type (T
) and the source type (S
). But let’s not forget the transformation function received by the constructor. Fixing its type to fn(S) -> T
would limit it to stateless functions, and we would like the user to be able to provide an arbitrary closure for transformation. One option would be to accept a generic function object in the constructor and box it in a Box<Fn(S) -> T>
, but that would impose a dynamic allocation on each LazyTransform
instance, as well as an indirection when invoking the function. If the transformation function is known at compile time and carries no state, it should incur neither storage nor run-time indirection overhead. This is easily achieved by adding a third type parameter, that of the transformation function type.
As a demonstration of the API, here is a single-threaded implementation of the container:
pub struct LazyTransform<T, S, FN> {
transform_fn: FN,
source: Option<S>,
value: Option<T>,
}
impl<T: Clone, S, FN: Fn(S) -> Option<T>> LazyTransform<T, S, FN> {
pub fn new(transform_fn: FN) -> LazyTransform<T, S, FN> {
LazyTransform {
transform_fn: transform_fn,
source: None, value: None,
}
}
pub fn set_source(&mut self, source: S) {
self.source = Some(source);
}
pub fn get_transformed(&mut self) -> Option<T> {
if let Some(source) = self.source.take() {
let newval = (self.transform_fn)(source);
if newval.is_some() {
self.value = newval;
}
}
self.value.clone()
}
}
In spirit this is exactly the same thing as the original Python, except sprinkled with a healthy dose of static typing.
Compile-time thread safety
What happens if we try to share an instance of LazyTransform
among threads? Rust will prevent that at compile time — invoking a &mut
method from multiple threads would require creating multiple &mut
references to the same object, which is prevented by the borrow checker. For example, the following doesn’t compile:
fn main() {
let mut lt = LazyTransform::new(|x: u64| Some(x + 1));
std::thread::spawn(move || { // lt moved here
for i in 0..10_000 {
lt.set_source(i);
}
});
while lt.get_transformed().is_none() { // lt used after move
}
let val = lt.get_transformed().unwrap();
assert!(val >= 0 && val < 10_000);
}
lt
gets moved into the closure executed by the new thread, but then it is no longer available for use by the main thread. Sending it by reference wouldn’t work because there can exist only one &mut
reference to an object, so we wouldn’t be allowed to send the same reference to multiple threads. Allocating LazyTransform
dynamically and using Arc
to share it among threads wouldn’t help either because Arc
only provides shared access to the data it owns.
In Rust, supporting parallel access to a container requires not only a change in implementation, but also in method signatures. This is an intentional design decision – while Python or Java single-threaded code will happily execute when called from multiple threads, providing incorrect results, the Rust version will refuse to compile when the thread-unsafe LazyTransform
object is accessed from two threads.
The compiler uses simple rules to decide whether it is safe to share an object between threads:
- Methods invoked from more than one thread must accept
&self
rather than&mut self
. This rule is enforced by the borrow checker for single-threaded code as well. - The object must not contain values of types specifically blacklisted for multi-threaded access even through shared references. In Rust terms, its type must “be Sync”, meaning it implements the
Sync
marker trait, which most objects do. Examples of non-Sync
types areRc
orCell
, and both have thread-safe equivalents.
At a glance, the first rule seems to rule out LazyTransform
as a multi-threaded type. Both its public methods clearly modify the object, with set_source
even doing that in a way that is observable from the outside. Changing the signatures to accept &self
instead of &mut self
fails to compile because both methods modify the data behind the &self
shared reference, which is prohibited. Accessing an object inside &self
will also result in further shared references that are read-only.
To modify data, we must find a way to obtain an exclusive mutable reference from the shared reference to self
. This is not allowed for ordinary objects because the compiler would have no way to ensure that writes are exclusive, i.e. that while a thread holds a mutable reference to a value, no other thread can read it or write to it. However, if we could statically convince Rust that the reference’s ownership of the data will be exclusive, it would be within the rules to allow the conversion. This is where mutexes come in.
Mutexes
Rust’s Mutex
type provides read-write access to the value it protects, using the appropriate operating system primitive to ensure that this can be done by only one thread at a time. Here is an implementation of LazyTransform
updated to use a mutex:
use std::sync::Mutex;
struct LazyState<T, S> {
source: Option<S>,
value: Option<T>,
}
pub struct LazyTransform<T, S, FN> {
transform_fn: FN,
state: Mutex<LazyState<T, S>>,
}
impl<T: Clone, S, FN: Fn(S) -> Option<T>> LazyTransform<T, S, FN> {
pub fn new(transform_fn: FN) -> LazyTransform<T, S, FN> {
LazyTransform {
transform_fn: transform_fn,
state: Mutex::new(LazyState { source: None, value: None }),
}
}
pub fn set_source(&self, source: S) {
let mut state = self.state.lock().unwrap();
state.source = Some(source);
}
pub fn get_transformed(&self) -> Option<T> {
let mut state = self.state.lock().unwrap();
if let Some(new_source) = state.source.take() {
let new_value = (self.transform_fn)(new_source);
if new_value.is_some() {
state.value = new_value;
}
}
state.value.clone()
}
}
Both methods now operate on &self
, relying on the mutex to obtain write access to the data in self.state
. As far as method signatures are concerned, this is the final version of the API – all future versions will only differ in implementation.
The storage is now split into transform_fn
, which is itself immutable and can be invoked from a shared reference, and state
, the mutable part of the object’s state moved to a separate struct and enclosed in a mutex. As can be seen here, Rust’s Mutex
is a container that holds and owns the data it protects. While that coupling looks strange at first, it enables the mutex to safely grant read-write access to the data it owns.
Calling Mutex::lock()
waits until an exclusive OS lock is acquired, then returns a “guard” object, that both LazyTransform
methods store in a local variable called state
. The mutex will not be unlocked until the guard goes out of scope. Therefore the existence of a live guard represents a proof that the mutex is locked and therefore provides read-write access to the underlying data.
In Rust’s twist on mutex semantics, the very meaning of the act of locking a mutex is obtaining temporary exclusive write access to its data through a temporary guard object. Despite self
being a shared reference, a successful self.state.lock()
grants access to &mut LazyState
that may last for as long as the mutex is locked (guard exists) and no more. This is the crux of the way Rust prevents data races through static analysis.
Other than the curious mutex design, there is nothing really interesting about the code itself. Once the mutex is locked, both functions do exactly the same thing that their single-threaded counterparts did. While this code is thread-safe in the sense Rust promises, i.e. free from data races, it is still very far from being efficient when invoked in parallel, even ignoring the stringent lock-free requirements. In particular, get_transformed
is extremely inefficient in a read-heavy scenario because each call blocks all other calls even when set_source
isn’t called at all. When a transformation is in progress, all the other readers are blocked until it is finished.
Fine-grained locking
To minimize the amount of time spent waiting, we can take advantage of the following facts:
- The methods are operating on two distinct pieces of data,
source
andvalue
.set_source
, for example, doesn’t accessvalue
at all. The two fields can be protected with different locks. get_transformed
has two distinct modes of operation: a fast one when it only returns the cached value, and the slow one when it detects that the source has changed and it needs to calculate the new value. The vast majority of calls toget_transformed
can be expected to belong to the “fast” scenario.
Here is an implementation that uses finer-grained locking to ensure that readers don’t wait for either writers or other readers:
use std::sync::{Mutex, RwLock};
pub struct LazyTransform<T, S, FN> {
transform_fn: FN,
source: Mutex<Option<S>>,
value: RwLock<Option<T>>,
transform_lock: Mutex<()>,
}
impl<T: Clone, S, FN: Fn(S) -> Option<T>> LazyTransform<T, S, FN> {
pub fn new(transform_fn: FN) -> LazyTransform<T, S, FN> {
LazyTransform {
transform_fn: transform_fn,
source: Mutex::new(None),
value: RwLock::new(None),
transform_lock: Mutex::new(()),
}
}
pub fn set_source(&self, source: S) {
let mut locked_source = self.source.lock().unwrap();
*locked_source = Some(source);
}
pub fn get_transformed(&self) -> Option<T> {
if let Ok(_) = self.transform_lock.try_lock() {
let mut new_source = None;
if let Ok(mut locked_source) = self.source.try_lock() {
new_source = locked_source.take();
}
if let Some(new_source) = new_source {
let new_value = (self.transform_fn)(new_source);
if new_value.is_some() {
*self.value.write().unwrap() = new_value.clone();
return new_value;
}
}
}
self.value.read().unwrap().clone()
}
}
In this implementation there is no longer a “state” structure protected by a coarse mutex, we are back to individual fields. The source
field is protected by its own mutex and the value
field is protected by a separate RwLock
, which is like a mutex, except it allows read access by multiple concurrent readers that don’t block each other. Finally, a new transform_lock
field doesn’t protect any particular piece of data, it serves as something resembling a conventional mutex.
set_source
locks the source mutex and replaces the source with the new value. It assigns to *locked_source
because locked_source
is just the variable holding the guard, and assigning Option<S>
to it would be a type error. Since the guard provides automatic access to &mut Option<S>
, *locked_source
at the left-hand side of the assignment serves to both coerce the guard to &mut Option<S>
(returned by guard’s implementation of DerefMut::deref_mut
) and at the same time to dereference it, so that the value behind the reference is replaced with the new one.
get_transformed
is more sophisticated. It first ensures that only a single call attempts to interact with the writer at one time. This is for two reasons: first, to avoid set_source
being “attacked” by a potentially large number of readers in a read-heavy scenario. Second, we want to prevent more than one transformation happening in parallel, which would require the result of one expensive transformation to be thrown away. The synchronization is implemented using try_lock
, which immediately returns if the lock could not be obtained. In case of failure to lock, get_transformed
gives up and returns the cached value, which meets its requirements. If it acquires transform_lock
, it proceeds to check whether a new source is available, again with a try_lock
and a fallback to returning the cached value. This ensures that get_transformed
gets out of the way of set_source
as much as possible. If it acquires the source
lock, it uses Option::take()
to grab the new value, leaving None
in its place. If the captured source is not None
, meaning a new source was published since the last check, get_transformed
performs the transformation, caches its result, and returns a copy.
get_transformed
uses a RwLock
to ensure that readers don’t wait for each other, but that the update is exclusive. RwLock
nicely maps to Rust’s ownership system by RwLock::read
returning a guard that provides shared reference to the underlying data, and RwLock::write
returning a guard that provides a mutable reference.
This implementation is about as good as it can get with the use of locks. The problem statement, however, requires get_transformed
and set_source
not to block each other regardless of how often they are invoked. The above implementation will attempt an exclusive lock of source
just to check if a new source has appeared. When this lock succeeds, set_source
will be blocked for the duration of the lock. In a read-heavy scenario, get_transformed
will be called often and by many different threads, and it is easy to imagine it hogging the lock enough to slow down set_source
, which must wait to acquire the lock (it cannot use try_lock
) in order to do its job.
Changing this requires looking outside the constructs offered by safe Rust, as discussed in the next post.
Isn’t there an error in the Python code?
It seems to recompute the value everytime instead of doing it only, when self.source changed.
Can you elaborate? The code only recomputes if there is a new source (
self.source is not None
). Otherwise, the cached value (self.value
) is returned.Actually, you can return
&T
. Because the lifetime of the returned value is implicitly tied to the lifetime on the methods’s&mut self
borrow, it will count as holding a borrow of the parent value for as long as the return value lives. Therefore, no other mutating function may be called while the return value is in scope, and the reference is guaranteed to remain valid. The caller can still choose to clone the value to immediately drop the borrow ifT: Clone
, but it’s not required anymore!Also a small optimization is that you check
val.is_some()
, guaranteeing that the value returned is neverNone
, so the return value does not need to be anOption
. That’s a relatively minor nitpick, however.Example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=501e7ab4f6aeded2b69ec177b9542ff6
If you are referring to
get_transformed()
, yes, aget_transformed()
that accepts&mut self
can indeed return&T
. But such design wouldn’t allowset_source()
being invoked from a different thread, thus defeating the point of the exercise. The idea is to create a sound and lock-free API where bothset_source
andget_transformed
accept a shared&self
reference, and are callable from different threads.Is the last piece of Rust code really still equivalent to the python code. As far as I can tell the Python code always returned a transformed value if a source was set before, while the optimized Rust code may return
None
if another thread already already acquired thetransform_lock
to transform a new source, but hasn’t completed it’s job yet…