Rust global variables, two years on

In November 2021 I wrote a blog post that examined Rust’s curious relationship with global variables. It aimed to explain why this ubiquitous language feature required external crates, and ended with personal recommendations on the use of globals in new code. Two years have passed, and Rust has changed enough that it’s time to take a fresh look. The rest of this text assumes you’ve read the previous article or are familiar with the subject.

Const Mutex and RwLock constructors

The first change is that Mutex::new() is const as of Rust 1.63, so this example from the previous post now compiles and works as expected:

// didn't compile two years ago, compiles now
static LOG_FILE: Mutex<String> = Mutex::new(String::new());

The foundation for this improvement was laid down in 1.62 which replaced Mutex, RwLock, and CondVar with lightweight, non-allocating implementations on Linux, and which 1.63 extended to provide const construction of those types on all platforms. The result is that for simple types mutex-protected globals “just work” without doing anything special.

Although we no longer have to encase every static Mutex in a OnceCell or equivalent, we still need a cell-like wrapper for scenarios where locked writing is only done on first use to initialize the value. In that case subsequent accesses to the global are read-only and shouldn’t require locking, only an atomic check. This is a very common use of global variables, a good example being a global holding a lazily compiled regex.

This brings us to the next and more important news.

Once cell is now in std

Since Rust 1.70, once_cell::sync::OnceCell, from the once_cell crate got integrated into the standard library as std::sync::OnceLock. For the first time in Rust’s existence, you don’t need to write unsafe code, or bring in external crates that encapsulate it, to create a global/static variable initialized on first use. Usage is essentially the same as with once_cell:

use std::sync::OnceLock;
use regex::Regex;

pub fn log_file_regex() -> &'static Regex {
    static LOG_FILE_REGEX: OnceLock<Regex> = OnceLock::new();
    LOG_FILE_REGEX.get_or_init(|| Regex::new(r#"^\d+-[[:xdigit:]]{8}$"#).unwrap())

// use log_file_regex().is_match(some_name) anywhere in your program

This addition might not seem like a big deal at first given that once_cell has provided the same functionality for years. However having it in the standard library greatly benefits the language in several ways. First, initialize-on-first-use globals are very widely used by both applications and libraries, and both can now phase out crates like once_cell and lazy_static from their dependencies. Second, global variables can now be created by macro-generated code without awkward reexports of once_cell and other logistic issues. Third, it makes it easier to teach the language, with teaching materials no longer needing to decide whether to cover once_cell or lazy_static, nor explain why external crates are needed for global variables to begin with. This excruciatingly long StackOverflow answer is a good example of the quagmire, as is my previous blog post on this topic. The whole stdlib/unsafe section of the latter is now just rendered obsolete, as the same be achieved safely with OnceLock at no loss of performance.

The work is not yet complete, however. Note how the static variable is placed inside the function that contains the sole call to OnceLock::get_or_init(). This pattern ensures that every access to the static OnceLock goes through one place which also initializes it. once_cell makes this less verbose through once_cell::sync::Lazy, but the equivalent stdlib type is not yet stable, being stuck on some technical issues. The workaround of placing the global into a function isn’t a significant obstacle, but it’s worth mentioning. It’s particularly relevant when comparing the ease of use of OnceLock with that of lazy_static::lazy_static! or once_cell::sync::Lazy, both of which offer the convenience of initializing in a single location without additional effort.

What to use in 2024

Two years ago the TL;DR of my recommendation was to “use once_cell or lazy_static, depending on which syntax you prefer”. Now it shifts to: use standard-library facilities like OnceLock or atomics in almost all situations, and once_cell when you require convenience not yet covered by std.

In particular:

  • As before, when the type you want to use in static supports thread-safe interior mutability and has a const constructor, you can declare it as static directly. (The compiler will check all that for you, just see if it compiles.) This used to only include atomics, but now also includes mutexes and rwlocks. So if something like static CURRENT_CONFIG: Mutex<Option<Config>> = Mutex::new(None) or static SHOULD_LOG: AtomicBool = AtomicBool::new(true) works for you, go for it.

  • When this doesn’t work, or you need to initialize on first use, use std::sync::OnceLock, preferably encapsulated in a function as shown above.

  • If you create a large number of globals and want to avoid the boilerplate encapsulating each in a function, use once_cell::sync::Lazy. That type is likely to be stabilized in some form, which makes it preferable over lazy_static. There are no good reasons to use lazy_static in new code.

Note that existing code that uses once_cell or lazy_static doesn’t require immediate attention. Those crates will remain available indefinitely, and they generate nearly identical assembly to that of the standard library’s OnceLock. The above recommendations are meant to guide your decisions regarding new code, or regarding code you’re refactoring anyway.

One thought on “Rust global variables, two years on”

Leave a Reply