Tag Archives: regex

Contention on multi-threaded regex matching

Let’s say you need to match the same regex across a large number of strings – perhaps you’re applying a grep-like filter to data generated or received by your program. This toy example demonstrates it by matching a regex against half a billion strings:

use regex::Regex;

lazy_static! {
    static ref IS_INTEGER: Regex = Regex::new("^[0-9]+$").unwrap();
}

fn main() {
    let strings: Vec<&str> = ["foo", "bar", "1234", "1234foo", ""]
        .into_iter()
        .cycle()
        .take(100_000_000)
        .collect();

    let start = Instant::now();
    let n_ints = strings.iter().filter(|s| IS_INTEGER.is_match(s)).count();
    let elapsed = start.elapsed().as_secs_f32();
    println!("{} {}s", n_ints, elapsed);
}

It’s not a scientific benchmark of regex performance, but it does show some interesting and unexpected effects also observed in real-world code. For starters, it takes 2.0s to execute the matches on my laptop.

This is good performance, but it’s a shame to do everything in one thread – let’s try to speed it up by using all cores. This is the kind of thing Rayon makes really easy, just change iter() to par_iter():

use rayon::prelude::*;
let n_ints = strings.par_iter().filter(|s| IS_INTEGER.is_match(s)).count();

Surprisingly, this takes 6-8s to execute on the system with 4 physical cores. In other words, instead of being 3-4x faster due to running on 4 cores, it’s 3-4 times slower. A very similar slowdown occurs on the system with 32 physical cores, where the time grows from 2.05s to 8.2s.

This result can’t be chalked up to an inefficiency in Rayon, as the same slowdown is observed when dividing the work among threads in other ways. Is it possible that matching a compiled regex in multiple threads causes contention when accessed from multiple threads?

When this was first suggested in discussions with coworkers, it seemed quite unlikely, as contention would imply that the compiled regex held a lock or other form of synchronization. This runs counter to the idea of a compiled regex, which one would expect to be fully constructed during compilation. Compiled regexes are often seen in lazy_statics, and shared by the whole program. But no matter how unlikely, the possibility of contention is easy to test, simply by switching from lazy_static! to thread_local!:

thread_local! {
    static IS_INTEGER: Regex = Regex::new("^[0-9]+$").unwrap();
}

The match now needs an additional closure to access the thread-local, but is still quite readable:

use rayon::prelude::*;
let n_ints = strings
    .par_iter()
    .filter(|s| IS_INTEGER.with(|is_integer| is_integer.is_match(s)))
    .count();

Continuing the surprise, this takes 0.66s to run, which is 3x faster than the single-threaded version – the kind of speedup one might realistically expect from a 4-core computer. On the 32-core server, it takes 0.086s, a 24x speedup.

So, regex matching does have a contention issue. The compiled regex type, Regex, wraps the internal Exec type, which holds ProgramCache values organized in a Pool that stores them inside a mutex. Accessed from a single thread or from multiple threads at different times, this mutex is cheap to acquire and is held for a very short time. But under strong contention it becomes a bottle neck with attempts to acquire it falling back to OS-level waits, causing performance to degrade.

The file PERFORMANCE.md dedicates a whole section to this issue. The text is nuanced, so instead of quoting its parts, I encourage you to go ahead and read it. It warns of the performance impact and shows how to eliminate it (using a slightly different approach than taken above), but it also says that it’s “supported and encouraged” to define regexes using lazy_static! and use them from multiple threads. It explains that, despite expectations, a compiled regex doesn’t contain everything needed to match it – some of the compiled state is built lazily while executing particular kinds of search, and is later reused for searches of the same kind. The mutex protects the “scratch space” used for those updates. Design options and tradeoffs are discussed in this issue in more detail.

In summary, for most use cases it’s perfectly fine to use the same compiled regex from multiple threads. But if you have code that does heavy regex matching from multiple threads, and does most or all of it on one regex, you’ll almost certainly want to give different instances of the compiled regex to different threads. As always, be sure to measure performance before actually making such a change to the code.