You read about faster hash functions and switch to one. Most of your code gets the expected speed boost, but some parts mysteriously get slower – much slower, especially when dealing with large hashmaps. If this sounds familiar, you might have encountered the stable HashMap trap.
Background
Rust’s HashMap
is an implementation of Google’s SwissTable. While the hash table provides excellent performance of, the SipHash hash function used by default is designed to protect against a large class of DoS attacks at the price of performance. When you care about performance and maliciously crafted keys are not an issue, switching to a more efficient hash function can boost performance by a significant amount. Rust hash maps make it fairly easy to change the hash function by switching to a different one, and there are crates that offer high-performance hash functions. The Rust Performance Book recommends, among others, the use of the brutally effective FxHasher
from rustc-hash crate and the higher-quality ahash crate. This recommendation, however, comes at a price that is rarely if ever mentioned – some O(n) hash map operations, including deserialization, degrade to O(n**2)!
The issue
Since the issue is easy to stumble upon when deserializing HashMaps, let’s use that as example. We’ll construct a HashSet
for simplicity, but HashMap
would exhibit the same behavior. This code creates a set of 10m random values, serializes it to a Vec<u8>
using bincode, and deserializes it back into a HashSet
:
use std::time::Instant;
use rand::Rng;
type TestHashSet<K> = std::collections::HashSet<K>;
//type TestHashSet<K> = rustc_hash::FxHashSet<K>;
fn main() {
let mut rnd = rand::thread_rng();
let nums: Vec<_> = (0..10_000_000).map(|_| rnd.gen::<u64>()).collect();
let t0 = Instant::now();
let h: TestHashSet<u64> = nums.into_iter().collect();
let t1 = Instant::now();
println!("create: {}", (t1 - t0).as_secs_f64());
let out = bincode::serialize(&h).unwrap();
let t2 = Instant::now();
println!("serialize: {}", (t2 - t1).as_secs_f64());
let h2: TestHashSet<u64> = bincode::deserialize(&out).unwrap();
let t3 = Instant::now();
println!("deserialize: {}", (t3 - t2).as_secs_f64());
println!("{}", h2.len());
}
Output on my machine:
create: 1.440201369
serialize: 0.071765342
deserialize: 1.114031976
Now let’s swith to fxhash
to speed it up. We only need to point the TestHashSet
alias to rustc_hash::FxHashSet<K>
and we’re good to go. Note that FxHashSet
is itself an alias for std::collections::HashSet
whose hasher is FxHasher, so it only affects the hash function, not the implementation of the table. With that change, I get this output:
create: 1.391839734
serialize: 0.081116361
deserialize: 5.2052695
Creation is slightly faster, serialization takes about the same time, but deserialization is almost 5x slower than with the stdlib hash! Tested against different set sizes, the deserialization time is close to quadratic – it takes 5s with 10m elements, 20s with 20m elements, 69s with 40m elements, and 219s with 80m elements. We’ve seen the trap.
When does it happen?
First, let’s make it clear that the problem has nothing to do with serde or bincode. Serde just happens to create the set from values obtained by iterating it. The same slowdown can be easily reproduced without serde:
// just as slow as the serde version when TestHashSet is FxHashSet
let mut h2: TestHashSet<u64> = Default::default();
for &n in &h {
h2.insert(n);
}
Second, this is not a consequence of fxhash being of lower quality or hitting some bad edge case. The same quadratic behavior can be observed using any other stable hash function, including the default SipHash when randomization is omitted. To try it, define TestHashSet<K>
as std::collections::HashSet<K, BuildHasherDefault<std::collections::hash_map::DefaultHasher>>
(quite a mouthful), which uses the default hash function with a non-random initial state – you’ll get a similar slowdown.
The third thing to be aware of is that you can’t reproduce the issue with a simple clone()
or collect()
– things like let h2 = h.clone()
and let h2: TestHashSet<u64> = h.iter().copied().collect()
will be O(n), and almost 10x faster with fxhash than with the default hasher. It also doesn’t happen when you pre-allocate the set using with_capacity()
. To reproduce, you need to start with a HashSet
that must resize at least once to accommodate the data, such as an empty one. Both clone()
and collect()
preallocate the HashSet
which avoids this performance degradation.
This is why serde hits it – although serde stores the size of every container as a “size hint”, it also caps this hint during deserialization to a low number, for security reasons.
In summary, to reproduce quadratic complexity of construction, all of the following conditions must be met:
- the HashMap/HashSet must resize to accommodate the values we’re inserting
- it uses a stable hash function
- it is built from keys in iteration order of the same HashMap/HashSet
Why does it happen?
This bug is thoroughly described in this 2016 article, which I encourage you to read to understand it. In short, copying a larger table into a smaller one assigns the keys from the first half of the larger table to consecutive locations. Then the keys from the second half of the larger table must fit between the keys which were already densely packed, which forms large collision chains. Although this is a temporary situation, fixed on the next resize, it affects performance visibly. But do refer to the article for details.
The article mentions Robin Hood hashing which is no longer used by Rust hash tables, but don’t let that throw you off. Although the implementation switched to a different probing strategy, it still uses open addressing with power-of-two hash table sizes.
The quadratic behavior was fixed for the default hasher by introducing randomization, but the issue is still open because the underlying bug is still present, visible with non-randomized hashers, as in the above code.
Workarounds
All fixes for this bug boil down to shuffling the items in order to avoid the construction in unfavorable order. Since the actual bug that leads to the issue is in the implementation of the hashmap, I refer to these as workarounds rather than proper fixes.
I do hope this bug gets more visibility and gets fixed, as it hurts people who try to use fxhash
to get a faster HashMap
, and it’s not immediately obvious what’s going on. In general, a nice property of hash tables is that, unlike unbalanced binary trees or quicksort, they’re normally not sensitive to key order. It’s a bit of a shame that that doesn’t apply to Rust’s hash tables with deterministic hashers.
Pre-allocate the HashSet/HashMap
This is by far the simplest fix when you don’t control the exact type of the map, but do control it is created. In the non-serde case, that means initialize the h2
hash map with capacity given upfront:
// fast because h2 is of the correct size to begin with
// note: can't just call with_capacity() because it assumes RandomState from stdlib
let mut h2: TestHashSet<u64> = TestHashSet::with_capacity_and_hasher(h.len(), Default::default());
for &n in &h {
h2.insert(n);
}
This can be made to work with serde, but requires writing a custom deserializer.
Shuffle the items
Shuffling the elements also fixes the issue. For example:
// fast because h2 is built of elements which are not in iteration order of h
let mut elems: Vec<_> = h.iter().copied().collect();
elems.shuffle(&mut rnd);
let mut h2: TestHashSet<u64> = TestHashSet::default();
for n in elems {
h2.insert(n);
}
This is a good “quick fix” when you don’t control how the table is constructed, but do control the data that flows into it. It is not that easy to apply to the serde case, where it would require a custom serializer.
Randomize the hasher
This is the most general solution, and is how the issue was worked around in the standard library. The standard library uses the RandomState
to build the default hasher.
If you use another hasher, you just need to make sure that you’re using the randomizing one. (This is what AHashMap
uses by default, so you don’t need to do anything special.) If your hasher doesn’t provide a random hasher builder, you can always create one using the public API. For example, for FxHasher we can create an FxRandomState
that implements BuildHasher
:
#[derive(Copy, Clone, Debug)]
pub struct FxRandomState(usize);
impl BuildHasher for FxRandomState {
type Hasher = FxHasher;
fn build_hasher(&self) -> FxHasher {
let mut hasher = FxHasher::default();
hasher.write_usize(self.0);
hasher
}
}
Ideally build_hasher()
would just return FxHasher::with_seed(self.0)
, but with_seed()
isn’t provided, so we get equivalent functionality by creating a hasher and mixing the seed into it. The remaining part is to implement Default
for our hasher builder, which will obtain the actual seed. To make it efficient, we won’t get a random seed every time, but only when a new thread is created. Then within a thread we’ll assign each FxRandomState
(and therefore each hash table) a new consecutive seed.
impl Default for FxRandomState {
fn default() -> Self {
thread_local! {
static SEED: Cell<usize> = Cell::new(rand::thread_rng().gen())
}
let seed = SEED.with(|seed| {
let n = seed.get();
seed.set(n.wrapping_add(1));
n
});
FxRandomState(seed)
}
}
With this in place, we can alias TestHashSet<K>
to HashSet<K, FxRandomState>
, and the quadratic behavior goes away.