Understanding Memory Ordering in Rust
I am reading Mara Bos' Rust Atomics and Locks. On the first pass, I didn't really grok memory ordering (what Mara describes as "the most complicated topic of the book"). So here's my attempt at understanding by explaining.
Why do we care about memory ordering
Atomic operations and memory ordering come into play when your program has multiple threads working on the same data, such as a shared counter or context object.
In single-threaded code, you can read a function's statements from top to bottom and understand what it's going to do. Earlier statements will happen before later statements.
In multi-threaded code, this is not so simple. Another thread could be reading or writing the same variables as a function you're looking at in ways that might break what your program is supposed to do.
Multi-threading is extra tricky to reason about because of two additional factors:
- Compiler optimizations - at compile time, the compiler can re-order and combine instructions to optimize your code. We generally appreciate these performance improvements, but, if we're not careful, reordering certain instructions could break our multi-threaded code.
- Processor optimizations - at runtime, the processor might also reorder instructions. For example, because a variable from a later instruction is already available in the processor cache while another needs to be loaded from memory. Similarly to the compiler optimizations, these reorderings are fine for single-threaded code but might break multi-threaded code.
Compiler and processor optimizations are great, but they don't take multiple threads into account.
Memory ordering is how we tell the compiler and processors about which operations can be safely reordered and which cannot. (Memory ordering is also used to implement higher-level multi-threading building blocks like channels, Mutex
es, and RwLock
s.)
Three categories of memory ordering
While the std::sync::atomic::Ordering
enum has 5 variants, there are really 3 categories of memory orderings:
Relaxed
- Release / Acquire (
Acquire
,Release
,AcqRel
) - Sequential Consistency (
SeqCst
)
Let's dive into each of these categories, starting with the "strongest".
You probably don't want Sequential Consistency
This point comes across very clearly:
While it might seem like the easiest memory ordering to reason about, SeqCst ordering is almost never necessary in practice...
Myth: Sequentially consistent memory ordering is a great default and is always correct.
Putting aside performance concerns, sequentially consistent memory ordering is often seen as the perfect type of memory ordering to default to, because of its strong guarantees. Itβs true that if any other memory ordering is correct, SeqCst is also correct. This might make it sound like SeqCst is always correct. However, itβs entirely possible that a concurrent algorithm is simply incorrect, regardless of memory ordering. ... It is advisable to see SeqCst as a warning sign. Seeing it in the wild often means that either something complicated is going on, or simply that the author did not take the time to analyze their memory ordering related assumptions, both of which are reasons for extra scrutiny.
Ruh roh. I'm definitely guilty of having written code with SeqCst
in the past, just because it seemed like the safest default.
So, what else do we have to work with?
Relax a little
Relaxed
memory ordering is useful when you want to perform atomic operations on a single variable but you don't care about synchronizing different operations across threads. For example, if you are incrementing a single counter, you might not care about the order that the increment operations happen in as long as each is individually atomic.
Mara gives the example of this code, assuming that the functions a
and b
are run on separate threads:
static X: AtomicI32 = AtomicI32::new(0);
fn a() {
X.fetch_add(5, Relaxed);
X.fetch_add(10, Relaxed);
}
fn b() {
let a = X.load(Relaxed);
let b = X.load(Relaxed);
let c = X.load(Relaxed);
let d = X.load(Relaxed);
println!("{a} {b} {c} {d}");
}
In this program:
- We know that
5
will be added toX
before10
is. - We don't know what values will be loaded by the different
load
operations inb
. - We do know that the load operations will observe the additions in the correct order (0 β 5 β 15).
- Different runs of the same program might interleave the
fetch_add
andload
operations in a different way.
Here is an illustration of 4 different ways that the operations could be ordered:
To repeat, Relaxed
is useful when:
- You have a single variable you want to atomically update.
- You don't care about the order in which multiple threads' updates happen.
- It's okay if other threads don't immediately observe the effects of an update.
- You have multiple variables and you don't care about synchronizing the updates between them.
Into the depths: Release
and Acquire
ordering
Now that we've seen that we probably don't want SeqCst
and that Relaxed
is for atomic operations without synchronization, let's turn our attention to Release
and Acquire
.
One thing to clarify up front: you should not think about Release
and Acquire
as two different "modes". Release
is only relevant for write/store operations. Acquire
is only relevant for read/load operations.
Release
marks a moment in the program's overall timeline, declaring that all writes that happened with or before this moment will be visible after an Acquire
on this same variable.
In simpler terms, let's say you have two threads doing some work and reading and writing data:
Producer Thread | Consumer Thread |
---|---|
π οΈ work π οΈ work βοΈ write data βοΈ write more data π© Release |
|
π© Acquire π read data π read more data π οΈ work |
Release
and Acquire
synchronize the threads such that all writes that happened with or before this moment will be visible after an Acquire
on this same variable.
The use cases for this type of ordering include:
- Implementing locks like
Mutex
es andRwLock
s - Implementing producer / consumer patterns like channels
- Lazy initialization of data
- Signaling the completion of tasks between threads
Signaling example
Let's look at Mara's example of using an AtomicBool
to signal when some data is ready:
use std::sync::atomic::{AtomicBool, AtomicU64};
use std::sync::atomic::Ordering::{Acquire, Relaxed, Release};
static DATA: AtomicU64 = AtomicU64::new(0);
static READY: AtomicBool = AtomicBool::new(false);
fn main() {
std::thread::spawn(|| {
DATA.store(123, Relaxed);
READY.store(true, Release); // Everything from before this store ..
});
while !READY.load(Acquire) { // .. is visible after this loads `true`.
std::thread::sleep(std::time::Duration::from_millis(100));
println!("waiting...");
}
println!("{}", DATA.load(Relaxed));
}
The main thread is waiting for the DATA
to be ready and sleeping for 100 milliseconds between each check.
Crucially, the line READY.store(true, Release);
ensures that all writes that happened with or before this moment will be visible after an Acquire
on this same variable. Note how DATA
is written before the Release
moment and it's only using the Relaxed
ordering.
When the main thread finally observes READY.load(Acquire)
to be true
, we break out of the while
loop and finally read the value. Even though DATA.load(Relaxed)
uses Relaxed
, it is guaranteed to see the value. The write happened before the Release
moment on the READY
variable and this load
is happening after the corresponding Acquire
on the READY
.
This is the only possible sequence for this code:
Locking example
Mara's next example for Release
/ Acquire
ordering is a very rudimentary lock that uses an AtomicBool
to protect access to a String
. Here's a slightly modified version of that example:
use std::sync::atomic::Ordering::{Acquire, Relaxed, Release};
use std::sync::atomic::AtomicBool;
static mut DATA: String = String::new();
static LOCKED: AtomicBool = AtomicBool::new(false);
fn f() {
if LOCKED.compare_exchange(false, true, Acquire, Relaxed).is_ok() {
// Safety: We hold the exclusive lock, so nothing else is accessing DATA.
unsafe { DATA.push('!') };
LOCKED.store(false, Release);
}
}
fn main() {
std::thread::scope(|s| {
for _ in 0..100 {
s.spawn(f);
}
});
println!("{}", unsafe { &DATA });
}
The two critical lines are:
if LOCKED.compare_exchange(false, true, Acquire, Relaxed).is_ok() {
LOCKED.store(false, Release);
The first line atomically reads LOCKED
using the Acquire
ordering and if the value is false
, it sets the value to true
using the Relaxed
ordering.
The second line sets LOCKED
back to false
using Release
. As we've seen before, Release
means that all writes that happened with or before this moment will be visible after an Acquire
on this same variable.
Combining these two lines guarantees that the unsafe
write to DATA
and the Release
write to READY
will both be finished before another thread observes LOCKED
as false
using the Acquire
ordering.
DIY synchronization primitives
The next couple chapters in Mara's book give excellent walkthroughs of building a spin lock, a channel, and an atomically reference-counted pointer (Arc
) using the memory ordering principles we've discussed here. I won't go through those here, but I would recommend reading through them on your own.
If you're interested in diving in further, you can also read about atomic fences. These effectively decompose the store/Release
and load/Acquire
operations we've been discussing into separate fence and store or load operations, which can be useful for applying a fence to multiple variables at once.
Conclusion
Memory ordering can be tricky to grok, but here are a couple takeaways to remember:
SeqCst
isn't as good a default as you think, and should be avoided in most cases.Relaxed
is good for when you want to apply atomic operations on individual variables but don't care about synchronizing reads and writes on multiple variables. For example, incrementing simple counters or collecting statistics.Release
/Acquire
is for when you want to synchronize reads and writes to multiple variables across multiple threads. Storing a value to a given variable usingRelease
guarantees that all writes that happened with or before this moment will be visible after the same variable is loaded using anAcquire
.Release
/Acquire
are useful for building all manner of synchronization primitives, including locks, channels, and signals. Think ofRelease
as releasing some changes (or a lock) to other threads, whileAcquire
acquires those changes (or the lock).
Thanks to Mara Bos for writing a clear and approachable explanation of Rust Atomics and Locks and to Simon Rassmussen for feedback on a draft of this post.
Discuss on r/rust, Lobsters, or Hacker News.