Evan Schwartz

Understanding Memory Ordering in Rust

I am reading Mara Bos' Rust Atomics and Locks. On the first pass, I didn't really grok memory ordering (what Mara describes as "the most complicated topic of the book"). So here's my attempt at understanding by explaining.

Why do we care about memory ordering

Atomic operations and memory ordering come into play when your program has multiple threads working on the same data, such as a shared counter or context object.

In single-threaded code, you can read a function's statements from top to bottom and understand what it's going to do. Earlier statements will happen before later statements.

In multi-threaded code, this is not so simple. Another thread could be reading or writing the same variables as a function you're looking at in ways that might break what your program is supposed to do.

Multi-threading is extra tricky to reason about because of two additional factors:

Compiler and processor optimizations are great, but they don't take multiple threads into account.

Memory ordering is how we tell the compiler and processors about which operations can be safely reordered and which cannot. (Memory ordering is also used to implement higher-level multi-threading building blocks like channels, Mutexes, and RwLocks.)

Three categories of memory ordering

While the std::sync::atomic::Ordering enum has 5 variants, there are really 3 categories of memory orderings:

  1. Relaxed
  2. Release / Acquire (Acquire, Release, AcqRel)
  3. Sequential Consistency (SeqCst)

Let's dive into each of these categories, starting with the "strongest".

You probably don't want Sequential Consistency

This point comes across very clearly:

While it might seem like the easiest memory ordering to reason about, SeqCst ordering is almost never necessary in practice...

Myth: Sequentially consistent memory ordering is a great default and is always correct.

Putting aside performance concerns, sequentially consistent memory ordering is often seen as the perfect type of memory ordering to default to, because of its strong guarantees. It’s true that if any other memory ordering is correct, SeqCst is also correct. This might make it sound like SeqCst is always correct. However, it’s entirely possible that a concurrent algorithm is simply incorrect, regardless of memory ordering. ... It is advisable to see SeqCst as a warning sign. Seeing it in the wild often means that either something complicated is going on, or simply that the author did not take the time to analyze their memory ordering related assumptions, both of which are reasons for extra scrutiny.

Ruh roh. I'm definitely guilty of having written code with SeqCst in the past, just because it seemed like the safest default.

So, what else do we have to work with?

Relax a little

Relaxed memory ordering is useful when you want to perform atomic operations on a single variable but you don't care about synchronizing different operations across threads. For example, if you are incrementing a single counter, you might not care about the order that the increment operations happen in as long as each is individually atomic.

Mara gives the example of this code, assuming that the functions a and b are run on separate threads:

static X: AtomicI32 = AtomicI32::new(0);

fn a() {
    X.fetch_add(5, Relaxed);
    X.fetch_add(10, Relaxed);
}

fn b() {
    let a = X.load(Relaxed);
    let b = X.load(Relaxed);
    let c = X.load(Relaxed);
    let d = X.load(Relaxed);
    println!("{a} {b} {c} {d}");
}

In this program:

Here is an illustration of 4 different ways that the operations could be ordered:

mermaid-diagram-2024-09-17-133730

To repeat, Relaxed is useful when:

Into the depths: Release and Acquire ordering

Now that we've seen that we probably don't want SeqCst and that Relaxed is for atomic operations without synchronization, let's turn our attention to Release and Acquire.

One thing to clarify up front: you should not think about Release and Acquire as two different "modes". Release is only relevant for write/store operations. Acquire is only relevant for read/load operations.

Release marks a moment in the program's overall timeline, declaring that all writes that happened with or before this moment will be visible after an Acquire on this same variable.

In simpler terms, let's say you have two threads doing some work and reading and writing data:

Producer Thread Consumer Thread
πŸ› οΈ work
πŸ› οΈ work
✏️ write data
✏️ write more data
🚩 Release
🚩 Acquire
πŸ“– read data
πŸ“– read more data
πŸ› οΈ work

Release and Acquire synchronize the threads such that all writes that happened with or before this moment will be visible after an Acquire on this same variable.

The use cases for this type of ordering include:

Signaling example

Let's look at Mara's example of using an AtomicBool to signal when some data is ready:

use std::sync::atomic::{AtomicBool, AtomicU64};
use std::sync::atomic::Ordering::{Acquire, Relaxed, Release};

static DATA: AtomicU64 = AtomicU64::new(0);
static READY: AtomicBool = AtomicBool::new(false);

fn main() {
    std::thread::spawn(|| {
        DATA.store(123, Relaxed);
        READY.store(true, Release); // Everything from before this store ..
    });
    while !READY.load(Acquire) { // .. is visible after this loads `true`.
        std::thread::sleep(std::time::Duration::from_millis(100));
        println!("waiting...");
    }
    println!("{}", DATA.load(Relaxed));
}

The main thread is waiting for the DATA to be ready and sleeping for 100 milliseconds between each check.

Crucially, the line READY.store(true, Release); ensures that all writes that happened with or before this moment will be visible after an Acquire on this same variable. Note how DATA is written before the Releasemoment and it's only using the Relaxed ordering.

When the main thread finally observes READY.load(Acquire) to be true, we break out of the while loop and finally read the value. Even though DATA.load(Relaxed) uses Relaxed, it is guaranteed to see the value. The write happened before the Release moment on the READY variable and this loadis happening after the corresponding Acquire on the READY.

This is the only possible sequence for this code:

mermaid-diagram-2024-09-17-133900

Locking example

Mara's next example for Release / Acquire ordering is a very rudimentary lock that uses an AtomicBool to protect access to a String. Here's a slightly modified version of that example:

use std::sync::atomic::Ordering::{Acquire, Relaxed, Release};
use std::sync::atomic::AtomicBool;

static mut DATA: String = String::new();
static LOCKED: AtomicBool = AtomicBool::new(false);

fn f() {
    if LOCKED.compare_exchange(false, true, Acquire, Relaxed).is_ok() {
        // Safety: We hold the exclusive lock, so nothing else is accessing DATA.
        unsafe { DATA.push('!') };
        LOCKED.store(false, Release);
    }
}

fn main() {
    std::thread::scope(|s| {
        for _ in 0..100 {
            s.spawn(f);
        }
    });
    println!("{}", unsafe { &DATA });
}

The two critical lines are:

  1. if LOCKED.compare_exchange(false, true, Acquire, Relaxed).is_ok() {
  2. LOCKED.store(false, Release);

The first line atomically reads LOCKED using the Acquire ordering and if the value is false, it sets the value to true using the Relaxed ordering.

The second line sets LOCKED back to false using Release. As we've seen before, Release means that all writes that happened with or before this moment will be visible after an Acquire on this same variable.

Combining these two lines guarantees that the unsafe write to DATA and the Release write to READY will both be finished before another thread observes LOCKED as false using the Acquire ordering.

DIY synchronization primitives

The next couple chapters in Mara's book give excellent walkthroughs of building a spin lock, a channel, and an atomically reference-counted pointer (Arc) using the memory ordering principles we've discussed here. I won't go through those here, but I would recommend reading through them on your own.

If you're interested in diving in further, you can also read about atomic fences. These effectively decompose the store/Release and load/Acquire operations we've been discussing into separate fence and store or load operations, which can be useful for applying a fence to multiple variables at once.

Conclusion

Memory ordering can be tricky to grok, but here are a couple takeaways to remember:


Thanks to Mara Bos for writing a clear and approachable explanation of Rust Atomics and Locks and to Simon Rassmussen for feedback on a draft of this post.


Discuss on r/rust, Lobsters, or Hacker News.

#rust