Embracing Thread-Per-Core Architecture
What is "thread-per-core" ?
The term "thread-per-core" is confusing. It implies that there is one thread for each CPU core.
Aren't most modern multi-threaded async runtimes already following thread-per-core? Since most modern async runtimes use the M:N threading model. It spawns multiple N OS threads to handle M tasks concurrently.
However, the community refers to the term "thread-per-core" as a single-threaded runtime. Tasks are pinned to a specific thread.
Projects such as monoio, compio, and glommio are advertised as thread-per-core runtimes.
Why "thread-per-core" ?
Some algorithms only function correctly or work best in single-thread context. The task may use thread-local storage.
In this model, the tasks are pinned to a thread, which is a very powerful trait. The data is allowed to be mutated using a shared reference. Also, the CPU can execute more efficiently because data stays hot in the cache.
What I really like about this model is that the runtime doesn’t need any scheduler or synchronization.
In a traditional work-stealing runtime, such as Tokio require the task to be safe to Send across different threads in order to balance the workload.
Work stealing occurs only when a thread is idle. In practice, there are typically far more tasks than CPU cores, so threads almost always have some work to execute.
Announcing Nio v0.1.0 🎉
I’m pretty excited to announce the first stable release of Nio. It was completely rewritten from scratch.
[]
= { = "0.1", = ["tokio-io"] }
By default, Nio implements async traits from futures-io. But the optional "tokio-io" feature implements async traits from tokio::io.
Nio now support this folloing API.
nio::spawn_local
Spawn a !Send task that is pinned on the current thread.
let counter = new;
spawn_local;
nio::spawn_pinned
Spawn a !Send task and pin it to a thread that has the least amount of work.
spawn_pinned;
Since we are spawning on a different thread. It accepts an async closure; captured variables have to be Send. However, the actual task itself does not need to be Send.
nio::spawn_pinned_at
Same as nio::spawn_pinned, but spawns the task pinned to a specific worker thread by its index.
spawn_pinned_at;
nio::spawn
Spawn a new task. The task is not pinned to a worker thread and has to be Send. The runtime may move the task to a different thread at every .await point.
This is the same as tokio::spawn. Here is the benchmark. If you are curious how it compares to tokio::spawn.
The amazing part is that Nio supports all of these APIs without any config. There is no runtime flavor configuration either. Nio welcomes both “thread-per-core” folks and the rest of us.
Architecture
Under the hood, Nio uses a concurrent architecture called the Actor model.
Nio uses multiple worker threads to execute tasks. Each worker has a shared and local queue. The shared queue is used for:
- Spawn a task on a different thread (
nio::spawn_pinned,nio::spawn). - In cross-thread wakeup, the shared queue is also used to pin a
!Sendtask to a specific worker thread. - If the task is spawned via
nio::spawn, the runtime use shared queue to move the task to a different worker at every.awaitpoint.
The worker thread can access its local queue without any synchronization. However, the async API also needs to utilize the current thread runtime to avoid cross-thread wakeup. Consider this example:
let listener = bind.await?;
loop
We accept tcp connection on worker A. Therefore, every TCP readiness happened on thread A, but the socket is waiting on thread B.
In other words, thread A wakes up a task that is waiting on thread B. This is not good, as it requires the use of shared queue synchronization. It destroys the benefit of using the so-called “thread-per-core” runtime in the first place.
let listener = bind.await?;
loop
In this example, by using the connect() method, we bind the TCP connection on thread B, ensuring that all TCP readiness events occur on thread B instead of thread A.
To avoid cross-thread wakeup, Nio does not allow I/O resources (such as TCP connections or timers) to be moved between threads. In other words, I/O resources do not implement Send. This enables Nio to provide a highly efficient async API, as I/O resources can be implemented without atomic operations. This design really benefits the Timer implementation. Timer algorithms are inherently single-threaded. Nio timer doesn’t require any mutex lock.
A worker can be seen as an independent async runtime. In a sense, Workers do not share any state, reactor, and timer with each other.
The Nio project is broken into separate, smaller crates. So the community can benefit from it:
- nio-task: Low-level task abstraction.
- nio-threadpool: Thread pool implementation for nio runtime.