Work Contracts - A simple and efficient system for handling async tasks in C++
October 17, 2019
Introduction:
TODO:
Core concepts:
Thread Pool
Work Contract Group
Work Contract
System is thread pool agnostic. In the following example the thread pool is
A really simple thread pool:
Work Contract Groups and Work Contracts are intentional thread pool agnostic. Here, for the purposes of clarity a thread pool is
defined which is intentionally written to be as minimalistic as is possible so that it does not distract from the main concepts of
'work contracts' and 'work contract groups'. This simple thread pool works by constructing a predetermined number of worker threads
and registering a single lambda function which each of these threads will invoke repeatedly until the thread is terminated. All
custom work is managed within that single lambda function.
Once the concept of work contracts is understood it should be apparent that this simple and minimilistic thread pool design
is actually sufficient for professional grade code (with a few more bells and whistles perhaps but these are left out in this instance
for the sake of clarity).
Here is the thread pool class:
Work Contracts - how work gets done:
In this system all work is done by taking out a 'contract' for work. A work contract is a simple class defining
a function to invoke when the work is to be done and a function to invoke when the contract expires.
A work contract can be exercised as many times as desired. They are non copyable and are guaranteed to be processed
by only one worker thread at a time (this guarantee is made by the parent work_contract_group to which the specific
work_contract belongs - more on this later).
Here is the work_contract class:
Work Contract Groups - Managing work contracts:
Work contracts are intended to be managed by a parent Work Contract Group. They can not be constructed
independently and must, instead, be created by a Work Contract Group. A work contract group can manage
a finite number of work contracts and is designed to be lock free and to ensure that any work contract which
it manages will be exercised by only one thread at a time. This is a powerful guarantee which, with clever
design, can be used to significantly simplify code while simultaneously improve performance - more on this later.
A work_contract_group constructor requires a single function which is invoked whenever one of its work_contracts
has been exercised. This function hook is required to completely decouple the work_contract_group from the specific
implementation of the worker thread pool. This decoupling is important both for architectural reasons but also it
allows this system to be used either in low latency systems where 'core burning' is desirable (threads spin rather than
yielding while there is no work to be done) or in more traditional systems where threads are expected to sleep while
there is no work to be done. More on this later.
NOTE: The work_contract_group represents the only significant complexity within this system. This complexity is required to
achieve lockless behavior and will be described in greater detail further down.
Here is the work_contract_group class:
Putting the pieces together:
In order to actually process the work associated with a work_contract we will need create a thread pool
to do the work and direct those threads to query the work_contract_group to see if any work_contracts require
servicing. The following code demonstrates this:
We now have a collection of threads which will do the work for any work_contract associated with our
work_contract_group. You will notice, however, that this results in a constant 100% CPU usage for the
cores which the worker threads are running on. This might be exactly what you want in situations where
ultra-low latency is required (I'm looking at you, finanace). However, for the typical 'good citizen'
application where playing well with the other applications is desirable we will need to use as little CPU
as possible when there is no work to be done. This is the motivation for the single function hook provided
in the work_contract_group constructor. Let's add a little bit of code to our example an make this system
a more pleasant application to be around.
Now we have two simple methods for processing work for a work_contract_group. One which is more suitable for low
latency applications and one which is intended for normal application usage. Both are trivial to set up.
Now let's actually create a work contract and do something with this system.
Performance:
To measure the performance of this system I wrote a test which created a thread_pool with a configurable number of
worker threads (consumer threads) and a configurable number of threads which create work contracts (producer threads).
These producer threads create work by creating a work contract which simply increments a counter for the number of times
which the contract has been executed. These producer threads spin in a loop for some configurable amount of time continually
exercising their work contract and waiting for the counter to increment (indicating that a worker thread has executed the
work contract once). After the the test time elapses the number of times which the contract has been executed per second per
thread is calculated which effectively measures the overhead of the system for queing and executing an asynchronous task -
our work contract which simply increments a counter.