Hi Mattias, Few comments inline. > -----Original Message----- > From: Mattias Rönnblom <mattias.ronnb...@ericsson.com> > Sent: Friday, April 1, 2022 10:08 AM > To: dev@dpdk.org > Cc: tho...@monjalon.net; David Marchand <david.march...@redhat.com>; > onar.ol...@ericsson.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com>; > konstantin.anan...@intel.com; m...@smartsharesystems.com; > step...@networkplumber.org; Mattias Rönnblom > <mattias.ronnb...@ericsson.com>; Ola Liljedahl <ola.liljed...@arm.com> > Subject: [PATCH v3] eal: add seqlock > > A sequence lock (seqlock) is synchronization primitive which allows for data- > race free, low-overhead, high-frequency reads, especially for data structures > shared across many cores and which are updated with relatively infrequently. > > A seqlock permits multiple parallel readers. The variant of seqlock > implemented > in this patch supports multiple writers as well. A spinlock is used for > writer- > writer serialization. > > To avoid resource reclamation and other issues, the data protected by a > seqlock > is best off being self-contained (i.e., no pointers [except to constant > data]). > > One way to think about seqlocks is that they provide means to perform atomic > operations on data objects larger what the native atomic machine instructions > allow for. > > DPDK seqlocks are not preemption safe on the writer side. A thread preemption > affects performance, not correctness. > > A seqlock contains a sequence number, which can be thought of as the > generation of the data it protects. > > A reader will > 1. Load the sequence number (sn). > 2. Load, in arbitrary order, the seqlock-protected data. > 3. Load the sn again. > 4. Check if the first and second sn are equal, and even numbered. > If they are not, discard the loaded data, and restart from 1. > > The first three steps need to be ordered using suitable memory fences. > > A writer will > 1. Take the spinlock, to serialize writer access. > 2. Load the sn. > 3. Store the original sn + 1 as the new sn. > 4. Perform load and stores to the seqlock-protected data. > 5. Store the original sn + 2 as the new sn. > 6. Release the spinlock. > > Proper memory fencing is required to make sure the first sn store, the data > stores, and the second sn store appear to the reader in the mentioned order. > > The sn loads and stores must be atomic, but the data loads and stores need not > be. > > The original seqlock design and implementation was done by Stephen > Hemminger. This is an independent implementation, using C11 atomics. > > For more information on seqlocks, see > https://en.wikipedia.org/wiki/Seqlock > > PATCH v3: > * Renamed both read and write-side critical section begin/end functions > to better match rwlock naming, per Ola Liljedahl's suggestion. > * Added 'extern "C"' guards for C++ compatibility. > * Refer to the main lcore as the main, and not the master. > > PATCH v2: > * Skip instead of fail unit test in case too few lcores are available. > * Use main lcore for testing, reducing the minimum number of lcores > required to run the unit tests to four. > * Consistently refer to sn field as the "sequence number" in the > documentation. > * Fixed spelling mistakes in documentation. > > Updates since RFC: > * Added API documentation. > * Added link to Wikipedia article in the commit message. > * Changed seqlock sequence number field from uint64_t (which was > overkill) to uint32_t. The sn type needs to be sufficiently large > to assure no reader will read a sn, access the data, and then read > the same sn, but the sn has been updated to many times during the > read, so it has wrapped. > * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. > * Removed the rte_seqlock struct + separate rte_seqlock_t typedef > with an anonymous struct typedef:ed to rte_seqlock_t. > > Acked-by: Morten Brørup <m...@smartsharesystems.com> > Reviewed-by: Ola Liljedahl <ola.liljed...@arm.com> > Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> > --- > app/test/meson.build | 2 + > app/test/test_seqlock.c | 202 +++++++++++++++++++++++ > lib/eal/common/meson.build | 1 + > lib/eal/common/rte_seqlock.c | 12 ++ > lib/eal/include/meson.build | 1 + > lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++ > lib/eal/version.map | 3 + > 7 files changed, 523 insertions(+) > create mode 100644 app/test/test_seqlock.c create mode 100644 > lib/eal/common/rte_seqlock.c create mode 100644 > lib/eal/include/rte_seqlock.h > > diff --git a/app/test/meson.build b/app/test/meson.build index > 5fc1dd1b7b..5e418e8766 100644 > --- a/app/test/meson.build > +++ b/app/test/meson.build > @@ -125,6 +125,7 @@ test_sources = files( > 'test_rwlock.c', > 'test_sched.c', > 'test_security.c', > + 'test_seqlock.c', > 'test_service_cores.c', > 'test_spinlock.c', > 'test_stack.c', > @@ -214,6 +215,7 @@ fast_tests = [ > ['rwlock_rde_wro_autotest', true], > ['sched_autotest', true], > ['security_autotest', false], > + ['seqlock_autotest', true], > ['spinlock_autotest', true], > ['stack_autotest', false], > ['stack_lf_autotest', false], > diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode > 100644 index 0000000000..54fadf8025 > --- /dev/null > +++ b/app/test/test_seqlock.c > @@ -0,0 +1,202 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#include <rte_seqlock.h> > + > +#include <rte_cycles.h> > +#include <rte_malloc.h> > +#include <rte_random.h> > + > +#include <inttypes.h> > + > +#include "test.h" > + > +struct data { > + rte_seqlock_t lock; > + > + uint64_t a; > + uint64_t b __rte_cache_aligned; > + uint64_t c __rte_cache_aligned; > +} __rte_cache_aligned; > + > +struct reader { > + struct data *data; > + uint8_t stop; > +}; > + > +#define WRITER_RUNTIME (2.0) /* s */ > + > +#define WRITER_MAX_DELAY (100) /* us */ > + > +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define > +WRITER_INTERRUPT_TIME (1) /* us */ > + > +static int > +writer_run(void *arg) > +{ > + struct data *data = arg; > + uint64_t deadline; > + > + deadline = rte_get_timer_cycles() + > + WRITER_RUNTIME * rte_get_timer_hz(); > + > + while (rte_get_timer_cycles() < deadline) { > + bool interrupted; > + uint64_t new_value; > + unsigned int delay; > + > + new_value = rte_rand(); > + > + interrupted = > rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; > + > + rte_seqlock_write_lock(&data->lock); > + > + data->c = new_value; > + > + /* These compiler barriers (both on the test reader > + * and the test writer side) are here to ensure that > + * loads/stores *usually* happen in test program order > + * (always on a TSO machine). They are arrange in such > + * a way that the writer stores in a different order > + * than the reader loads, to emulate an arbitrary > + * order. A real application using a seqlock does not > + * require any compiler barriers. > + */ > + rte_compiler_barrier(); The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order).
> + data->b = new_value; > + > + if (interrupted) > + rte_delay_us_block(WRITER_INTERRUPT_TIME); > + > + rte_compiler_barrier(); > + data->a = new_value; > + > + rte_seqlock_write_unlock(&data->lock); > + > + delay = rte_rand_max(WRITER_MAX_DELAY); > + > + rte_delay_us_block(delay); > + } > + > + return 0; > +} > + > +#define INTERRUPTED_READER_FREQUENCY (1000) #define > +READER_INTERRUPT_TIME (1000) /* us */ > + > +static int > +reader_run(void *arg) > +{ > + struct reader *r = arg; > + int rc = 0; > + > + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == > 0) { > + struct data *data = r->data; > + bool interrupted; > + uint64_t a; > + uint64_t b; > + uint64_t c; > + uint32_t sn; > + > + interrupted = > rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; > + > + sn = rte_seqlock_read_lock(&data->lock); > + > + do { > + a = data->a; > + /* See writer_run() for an explanation why > + * these barriers are here. > + */ > + rte_compiler_barrier(); > + > + if (interrupted) > + > rte_delay_us_block(READER_INTERRUPT_TIME); > + > + c = data->c; > + > + rte_compiler_barrier(); > + b = data->b; > + > + } while (!rte_seqlock_read_tryunlock(&data->lock, &sn)); > + > + if (a != b || b != c) { > + printf("Reader observed inconsistent data values " > + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", > + a, b, c); > + rc = -1; > + } > + } > + > + return rc; > +} > + > +static void > +reader_stop(struct reader *reader) > +{ > + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); } > + > +#define NUM_WRITERS (2) /* main lcore + one worker */ #define > +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE - > NUM_WRITERS - > +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) > + > +/* Only a compile-time test */ > +static rte_seqlock_t __rte_unused static_init_lock = > +RTE_SEQLOCK_INITIALIZER; > + > +static int > +test_seqlock(void) > +{ > + struct reader readers[MAX_READERS]; > + unsigned int num_readers; > + unsigned int num_lcores; > + unsigned int i; > + unsigned int lcore_id; > + unsigned int reader_lcore_ids[MAX_READERS]; > + unsigned int worker_writer_lcore_id = 0; > + int rc = 0; > + > + num_lcores = rte_lcore_count(); > + > + if (num_lcores < MIN_LCORE_COUNT) { > + printf("Too few cores to run test. Skipping.\n"); > + return 0; > + } > + > + num_readers = num_lcores - NUM_WRITERS; > + > + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); > + > + i = 0; > + RTE_LCORE_FOREACH_WORKER(lcore_id) { > + if (i == 0) { > + rte_eal_remote_launch(writer_run, data, lcore_id); > + worker_writer_lcore_id = lcore_id; > + } else { > + unsigned int reader_idx = i - 1; > + struct reader *reader = &readers[reader_idx]; > + > + reader->data = data; > + reader->stop = 0; > + > + rte_eal_remote_launch(reader_run, reader, lcore_id); > + reader_lcore_ids[reader_idx] = lcore_id; > + } > + i++; > + } > + > + if (writer_run(data) != 0 || > + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) > + rc = -1; > + > + for (i = 0; i < num_readers; i++) { > + reader_stop(&readers[i]); > + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) > + rc = -1; > + } > + > + return rc; > +} > + > +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); > diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index > 917758cc65..a41343bfed 100644 > --- a/lib/eal/common/meson.build > +++ b/lib/eal/common/meson.build > @@ -35,6 +35,7 @@ sources += files( > 'rte_malloc.c', > 'rte_random.c', > 'rte_reciprocal.c', > + 'rte_seqlock.c', > 'rte_service.c', > 'rte_version.c', > ) > diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new > file mode 100644 index 0000000000..d4fe648799 > --- /dev/null > +++ b/lib/eal/common/rte_seqlock.c > @@ -0,0 +1,12 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#include <rte_seqlock.h> > + > +void > +rte_seqlock_init(rte_seqlock_t *seqlock) { > + seqlock->sn = 0; > + rte_spinlock_init(&seqlock->lock); > +} > diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index > 9700494816..48df5f1a21 100644 > --- a/lib/eal/include/meson.build > +++ b/lib/eal/include/meson.build > @@ -36,6 +36,7 @@ headers += files( > 'rte_per_lcore.h', > 'rte_random.h', > 'rte_reciprocal.h', > + 'rte_seqlock.h', > 'rte_service.h', > 'rte_service_component.h', > 'rte_string_fns.h', > diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h > new file Other lock implementations are in lib/eal/include/generic. > mode 100644 index 0000000000..44eacd66e8 > --- /dev/null > +++ b/lib/eal/include/rte_seqlock.h > @@ -0,0 +1,302 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#ifndef _RTE_SEQLOCK_H_ > +#define _RTE_SEQLOCK_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/** > + * @file > + * RTE Seqlock > + * > + * A sequence lock (seqlock) is a synchronization primitive allowing > + * multiple, parallel, readers to efficiently and safely (i.e., in a > + * data-race free manner) access the lock-protected data. The RTE > + * seqlock permits multiple writers as well. A spinlock is used to > + * writer-writer synchronization. > + * > + * A reader never blocks a writer. Very high frequency writes may > + * prevent readers from making progress. > + * > + * A seqlock is not preemption-safe on the writer side. If a writer is > + * preempted, it may block readers until the writer thread is again > + * allowed to execute. Heavy computations should be kept out of the > + * writer-side critical section, to avoid delaying readers. > + * > + * Seqlocks are useful for data which are read by many cores, at a > + * high frequency, and relatively infrequently written to. > + * > + * One way to think about seqlocks is that they provide means to > + * perform atomic operations on objects larger than what the native > + * machine instructions allow for. > + * > + * To avoid resource reclamation issues, the data protected by a > + * seqlock should typically be kept self-contained (e.g., no pointers > + * to mutable, dynamically allocated data). > + * > + * Example usage: > + * @code{.c} > + * #define MAX_Y_LEN (16) > + * // Application-defined example data structure, protected by a seqlock. > + * struct config { > + * rte_seqlock_t lock; > + * int param_x; > + * char param_y[MAX_Y_LEN]; > + * }; > + * > + * // Accessor function for reading config fields. > + * void > + * config_read(const struct config *config, int *param_x, char > +*param_y) > + * { > + * // Temporary variables, just to improve readability. I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small. > + * int tentative_x; > + * char tentative_y[MAX_Y_LEN]; > + * uint32_t sn; > + * > + * sn = rte_seqlock_read_lock(&config->lock); > + * do { > + * // Loads may be atomic or non-atomic, as in this example. > + * tentative_x = config->param_x; > + * strcpy(tentative_y, config->param_y); > + * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn)); > + * // An application could skip retrying, and try again later, if > + * // progress is possible without the data. > + * > + * *param_x = tentative_x; > + * strcpy(param_y, tentative_y); > + * } > + * > + * // Accessor function for writing config fields. > + * void > + * config_update(struct config *config, int param_x, const char > +*param_y) > + * { > + * rte_seqlock_write_lock(&config->lock); > + * // Stores may be atomic or non-atomic, as in this example. > + * config->param_x = param_x; > + * strcpy(config->param_y, param_y); > + * rte_seqlock_write_unlock(&config->lock); > + * } > + * @endcode > + * > + * @see > + * https://en.wikipedia.org/wiki/Seqlock. > + */ > + > +#include <stdbool.h> > +#include <stdint.h> > + > +#include <rte_atomic.h> > +#include <rte_branch_prediction.h> > +#include <rte_spinlock.h> > + > +/** > + * The RTE seqlock type. > + */ > +typedef struct { > + uint32_t sn; /**< A sequence number for the protected data. */ > + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ } Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers. > +rte_seqlock_t; > + > +/** > + * A static seqlock initializer. > + */ > +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } > + > +/** > + * Initialize the seqlock. > + * > + * This function initializes the seqlock, and leaves the writer-side > + * spinlock unlocked. > + * > + * @param seqlock > + * A pointer to the seqlock. > + */ > +__rte_experimental > +void > +rte_seqlock_init(rte_seqlock_t *seqlock); > + > +/** > + * Begin a read-side critical section. > + * > + * A call to this function marks the beginning of a read-side critical > + * section, for @p seqlock. > + * > + * rte_seqlock_read_lock() returns a sequence number, which is later > + * used in rte_seqlock_read_tryunlock() to check if the protected data > + * underwent any modifications during the read transaction. > + * > + * After (in program order) rte_seqlock_read_lock() has been called, > + * the calling thread reads the protected data, for later use. The > + * protected data read *must* be copied (either in pristine form, or > + * in the form of some derivative), since the caller may only read the > + * data from within the read-side critical section (i.e., after > + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()), > + * but must not act upon the retrieved data while in the critical > + * section, since it does not yet know if it is consistent. > + * > + * The protected data may be read using atomic and/or non-atomic > + * operations. > + * > + * After (in program order) all required data loads have been > + * performed, rte_seqlock_read_tryunlock() should be called, marking > + * the end of the read-side critical section. > + * > + * If rte_seqlock_read_tryunlock() returns true, the data was read > + * atomically and the copied data is consistent. > + * > + * If rte_seqlock_read_tryunlock() returns false, the just-read data > + * is inconsistent and should be discarded. The caller has the option > + * to either re-read the data and call rte_seqlock_read_tryunlock() > + * again, or to restart the whole procedure (i.e., from > + * rte_seqlock_read_lock()) at some later time. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * @return > + * The seqlock sequence number for this critical section, to > + * later be passed to rte_seqlock_read_tryunlock(). > + * > + * @see rte_seqlock_read_tryunlock() > + */ > +__rte_experimental > +static inline uint32_t > +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) { > + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) > + * from happening before the sn load. Synchronizes-with the > + * store release in rte_seqlock_write_unlock(). > + */ > + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); } > + > +/** > + * End a read-side critical section. > + * > + * A call to this function marks the end of a read-side critical Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()? > + * section, for @p seqlock. The application must supply the sequence > + * number produced by the corresponding rte_seqlock_read_lock() (or, > + * in case of a retry, the rte_seqlock_tryunlock()) call. > + * > + * After this function has been called, the caller should not access > + * the protected data. I understand what you mean here. But, I think this needs clarity. In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data. May be this should be changed to: " After this function returns true, the caller should not access the protected data."? Or may be combine it with the following para. > + * > + * In case this function returns true, the just-read data was > + * consistent and the set of atomic and non-atomic load operations > + * performed between rte_seqlock_read_lock() and > + * rte_seqlock_read_tryunlock() were atomic, as a whole. > + * > + * In case rte_seqlock_read_tryunlock() returns false, the data was > + * modified as it was being read and may be inconsistent, and thus > + * should be discarded. The @p begin_sn is updated with the > + * now-current sequence number. May be " The @p begin_sn is updated with the sequence number for the next critical section." > + * > + * @param seqlock > + * A pointer to the seqlock. > + * @param begin_sn > + * The seqlock sequence number returned by > + * rte_seqlock_read_lock() (potentially updated in subsequent > + * rte_seqlock_read_tryunlock() calls) for this critical section. > + * @return > + * true or false, if the just-read seqlock-protected data was consistent > + * or inconsistent, respectively, at the time it was read. true - just read protected data was consistent false - just read protected data was inconsistent > + * > + * @see rte_seqlock_read_lock() > + */ > +__rte_experimental > +static inline bool > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t > +*begin_sn) { > + uint32_t end_sn; > + > + /* make sure the data loads happens before the sn load */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > + > + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { > + *begin_sn = end_sn; > + return false; > + } > + > + return true; > +} > + > +/** > + * Begin a write-side critical section. > + * > + * A call to this function acquires the write lock associated @p > + * seqlock, and marks the beginning of a write-side critical section. > + * > + * After having called this function, the caller may go on to modify > + * (both read and write) the protected data, in an atomic or > + * non-atomic manner. > + * > + * After the necessary updates have been performed, the application > + * calls rte_seqlock_write_unlock(). > + * > + * This function is not preemption-safe in the sense that preemption > + * of the calling thread may block reader progress until the writer > + * thread is rescheduled. > + * > + * Unlike rte_seqlock_read_lock(), each call made to > + * rte_seqlock_write_lock() must be matched with an unlock call. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * > + * @see rte_seqlock_write_unlock() > + */ > +__rte_experimental > +static inline void > +rte_seqlock_write_lock(rte_seqlock_t *seqlock) { > + uint32_t sn; > + > + /* to synchronize with other writers */ > + rte_spinlock_lock(&seqlock->lock); > + > + sn = seqlock->sn + 1; The load of seqlock->sn could use __atomic_load_n to be consistent. > + > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > + > + /* __ATOMIC_RELEASE to prevent stores after (in program order) > + * from happening before the sn store. > + */ > + rte_atomic_thread_fence(__ATOMIC_RELEASE); > +} > + > +/** > + * End a write-side critical section. > + * > + * A call to this function marks the end of the write-side critical > + * section, for @p seqlock. After this call has been made, the > +protected > + * data may no longer be modified. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * > + * @see rte_seqlock_write_lock() > + */ > +__rte_experimental > +static inline void > +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) { > + uint32_t sn; > + > + sn = seqlock->sn + 1; Same here, the load of seqlock->sn could use __atomic_load_n > + > + /* synchronizes-with the load acquire in rte_seqlock_read_lock() */ > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > + > + rte_spinlock_unlock(&seqlock->lock); > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_SEQLOCK_H_ */ > diff --git a/lib/eal/version.map b/lib/eal/version.map index > b53eeb30d7..4a9d0ed899 100644 > --- a/lib/eal/version.map > +++ b/lib/eal/version.map > @@ -420,6 +420,9 @@ EXPERIMENTAL { > rte_intr_instance_free; > rte_intr_type_get; > rte_intr_type_set; > + > + # added in 22.07 > + rte_seqlock_init; > }; > > INTERNAL { > -- > 2.25.1