On 2024-05-08 09:10, Morten Brørup wrote:
From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Wednesday, 8 May 2024 08.35
On 2024-05-07 21:19, Morten Brørup wrote:
From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Tuesday, 7 May 2024 09.24
On 2024-04-28 17:11, Mattias Rönnblom wrote:
On 2024-04-26 16:38, Ferruh Yigit wrote:
[...]
static uint64_t
counter_value(struct counter *counter)
{
uint64_t count = __atomic_load_n(&counter->count,
__ATOMIC_RELAXED);
uint64_t offset = __atomic_load_n(&counter->offset,
__ATOMIC_RELAXED);
Since the count and the offset are written to independently, without any
ordering restrictions, an update and a reset in quick succession may
cause the offset store to be globally visible before the new count.
Good catch.
This may happen when a thread calls stats_add() and then the same thread
calls stats_reset().
In such a scenario, a reader could see an offset > count.
Thus, unless I'm missing something, one should add a
if (unlikely(offset > count))
return 0;
here. With the appropriate comment explaining why this might be.
Another approach would be to think about what memory barriers may be
required to make sure one sees the count update before the offset
update, but, intuitively, that seems like both more complex and more
costly (performance-wise).
I think it can be done without affecting stats_add(), by using "offset" with
Release-Consume ordering:
- stats_reset() must write "offset" with memory_order_release, so
"counter" cannot be visible after it, and
- stats_get() must read "offset" with memory_order_consume, so no reads or
writes in the current thread dependent on "offset" can be reordered before
this load, and writes to "counter" (a data-dependent variable) in other
threads that release "offset" are visible in the current thread.
That was the kind of complexity I was thinking about. Those barriers
come with a non-zero cost, both with different instructions being used
and compiler optimizations being prevented.
Yep, you mentioned that there might be a more complex alternative, so I decided
to explore it. :-)
This approach doesn't impose any new requirements on stats_add(), so the data
plane performance is not affected.
OK, I get it now. That's a good point. In my thought experiment, I had a
thread both updating and resetting the counter, which should be allowed,
but you could have barriers sit only in the reset routine.
For per-thread counters, stats_add() can store "counter" using
memory_order_relaxed. Or, if the architecture prevents tearing of 64-bit variables, using
volatile.
Counters shared by multiple threads must be atomically incremented using
rte_atomic_fetch_add_explicit() with memory_order_relaxed.
return count + offset;
}
static void
counter_reset(struct counter *counter)
{
uint64_t count = __atomic_load_n(&counter->count,
__ATOMIC_RELAXED);
__atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
}
static void
counter_add(struct counter *counter, uint64_t operand)
{
__atomic_store_n(&counter->count, counter->count + operand,
__ATOMIC_RELAXED);
}