On Fri, Oct 06, 2023 at 04:17:34PM +0300, Vitaliy Makkoveev wrote:
> On Fri, Oct 06, 2023 at 02:50:21PM +0200, Alexander Bluhm wrote:
> > On Fri, Oct 06, 2023 at 12:45:54PM +0300, Vitaliy Makkoveev wrote:
> > > On Thu, Oct 05, 2023 at 10:42:25PM +1000, David Gwynne wrote:
> > > > > On 5 Oct 2023, at 21:50, Vitaliy Makkoveev <m...@openbsd.org> wrote:
> > > > > 
> > > > > I don't like this unlocked if_flags check we have in ifq_start_task().
> > > > > Guess READ_ONCE() is much better to load value.
> > > > 
> > > > there are no inconsistent intermediate values you can read from 
> > > > if_flags. are you worried the compiler will load the value again and 
> > > > maybe we'll get a different state the second time?
> > > 
> > > To enforce loading it from memory instead of cache. But may be I'm
> > > wrong.
> > 
> > READ_ONCE() does not address cache coherency problmes.
> > 
> > Just reading an int value that an other processor can modify is
> > wrong.  The compiler can optimize it into multiple reads with
> > inconsistent values, witten by the other CPU.  This is prevented
> > by READ_ONCE().
> > 
> > When using READ_ONCE() you must be sure that cache coherency is not
> > a problem or that you have some memory barriers in place.  I added
> > READ_ONCE() to some queue macros recently to fix obvious missuse.
> > That does not mean it is sufficient.
> > 
> 
> Thanks for explanation.
> 
> > Accessing ifp->if_flags is a different story.  As it is marked as
> > [N] you need exclusive netlock to write and shared netlock to read
> > to be on the safe side.  A lot of your code just reads it without
> > caring about locks.  This works most of the time.
> > 
> 
> Not my code. However these IFF_UP and IFF_RUNNING reads and
> modifications are within kernel locked chunks of drivers and kernel
> locked ifioctl() chunk. Mostly, this is configuration path. In normal
> condition these flags should be modified a very little times, mostly
> once. So this works...

During configuration there is usually some kind of lock, it happens
rarely, and bugs are hard to trigger, especially on amd64 which
guarantees some consistency.  That's why it works.

> Some times ago, I privately pointed this and proposed to modify if_flags
> atomically. May be it's time to rethink this.

Atomic operations are not the answer to everything.  They are still
slow and don't guarantee consistency with the logic around them.
If writes to if_flags happen rarely and are protected by exclusive
netlock anyway, we can leave them as they are.

More problematic is reading.  Many people believe that reading an
integer, that another CPU can modify, is fine.  To make the integer
value consistent for the compiler, you need READ_ONCE() or
atomic_load_int().  And then consider whether reordering it with
all the instructions around it, is a problem.  Getting that right
with memory barriers is very hard.  It should not be done in regular
code, better use primitives that have better semantics.

The easiest way to get MP safety is to identify critical sections
and put a lock around them.  For reading, a shared lock is enough,
it has all the barriers.  But it is slow and should be avoided in
the hot path.  By adding a lock per single network packet, throughput
reduction is well visible in my measurements.

So we ended with the current code that works although it does not
follow these principles.  Usually it does not really matter where
the read actually occures within the context.

We have also lockless primitives like SMR and SRP which are also
hard to understand and have their drawbacks.  I never understood
how they work, but I can trigger bugs in code that use them.

> No. They all do simply load, compare and ored the result, no difference
> what condition was the reason to return.

That's why it works with multiple CPUs.

bluhm

Reply via email to