On Sat, May 21, 2022 at 10:50:28PM +0300, Vitaliy Makkoveev wrote:
> This diff looks good, except the re-check after kernel lock. It???s
> supposed `rt??? could became inconsistent, right? But what stops to
> make it inconsistent after first unlocked RTF_LLINFO flag check?
> 
> I think this re-check should gone.

I have copied the re-check from intenal genua code.  I am not sure
if it is really needed.  We know from Hrvoje that the diff with
re-check is stable.  And we know that it crashes without kernel
lock at all.

I have talked with mpi@ about it.  The main problem is that we have
no write lock when we change RTF_LLINFO.  Then rt_llinfo can get
NULL or inconsistent.

Plan is that I put some lock asserts into route add and delete.
This helps to find the parts that modify RTF_LLINFO and rt_llinfo
without exclusive lock.

Maybe we need some kernel lock somewhere else.  Or we want to use
some ARP mutex.  We could also add some comment and commit the diff
that I have.  We know that it is faster and stable.  Pushing the
kernel lock down or replacing it with something clever can always
be done later.

bluhm

> > On 21 May 2022, at 19:45, Hrvoje Popovski <[email protected]> wrote:
> > 
> > On 18.5.2022. 20:52, Hrvoje Popovski wrote:
> >> On 18.5.2022. 17:31, Alexander Bluhm wrote:
> >>> Hi,
> >>> 
> >>> For parallel IP forwarding I had put kernel lock around arpresolve()
> >>> as a quick workaround for crashes.  Moving the kernel lock inside
> >>> the function makes the hot path lock free.  I see slight prerformace
> >>> increase in my test and no lock contention in kstack flamegraph.
> >>> 
> >>> http://bluhm.genua.de/perform/results/latest/2022-05-16T00%3A00%3A00Z/btrace/tcpbench_-S1000000_-t10_-n100_10.3.45.35-btrace-kstack.0.svg
> >>> http://bluhm.genua.de/perform/results/latest/patch-sys-arpresolve-kernel-lock.1/btrace/tcpbench_-S1000000_-t10_-n100_10.3.45.35-btrace-kstack.0.svg
> >>> 
> >>> Search for kernel_lock.  Matched goes from 0.6% to 0.2%
> >>> 
> >>> We are running such a diff in our genua code for a while.  I think
> >>> route flags need more love.  I doubt that all flags and fields are
> >>> consistent when run on multiple CPU.  But this diff does not make
> >>> it worse and increases MP pressure.
> >> 
> >> Hi,
> >> 
> >> I'm seeing increase of forwarding performance from 2Mpps to 2.4Mpps with
> >> this diff and NET_TASKQ=4 on 6 x E5-2643 v2 @ 3.50GHz, 3600.01 MHz
> >> 
> >> Thank you ..
> >> 
> >> 
> > 
> > Hi,
> > 
> > I'm not seeing any fallout's with this diff. I'm running it on
> > production and test firewalls.
> > 
> 

Reply via email to