On 31/01/17 19:45, Nikolay Aleksandrov wrote: > On 31/01/17 19:21, Stephen Hemminger wrote: >> On Tue, 31 Jan 2017 19:09:09 +0100 >> Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote: >> >>> On 31/01/17 17:41, Nikolay Aleksandrov wrote: >>>>> >>>>> I agree with the first 3 patches, but not the last one. >>>>> Changing the API just for a performance hack is not necessary. Instead >>>>> make >>>>> the algorithm smarter and use per-cpu values. >>>>> >>>> >>>> Thanks for the feedback, I would very much prefer any of the other two >>>> approaches >>>> I tried (per-cpu pool and per-cpu for each fdb), from the two the second >>>> one - >>>> per-cpu for each fdb is much simpler, so would it be acceptable to do >>>> per-cpu allocation >>>> for each fdb ? >>>> >>>> >>>> >>> >>> Okay, after some more testing the version with per-cpu per-fdb allocations, >>> at 300 000 fdb entries >>> I got 120 failed per-cpu allocs which seems okay. I'll wait a little more >>> and will repost the series >>> with per-cpu allocations and without the RFC tag. >>> >>> Thanks, >>> Nik >>> >> >> You could also use a mark/sweep algorithm (rather than recording updated). >> It turns out that clearing is fast (can be unlocked). >> The timer workqueue can mark all fdb entries (during scan), then in forward >> function clear the bit if it is set. This would turn writes into reads. > > The wq doesn't have a strict next call, it is floating depending on the > soonest > expire, this can cause issues as we don't know when last we've reset the bit > and > using the scan interval resolution will result in big offsets when purging > entries. > >> >> To keep the API for last used, just change the resolution to be scan >> interval. >> > > With default 300 second resolution ? People will be angry. :-) > Also this has to happen for both "updated" and "used", they're both causing > trouble. > In fact "used" is much worse than "updated", because it's written to by all > who transmit > to that fdb. > > Actually to start we can do something much simpler - just always update > "used" at most > once per 1/10 of ageing_time for example. The default case would give us an > update every > 30 seconds if the fdb is actually used or we can cap it at 10 seconds. > The "updated" we move to its own cache line and with proper config (bind > ports to CPUs) > it will be fine. >
Acutally this is a no go, there're already users out there who depend on the high resolution of the "used" field, so we cannot break them. We're back to either an option or per-cpu. > What do you think ? > >