On 10/11/18 12:05 PM, Jamal Hadi Salim wrote: > On 2018-10-11 1:04 p.m., David Ahern wrote: > >> You can already filter link dumps by kind. How? By passing in the KIND >> attribute on a dump request. This type of filtering exists for link >> dumps, neighbor dumps, fdb dumps. Why is there a push to make route >> dumps different? Why can't they be consistent and use existing semantics? > > I think you meant filtering by ifindex in neighbor.
I meant the general API of users passing filter arguments as attributes to the dump (or values in the header) -- KIND, MASTER, device index, etc. This is an existing API and existing capability. > note: I would argue that there are already "adhoc" ways of filtering > in place, mostly use case driven). Otherwise Sowmini wouldnt have to > craft that bpf filter. There are netlink users who have none or some > weird filtering involved. There is no arguement that your approach > works for rtm. But the rest of the users missing filters will require > similar kernel changes. Could this be made generic enough to benefit > other netlink users? > The problem is there's always one new attribute that would make sense > for some use case which requires a kernel change ("send me an event only > if you get link down" or "dump all ports with link down"). > I disagree with your overall premise of bpf the end-all hammer. It is a tool but not the only tool. For starters, you are proposing building the message, run the filter on it, and potentially back the message up to drop the recently added piece because the filter does not want it included. That is still wasting a lot of cpu cycles to build and drop. I am thinking about scale to 1 million routes -- I do not need the dump loop building a message for 1 million entries only to drop 99% of them. That is crazy. The way the kernel manages route tables says I should pass in the table id as it is a major differentiator on what is returned. From there lookup the specific table (I need to fix this part per my response to Andrew), and then only walk it. The existing semantics, capabilities that exist for other dump commands is the most efficient for some of these high level, big hammer filters. What you want gets into the tiniest of details and yes the imagination can go wild with combinations of filter options. So maybe this scanning of post-built messages is reasonable *after* the high level sorting is done.