On Mon, Nov 26, 2018 at 08:33:36PM +0100, Pablo Neira Ayuso wrote: > Hi Marcelo,
Hello! > > On Thu, Nov 22, 2018 at 07:08:32PM -0200, Marcelo Ricardo Leitner wrote: > > On Thu, Nov 22, 2018 at 02:22:20PM -0200, Marcelo Ricardo Leitner wrote: > > > On Wed, Nov 21, 2018 at 03:51:20AM +0100, Pablo Neira Ayuso wrote: > > > > Hi, > > > > > > > > This patchset is the third iteration [1] [2] [3] to introduce a kernel > > > > intermediate (IR) to express ACL hardware offloads. > > > > > > On v2 cover letter you had: > > > > > > """ > > > However, cost of this layer is very small, adding 1 million rules via > > > tc -batch, perf shows: > > > > > > 0.06% tc [kernel.vmlinux] [k] tc_setup_flow_action > > > """ > > > > > > The above doesn't include time spent on children calls and I'm worried > > > about the new allocation done by flow_rule_alloc(), as it can impact > > > rule insertion rate. I'll run some tests here and report back. > > > > I'm seeing +60ms on 1.75s (~3.4%) to add 40k flower rules on ingress > > with skip_hw and tc in batch mode, with flows like: > > > > filter add dev p6p2 parent ffff: protocol ip prio 1 flower skip_hw > > src_mac ec:13:db:00:00:00 dst_mac ec:14:c2:00:00:00 src_ip > > 56.0.0.0 dst_ip 55.0.0.0 action drop > > > > Only 20ms out of those 60ms were consumed within fl_change() calls > > (considering children calls), though. > > > > Do you see something similar? I used current net-next (d59da3fbfe3f) > > and with this patchset applied. > > I see lots of send() and recv() in tc -batch via strace, using this > example rule, repeating it N times: > > filter add dev eth0 parent ffff: protocol ip pref 1 flower dst_mac > f4:52:14:10:df:92 action mirred egress redirect dev eth1 > > This is taking ~8 seconds for 40k rules from my old laptop [*], this > is already not too fast (without my patchset). On a E5-2643 v3 @ 3.40GHz I see a total of 1.17s with an old iproute (4.11) (more below). > > I remember we discussed about adding support for real batching for tc > - probably we can probably do this transparently by assuming that if the > skbuff length mismatches nlmsghdr->len field, then we enter the batch > mode from the kernel. This would require to update iproute2 to use > libmnl batching routines, or code that follows similar approach > otherwise. Yes, I believe you're referring to commit 485d0c6001c4aa134b99c86913d6a7089b7b2ab0 Author: Chris Mi <chr...@mellanox.com> Date: Fri Jan 12 14:13:16 2018 +0900 tc: Add batchsize feature for filter and actions Which is present in 4.16. It does transparent batching on app side. With tc from today's tip, I get 1.05s for 40k rules, both with this patchset applied. > > [*] 0.5 seconds in nft (similar ruleset), this is using netlink batching. Nice. Cheers, Marcelo