On Tue 22 Oct 2019 at 21:17, Roman Mashak <m...@mojatatu.com> wrote: > Vlad Buslov <vla...@mellanox.com> writes: > >> On Tue 22 Oct 2019 at 17:35, Marcelo Ricardo Leitner <mleit...@redhat.com> >> wrote: >>> On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >>>> Currently, significant fraction of CPU time during TC filter allocation >>>> is spent in percpu allocator. Moreover, percpu allocator is protected >>>> with single global mutex which negates any potential to improve its >>>> performance by means of recent developments in TC filter update API that >>>> removed rtnl lock for some Qdiscs and classifiers. In order to >>>> significantly improve filter update rate and reduce memory usage we >>>> would like to allow users to skip percpu counters allocation for >>>> specific action if they don't expect high traffic rate hitting the >>>> action, which is a reasonable expectation for hardware-offloaded setup. >>>> In that case any potential gains to software fast-path performance >>>> gained by usage of percpu-allocated counters compared to regular integer >>>> counters protected by spinlock are not important, but amount of >>>> additional CPU and memory consumed by them is significant. >>> >>> Yes! >>> >>> I wonder how this can play together with conntrack offloading. With >>> it the sw datapath will be more used, as a conntrack entry can only be >>> offloaded after the handshake. That said, the host can have to >>> process quite some handshakes in sw datapath. Seems OvS can then just >>> not set this flag in act_ct (and others for this rule), and such cases >>> will be able to leverage the percpu stats. Right? >> >> The flag is set per each actions instance so client can chose not to use >> the flag in case-by-case basis. Conntrack use case requires further >> investigation since I'm not entirely convinced that handling first few >> packets in sw (before connection reaches established state and is >> offloaded) warrants having percpu counter. > > Hi Vlad, > > Did you consider using TCA_ROOT_FLAGS instead of adding another > per-action 32-bit flag?
Hi Roman, I considered it, but didn't find good way to implement my change with TCA_ROOT_FLAGS. I needed some flags to be per-action for following reasons: 1. Not all actions support the flag (only implemented for hw offloaded actions). 2. TCA_ROOT_FLAGS is act API specific and I need this to work when actions are created when actions are created with filters through cls API. I guess I could have changed tcf_action_init_1() to require having TCA_ROOT_FLAGS before actions attribute and then pass obtained value to act_ops->init() as additional argument, but it sounds more complex and ugly. Regards, Vlad