> -----Original Message-----
> From: Steffen Klassert <steffen.klass...@secunet.com>
> Sent: Friday, May 3, 2019 11:38 AM
> To: Florian Westphal <f...@strlen.de>
> Cc: Vakul Garg <vakul.g...@nxp.com>; netdev@vger.kernel.org
> Subject: Re: [RFC HACK] xfrm: make state refcounting percpu
> 
> On Wed, Apr 24, 2019 at 12:40:23PM +0200, Florian Westphal wrote:
> > I'm not sure this is a good idea to begin with, refcount is right next
> > to state spinlock which is taken for both tx and rx ops, plus this
> > complicates debugging quite a bit.
> 
> 
> Hm, what would be the usecase where this could help?
> 
> The only thing that comes to my mind is a TX state with wide selectors. In
> that case you might see traffic for this state on a lot of cpus. But in that 
> case
> we have a lot of other problems too, state lock, replay window etc. It might
> make more sense to install a full state per cpu as this would solve all the
> other problems too (I've talked about that idea at the IPsec workshop).
> 
> In fact RFC 7296 allows to insert multiple SAs with the same traffic selector,
> so it is possible to install one state per cpu. We did a PoC for this at the 
> IETF
> meeting the week after the IPsec workshop.
> 

On 16-core arm64 processor, I am getting very high cpu usage (~ 40 %) in 
refcount atomics.
E.g. in function dst_release() itself, I get 19% cpu usage  in refcount api.
Will the PoC help here?

> One problem that is not solved completely is that, from userland point of
> view, a SA consists of two states (RX/TX) and this has to be symetic i.e.
> both ends must have the same number of states.
> So if both ends have a different number of cpus, it is not clear how many
> states we should install.
> 
> We are currently discuss to extend the IKEv2 standard so that we can
> negotiate the 'optimal' number of (per cpu) SAs for a connection.

Reply via email to