On Wed, Jan 03, 2018 at 09:46:15AM -0800, John Fastabend wrote: > On 01/03/2018 07:50 AM, Michael S. Tsirkin wrote: > > On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote: > >>> > >>> More generally, what makes this usage safe? > >>> Is there a way to formalize it at the API level? > >>> > >> > >> Right I think these are good questions. I think the ptr_ring API should > >> allow a peek operation to be used without a lock. The user has to ensure > >> they only use it as a hint and if its dereferenced the user needs to > >> ensure the object is not free'd from some other codepath while it is > >> being dereferenced. The existing API seems to match this. > >> > >> This is how I used it in pfifo_fast expecting the above to be true. The > >> API allows for false negatives which _should_ be OK if the user is > >> expecting > >> this. Alternatively, we could make it false positives if you want and > >> that would also work for me considering this case is hit very rarely. > > > > By now I'm confused by which are positives and which are negatives :) > > OK so the guarantees we want would be: > > > > - empty can return false if ring is empty. > > caller must re-check with consumer lock taken > > Correct. > > > - if multiple threads call it, only one thread > > What is "it" here, __skb_array_empty() presumably. > > > is guaranteed that empty will not return true > > if ring is non empty. > > after detecting that ring is not empty, > > this thread shall .... > > > The guarantee is even weaker. > > - if __skb_array_empty() returns true, either the > ring is empty OR a consumer operation is in > progress.
Well I'm not sure that's guaranteed actually, in that in progress isn't all that well defined on multi-core systems. E.g. write of NULL could bypass index write and empty will return true on another CPU even though both completed on our CPU. I still think our use is ok, I'm just still having trouble formulating the exact rules. > > For pfifo_fast this is OK because some thread is making > progress at this point. > > > Note, even if multiple threads > are calling __skb_array_empty() there is no guarantee > any of them will return false even if the ring has > elements. > > > can you help me fill in the blank please? > > > > Did that help? > > > > > > > > > > >>>>> John, others, could you pls confirm it's not too bad performance-wise? > >>>>> I'll then split it up properly and re-post. > >>>>> > >>>> > >>>> I haven't benchmarked it but in the dequeue case taking a lock for > >>>> every priority even when its empty seems unneeded. > >>> > >>> Well it does seem to make the code more comprehensible and more robust. > >>> > >> > >> Its a trade-off between performance and robustness. > >> > >>> But OK - I was looking at fixing the unlocked empty API to make sure it > >>> actually does what it's supposed to. I posted a draft earlier in this > >>> thread, it needs to be looked at in depth to figure out whether it can > >>> ever give false negatives or positives, and document the results. > >>> > >>> > >> > >> I'll look at it. But I still think keeping a lockless version makes sense > >> for many use cases. > > > > Fine. Just let's try to document what are the guarantees. > > > > Yep. Guarantees should be (on ptr_ring implementation and similar > on skb_array implementation) > > - if __ptr_ring_empty returns true, the ring may be empty > user must use operation with consumer lock to guarantee the > ring is empty. Note, when run with concurrent consumer operations > it is possible to return true when ring has elements. This > occurs when ring consumer head rolls over ring size. Also, > producer operations may put object in the ring concurrently > making empty check incorrect. To avoid this use locked > ptr_ring_empty() API. > > - if __ptr_ring_empty returns false, the ring may have elements. > Possibly, scenario is consumer operation and __ptr_ring_empty > operation run concurrently causing false to be returned when > ring is empty. To avoid this use locked ptr_ring_empty() API. > > >>> > >>>>> --> > >>>>> > >>>>> net: don't misuse ptr_ring_peek > >>>>> > >>>>> ptr_ring_peek only claims to be safe if the result is never > >>>>> dereferenced, which isn't the case for its use in sch_generic. > >>>>> Add locked API variants and use the bh one here. > >>>>> > >>>>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > >>>>> > >>>>> --- > >>>>> > >>>> > >>>> [...] > >>>> > >>>>> --- a/net/sched/sch_generic.c > >>>>> +++ b/net/sched/sch_generic.c > >>>>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc > >>>>> *qdisc) > >>>>> for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) { > >>>>> struct skb_array *q = band2list(priv, band); > >>>>> > >>>>> - skb = __skb_array_peek(q); > >>>>> + skb = skb_array_peek_bh(q); > >>>> > >>>> Ah I should have added a comment here. For now peek() is only used from > >>>> locking qdiscs. So peek and consume/produce operations will never happen > >>>> in parallel. In this case we should never hit the false negative case > >>>> with > >>>> my patch or the out of bounds reference without my patch. > > > > OK so this is the part I missed. Can you add a comment please? > > > > > > Yes, working on a documentation patch set to address misleading and missing > comments/documentation in qdisc layer now. > > >>>> Doing a peek() op without qdisc lock is a bit problematic anyways. With > >>>> current code another cpu could consume the skb and free it. Either we > >>>> can ensure a single consumer runs at a time on an array (not the same as > >>>> qdisc maybe) or just avoid peek operations in this case. My current plan > >>>> was to just avoid peek() ops altogether, they seem unnecessary with the > >>>> types of qdiscs I want to be build. > >>>> > >>>> Thanks, > >>>> John > >>> > >>> For the lockless qdisc, for net-next, do we need the patch above until > >>> you fix that though? > >>> > >> > >> No, I think after this patch (net: ptr_ring: otherwise safe empty > >> checks...) is > >> applied we do not need any additional fixes in net-next. Future work will > >> require the above patch (the one you provided) though so its useful work. > > > > The one that avoids allocating extra memory? > > > > Let me try summarize again. > > We need the extra memory slot patch if we expose the > __ptr_ring_empty/__skb_array_empty API. Otherwise we hit the out > of bounds issues. And for qdisc layer the __skb_array_empty() API > call is useful so we should not remove it. > > In addition to the qdisc layer using the __skb_array_empty() API if > we need a peek() API that works without the qdisc lock then your patch, > the one adding the locked peek API, will be needed. It is not needed > though until we have a qdisc that would use it without the lock. > > In summary we keep the extra allocation to make the __skb_array_empty() API > usable. And hold off exposing the skb_array_peek()/ptr_ring_peek() until we > have a user for it. We can roll your proposed patch into any series we come > up with for a new/updated qdisc. > > > > >> I'll do another review of the false positive case though to be sure the > >> current code is OK wrt handling false positives and any potential stalls. > > > > > > Thanks! > > > >>>>> } > >>>>> > >>>>> return skb; > >>>>>