hi Sam,
Pls. find my replies inline.

On Sep 14, 2014, at 9:12 PM, Samuel Ghinet <sghi...@cloudbasesolutions.com>
 wrote:

> Hello Nithin,
> 
> Thanks a lot again for the review!

Welcome. As you said, the code is intricate. It is fine though. Thanks for 
working on this one.

> -------------------------
>> More than that I was a little concerned about the changes in OvsTunnelTx to 
>> split a TSOed NBL (with multiple NBs) into multiple smaller NBLs. I believe 
>> this would have a performance impact (when we measure it).
> 
>> The reason why we did not call OvsDoFlowLookupOutput() on each NB earlier is 
>> that we know all of the NBs have the same L2, L3 and L4 header. The fields 
>> that are different in the L3 and L4 headers are not part of the flow key, so 
>> there's no reason to do a separate flow lookup and actions execute. It is an 
>> optimization we had, and there was no correctness issue with that 
>> optimization.
> 
>> But, I understand that this does not suit the change you want to make where 
>> you want to update the pipeline to process one NB at a time rather than NBL 
>> at a time. Can I ask what is the advantage of that change? This change can 
>> potentially cause bad performance.
> There was an email I had sent about this, long ago.
> And the answer is that MS, in its doc, makes very few guarantees about the 
> things shared in common by the NBs in a NBL.
> Namely, we do have L2 the same. Yes.
> L3? we don't. We have here only the protocol type the same. If the packet is 
> TCP or UDP, we also have src ip and src destination the same, and src ports 
> and dest ports the same. But that's quite it.
> There are a lot of fields in the packet info / flow keys regarding L3 and L4 
> headers that the MS documentation does not give any guarantee about.
> We could theoretically assume, for instance, that if we have NBs of a tcp 
> connection, we would generally treat them the same. But practically, we can 
> apply a 'set action' if tcp flags = X (which may be for only one of the NBs 
> in the NBL), or if it is the first ipv4 / ipv6 fragment in a series or not, 
> etc. (which may all be NBs in the same NBL).
> 
> An optimized version of the "NBs-> NBLs" would have to do advanced flow match 
> checks, where it knows that say, tcp flags = wildcard, the same with others, 
> and it knows it needs not check flow match for the NBs that follow in the 
> NBL. 

Sam,
One of us is missing something here. Let me clarify what I meant to say.

VM sends a TSO packet which is say 20k, and MSS is 1k. Let's assume that entire 
20k is in one NB ie. the NBL has only 1 NB. Such a packet would be subject to 
tunneling in OvsDoEncapVxlan(). OvsDoEncapVxlan() calls into 
OvsTcpSegmentNBL(), which creates 20 NBs. Each of the NBs is guaranteed to have 
the same L2, L3 and L4, except for the following differences:
1. TCP sequence numbers
2. TCP length
3. TCP checksum

None of these fields that are different between the different NBs are part of 
the same key. So, we don't need to do a separate flow lookup for each of them.

Furthermore, it is NDIS that is creating such an NBL, but it is OVS ie. the 
function OvsTcpSegmentNBL(). So, it is guaranteed that all the NBs fall into 
the same flow key.

> We could have an IRC chat on this, if you think it could use more 
> clarifications.

Sure, we could chat more on IRC about this.

> 
>>> +    atDispatch = NDIS_TEST_SEND_AT_DISPATCH_LEVEL(sendFlags) ?
>>> +                    NDIS_RWL_AT_DISPATCH_LEVEL : 0;
>> 
>> If there's some way to check if the 'switchContext->dispatchLock' is already 
>> taken, we should do it. At a minimum, we can ASSERT if we are already at 
>> DISPATCH level. I'm implying that the DPC level would have been upgrated to 
>> DISPATCH_LEVEL because we have taken the 'switchContext->dispatchLock'.
> Actually, we should be at DPC level, because OvsProcessOneNb is called by the 
> NDIS Send callback, which is always called (from observation) at DPC level.

Right, we can assume the the NDIS Send callback happens at DISPATCH_LEVEL or 
since we hold a spin lock, we can definitely be sure that we are at 
DISPATCH_LEVEL. So, we don't need 'atDispatch'.


>>> +    ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
>> 
>> Cool, we have this ASSERT. Let's assume that we are at DISPATCH_LEVEL then. 
>> We don't need the 'atDispatch' variable.
> Actually, I think I should remove this ASSERT instead. Even though, from 
> observation, the NDIS Send callback is always called at dispatch level: they 
> give us a flag, and I think it is better to use that only.

Yes, that and also we hold a spinlock already.

>>> +    newNbl = OvsSplitNblByNB(origNbl, switchContext, TRUE);
>>> +    if (!newNbl) {
>>> +        RtlInitUnicodeString(&filterReason, L"Cannot allocate new NBL: 
>>> partial"
>>> +                             L"copy NB to multiple NBLs.");
>>> +        status = NDIS_STATUS_RESOURCES;
>>> +        goto Cleanup;
>>> +    }
>> 
>> Jumping to 'Cleanup' here or from code down the line would add 'origNbl' to 
>> the completion list, and we'd end up calling OVSCompleteNBl() on it. But, 
>> the refCount is already 0, so we can have bad behavior. You can fix it 
>> either by not jumping to Cleanup or by preserving the refCount and letting 
>> code code flow through.
> Actually, we have 3 cases:
> 1. if OvsSplitNblByNB returns NULL - fixing the refcount decrement to be done 
> after the partial copy will solve the problem
> 2. if origNbl has one NB: no deref happens.
> 3. if origNbl has N NBs (N > 1). Then origNbl will have refCount = N 
> (normally, N + 1, 1 for the initial value, but the deref happened, resulting 
> in refCount == N). And this allows for the completion of the children NBLs 
> (i.e. cloned) to also complete the parent (original).

OK, sounds good. I missed that OvsSplitNblByNB() does not decrement the 
refCount is it has only one NB.

>> We generally use all lowercase for the the label. I can update the 
>> CodingStyle to make this explicit. What do you think? Having both lower and 
>> upper case makes it a little untidy.
> I would strongly prefer labels starting with uppercase (like Cleanup).

OK, that is fine. We'll update all the labels to be uppercase then or have a 
mix.

>>> +    /* Queue the missed packets. */
>>> +    OvsQueuePackets(OVS_DEFAULT_PACKET_QUEUE, &missedPackets, 
>>> countMissedPackets);
>> 
>> A small optimization is to call OvsQueuePackets() only if countMissedPackets 
>> > 0. Also, the line is longer than 79 characters. You can reduce the length 
>> if possible.
> Hmm, perhaps it would be better to put the check inside OvsQueuePackets: this 
> would make the code clearer if other callers would like this optimizations.

Sounds good. I am ok with this.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to