On Wed, 9 Dec 2020 21:26:05 -0700 David Ahern wrote: > Yes, TCP is a byte stream, so the packets could very well show up like this: > > +--------------+---------+-----------+---------+--------+-----+ > | data - seg 1 | PDU hdr | prev data | TCP hdr | IP hdr | eth | > +--------------+---------+-----------+---------+--------+-----+ > +-----------------------------------+---------+--------+-----+ > | payload - seg 2 | TCP hdr | IP hdr | eth | > +-----------------------------------+---------+--------+-----+ > +-------- +-------------------------+---------+--------+-----+ > | PDU hdr | payload - seg 3 | TCP hdr | IP hdr | eth | > +---------+-------------------------+---------+--------+-----+ > > If your hardware can extract the NVMe payload into a targeted SGL like > you want in this set, then it has some logic for parsing headers and > "snapping" an SGL to a new element. ie., it already knows 'prev data' > goes with the in-progress PDU, sees more data, recognizes a new PDU > header and a new payload. That means it already has to handle a > 'snap-to-PDU' style argument where the end of the payload closes out an > SGL element and the next PDU hdr starts in a new SGL element (ie., 'prev > data' closes out sgl[i], and the next PDU hdr starts sgl[i+1]). So in > this case, you want 'snap-to-PDU' but that could just as easily be 'no > snap at all', just a byte stream and filling an SGL after the protocol > headers.
This 'snap-to-PDU' requirement is something that I don't understand with the current TCP zero copy. In case of, say, a storage application which wants to send some headers (whatever RPC info, block number, etc.) and then a 4k block of data - how does the RX side get just the 4k block a into a page so it can zero copy it out to its storage device? Per-connection state in the NIC, and FW parsing headers is one way, but I wonder how this record split problem is best resolved generically. Perhaps by passing hints in the headers somehow? Sorry for the slight off-topic :)