> -----Original Message----- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yongseok Koh > Sent: Tuesday, April 10, 2018 2:59 AM > To: Olivier Matz <olivier.m...@6wind.com> > Cc: Lu, Wenzhuo <wenzhuo...@intel.com>; Wu, Jingjing <jingjing...@intel.com>; > adrien.mazarg...@6wind.com; > nelio.laranje...@6wind.com; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/6] mbuf: add buffer offset field for > flexible indirection > > On Mon, Apr 09, 2018 at 06:04:34PM +0200, Olivier Matz wrote: > > Hi Yongseok, > > > > On Tue, Apr 03, 2018 at 05:12:06PM -0700, Yongseok Koh wrote: > > > On Tue, Apr 03, 2018 at 10:26:15AM +0200, Olivier Matz wrote: > > > > Hi, > > > > > > > > On Mon, Apr 02, 2018 at 11:50:03AM -0700, Yongseok Koh wrote: > > > > > When attaching a mbuf, indirect mbuf has to point to start of buffer > > > > > of > > > > > direct mbuf. By adding buf_off field to rte_mbuf, this becomes more > > > > > flexible. Indirect mbuf can point to any part of direct mbuf by > > > > > calling > > > > > rte_pktmbuf_attach_at(). > > > > > > > > > > Possible use-cases could be: > > > > > - If a packet has multiple layers of encapsulation, multiple indirect > > > > > buffers can reference different layers of the encapsulated packet. > > > > > - A large direct mbuf can even contain multiple packets in series and > > > > > each packet can be referenced by multiple mbuf indirections. > > > > > > > > > > Signed-off-by: Yongseok Koh <ys...@mellanox.com> > > > > > > > > I think the current API is already able to do what you want. > > > > > > > > 1/ Here is a mbuf m with its data > > > > > > > > off > > > > <--> > > > > len > > > > +----+ <----------> > > > > | | > > > > +-|----v----------------------+ > > > > | | -----------------------| > > > > m | buf | XXXXXXXXXXX || > > > > | -----------------------| > > > > +-----------------------------+ > > > > > > > > > > > > 2/ clone m: > > > > > > > > c = rte_pktmbuf_alloc(pool); > > > > rte_pktmbuf_attach(c, m); > > > > > > > > Note that c has its own offset and length fields. > > > > > > > > > > > > off > > > > <--> > > > > len > > > > +----+ <----------> > > > > | | > > > > +-|----v----------------------+ > > > > | | -----------------------| > > > > m | buf | XXXXXXXXXXX || > > > > | -----------------------| > > > > +------^----------------------+ > > > > | > > > > +----+ > > > > indirect | > > > > +-|---------------------------+ > > > > | | -----------------------| > > > > c | buf | || > > > > | -----------------------| > > > > +-----------------------------+ > > > > > > > > off len > > > > <--><----------> > > > > > > > > > > > > 3/ remove some data from c without changing m > > > > > > > > rte_pktmbuf_adj(c, 10) // at head > > > > rte_pktmbuf_trim(c, 10) // at tail > > > > > > > > > > > > Please let me know if it fits your needs. > > > > > > No, it doesn't. > > > > > > Trimming head and tail with the current APIs removes data and make the > > > space > > > available. Adjusting packet head means giving more headroom, not shifting > > > the > > > buffer itself. If m has two indirect mbufs (c1 and c2) and those are > > > pointing to > > > difference offsets in m, > > > > > > rte_pktmbuf_adj(c1, 10); > > > rte_pktmbuf_adj(c2, 20); > > > > > > then the owner of c2 regard the first (off+20)B as available headroom. If > > > it > > > wants to attach outer header, it will overwrite the headroom even though > > > the > > > owner of c1 is still accessing it. Instead, another mbuf (h1) for the > > > outer > > > header should be linked by h1->next = c2. > > > > Yes, after these operations c1, c2 and m should become read-only. So, to > > prepend headers, another mbuf has to be inserted before as you suggest. It > > is possible to wrap this in a function rte_pktmbuf_clone_area(m, offset, > > length) that will: > > - alloc and attach indirect mbuf for each segment of m that is > > in the range [offset : length+offset]. > > - prepend an empty and writable mbuf for the headers > > > > > If c1 and c2 are attached with shifting buffer address by adjusting > > > buf_off, > > > which actually shrink the headroom, this case can be properly handled. > > > > What do you mean by properly handled? > > > > Yes, prepending data or adding data in the indirect mbuf won't override > > the direct mbuf. But prepending data or adding data in the direct mbuf m > > won't be protected. > > > > From an application point of view, indirect mbufs, or direct mbufs that > > have refcnt != 1, should be both considered as read-only because they > > may share their data. How an application can know if the data is shared > > or not? > > > > Maybe we need a flag to differentiate mbufs that are read-only > > (something like SHARED_DATA, or simply READONLY). In your case, if my > > understanding is correct, you want to have indirect mbufs with RW data. > > Agree that indirect mbuf must be treated as read-only, Then the current code > is > enough to handle that use-case. > > > > And another use-case (this is my actual use-case) is to make a large mbuf > > > have > > > multiple packets in series. AFAIK, this will also be helpful for some > > > FPGA NICs > > > because it transfers multiple packets to a single large buffer to reduce > > > PCIe > > > overhead for small packet traffic like the Multi-Packet Rx of mlx5 does. > > > Otherwise, packets should be memcpy'd to regular mbufs one by one instead > > > of > > > indirect referencing.
But just to make HW to RX multiple packets into one mbuf, data_off inside indirect mbuf should be enough, correct? As I understand, what you'd like to achieve with this new field - ability to manipulate packet boundaries after RX, probably at upper layer. As Olivier pointed above, that doesn't sound as safe approach - as you have multiple indirect mbufs trying to modify same direct buffer. Though if you really need to do that, why it can be achieved by updating buf_len and priv_size Fields for indirect mbufs, straight after attach()? Konstantin > > > > > > Does this make sense? > > > > I understand the need. > > > > Another option would be to make the mbuf->buffer point to an external > > buffer (not inside the direct mbuf). This would require to add a > > mbuf->free_cb. See "Mbuf with external data buffer" (page 19) in [1] for > > a quick overview. > > > > [1] > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdksummit.com%2FArchive%2Fpdf%2F2016Userspace%2FDay01 > -Session05-OlivierMatz- > Userspace2016.pdf&data=02%7C01%7Cyskoh%40mellanox.com%7Ca5405edb36e445e6540808d59e339a38%7Ca652971c7d2e4d9ba6a4d > 149256f461b%7C0%7C0%7C636588866861082855&sdata=llw%2BwiY5cC56naOUhBbIg8TKtfFN6VZcIRY5PV7VqZs%3D&reserved=0 > > > > The advantage is that it does not require the large data to be inside a > > mbuf (requiring a mbuf structure before the buffer, and requiring to be > > allocated from a mempool). On the other hand, it is maybe more complex > > to implement compared to your solution. > > I knew that you presented the slides and frankly, I had considered that option > at first. But even with that option, metadata to store refcnt should also be > allocated and managed anyway. Kernel also maintains the skb_shared_info at the > end of the data segment. Even though it could have smaller metadata structure, > I just wanted to make full use of the existing framework because it is less > complex as you mentioned. Given that you presented the idea of external data > buffer in 2016 and there hasn't been many follow-up discussions/activities so > far, I thought the demand isn't so big yet thus I wanted to make this patch > simpler. I personally think that we can take the idea of external data seg > when > more demands come from users in the future as it would be a huge change and > may > break current ABI/API. When the day comes, I'll gladly participate in the > discussions and write codes for it if I can be helpful. > > Do you think this patch is okay for now? > > > Thanks for your comments, > Yongseok