Hi Cristian, I agree with you, the natural way to store application metadata into mbufs consists in putting it right after the rte_mbuf data structure. This can be simply implemented this way:
struct metadata { ... }; struct app_mbuf { struct rte_mbuf mbuf; struct metadata mtdt; }; With such a representation, the application initializes the "buf_addr" field of each mbuf pointed to by a (struct app_mbuf *)amb pointer as: amb->mbuf.buf_addr = ((char *amb) + sizeof(struct app_mbuf)); But such a natural programming approach breaks the assumptions of the macros RTE_MBUF_FROM_BADDR, RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT. For instance, in the function __rte_pktmbuf_prefree_seg(m), after the line struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr); "md" is always different from "m", and thus systematically (and most of the time wrongly) considered as an indirect mbuf. In the same way, the operations done by the function rte_pktmbuf_detach(m) void *buf = RTE_MBUF_TO_BADDR(m); m->buf_addr = buf; does not set buf_addr to its [dafault] correct value. To summarize, adding metadata after the rte_mbuf data structure is incompatible with the packet cloning feature behind the [wrongly named] RTE_MBUF_SCATTER_GATHER configuration option. By the way, you suggest to use the headroom to also store packet metadata. But, then, how does an application can store both types of information in a given mbuf at the same time, when the actual length of network headers in a mbuf is variable, as it depends on the protocol path followed by the packet in a networking stack (tunnels, etc)? Regards, Ivan On 05/30/2014 12:28 AM, Dumitrescu, Cristian wrote: > Hi Ivan, > > I hate to disagree with you :), but no, we do not break the scatter-gather > feature. We actually implemented the Packet Framework IPv4 fragmentation and > reassembly ports to validate exactly this. > > Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro only > works (and it is only used) for direct mbufs, so probably the correct name > for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (and it > is not used) for indirect mbufs. > > I am describing the rationale behind meta-data design below, sorry for the > long description that looks like a design document ... > > Quick recap: > - mbuf->buf_addr points to the first address where a byte of data for the > current segment _can_ be placed > - direct mbuf: the packet descriptor (mbuf) sits in the same mempool buffer > with the packet itself, so mbuf->buf_addr = mbuf + 1; > - indirect mbuf: the packet descriptor is located in a different mempool > buffer than the packet itself, so mbuf->buf_addr != mbuf + 1; > - Regardless of the mbuf type, it is not necessarily true that the first byte > of data is located at mbuf->buf_addr, as we save a headroom (configurable, by > default initially of CONFIG_RTE_PKTMBUF_HEADROOM = 128 bytes) at the start of > the data buffer (mbuf->buf_addr) for prepending packet headers ( ... or, why > not, meta-data!). The location of the first real data byte is > mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which does > not change. > > Packet meta-data rationale: > - I think we both agree that we need to be able to store packet meta-data in > the packet buffer. The packet buffer regions where meta-data could be stored > can only be the in the headroom or in the tailroom, which are both managed by > the application and both go up and down during the packet lifetime. > > - To me, packet-metadata is part of the packet descriptor: mbuf is the > mandatory & standard part of the packet descriptor, while meta-data is the > optional & non-standard (per application) part of the packet descriptor. > Therefore, it makes sense to put the meta-data immediately after mbuf, but > this is not mandatory. > > - The zero-size field called meta-data is really just an offset: it points to > the first buffer location where meta-data _can_ be placed. The reason for > having this field is to provide a standard way to place meta-data in the > packet buffer rather that hide it in the Packet Framework libraries and > potentially conflict with other mbuf changes in the future. It flags that > meta-data should be taken into consideration when any mbuf change is done in > the future. > > - For direct mbufs, the same buffer space (the headroom) is shared between > packet data (header prepending) and meta-data similar to how the stack and > the heap manage the same memory. Of course, since it is managed by the > application, it is the responsibility of the application to make sure the > packet bytes and the meta-data do not step on one another, but this problem > is not at all new, nor introduced by the meta-data field: even currently, the > application has to make sure the headroom is dimensioned correctly, so that > even in the worst case scenario (application-specific), the packet bytes do > not run into the mbuf fields, right? > > - For indirect mbufs, the packet meta-data is associated with the indirect > mbuf rather than the direct mbuf (the one the indirect mbuf attaches to, as > the direct mbuf contains a different packet that has different meta-data), so > it makes sense that meta-data of the indirect mbuf is stored in the same > buffer as the indirect mbuf (rather than the buffer of the attached direct > mbuf). So this means that the buffer size used to store the indirect mbuf is > sized appropriately (mbuf size, currently 64 bytes, plus the max size for > additional meta-data). This is illustrated in the code of the IPv4 > fragmentation port, where for every child packet (output fragment) we copy > the parent meta-data in the child buffer (which stores an indirect buffer > attached to the direct buffer of the input jumbo, plus now additional > meta-data). > > - We are also considering having a user_data field in the mbuf itself to > point to meta-data in the same buffer or any other buffer. The Packet > Framework functionally works with both approaches, but there is a performance > problem associated with the mbuf->user_data approach that we are not > addressing for this 1.7 release timeframe. The issue is the data dependency > that is created, as in order to find out the location of the meta-data, we > need to first read the mbuf and then read meta-data from mbuf->user_data. > The cost of the additional memory access is high, due to data dependency > preventing efficient prefetch, therefore (at least for the time being) we > need to use a compile-time known meta-data offset. The application can fill > the meta-data on packet RX and then all additional CPU cores processing the > packet can leave of the meta-data for a long time with no need to access the > mbuf or the packet (efficiency). For the time being, if somebody needs more > yet meta-data in yet another buffer, they can add (for their application) a user_data pointer as part of their application meta-data (instead of standard mbuf). > > - As said, the mbuf->metadata points to the first location where meta-data > _can_ be placed, if somebody needs a different offset, they can add it on top > of the mbuf->metadata (e.g. by having a reserved field in their struct > app_pkt_metadata). We have demonstrated the use of meta-data in the > examples/ip_pipeline sample app (see struct app_pkt_metadata in "main.h"). > > Please let me know if this makes sense? > > Regards, > Cristian > > PS: This is where a white board would help a lot ... > > > -----Original Message----- > From: Ivan Boule [mailto:ivan.boule at 6wind.com] > Sent: Wednesday, May 28, 2014 1:03 PM > To: Dumitrescu, Cristian; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data > in the packet buffer just after mbuf > > Hi Cristian, > > Currently, the DPDK framework does not make any assumption on the actual > layout of a mbuf. > More precisely, the DPDK does not impose any constraint on the actual > location of additional metadata, if any, or on the actual location and > size of its associated payload data buffer. > This is coherent with the fact that mbuf pools are not created by the > DPDK itself, but by network applications that are free to choose > whatever packet mbuf layout that fits their particular needs. > > There is one exception to this basic DPDK rule: the mbuf cloning feature > available through the RTE_MBUF_SCATTER_GATHER configuration option > assumes that the payload data buffer of the mbuf immediately follows the > rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR, > RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file > lib/librte_mbuf/rte_mbuf.h). > > The cloning feature prevents to build packet mbufs with their metadata > located immediately after the rte_mbuf data structure, which is exactly > what your patch introduces. > > At least, a comment that clearly warns the user of this incompatibility > might be worth adding into both the code and your patch log. > > Regards, > Ivan > > On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote: >> Added zero-size field (offset in data structure) to specify the beginning of >> packet meta-data in the packet buffer just after the mbuf. >> >> The size of the packet meta-data is application specific and the packet >> meta-data is managed by the application. >> >> The packet meta-data should always be accessed through the provided macros. >> >> This is used by the Packet Framework libraries (port, table, pipeline). >> >> There is absolutely no performance impact due to this mbuf field, as it does >> not take any space in the mbuf structure (zero-size field). >> >> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu at intel.com> >> --- >> lib/librte_mbuf/rte_mbuf.h | 17 +++++++++++++++++ >> 1 files changed, 17 insertions(+), 0 deletions(-) >> >> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h >> index 4a9ab41..bf09618 100644 >> --- a/lib/librte_mbuf/rte_mbuf.h >> +++ b/lib/librte_mbuf/rte_mbuf.h >> @@ -201,8 +201,25 @@ struct rte_mbuf { >> struct rte_ctrlmbuf ctrl; >> struct rte_pktmbuf pkt; >> }; >> + >> + union { >> + uint8_t metadata[0]; >> + uint16_t metadata16[0]; >> + uint32_t metadata32[0]; >> + uint64_t metadata64[0]; >> + }; >> } __rte_cache_aligned; >> >> +#define RTE_MBUF_METADATA_UINT8(mbuf, offset) (mbuf->metadata[offset]) >> +#define RTE_MBUF_METADATA_UINT16(mbuf, offset) >> (mbuf->metadata16[offset/sizeof(uint16_t)]) >> +#define RTE_MBUF_METADATA_UINT32(mbuf, offset) >> (mbuf->metadata32[offset/sizeof(uint32_t)]) >> +#define RTE_MBUF_METADATA_UINT64(mbuf, offset) >> (mbuf->metadata64[offset/sizeof(uint64_t)]) >> + >> +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset) >> (&mbuf->metadata[offset]) >> +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset) >> (&mbuf->metadata16[offset/sizeof(uint16_t)]) >> +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset) >> (&mbuf->metadata32[offset/sizeof(uint32_t)]) >> +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset) >> (&mbuf->metadata64[offset/sizeof(uint64_t)]) >> + >> /** >> * Given the buf_addr returns the pointer to corresponding mbuf. >> */ >> > > -- Ivan Boule 6WIND Development Engineer