Hi Cristian,

I agree with you, the natural way to store application metadata into 
mbufs consists in putting it right after the rte_mbuf data structure.
This can be simply implemented this way:

struct metadata {
     ...
};

struct app_mbuf {
     struct rte_mbuf mbuf;
     struct metadata mtdt;
};

With such a representation, the application initializes the "buf_addr" 
field of each mbuf pointed to by a (struct app_mbuf *)amb pointer as:

     amb->mbuf.buf_addr = ((char *amb) + sizeof(struct app_mbuf));

But such a natural programming approach breaks the assumptions of the 
macros RTE_MBUF_FROM_BADDR, RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and 
RTE_MBUF_DIRECT.

For instance, in the function  __rte_pktmbuf_prefree_seg(m), after the line
     struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr);
"md" is always different from "m", and thus systematically (and most of 
the time wrongly) considered as an indirect mbuf.

In the same way, the operations done by the function rte_pktmbuf_detach(m)
     void *buf = RTE_MBUF_TO_BADDR(m);
     m->buf_addr = buf;
does not set buf_addr to its [dafault] correct value.

To summarize, adding metadata after the rte_mbuf data structure is
incompatible with the packet cloning feature behind the [wrongly named]
RTE_MBUF_SCATTER_GATHER configuration option.

By the way, you suggest to use the headroom to also store packet metadata.
But, then, how does an application can store both types of information 
in a given mbuf at the same time, when the actual length of network 
headers in a mbuf is variable, as it depends on the protocol path 
followed by the packet in a networking stack (tunnels, etc)?

Regards,
Ivan


On 05/30/2014 12:28 AM, Dumitrescu, Cristian wrote:
> Hi Ivan,
>
> I hate to disagree with you :), but no, we do not break the scatter-gather 
> feature. We actually implemented the Packet Framework IPv4 fragmentation and 
> reassembly ports to validate exactly this.
>
> Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro only 
> works (and it is only used) for direct mbufs, so probably the correct name 
> for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (and it 
> is not used) for indirect mbufs.
>
> I am describing the rationale behind meta-data design below, sorry for the 
> long description that looks like a design document ...
>
> Quick recap:
> - mbuf->buf_addr points to the first address where a byte of data for the 
> current segment _can_ be placed
> - direct mbuf: the packet descriptor (mbuf) sits in the same mempool buffer 
> with the packet itself, so mbuf->buf_addr = mbuf + 1;
> - indirect mbuf: the packet descriptor is located in a different mempool 
> buffer than the packet itself, so mbuf->buf_addr != mbuf + 1;
> - Regardless of the mbuf type, it is not necessarily true that the first byte 
> of data is located at mbuf->buf_addr, as we save a headroom (configurable, by 
> default initially of CONFIG_RTE_PKTMBUF_HEADROOM = 128 bytes) at the start of 
> the data buffer (mbuf->buf_addr) for prepending packet headers ( ... or, why 
> not, meta-data!).  The location of the first real data byte is 
> mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which does 
> not change.
>
> Packet meta-data rationale:
> - I think we both agree that we need to be able to store packet meta-data in 
> the packet buffer. The packet buffer regions where meta-data could be stored 
> can only be the in the headroom or in the tailroom, which are both managed by 
> the application and both go up and down during the packet lifetime.
>
> - To me, packet-metadata is part of the packet descriptor: mbuf is the 
> mandatory & standard part of the packet descriptor, while meta-data is the 
> optional & non-standard (per application) part of the packet descriptor. 
> Therefore, it makes sense to put the meta-data immediately after mbuf, but 
> this is not mandatory.
>
> - The zero-size field called meta-data is really just an offset: it points to 
> the first buffer location where meta-data _can_ be placed. The reason for 
> having this field is to provide a standard way to place meta-data in the 
> packet buffer rather that hide it in the Packet Framework libraries and 
> potentially conflict with other mbuf changes in the future. It flags that 
> meta-data should be taken into consideration when any mbuf change is done in  
> the future.
>
> - For direct mbufs, the same buffer space (the headroom) is shared between 
> packet data (header prepending) and meta-data similar to how the stack and 
> the heap manage the same memory. Of course, since it is managed by the 
> application, it is the responsibility of the application to make sure the 
> packet bytes and the meta-data do not step on one another, but this problem 
> is not at all new, nor introduced by the meta-data field: even currently, the 
> application has to make sure the headroom is dimensioned correctly, so that 
> even in the worst case scenario (application-specific), the packet bytes do 
> not run into the mbuf fields, right?
>
> - For indirect mbufs, the packet meta-data is associated with the indirect 
> mbuf rather than the direct mbuf (the one the indirect mbuf attaches to, as 
> the direct mbuf contains a different packet that has different meta-data), so 
> it makes sense that meta-data of the indirect mbuf is stored in the same 
> buffer as the indirect mbuf (rather than the buffer of the attached direct 
> mbuf). So this means that the buffer size used to store the indirect mbuf is 
> sized appropriately (mbuf size, currently 64 bytes, plus the max size for 
> additional meta-data). This is illustrated in the code of the IPv4 
> fragmentation port, where for every child packet (output fragment) we copy 
> the parent meta-data in the child buffer (which stores an indirect buffer 
> attached to the direct buffer of the input jumbo, plus now additional 
> meta-data).
>
> - We are also considering having a user_data field in the mbuf itself to 
> point to meta-data in the same buffer or any other buffer. The Packet 
> Framework functionally works with both approaches, but there is a performance 
> problem associated with the mbuf->user_data approach that we are not 
> addressing for this 1.7 release timeframe. The issue is the data dependency 
> that is created, as in order to find out the location of the meta-data, we 
> need to first read the mbuf  and then read meta-data from mbuf->user_data. 
> The cost of the additional memory access is high, due to data dependency 
> preventing efficient prefetch, therefore (at least for the time being) we 
> need to use a compile-time known meta-data offset. The application can fill 
> the meta-data on packet RX and then all additional CPU cores processing the 
> packet can leave of the meta-data for a long time with no need to access the 
> mbuf or the packet (efficiency). For the time being, if somebody needs more 
> yet meta-data in yet another 
buffer, they can add (for their application) a user_data pointer as part of 
their application meta-data (instead of standard mbuf).
>
> - As said, the mbuf->metadata points to the first location where meta-data 
> _can_ be placed, if somebody needs a different offset, they can add it on top 
> of the mbuf->metadata (e.g. by having a reserved field in their struct 
> app_pkt_metadata). We have demonstrated the use of meta-data in the 
> examples/ip_pipeline sample app (see struct app_pkt_metadata in "main.h").
>
> Please let me know if this makes sense?
>
> Regards,
> Cristian
>
> PS: This is where a white board would help a lot ...
>
>
> -----Original Message-----
> From: Ivan Boule [mailto:ivan.boule at 6wind.com]
> Sent: Wednesday, May 28, 2014 1:03 PM
> To: Dumitrescu, Cristian; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data 
> in the packet buffer just after mbuf
>
> Hi Cristian,
>
> Currently, the DPDK framework does not make any assumption on the actual
> layout of a mbuf.
> More precisely, the DPDK does not impose any constraint on the actual
> location of additional metadata, if any, or on the actual location and
> size of its associated payload data buffer.
> This is coherent with the fact that mbuf pools are not created by the
> DPDK itself, but by network applications that are free to choose
> whatever packet mbuf layout that fits their particular needs.
>
> There is one exception to this basic DPDK rule: the mbuf cloning feature
> available through the RTE_MBUF_SCATTER_GATHER configuration option
> assumes that the payload data buffer of the mbuf immediately follows the
> rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR,
> RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file
> lib/librte_mbuf/rte_mbuf.h).
>
> The cloning feature prevents to build packet mbufs with their metadata
> located immediately after the rte_mbuf data structure, which is exactly
> what your patch introduces.
>
> At least, a comment that clearly warns the user of this incompatibility
> might be worth adding into both the code and your patch log.
>
> Regards,
> Ivan
>
> On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote:
>> Added zero-size field (offset in data structure) to specify the beginning of 
>> packet meta-data in the packet buffer just after the mbuf.
>>
>> The size of the packet meta-data is application specific and the packet 
>> meta-data is managed by the application.
>>
>> The packet meta-data should always be accessed through the provided macros.
>>
>> This is used by the Packet Framework libraries (port, table, pipeline).
>>
>> There is absolutely no performance impact due to this mbuf field, as it does 
>> not take any space in the mbuf structure (zero-size field).
>>
>> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu at intel.com>
>> ---
>>    lib/librte_mbuf/rte_mbuf.h |   17 +++++++++++++++++
>>    1 files changed, 17 insertions(+), 0 deletions(-)
>>
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index 4a9ab41..bf09618 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -201,8 +201,25 @@ struct rte_mbuf {
>>              struct rte_ctrlmbuf ctrl;
>>              struct rte_pktmbuf pkt;
>>      };
>> +    
>> +    union {
>> +            uint8_t metadata[0];
>> +            uint16_t metadata16[0];
>> +            uint32_t metadata32[0];
>> +            uint64_t metadata64[0];
>> +    };
>>    } __rte_cache_aligned;
>>
>> +#define RTE_MBUF_METADATA_UINT8(mbuf, offset)       (mbuf->metadata[offset])
>> +#define RTE_MBUF_METADATA_UINT16(mbuf, offset)      
>> (mbuf->metadata16[offset/sizeof(uint16_t)])
>> +#define RTE_MBUF_METADATA_UINT32(mbuf, offset)      
>> (mbuf->metadata32[offset/sizeof(uint32_t)])
>> +#define RTE_MBUF_METADATA_UINT64(mbuf, offset)      
>> (mbuf->metadata64[offset/sizeof(uint64_t)])
>> +
>> +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)   
>> (&mbuf->metadata[offset])
>> +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)  
>> (&mbuf->metadata16[offset/sizeof(uint16_t)])
>> +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)  
>> (&mbuf->metadata32[offset/sizeof(uint32_t)])
>> +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)  
>> (&mbuf->metadata64[offset/sizeof(uint64_t)])
>> +
>>    /**
>>     * Given the buf_addr returns the pointer to corresponding mbuf.
>>     */
>>
>
>


-- 
Ivan Boule
6WIND Development Engineer

Reply via email to