Hi Olivier, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Friday, March 27, 2015 3:17 PM > To: Ananyev, Konstantin; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when > application uses private mbuf data > > Hi Konstantin, > > On 03/27/2015 03:25 PM, Ananyev, Konstantin wrote: > > Hi Olivier, > > > >> -----Original Message----- > >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > >> Sent: Friday, March 27, 2015 1:56 PM > >> To: Ananyev, Konstantin; dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when > >> application uses private mbuf data > >> > >> Hi Konstantin, > >> > >> On 03/27/2015 10:07 AM, Olivier MATZ wrote: > >>>> I think that to support ability to setup priv_size on a mempool basis, > >>>> and reserve private space between struct rte_mbuf and rte_mbuf. buf_addr, > >>>> we need to: > >>>> > >>>> 1. Store priv_size both inside the mempool and inside the mbuf. > >>>> > >>>> 2. rte_pktmbuf_attach() should change the value of priv_size to the > >>>> priv_size of the direct mbuf we are going to attach to: > >>>> rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) {... > >>>> mi->priv_size = md->priv_size; ...} > >>>> > >>>> 3. rte_pktmbuf_detach() should restore original value of mbuf's > >>>> priv_size: > >>>> rte_pktmbuf_detach(struct rte_mbuf *m) > >>>> { > >>>> ... > >>>> m->priv_size = rte_mempool_get_privsize(m->pool); > >>>> m->buf _addr= rte_mbuf_to_baddr(m); > >>>> ... > >>>> } > >>>> > >>>> Also I think we need to provide a way to specify priv_size for all mbufs > >>>> of the mepool at init time: > >>>> - either force people to specify it at rte_mempool_create() time > >>>> (probably use init_arg for that), > >>>> - or provide separate function that could be called straight after > >>>> rte_mempool_create() , that > >>>> would setup priv_size for the pool and for all its mbufs. > >>>> - or some sort of combination of these 2 approaches - introduce a > >>>> wrapper function > >>>> (rte_mbuf_pool_create() or something) that would take priv_size as > >>>> parameter, > >>>> create a new mempool and then setup priv_size. > >> > >> I though a bit more about this solution, and I realized that doing > >> mi->priv_size = md->priv_size in rte_pktmbuf_attach() is probably not > >> a good idea, as there is no garantee that the size of mi is large enough > >> to store the priv of md. > >> > >> Having the same priv_size for mi and md is maybe a good constraint. > >> I can add this in the API comments and assertions in the code to > >> check this condition, what do you think? > > > > Probably we have a different concepts of what is mbuf's private space in > > mind. > > From my point, even indirect buffer should use it's own private space and > > leave contents of direct mbuf it attached to unmodified. > > After attach() operation, only contents of the buffer are shared between > > mbufs, > > but not the mbuf's metadata. > > Sorry if it was not clear in my previous messages, but I agree > with your description. When attaching a mbuf, only data, not > metadata should be shared. > > In the solution you are suggesting (quoted above), you say we need > to set mi->priv_size to md->priv_size in rte_pktmbuf_attach(). I felt > this was not possible, but it depends on the meaning we give to > priv_size: > > 1. If the meaning is "the size of the private data embedded in this > mbuf", which is the most logical meaning, we cannot do this > affectation > > 2. If the meaning is "the size of the private data embedded in the > mbuf the buf_addr is pointing to" (which is harder to get), the > affectation makes sense. > > From what I understand, you feel we should use 2/ as priv_size > definition. Is it correct?
Yes, I meant 2. >From my point priv_size inside mbuf is more like 'buf_ofs'. It's main purpose is for internal use - to help our mbuf manipulation routinies (attach/detach/free) to work correctly. If the user wants to query size of private space for the mbuf, he probably should use the value from mempool. > > In my previous message, the definition of m->priv_size was 1/, > so that's why I felt assigning mi->priv_size to md->priv_size was > not possible. > > I agree 2/ is probably a good choice, as it would allow to attach > to a mbuf with a different priv_size. It may require some additional > comments above the field in the structure to explain that. > > > > Otherwise on detach(), you'll have to copy contents of private space back, > > from direct to indirect mbuf? > > Again how to deal with the case, when 2 or more mbufs will attach to the > > same direct one? > > > > So let say, if we'll have a macro: > > > > #define RTE_MBUF_PRIV_PTR(mb) ((void *)((struct rte_mbuf *)(mb)) + 1)) > > > > No matter is mb a direct or indirect mbuf. > > Do you have something else in mind here? > > I completely agree with this macro. We should consider the private data > as an extension of the mbuf structure. > > > >>> Introducing rte_mbuf_pool_create() seems a good idea to me, it > >>> would hide 'rte_pktmbuf_pool_private' from the user and force > >>> to initialize all the required fields (mbuf_data_room_size only > >>> today, and maybe mbuf_priv_size). > >>> > >>> The API would be: > >>> > >>> struct rte_mempool * > >>> rte_mbuf_pool_create(const char *name, unsigned n, unsigned elt_size, > >>> unsigned cache_size, size_t mbuf_priv_size, > >>> rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg, > >>> int socket_id, unsigned flags) > >>> > >>> I can give it a try and send a patch for this. > >> > >> About this, it is not required anymore for this patch series if we > >> agree with my comment above. > > > > I still think we need some way to setup priv_size on a per-mempool basis. > > Doing that in rte_mbuf_pool_create() seems like a good thing to me. > > Not sure, why you decided to drop it? > > I think we can already do it without changing the API by providing > our own rte_pktmbuf_init and rte_pktmbuf_pool_init. > > rte_pktmbuf_init() has to set: > m->buf_len = mp->elt_size - sizeof(struct mbuf); > m->priv_size = sizeof(struct mbuf) - sizeof(struct rte_mbuf); > > rte_pktmbuf_pool_init() has to set: > /* we can use the default function */ > mbp_priv->mbuf_data_room_size = MBUF_RXDATA_SIZE + > RTE_PKTMBUF_HEADROOM; Yeh, when arg==NULL for rte_pktmbuf_pool_init() we always set up mbuf_data_room_size to the hardcoded value. Which always looked strange to me. I think it should be set to: mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM; for that case. > > In this case, it is possible to calculate the mbuf_priv_size only > from the pool object: > > mbuf_priv_size = pool->elt_size - RTE_PKTMBUF_HEADROOM - > pool_private->mbuf_data_room_size - > sizeof(rte_mbuf) > So if I understand your idea correctly: If second parameter for rte_pktmbuf_pool_init() is NULL, then we setup mbp_priv->mbuf_data_room_size = mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM; Which means that priv_size ==0 for all mbufs in the pool Otherwise we setup: mbp_priv->mbuf_data_room_size = opaque_arg; As we are doing now, and priv_size for all mbufs in the pool will be: pool->elt_size - pool_private->mbuf_data_room_size - sizeof(rte_mbuf); And in that case, all users have to do to specify priv_size for mempool is to pass a proper argument for rte_pktmbuf_pool_init() at mempool_create(). Correct? > > I agree it's not ideal, but I think the mbuf pool initialization > is another problem. That's why I suggested to change this in a > separate series that will add rte_mbuf_pool_create() with the > API described above. Thoughts? > I also put answers to another mail below. Just to keep all discussion in one place. > > Though, I still think that the better approach is to reserve private space > > before struct rte_mbuf, not after. > > In that case, user got his private space, and we keep buf_addr straight > > after rte_mbuf, without any whole. > > So we don't need steps 2 and 3, above, > > plus we don't need rte_mbuf_to_baddr() and rte_mbuf_from_indirect() - > > RTE_MBUF_TO_BADDR/ RTE_MBUF_FROM_BADDR would keep working correctly. > > In fact, with this scheme - we don't even need priv_size for mbuf > > management (attach/detach/free). > > > > Wonder did you try that approach? > > Yes, I though about this approach too. But I think it would require > more changes. When an application or a driver allocate a mbuf with > mempool_get(), it expects that the returned pointer is the rte_mbuf *. Yep, sure it will still get the pointer to the rte_mbuf *. Though later, if upper layer would like to convert from rte_mbuf* to app_specific_mbuf *, it would have to use a macro: #define RTE_MBUF_TO_PRIV(m) ((void *)((uintptr_t)(m) - (m)->priv_size)) > > With this solution, there are 2 options: > - no mempool modification, so each application/driver has to add > priv_size bytes to the object to get the mbuf pointer. This does not > look feasible. > - change the mempool to support private area before each object. I > think mempool API is already quite complex, and I think it's not > the role of the mempool library to provide such features. In fact, I think the changes would be minimal. All we have to do, is to make changes in rte_pktmbuf_init(): void rte_pktmbuf_init(struct rte_mempool *mp, __attribute__((unused)) void *opaque_arg, void *_m, __attribute__((unused)) unsigned i) { //extract priv_size from mempool (discussed above). uint16_t priv_size = rte_mbufpool_get_priv_size(mp); struct rte_mbuf *m = _m + priv_size; uint32_t buf_len = mp->elt_size - sizeof(struct rte_mbuf) - priv_size; .... With that way priv_size inside mbuf would always be the size its own private space, for both direct and indirect mbus., so we don't require to set priv_size at attach/detach. Again RTE_MBUF_TO_BADDR() and RTE_MBUF_FROM_BADDR() would just work without any modifications. Konstantin