> On Apr 25, 2018, at 11:22 AM, Yongseok Koh <ys...@mellanox.com> wrote: > > On Wed, Apr 25, 2018 at 11:02:36AM -0700, Yongseok Koh wrote: >> On Wed, Apr 25, 2018 at 05:23:20PM +0000, Ananyev, Konstantin wrote: >>> >>> >>>> -----Original Message----- >>>> From: Yongseok Koh [mailto:ys...@mellanox.com] >>>> Sent: Wednesday, April 25, 2018 6:07 PM >>>> To: Ananyev, Konstantin <konstantin.anan...@intel.com> >>>> Cc: Lu, Wenzhuo <wenzhuo...@intel.com>; Wu, Jingjing >>>> <jingjing...@intel.com>; olivier.m...@6wind.com; dev@dpdk.org; >>>> arybche...@solarflare.com; step...@networkplumber.org; >>>> tho...@monjalon.net; adrien.mazarg...@6wind.com; >>>> nelio.laranje...@6wind.com >>>> Subject: Re: [PATCH v5 1/2] mbuf: support attaching external buffer to mbuf >>>> >>>> On Wed, Apr 25, 2018 at 01:31:42PM +0000, Ananyev, Konstantin wrote: >>>> [...] >>>>>> /** Mbuf prefetch */ >>>>>> #define RTE_MBUF_PREFETCH_TO_FREE(m) do { \ >>>>>> if ((m) != NULL) \ >>>>>> @@ -1213,11 +1306,127 @@ static inline int rte_pktmbuf_alloc_bulk(struct >>>>>> rte_mempool *pool, >>>>>> } >>>>>> >>>>>> /** >>>>>> + * Attach an external buffer to a mbuf. >>>>>> + * >>>>>> + * User-managed anonymous buffer can be attached to an mbuf. When >>>>>> attaching >>>>>> + * it, corresponding free callback function and its argument should be >>>>>> + * provided. This callback function will be called once all the mbufs >>>>>> are >>>>>> + * detached from the buffer. >>>>>> + * >>>>>> + * The headroom for the attaching mbuf will be set to zero and this can >>>>>> be >>>>>> + * properly adjusted after attachment. For example, >>>>>> ``rte_pktmbuf_adj()`` >>>>>> + * or ``rte_pktmbuf_reset_headroom()`` can be used. >>>>>> + * >>>>>> + * More mbufs can be attached to the same external buffer by >>>>>> + * ``rte_pktmbuf_attach()`` once the external buffer has been attached >>>>>> by >>>>>> + * this API. >>>>>> + * >>>>>> + * Detachment can be done by either ``rte_pktmbuf_detach_extbuf()`` or >>>>>> + * ``rte_pktmbuf_detach()``. >>>>>> + * >>>>>> + * Attaching an external buffer is quite similar to mbuf indirection in >>>>>> + * replacing buffer addresses and length of a mbuf, but a few >>>>>> differences: >>>>>> + * - When an indirect mbuf is attached, refcnt of the direct mbuf would >>>>>> be >>>>>> + * 2 as long as the direct mbuf itself isn't freed after the >>>>>> attachment. >>>>>> + * In such cases, the buffer area of a direct mbuf must be read-only. >>>>>> But >>>>>> + * external buffer has its own refcnt and it starts from 1. Unless >>>>>> + * multiple mbufs are attached to a mbuf having an external buffer, >>>>>> the >>>>>> + * external buffer is writable. >>>>>> + * - There's no need to allocate buffer from a mempool. Any buffer can >>>>>> be >>>>>> + * attached with appropriate free callback and its IO address. >>>>>> + * - Smaller metadata is required to maintain shared data such as >>>>>> refcnt. >>>>>> + * >>>>>> + * @warning >>>>>> + * @b EXPERIMENTAL: This API may change without prior notice. >>>>>> + * Once external buffer is enabled by allowing experimental API, >>>>>> + * ``RTE_MBUF_DIRECT()`` and ``RTE_MBUF_INDIRECT()`` are no longer >>>>>> + * exclusive. A mbuf can be considered direct if it is neither indirect >>>>>> nor >>>>>> + * having external buffer. >>>>>> + * >>>>>> + * @param m >>>>>> + * The pointer to the mbuf. >>>>>> + * @param buf_addr >>>>>> + * The pointer to the external buffer we're attaching to. >>>>>> + * @param buf_iova >>>>>> + * IO address of the external buffer we're attaching to. >>>>>> + * @param buf_len >>>>>> + * The size of the external buffer we're attaching to. If memory for >>>>>> + * shared data is not provided, buf_len must be larger than the size >>>>>> of >>>>>> + * ``struct rte_mbuf_ext_shared_info`` and padding for alignment. If >>>>>> not >>>>>> + * enough, this function will return NULL. >>>>>> + * @param shinfo >>>>>> + * User-provided memory for shared data. If NULL, a few bytes in the >>>>>> + * trailer of the provided buffer will be dedicated for shared data >>>>>> and >>>>>> + * the shared data will be properly initialized. Otherwise, user must >>>>>> + * initialize the content except for free callback and its argument. >>>>>> The >>>>>> + * pointer of shared data will be stored in m->shinfo. >>>>>> + * @param free_cb >>>>>> + * Free callback function to call when the external buffer needs to be >>>>>> + * freed. >>>>>> + * @param fcb_opaque >>>>>> + * Argument for the free callback function. >>>>>> + * >>>>>> + * @return >>>>>> + * A pointer to the new start of the data on success, return NULL >>>>>> + * otherwise. >>>>>> + */ >>>>>> +static inline char * __rte_experimental >>>>>> +rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr, >>>>>> + rte_iova_t buf_iova, uint16_t buf_len, >>>>>> + struct rte_mbuf_ext_shared_info *shinfo, >>>>>> + rte_mbuf_extbuf_free_callback_t free_cb, void *fcb_opaque) >>>>>> +{ >>>>>> + /* Additional attachment should be done by rte_pktmbuf_attach() >>>>>> */ >>>>>> + RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m)); >>>>> >>>>> Shouldn't we have here something like: >>>>> RTE_ASSERT(RTE_MBUF_DIRECT(m) && rte_mbuf_refcnt_read(m) == 1); >>>>> ? >>>> >>>> Right. That's better. Attaching mbuf should be direct and writable. >>>> >>>>>> + >>>>>> + m->buf_addr = buf_addr; >>>>>> + m->buf_iova = buf_iova; >>>>>> + >>>>>> + if (shinfo == NULL) { >>>>> >>>>> Instead of allocating shinfo ourselves - wound's it be better to rely >>>>> on caller always allocating afeeling it for us (he can do that at the >>>>> end/start of buffer, >>>>> or whenever he likes to. >>>> >>>> It is just for convenience. For some users, external attachment could be >>>> occasional and casual, e.g. punt control traffic from kernel/hv. For such >>>> non-serious cases, it is good to provide this small utility. >>> >>> For such users that small utility could be a separate function then: >>> shinfo_inside_buf() or so. >> >> I like this idea! As this is an inline function and can be called in a >> datapath, >> shorter code is better if it isn't expected to be used frequently. >> >> Will take this idea for the new version. Thanks. > > However, if this API is called with shinfo=NULL (builtin constant), this code > block won't get included in compile time because it is an inline function.
Sorry, it was wrong. I said the exact opposite. Not enough sleep theses days. :-( If shinfo is passed, the code block will be included anyway. Please disregard the email. Yongseok > > What is disadvantage to keep this block here? More intuitive? > > Advantage of keeping it here could be simplicity. No need to call the utility > in > advance. > > Or separating this code to another inline function could make the API > prototype > simpler because free_cb and its arg should be passed via shinfo. > > static inline char * __rte_experimental > rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr, > rte_iova_t buf_iova, uint16_t buf_len, > struct rte_mbuf_ext_shared_info *shinfo) > > I'm still inclined to write the utility function like you suggested. > Thoughts? > > Thanks, > Yongseok > >>>>> Again in that case - caller can provide one shinfo to several mbufs (with >>>>> different buf_addrs) >>>>> and would know for sure that free_cb wouldn't be overwritten by mistake. >>>>> I.E. mbuf code will only update refcnt inside shinfo. >>>> >>>> I think you missed the discussion with other people yesterday. This change >>>> is >>>> exactly for that purpose. Like I documented above, if this API is called >>>> with >>>> shinfo being provided, it will use the user-provided shinfo instead of >>>> sparing a >>>> few byte in the trailer and won't touch the shinfo. >>> >>> As I can see your current code always update free_cb and fcb_opaque. >>> Which is kind of strange these fields shold be the same for all instances >>> of the shinfo.