On Sun, Aug 02, 2020 at 10:29:07PM +0200, Olivier Matz wrote: > Hi, > > On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote: > > > > > > At 2020-07-31 23:15:43, "Olivier Matz" <olivier.m...@6wind.com> wrote: > > >Hi, > > > > > >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y...@163.com wrote: > > >> From: Yi Yang <yangy...@inspur.com> > > >> > > >> In GSO case, segmented mbufs are attached to original > > >> mbuf which can't be freed when it is external. The issue > > >> is free_cb doesn't know original mbuf and doesn't free > > >> it when refcnt of shinfo is 0. > > >> > > >> Original mbuf can be freed by rte_pktmbuf_free segmented > > >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of > > >> cases should have different behaviors. free_cb won't > > >> explicitly call rte_pktmbuf_free to free original mbuf > > >> if it is freed by rte_pktmbuf_free original mbuf, but it > > >> has to call rte_pktmbuf_free to free original mbuf if it > > >> is freed by rte_pktmbuf_free segmented mbufs. > > >> > > >> In order to fix this issue, free_cb interface has to been > > >> changed, __rte_pktmbuf_free_extbuf must deliver called > > >> mbuf pointer to free_cb, argument opaque can be defined > > >> as a custom struct by user, it can includes original mbuf > > >> pointer, user-defined free_cb can compare caller mbuf with > > >> mbuf in opaque struct, free_cb should free original mbuf > > >> if they are not same, this corresponds to rte_pktmbuf_free > > >> segmented mbufs case, otherwise, free_cb won't free original > > >> mbuf because the caller explicitly called rte_pktmbuf_free > > >> to free it. > > >> > > >> Here is pseduo code to show two kind of cases. > > >> > > >> case 1. rte_pktmbuf_free segmented mbufs > > >> > > >> nb_tx = rte_gso_segment(original_mbuf, /* original mbuf */ > > >> &gso_ctx, > > >> /* segmented mbuf */ > > >> (struct rte_mbuf **)&gso_mbufs, > > >> MAX_GSO_MBUFS); > > > > > >I'm sorry but it is not very clear to me what operations are done by > > >rte_gso_segment(). > > > > > >In the current code, I only see calls to rte_pktmbuf_attach(), > > >which do not deal with external buffers. Am I missing something? > > > > > >Are you able to show the issue only with mbuf functions? It would > > >be helpful to understand what does not work. > > > > > > > > >Thanks, > > >Olivier > > > > > Oliver, thank you for comment, let me show you why it doesn't work for my > > use case. In OVS DPDK, VM uses vhostuserclient to send large packets whose > > size is about 64K because we enabled TSO & UFO, these large packets use > > rte_mbufs allocated by DPDK virtio_net function > > virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virtio_net.c. Please > > refer to [PATCH V1 3/3], I changed free_cb as below, these packets use the > > same allocate function and the same free_cb no matter they are TCP packet > > or UDP packets, in case of VXLAN TSO, most NICs can't support inner UDP > > fragment offload, so OVS DPDK has to do it by software, for UDP case, the > > original rte_mbuf only can be freed by segmented rte_mbufs which are output > > packets of rte_gso_segment, i.e. the original rte_mbuf only can freed by > > free_cb, you can see, it explicitly called rte_pktmbuf_free(arg->mbuf), the > > condition statement "if (caller_m != arg->mbuf)" is true for this case, > > this has no problem, but for TCP case, the original mbuf is delivered to > > rte_eth_tx_burst() but not segmented rte_mbufs output by rte_gso_segment, > > PMD driver will call rte_pktmbuf_free(original_rte_mbuf) but not > > rte_pktmbuf_free(segmented_rte_mbufs), the same free_cb will be called, > > that means original_rte_mbuf will be freed twice, you know what will > > happen, this is just the issue I'm fixing. I bring in caller_m argument, it > > can help work around this because caller_m is arg->mbuf and the condition > > statement "if (caller_m != arg->mbuf)" is false, you can't fix it without > > the change this patch series did. > > I'm sill not sure to get your issue. Please, if you have a simple test > case using only mbufs functions (without virtio, gso, ...), it would be > very helpful because we will be sure that we are talking about the same > thing. In case there is an issue, it can easily become a unit test. > > That said, I looked at vhost mbuf allocation and gso segmentation, and > I found some strange things: > > 1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to create the > ext mbuf. > > a/ The first one stores the shinfo struct in the mbuf, basically > like this: > > pkt = rte_pktmbuf_alloc(mp); > shinfo = rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *); > buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > shinfo->free_cb = virtio_dev_extbuf_free; > shinfo->fcb_opaque = buf; > rte_mbuf_ext_refcnt_set(shinfo, 1); > > I don't think it is a good idea, because there is no guarantee that > the mbuf won't be freed before the buffer. For instance, doing > this will probably fail: > > pkt2 = rte_pktmbuf_alloc(mp); > rte_pktmbuf_attach(pkt2, pkt); > rte_pktmbuf_free(pkt); /* pkt is freed, but it contains shinfo ! */ > > To do this properly, the mbuf refcnt should be increased, and > the mbuf should be freed in the callback. But I don't think it's > worth doing it, given the second path (b/) looks good to me. > > b/ The second path stores the shinfo struct at the end of the > allocated buffer, like this: > > pkt = rte_pktmbuf_alloc(mp); > buf_len += sizeof(*shinfo) + sizeof(uintptr_t); > buf_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); > buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, > virtio_dev_extbuf_free, buf); > > I think this is correct, because we have the guarantee that shinfo > exists as long as the buffer exists. > > 2/ in rte_gso_segment(), there is a loop like this: > > while (pkt_seg) { > rte_mbuf_refcnt_update(pkt_seg, -1); > pkt_seg = pkt_seg->next; > } > > You change it to take in account the refcnt for ext mbufs. > > I may have missed something but I wonder why it is not simply: > > rte_pktmbuf_free(pkt_seg);
Here, I mean rte_pktmbuf_free(pkt) (one free() call for the mbuf chain, not one per segment) > > It will decrease the proper refcnt, and free the mbufs if they > are not used. > > Again, sorry if this is not the issue your are referring to, but > in this case I really think that having a simple example code that > shows the issue would help. > > Regards, > Olivier > > > > > > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > > index e663fd4..3b69cbb 100644 > > > > --- a/lib/librte_vhost/virtio_net.c > > > > +++ b/lib/librte_vhost/virtio_net.c > > > > @@ -2136,10 +2136,20 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, > > uint16_t queue_id, > > > > return NULL; > > > > } > > > > > > > > +struct shinfo_arg { > > > > + void *buf; > > > > + struct rte_mbuf *mbuf; > > > > +}; > > > > + > > > > static void > > > > -virtio_dev_extbuf_free(struct rte_mbuf * caller_m __rte_unused, void > > *opaque) > > > > +virtio_dev_extbuf_free(struct rte_mbuf *caller_m, void *opaque) > > > > { > > > > - rte_free(opaque); > > > > + struct shinfo_arg *arg = (struct shinfo_arg *)opaque; > > > > + > > > > + rte_free(arg->buf); > > > > + if (caller_m != arg->mbuf) > > > > + rte_pktmbuf_free(arg->mbuf); > > > > + rte_free(arg); > > > > } > > > > > > > > static int > > > > @@ -2172,8 +2182,14 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, > > uint16_t queue_id, > > > > > > > > /* Initialize shinfo */ > > > > if (shinfo) { > > > > + struct shinfo_arg *arg = (struct shinfo_arg *) > > > > + rte_malloc(NULL, > > sizeof(struct shinfo_arg), > > > > + > > RTE_CACHE_LINE_SIZE); > > > > + > > > > + arg->buf = buf; > > > > + arg->mbuf = pkt; > > > > shinfo->free_cb = virtio_dev_extbuf_free; > > > > - shinfo->fcb_opaque = buf; > > > > + shinfo->fcb_opaque = arg; > > > > rte_mbuf_ext_refcnt_set(shinfo, 1); > > > > } else { > > > > shinfo = > > rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, > > > > -- > > > > 1.8.3.1 > > > > > > > > /* > > * Allocate a host supported pktmbuf. > > */ > > static __rte_always_inline struct rte_mbuf * > > virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, > > uint32_t data_len) > > { > > struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); > > > > if (unlikely(pkt == NULL)) { > > VHOST_LOG_DATA(ERR, > > "Failed to allocate memory for mbuf.\n"); > > return NULL; > > } > > > > if (rte_pktmbuf_tailroom(pkt) >= data_len) > > return pkt; > > > > /* attach an external buffer if supported */ > > if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len)) > > return pkt; > > > > /* check if chained buffers are allowed */ > > if (!dev->linearbuf) > > return pkt; > > > > /* Data doesn't fit into the buffer and the host supports > > * only linear buffers > > */ > > rte_pktmbuf_free(pkt); > > > > return NULL; > > } > >