On Sun, Aug 02, 2020 at 10:29:07PM +0200, Olivier Matz wrote:
> Hi,
> 
> On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote:
> > 
> > 
> > At 2020-07-31 23:15:43, "Olivier Matz" <olivier.m...@6wind.com> wrote:
> > >Hi,
> > >
> > >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y...@163.com wrote:
> > >> From: Yi Yang <yangy...@inspur.com>
> > >> 
> > >> In GSO case, segmented mbufs are attached to original
> > >> mbuf which can't be freed when it is external. The issue
> > >> is free_cb doesn't know original mbuf and doesn't free
> > >> it when refcnt of shinfo is 0.
> > >> 
> > >> Original mbuf can be freed by rte_pktmbuf_free segmented
> > >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of
> > >> cases should have different behaviors. free_cb won't
> > >> explicitly call rte_pktmbuf_free to free original mbuf
> > >> if it is freed by rte_pktmbuf_free original mbuf, but it
> > >> has to call rte_pktmbuf_free to free original mbuf if it
> > >> is freed by rte_pktmbuf_free segmented mbufs.
> > >> 
> > >> In order to fix this issue, free_cb interface has to been
> > >> changed, __rte_pktmbuf_free_extbuf must deliver called
> > >> mbuf pointer to free_cb, argument opaque can be defined
> > >> as a custom struct by user, it can includes original mbuf
> > >> pointer, user-defined free_cb can compare caller mbuf with
> > >> mbuf in opaque struct, free_cb should free original mbuf
> > >> if they are not same, this corresponds to rte_pktmbuf_free
> > >> segmented mbufs case, otherwise, free_cb won't free original
> > >> mbuf because the caller explicitly called rte_pktmbuf_free
> > >> to free it.
> > >> 
> > >> Here is pseduo code to show two kind of cases.
> > >> 
> > >> case 1. rte_pktmbuf_free segmented mbufs
> > >> 
> > >> nb_tx = rte_gso_segment(original_mbuf, /* original mbuf */
> > >>                       &gso_ctx,
> > >>                       /* segmented mbuf */
> > >>                       (struct rte_mbuf **)&gso_mbufs,
> > >>                       MAX_GSO_MBUFS);
> > >
> > >I'm sorry but it is not very clear to me what operations are done by
> > >rte_gso_segment().
> > >
> > >In the current code, I only see calls to rte_pktmbuf_attach(),
> > >which do not deal with external buffers. Am I missing something?
> > >
> > >Are you able to show the issue only with mbuf functions? It would
> > >be helpful to understand what does not work.
> > >
> > >
> > >Thanks,
> > >Olivier
> > >
> > Oliver, thank you for comment, let me show you why it doesn't work for my 
> > use case.  In OVS DPDK, VM uses vhostuserclient to send large packets whose 
> > size is about 64K because we enabled TSO & UFO, these large packets use 
> > rte_mbufs allocated by DPDK virtio_net function 
> > virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virtio_net.c. Please 
> > refer to [PATCH V1 3/3], I changed free_cb as below, these packets use the 
> > same allocate function and the same free_cb no matter they are TCP packet 
> > or UDP packets, in case of VXLAN TSO, most NICs can't support inner UDP 
> > fragment offload, so OVS DPDK has to do it by software, for UDP case, the 
> > original rte_mbuf only can be freed by segmented rte_mbufs which are output 
> > packets of rte_gso_segment, i.e. the original rte_mbuf only can freed by 
> > free_cb, you can see, it explicitly called rte_pktmbuf_free(arg->mbuf), the 
> > condition statement "if (caller_m != arg->mbuf)" is true for this case, 
> > this has no problem, but for TCP case, the original mbuf is delivered to 
> > rte_eth_tx_burst() but not segmented rte_mbufs output by rte_gso_segment, 
> > PMD driver will call rte_pktmbuf_free(original_rte_mbuf) but not 
> > rte_pktmbuf_free(segmented_rte_mbufs), the same free_cb will be called, 
> > that means original_rte_mbuf will be freed twice, you know what will 
> > happen, this is just the issue I'm fixing. I bring in caller_m argument, it 
> > can help work around this because caller_m is arg->mbuf and the condition 
> > statement "if (caller_m != arg->mbuf)" is false, you can't fix it without 
> > the change this patch series did.
> 
> I'm sill not sure to get your issue. Please, if you have a simple test
> case using only mbufs functions (without virtio, gso, ...), it would be
> very helpful because we will be sure that we are talking about the same
> thing. In case there is an issue, it can easily become a unit test.
> 
> That said, I looked at vhost mbuf allocation and gso segmentation, and
> I found some strange things:
> 
> 1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to create the
>    ext mbuf.
> 
>    a/ The first one stores the shinfo struct in the mbuf, basically
>       like this:
> 
>       pkt = rte_pktmbuf_alloc(mp);
>       shinfo = rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *);
>       buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
>       shinfo->free_cb = virtio_dev_extbuf_free;
>       shinfo->fcb_opaque = buf;
>       rte_mbuf_ext_refcnt_set(shinfo, 1);
> 
>       I don't think it is a good idea, because there is no guarantee that
>       the mbuf won't be freed before the buffer. For instance, doing
>       this will probably fail:
> 
>       pkt2 = rte_pktmbuf_alloc(mp);
>       rte_pktmbuf_attach(pkt2, pkt);
>       rte_pktmbuf_free(pkt);  /* pkt is freed, but it contains shinfo ! */
> 
>       To do this properly, the mbuf refcnt should be increased, and
>       the mbuf should be freed in the callback. But I don't think it's
>       worth doing it, given the second path (b/) looks good to me.
> 
>    b/ The second path stores the shinfo struct at the end of the
>       allocated buffer, like this:
> 
>       pkt = rte_pktmbuf_alloc(mp);
>       buf_len += sizeof(*shinfo) + sizeof(uintptr_t);
>       buf_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t));
>       buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
>       shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
>                                             virtio_dev_extbuf_free, buf);
> 
>       I think this is correct, because we have the guarantee that shinfo
>       exists as long as the buffer exists.
> 
> 2/ in rte_gso_segment(), there is a loop like this:
> 
>       while (pkt_seg) {
>               rte_mbuf_refcnt_update(pkt_seg, -1);
>               pkt_seg = pkt_seg->next;
>       }
> 
>    You change it to take in account the refcnt for ext mbufs.
> 
>    I may have missed something but I wonder why it is not simply:
> 
>       rte_pktmbuf_free(pkt_seg);

Here, I mean rte_pktmbuf_free(pkt)
(one free() call for the mbuf chain, not one per segment)

> 
>    It will decrease the proper refcnt, and free the mbufs if they
>    are not used.
> 
> Again, sorry if this is not the issue your are referring to, but
> in this case I really think that having a simple example code that
> shows the issue would help.
> 
> Regards,
> Olivier
> 
> 
> 
> > 
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c 
> > index e663fd4..3b69cbb 100644
> > 
> > --- a/lib/librte_vhost/virtio_net.c
> > 
> > +++ b/lib/librte_vhost/virtio_net.c
> > 
> > @@ -2136,10 +2136,20 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, 
> > uint16_t queue_id,
> > 
> >                 return NULL;
> > 
> >  }
> > 
> >  
> > 
> > +struct shinfo_arg {
> > 
> > +             void *buf;
> > 
> > +             struct rte_mbuf *mbuf;
> > 
> > +};
> > 
> > +
> > 
> >  static void
> > 
> > -virtio_dev_extbuf_free(struct rte_mbuf * caller_m __rte_unused, void 
> > *opaque)
> > 
> > +virtio_dev_extbuf_free(struct rte_mbuf *caller_m, void *opaque)
> > 
> >  {
> > 
> > -              rte_free(opaque);
> > 
> > +             struct shinfo_arg *arg = (struct shinfo_arg *)opaque;
> > 
> > +
> > 
> > +             rte_free(arg->buf);
> > 
> > +             if (caller_m != arg->mbuf)
> > 
> > +                             rte_pktmbuf_free(arg->mbuf);
> > 
> > +             rte_free(arg);
> > 
> >  }
> > 
> >  
> > 
> >  static int
> > 
> > @@ -2172,8 +2182,14 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, 
> > uint16_t queue_id,
> > 
> >  
> > 
> >                 /* Initialize shinfo */
> > 
> >                 if (shinfo) {
> > 
> > +                             struct shinfo_arg *arg = (struct shinfo_arg *)
> > 
> > +                                             rte_malloc(NULL, 
> > sizeof(struct shinfo_arg),
> > 
> > +                                                                
> > RTE_CACHE_LINE_SIZE);
> > 
> > +
> > 
> > +                             arg->buf = buf;
> > 
> > +                             arg->mbuf = pkt;
> > 
> >                                 shinfo->free_cb = virtio_dev_extbuf_free;
> > 
> > -                              shinfo->fcb_opaque = buf;
> > 
> > +                             shinfo->fcb_opaque = arg;
> > 
> >                                 rte_mbuf_ext_refcnt_set(shinfo, 1);
> > 
> >                 } else {
> > 
> >                                 shinfo = 
> > rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
> > 
> > --
> > 
> > 1.8.3.1
> > 
> > 
> > 
> > /*
> >  * Allocate a host supported pktmbuf.
> >  */
> > static __rte_always_inline struct rte_mbuf *
> > virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
> >                          uint32_t data_len)
> > {
> >         struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
> > 
> >         if (unlikely(pkt == NULL)) {
> >                 VHOST_LOG_DATA(ERR,
> >                         "Failed to allocate memory for mbuf.\n");
> >                 return NULL;
> >         }
> > 
> >         if (rte_pktmbuf_tailroom(pkt) >= data_len)
> >                 return pkt;
> > 
> >         /* attach an external buffer if supported */
> >         if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len))
> >                 return pkt;
> > 
> >         /* check if chained buffers are allowed */
> >         if (!dev->linearbuf)
> >                 return pkt;
> > 
> >         /* Data doesn't fit into the buffer and the host supports
> >          * only linear buffers
> >          */
> >         rte_pktmbuf_free(pkt);
> > 
> >         return NULL;
> > }
> >  

Reply via email to