At 2020-08-03 04:45:12, "Olivier Matz" <olivier.m...@6wind.com> wrote:
>On Sun, Aug 02, 2020 at 10:29:07PM +0200, Olivier Matz wrote:
>> Hi,
>> 
>> On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote:
>> > 
>> > 
>> > At 2020-07-31 23:15:43, "Olivier Matz" <olivier.m...@6wind.com> wrote:
>> > >Hi,
>> > >
>> > >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y...@163.com wrote:
>> > >> From: Yi Yang <yangy...@inspur.com>
>> > >> 
>> > >> In GSO case, segmented mbufs are attached to original
>> > >> mbuf which can't be freed when it is external. The issue
>> > >> is free_cb doesn't know original mbuf and doesn't free
>> > >> it when refcnt of shinfo is 0.
>> > >> 
>> > >> Original mbuf can be freed by rte_pktmbuf_free segmented
>> > >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of
>> > >> cases should have different behaviors. free_cb won't
>> > >> explicitly call rte_pktmbuf_free to free original mbuf
>> > >> if it is freed by rte_pktmbuf_free original mbuf, but it
>> > >> has to call rte_pktmbuf_free to free original mbuf if it
>> > >> is freed by rte_pktmbuf_free segmented mbufs.
>> > >> 
>> > >> In order to fix this issue, free_cb interface has to been
>> > >> changed, __rte_pktmbuf_free_extbuf must deliver called
>> > >> mbuf pointer to free_cb, argument opaque can be defined
>> > >> as a custom struct by user, it can includes original mbuf
>> > >> pointer, user-defined free_cb can compare caller mbuf with
>> > >> mbuf in opaque struct, free_cb should free original mbuf
>> > >> if they are not same, this corresponds to rte_pktmbuf_free
>> > >> segmented mbufs case, otherwise, free_cb won't free original
>> > >> mbuf because the caller explicitly called rte_pktmbuf_free
>> > >> to free it.
>> > >> 
>> > >> Here is pseduo code to show two kind of cases.
>> > >> 
>> > >> case 1. rte_pktmbuf_free segmented mbufs
>> > >> 
>> > >> nb_tx = rte_gso_segment(original_mbuf, /* original mbuf */
>> > >>                       &gso_ctx,
>> > >>                       /* segmented mbuf */
>> > >>                       (struct rte_mbuf **)&gso_mbufs,
>> > >>                       MAX_GSO_MBUFS);
>> > >
>> > >I'm sorry but it is not very clear to me what operations are done by
>> > >rte_gso_segment().
>> > >
>> > >In the current code, I only see calls to rte_pktmbuf_attach(),
>> > >which do not deal with external buffers. Am I missing something?
>> > >
>> > >Are you able to show the issue only with mbuf functions? It would
>> > >be helpful to understand what does not work.
>> > >
>> > >
>> > >Thanks,
>> > >Olivier
>> > >
>> > Oliver, thank you for comment, let me show you why it doesn't work for my 
>> > use case.  In OVS DPDK, VM uses vhostuserclient to send large packets 
>> > whose size is about 64K because we enabled TSO & UFO, these large packets 
>> > use rte_mbufs allocated by DPDK virtio_net function 
>> > virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virtio_net.c. Please 
>> > refer to [PATCH V1 3/3], I changed free_cb as below, these packets use the 
>> > same allocate function and the same free_cb no matter they are TCP packet 
>> > or UDP packets, in case of VXLAN TSO, most NICs can't support inner UDP 
>> > fragment offload, so OVS DPDK has to do it by software, for UDP case, the 
>> > original rte_mbuf only can be freed by segmented rte_mbufs which are 
>> > output packets of rte_gso_segment, i.e. the original rte_mbuf only can 
>> > freed by free_cb, you can see, it explicitly called 
>> > rte_pktmbuf_free(arg->mbuf), the condition statement "if (caller_m != 
>> > arg->mbuf)" is true for this case, this has no problem, but for TCP case, 
>> > the original mbuf is delivered to rte_eth_tx_burst() but not segmented 
>> > rte_mbufs output by rte_gso_segment, PMD driver will call 
>> > rte_pktmbuf_free(original_rte_mbuf) but not 
>> > rte_pktmbuf_free(segmented_rte_mbufs), the same free_cb will be called, 
>> > that means original_rte_mbuf will be freed twice, you know what will 
>> > happen, this is just the issue I'm fixing. I bring in caller_m argument, 
>> > it can help work around this because caller_m is arg->mbuf and the 
>> > condition statement "if (caller_m != arg->mbuf)" is false, you can't fix 
>> > it without the change this patch series did.
>> 
>> I'm sill not sure to get your issue. Please, if you have a simple test
>> case using only mbufs functions (without virtio, gso, ...), it would be
>> very helpful because we will be sure that we are talking about the same
>> thing. In case there is an issue, it can easily become a unit test.
>> 
>> That said, I looked at vhost mbuf allocation and gso segmentation, and
>> I found some strange things:
>> 
>> 1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to create the
>>    ext mbuf.
>> 
>>    a/ The first one stores the shinfo struct in the mbuf, basically
>>       like this:
>> 
>>      pkt = rte_pktmbuf_alloc(mp);
>>      shinfo = rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *);
>>      buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
>>      shinfo->free_cb = virtio_dev_extbuf_free;
>>      shinfo->fcb_opaque = buf;
>>      rte_mbuf_ext_refcnt_set(shinfo, 1);
>> 
>>       I don't think it is a good idea, because there is no guarantee that
>>       the mbuf won't be freed before the buffer. For instance, doing
>>       this will probably fail:
>> 
>>      pkt2 = rte_pktmbuf_alloc(mp);
>>      rte_pktmbuf_attach(pkt2, pkt);
>>      rte_pktmbuf_free(pkt);  /* pkt is freed, but it contains shinfo ! */
>> 
>>       To do this properly, the mbuf refcnt should be increased, and
>>       the mbuf should be freed in the callback. But I don't think it's
>>       worth doing it, given the second path (b/) looks good to me.
>> 
>>    b/ The second path stores the shinfo struct at the end of the
>>       allocated buffer, like this:
>> 
>>      pkt = rte_pktmbuf_alloc(mp);
>>      buf_len += sizeof(*shinfo) + sizeof(uintptr_t);
>>      buf_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t));
>>      buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
>>      shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
>>                                            virtio_dev_extbuf_free, buf);
>> 
>>       I think this is correct, because we have the guarantee that shinfo
>>       exists as long as the buffer exists.
>> 
>> 2/ in rte_gso_segment(), there is a loop like this:
>> 
>>      while (pkt_seg) {
>>              rte_mbuf_refcnt_update(pkt_seg, -1);
>>              pkt_seg = pkt_seg->next;
>>      }
>> 
>>    You change it to take in account the refcnt for ext mbufs.
>> 
>>    I may have missed something but I wonder why it is not simply:
>> 
>>      rte_pktmbuf_free(pkt_seg);
>
>Here, I mean rte_pktmbuf_free(pkt)
>(one free() call for the mbuf chain, not one per segment)

Oliver, what rte_gso_segment does do here is just to decrease refcnt by one in 
order that
last segmented mbuf free can trigger free_cb call, it is only one goal of the 
code here,
rte_gso_segment just segments a large mbuf to multiple small mbufs, it won't 
handle anything
else.

Reply via email to