On Mon, Aug 03, 2020 at 09:26:40AM +0800, yang_y_yi wrote: > At 2020-08-03 04:29:07, "Olivier Matz" <olivier.m...@6wind.com> wrote: > >Hi, > > > >On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote: > >> > >> > >> At 2020-07-31 23:15:43, "Olivier Matz" <olivier.m...@6wind.com> wrote: > >> >Hi, > >> > > >> >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y...@163.com wrote: > >> >> From: Yi Yang <yangy...@inspur.com> > >> >> > >> >> In GSO case, segmented mbufs are attached to original > >> >> mbuf which can't be freed when it is external. The issue > >> >> is free_cb doesn't know original mbuf and doesn't free > >> >> it when refcnt of shinfo is 0. > >> >> > >> >> Original mbuf can be freed by rte_pktmbuf_free segmented > >> >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of > >> >> cases should have different behaviors. free_cb won't > >> >> explicitly call rte_pktmbuf_free to free original mbuf > >> >> if it is freed by rte_pktmbuf_free original mbuf, but it > >> >> has to call rte_pktmbuf_free to free original mbuf if it > >> >> is freed by rte_pktmbuf_free segmented mbufs. > >> >> > >> >> In order to fix this issue, free_cb interface has to been > >> >> changed, __rte_pktmbuf_free_extbuf must deliver called > >> >> mbuf pointer to free_cb, argument opaque can be defined > >> >> as a custom struct by user, it can includes original mbuf > >> >> pointer, user-defined free_cb can compare caller mbuf with > >> >> mbuf in opaque struct, free_cb should free original mbuf > >> >> if they are not same, this corresponds to rte_pktmbuf_free > >> >> segmented mbufs case, otherwise, free_cb won't free original > >> >> mbuf because the caller explicitly called rte_pktmbuf_free > >> >> to free it. > >> >> > >> >> Here is pseduo code to show two kind of cases. > >> >> > >> >> case 1. rte_pktmbuf_free segmented mbufs > >> >> > >> >> nb_tx = rte_gso_segment(original_mbuf, /* original mbuf */ > >> >> &gso_ctx, > >> >> /* segmented mbuf */ > >> >> (struct rte_mbuf **)&gso_mbufs, > >> >> MAX_GSO_MBUFS); > >> > > >> >I'm sorry but it is not very clear to me what operations are done by > >> >rte_gso_segment(). > >> > > >> >In the current code, I only see calls to rte_pktmbuf_attach(), > >> >which do not deal with external buffers. Am I missing something? > >> > > >> >Are you able to show the issue only with mbuf functions? It would > >> >be helpful to understand what does not work. > >> > > >> > > >> >Thanks, > >> >Olivier > >> > > >> Oliver, thank you for comment, let me show you why it doesn't work for my > >> use case. In OVS DPDK, VM uses vhostuserclient to send large packets > >> whose size is about 64K because we enabled TSO & UFO, these large packets > >> use rte_mbufs allocated by DPDK virtio_net function > >> virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virtio_net.c. Please > >> refer to [PATCH V1 3/3], I changed free_cb as below, these packets use the > >> same allocate function and the same free_cb no matter they are TCP packet > >> or UDP packets, in case of VXLAN TSO, most NICs can't support inner UDP > >> fragment offload, so OVS DPDK has to do it by software, for UDP case, the > >> original rte_mbuf only can be freed by segmented rte_mbufs which are > >> output packets of rte_gso_segment, i.e. the original rte_mbuf only can > >> freed by free_cb, you can see, it explicitly called > >> rte_pktmbuf_free(arg->mbuf), the condition statement "if (caller_m != > >> arg->mbuf)" is true for this case, this has no problem, but for TCP case, > >> the original mbuf is delivered to rte_eth_tx_burst() but not segmented > >> rte_mbufs output by rte_gso_segment, PMD driver will call > >> rte_pktmbuf_free(original_rte_mbuf) but not > >> rte_pktmbuf_free(segmented_rte_mbufs), the same free_cb will be called, > >> that means original_rte_mbuf will be freed twice, you know what will > >> happen, this is just the issue I'm fixing. I bring in caller_m argument, > >> it can help work around this because caller_m is arg->mbuf and the > >> condition statement "if (caller_m != arg->mbuf)" is false, you can't fix > >> it without the change this patch series did. > > > >I'm sill not sure to get your issue. Please, if you have a simple test > >case using only mbufs functions (without virtio, gso, ...), it would be > >very helpful because we will be sure that we are talking about the same > >thing. In case there is an issue, it can easily become a unit test. > > Oliver, I think you don't get the point, free operation can't be controlled > by the application itself, > it is done by PMD driver and triggered by rte_eth_tx_burst, I have shown > pseudo code, > rte_gso_segment just segments a large mbuf to multiple mbufs, it won't send > them, the application > will call rte_eth_tx_burst to send them finally. > > > > >That said, I looked at vhost mbuf allocation and gso segmentation, and > >I found some strange things: > > > >1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to create the > > ext mbuf. > > > > a/ The first one stores the shinfo struct in the mbuf, basically > > like this: > > > > pkt = rte_pktmbuf_alloc(mp); > > shinfo = rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *); > > buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > > shinfo->free_cb = virtio_dev_extbuf_free; > > shinfo->fcb_opaque = buf; > > rte_mbuf_ext_refcnt_set(shinfo, 1); > > > > I don't think it is a good idea, because there is no guarantee that > > the mbuf won't be freed before the buffer. For instance, doing > > this will probably fail: > > > > pkt2 = rte_pktmbuf_alloc(mp); > > rte_pktmbuf_attach(pkt2, pkt); > > rte_pktmbuf_free(pkt); /* pkt is freed, but it contains shinfo ! */ > > pkt is created by the application I can control, so I can control it where it > will be freed, right?
This example shows that mbufs allocated like this by the vhost driver are not constructed correctly. If an application attach a new packet (pkt2) to it and frees the original one (pkt), it may result in a memory corruption. Of course, to be tested and confirmed. > > > > > To do this properly, the mbuf refcnt should be increased, and > > the mbuf should be freed in the callback. But I don't think it's > > worth doing it, given the second path (b/) looks good to me. > > > > b/ The second path stores the shinfo struct at the end of the > > allocated buffer, like this: > > > > pkt = rte_pktmbuf_alloc(mp); > > buf_len += sizeof(*shinfo) + sizeof(uintptr_t); > > buf_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); > > buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > > shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, > > virtio_dev_extbuf_free, buf); > > > > I think this is correct, because we have the guarantee that shinfo > > exists as long as the buffer exists. > > What buffer does the allocated buffer you're saying here? The issue we're > discussing how we can > free original mbuf which owns shinfo buffer. I don't get your question. I'm just saying that this code path looks correct, compared to the previous one. > > > > >2/ in rte_gso_segment(), there is a loop like this: > > > > while (pkt_seg) { > > rte_mbuf_refcnt_update(pkt_seg, -1); > > pkt_seg = pkt_seg->next; > > } > > > > You change it to take in account the refcnt for ext mbufs. > > > > I may have missed something but I wonder why it is not simply: > > > > rte_pktmbuf_free(pkt_seg); > > > > It will decrease the proper refcnt, and free the mbufs if they > > are not used. > > Again, rte_gso_segment just decreases refcnt by one, this will ensure the > last segmented > mbuf free will trigger freeing original mbuf (only free_cb can do this). rte_pktmbuf_free() will also decerase the refcnt, and free the resources when the refcnt reaches 0. It has some advantages compared to decrease the reference counter of all segments: - no need to iterate the segments, there is only one function call - no need to have a special case for ext mbufs like you did in your patch - it may be safer, in case some segments have a refcnt == 1, because resources will be freed. > >Again, sorry if this is not the issue your are referring to, but > >in this case I really think that having a simple example code that > >shows the issue would help. > > Oliver, my statement in the patch I sent out has pseudo code to show this. I > don't think a simple > unit test can show it. I don't see why. The PMDs and the libraries use the mbuf functions, why a unit test couldn't call the same functions? > Let me summarize it here again. For original mbuf, there are two cases freeing > it, case one is PMD driver calls free against segmented mbufs, last segmented > mbuf free will trigger > free_cb call which will free original large & extended mbuf. OK > Case two is PMD driver will call free against > original mbuf, that also will call free_cb to free attached extended buffer. OK And what makes that case 1 or case 2 is executed? > In case one free_cb must call > rte_pktmbuf_free otherwise nobody will free original large & extended mbuf, > in case two free_cb can't > call rte_pktmbuf_free because the caller calling it is just rte_pktmbuf_free > we need. That is to say, you > must use the same free_cb to handle these two cases, this is my issue and the > point you don't get. I think there is no need to change the free_cb API. It should work like this: - virtio creates the original external mbuf (orig_m) - gso will create a new mbuf referencing the external buffer (new_m) At this point, the shinfo has a refcnt of 2. The large buffer will be freed as soon as rte_pktmbuf_free() is called on orig_m and new_m, whatever the order. Regards, Olivier