On 20/10/2015 11:41, "David Evans" <davidjoshuaev...@gmail.com> wrote:
>Thanks Daniele > >may_steal sounds like the right thing, but i couldn’t see how it’s set by >the group OFPGT11_ALL code path. >Also - maybe refcnt + cloning would be a more efficient option (assuming >no packet mod’s would occur after the group) The datapath (dpif-netdev, in this case) doesn't deal with OFPGT11_ALL (or any OpenFlow), it just deals with a list of what we call ODP (or datapath) actions. The ofproto layer translates OpenFlow actions into ODP (or datapath) actions. If a packet need to multiple destination, the ODP action list will contain more than one OVS_ACTION_ATTR_OUTPUT. The code in odp_execute_actions() (called by the datapath), contains the logic to properly set `may_steal` for each call to dp_execute_cb(): only if there are no more actions `may_steal` is set to true, otherwise it will be false and the packet will be copied eventually. If you're not familiar with the distinction between the ofproto layer and the datapath, I would advise taking a look at the paper that Ben suggested. >My main concern is where there are multiple packets for a flow where it >may not go through this path for the 2nd and following packets. >to refcnt and or clone the packets for each output. I'm not sure I understand your concern. >This use case may not really be a part of your general switching >direction though. I don’t expect that OFPGT11_ALL is a really popular use >case. > >Thanks for replying > >Dave. > >> On Oct 20, 2015, at 1:33 PM, Daniele Di Proietto >><diproiet...@vmware.com> wrote: >> >> Hi, >> >> Currently every DPDK mbuf in OVS has the `refcnt` set to one. Output to >> multiple ports is handled by making a copy of the packet's payload (see >> `may_steal` in dp_netdev_execute_actions(), and in netdev_send()). >> >> You're right, having a `refcnt` != 1 might be necessary to use >> rte_ipv4_fragment_packet() or to support certain offloading capabilities >> (currently not implemented in OVS). >> >> Does this answer you question? >> >> Daniele >> >> >> On 15/10/2015 12:41, "Ben Pfaff" <b...@nicira.com> wrote: >> >>> I don't understand what you're asking for. >>> >>> Daniele or Pravin, I think that you know the DPDK datapath well. Do >>>you >>> understand what David wants or why? >>> >>> On Thu, Oct 15, 2015 at 01:15:11PM -0500, David Evans wrote: >>>> Thanks Ben, >>>> >>>> If that¹s the case, then it would be better to be adding custom action >>>> that applies prior to this group action, to update the refcnt. >>>> >>>> I expect it just has to happen some time before the first PMD has >>>> finished processing the packet so that the packet does not get deleted >>>> by the tx routine before other PMD¹s have seen the packet. >>>> >>>> Cheers >>>> Dave. >>>> >>>> >>>> >>>>> On Oct 12, 2015, at 12:23 PM, Ben Pfaff <b...@nicira.com> wrote: >>>>> >>>>> Your change isn't going to have much effect because most packets >>>>>don't >>>>> go through the translation process. If you try to force all packets >>>>> through translation, it will kill performance. >>>>> >>>>> I think that you should read this paper that describes the various >>>>> caching layers in Open vSwitch: >>>>> >>>> >>>>https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_sup >>>>po >>>> >>>>rt_papers_nsdi2015.pdf&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNt >>>>Xt >>>> >>>>-uEs&r=SmB5nZacmXNq0gKCC1s_Cw5yUNjxgD4v5kJqZ2uWLlE&m=xaCdqbPumJKYqzipA2 >>>>A5 >>>> >>>>CYRDmbv1Q_lFRe2Aw2_bqpQ&s=GdugumekoH_nwJ4XnY2ip92yy-YoGNIV8Rj_tQkQ_b0&e >>>>= >>>>> >>>>> On Mon, Oct 12, 2015 at 11:56:03AM -0500, David Evans wrote: >>>>>> Hi Ben, >>>>>> >>>>>> When i use the OFPGT11_ALL group action, the packets for a flow >>>> will be sent out all buckets in a group. (in my case all the buckets >>>>are >>>> ports to transmit out) >>>>>> >>>>>> I added a group_bucket_count to the context >>>>>> and >>>>>> in xlate_all_group fn the following. >>>>>> >>>>>> group_dpif_get_buckets(group, &buckets); >>>>>> + if(ctx->group_bucket_count == 0){ >>>>>> + LIST_FOR_EACH (bucket, list_node, buckets) { >>>>>> + ctx->group_bucket_count++; >>>>>> + } >>>>>> + } >>>>>> + if(ctx->xin->packet) >>>>>> + if(ctx->xin->packet->source == DPBUF_DPDK) >>>>>> + >>>> >>>> >>>> rte_pktmbuf_refcnt_update(&ctx->xin->packet->mbuf,ctx->group_bucket_c >>>>ou >>>> nt); >>>>>> LIST_FOR_EACH (bucket, list_node, buckets) { >>>>>> >>>>>> this stops the transmit pmd¹s attempting to free the packet until >>>> all the buckets( ports ) have transmitted it. >>>>>> My switch also does reassembly on rx - this refcnt is necessary for >>>> handling multi-segment dpdk buffers too. >>>>>> I also changed the segment free to rte_pktmbuf_free in netdev-dpdk.c >>>> for this purpose. >>>>>> I¹m expecting it will also be important for tso or the possibility >>>> of using rte_ipv4_fragment_packet on an outgoing port. >>>>>> >>>>>> i have between 6 and 12 PMD¹s depending on the number of dpdk ports >>>> running at any time, and if i use OFPGT11_ALL with many output >>>> buckets(ports) buffers will disappear from under some pmd¹s and cause >>>> segfaults etc.. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Dave. >>>>>> >>>>>>> On Oct 12, 2015, at 11:38 AM, Ben Pfaff <b...@nicira.com> wrote: >>>>>>> >>>>>>> On Wed, Oct 07, 2015 at 05:36:18PM -0500, David Evans wrote: >>>>>>>> While using netdev-dpdk - When i add a rule for which the action >>>> is to >>>>>>>> send to a group (type=all) containing (x) output buckets (ports) >>>> how >>>>>>>> can i increment the dp_packet->pkt_mbuf¹s refcnt to (x) so that >>>>>>>>the >>>>>>>> packet is not deleted before it has transmitted all ports(buckets) >>>> in >>>>>>>> the group. >>>>>>>> >>>>>>>> Perhaps in ofproto-dpif-xlate.c function xlate_all_group find the >>>>>>>> packet and apply the ctx->xin->packet->mbuf->refcnt ? Will that >>>> work >>>>>>>> for all packets for a ctx? >>>>>>> >>>>>>> I don't understand what relationship you expect here. A group has >>>> no >>>>>>> direct relationship to a packet. Translation produces a flat list >>>> of >>>>>>> simple actions that don't refer back to the group. >>>>>> >>>> >> > _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss