> -----Original Message----- > From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com] > Sent: Wednesday, February 8, 2017 10:23 AM > To: Van Haaren, Harry <harry.van.haa...@intel.com> > Cc: dev@dpdk.org; Richardson, Bruce <bruce.richard...@intel.com>; Hunt, David > <david.h...@intel.com>; nipun.gu...@nxp.com; hemant.agra...@nxp.com; Eads, > Gage > <gage.e...@intel.com> > Subject: Re: [PATCH v2 15/15] app/test: add unit tests for SW eventdev driver
<snip> > Thanks for SW driver specific test cases. It provided me a good insight > of expected application behavior from SW driver perspective and in turn it > created > some challenge in portable applications. > > I would like highlight a main difference between the implementation and get a > consensus on how to abstract it? Thanks for taking the time to detail your thoughts - the examples certainly help to get a better picture of the whole. > Based on existing header file, We can do event pipelining in two different > ways > a) Flow-based event pipelining > b) queue_id based event pipelining > > I will provide an example to showcase application flow in both modes. > Based on my understanding from SW driver source code, it supports only > queue_id based event pipelining. I guess, Flow based event pipelining will > work semantically with SW driver but it will be very slow. > > I think, the reason for the difference is the capability of the context > definition. > SW model the context is - queue_id > Cavium HW model the context is queue_id + flow_id + sub_event_type + > event_type > > AFAIK, queue_id based event pipelining will work with NXP HW but I am not > sure about flow based event pipelining model with NXP HW. Appreciate any > input this? > > In Cavium HW, We support both modes. > > As an open question, Should we add a capability flag to advertise the > supported > models and let application choose the model based on implementation > capability. The > downside is, a small portion of stage advance code will be different but we > can reuse the STAGE specific application code(I think it a fair > trade off) > > Bruce, Harry, Gage, Hemant, Nipun > Thoughts? Or any other proposal? [HvH] Comments inline. > I will take an non trivial realworld NW use case show the difference. > A standard IPSec outbound processing will have minimum 4 to 5 stages > > stage_0: > -------- > a) Takes the pkts from ethdev and push to eventdev as > RTE_EVENT_OP_NEW > b) Some HW implementation, This will be done by HW. In SW implementation > it done by service cores > > stage_1:(ORDERED) > ------------------ > a) Receive pkts from stage_0 in ORDERED flow and it process in parallel on N > of cores > b) Find a SA belongs that packet move to next stage for SA specific > outbound operations.Outbound processing starts with updating the > sequence number in the critical section and followed by packet encryption in > parallel. > > stage_2(ATOMIC) based on SA > ---------------------------- > a) Update the sequence number and move to ORDERED sched_type for packet > encryption in parallel > > stage_3(ORDERED) based on SA > ---------------------------- > a) Encrypt the packets in parallel > b) Do output route look-up and figure out tx port and queue to transmit > the packet > c) Move to ATOMIC stage based on tx port and tx queue_id to transmit > the packet _without_ losing the ingress ordering > > stage_4(ATOMIC) based on tx port/tx queue > ----------------------------------------- > a) enqueue the encrypted packet to ethdev tx port/tx_queue > > > 1) queue_id based event pipelining > ================================= > > stage_1_work(assigned to event queue 1)# N ports/N cores establish > link to queue 1 through rte_event_port_link() > > on_each_cores_linked_to_queue1(stage1) [HvH] All worker cores can be linked to all stages - we do a lookup of what stage the work is based on the event->queue_id. > while(1) > { > /* STAGE 1 processing */ > nr_events = rte_event_dequeue_burst(ev,..); > if (!nr_events); > continue; > > sa = find_sa_from_packet(ev.mbuf); > > /* move to next stage(ATOMIC) */ > ev.event_type = RTE_EVENT_TYPE_CPU; > ev.sub_event_type = 2; > ev.sched_type = RTE_SCHED_TYPE_ATOMIC; > ev.flow_id = sa; > ev.op = RTE_EVENT_OP_FORWARD; > ev.queue_id = 2; > /* move to stage 2(event queue 2) */ > rte_event_enqueue_burst(ev,..); > } > > on_each_cores_linked_to_queue2(stage2) > while(1) > { > /* STAGE 2 processing */ > nr_events = rte_event_dequeue_burst(ev,..); > if (!nr_events); > continue; > > sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq > number update in > critical section */ > > /* move to next stage(ORDERED) */ > ev.event_type = RTE_EVENT_TYPE_CPU; > ev.sub_event_type = 3; > ev.sched_type = RTE_SCHED_TYPE_ORDERED; > ev.flow_id = sa; > ev.op = RTE_EVENT_OP_FORWARD; > ev.queue_id = 3; > /* move to stage 3(event queue 3) */ > rte_event_enqueue_burst(ev,..); > } > > on_each_cores_linked_to_queue3(stage3) > while(1) > { > /* STAGE 3 processing */ > nr_events = rte_event_dequeue_burst(ev,..); > if (!nr_events); > continue; > > sa_specific_ordered_processing(sa /*ev.flow_id */);/* packets > encryption in > parallel */ > > /* move to next stage(ATOMIC) */ > ev.event_type = RTE_EVENT_TYPE_CPU; > ev.sub_event_type = 4; > ev.sched_type = RTE_SCHED_TYPE_ATOMIC; > output_tx_port_queue = > find_output_tx_queue_and_tx_port(ev.mbuff); > ev.flow_id = output_tx_port_queue; > ev.op = RTE_EVENT_OP_FORWARD; > ev.queue_id = 4; > /* move to stage 4(event queue 4) */ > rte_event_enqueue_burst(ev,...); > } > > on_each_cores_linked_to_queue4(stage4) > while(1) > { > /* STAGE 4 processing */ > nr_events = rte_event_dequeue_burst(ev,..); > if (!nr_events); > continue; > > rte_eth_tx_buffer(); > } > > 2) flow-based event pipelining > ============================= > > - No need to partition queues for different stages > - All the cores can operate on all the stages, Thus enables > automatic multicore scaling, true dynamic load balancing, [HvH] The sw case is the same - all cores can map to all stages, the lookup for stage of work is the queue_id. > - Fairly large number of SA(kind of 2^16 to 2^20) can be processed in parallel > Something existing IPSec application has constraints on > http://dpdk.org/doc/guides-16.04/sample_app_ug/ipsec_secgw.html > > on_each_worker_cores() > while(1) > { > rte_event_dequeue_burst(ev,..) > if (!nr_events); > continue; > > /* STAGE 1 processing */ > if(ev.event_type == RTE_EVENT_TYPE_ETHDEV) { > sa = find_it_from_packet(ev.mbuf); > /* move to next stage2(ATOMIC) */ > ev.event_type = RTE_EVENT_TYPE_CPU; > ev.sub_event_type = 2; > ev.sched_type = RTE_SCHED_TYPE_ATOMIC; > ev.flow_id = sa; > ev.op = RTE_EVENT_OP_FORWARD; > rte_event_enqueue_burst(ev..); > > } else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == > 2) { /* stage 2 */ [HvH] In the case of software eventdev ev.queue_id is used instead of ev.sub_event_type - but this is the same lookup operation as mentioned above. I don't see a fundamental difference between these approaches? > > sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq > number update in critical > section */ > /* move to next stage(ORDERED) */ > ev.event_type = RTE_EVENT_TYPE_CPU; > ev.sub_event_type = 3; > ev.sched_type = RTE_SCHED_TYPE_ORDERED; > ev.flow_id = sa; > ev.op = RTE_EVENT_OP_FORWARD; > rte_event_enqueue_burst(ev,..); > > } else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == > 3) { /* stage 3 */ > > sa_specific_ordered_processing(sa /* ev.flow_id */);/* like > encrypting packets in > parallel */ > /* move to next stage(ATOMIC) */ > ev.event_type = RTE_EVENT_TYPE_CPU; > ev.sub_event_type = 4; > ev.sched_type = RTE_SCHED_TYPE_ATOMIC; > output_tx_port_queue = > find_output_tx_queue_and_tx_port(ev.mbuff); > ev.flow_id = output_tx_port_queue; > ev.op = RTE_EVENT_OP_FORWARD; > rte_event_enqueue_burst(ev,..); > > } else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == > 4) { /* stage 4 */ > rte_eth_tx_buffer(); > } > } > > /Jerin > Cavium