>> > >> >> > >> > >> >> > + @Van Haaren, Harry >> > >> > >> > >> >Hi All, >> > >> > >> > >> >I have been away on vacation for the last week - hence the delay >> > >> >in reply on this thread. >> > >> > >> > >> ><snip discussion> >> > >> > >> > >> >> > > [1] >> > >> >> > > Steps to reproduce: >> > >> >> > > * Clone https://urldefense.proofpoint.com/v2/url?u=http- >> > >> >3A__dpdk.org_git_next_dpdk-2Dnext- >> > >> >> >2Deventdev&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=G9w4KsPaQLAC >> > >> BfGCL >> > >> >35PtiRH996yqJDxAZwrWegU2qQ&m=-yaLm_cvg5cKTbBy3OoUs719W- >> > >> >> >E3ARETajJQmUvoE4aSAPjcEn1kulkRNxTn841D&s=lZjsn2zecck8IBBQRA7fId7 >> > >> BXSYKk >> > >> >U8Tjj10gNQLB6U&e= >> > >> >> > > * Apply [v5] app/eventdev: add crypto producer mode >> > >> >> > > git-pw --server >> > >> >> > > https://urldefense.proofpoint.com/v2/url?u=https- >> > >> >> >3A__patches.dpdk.org_api_1.2_&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtf >> > >> Q&r=G >> > >> >9w4KsPaQLACBfGCL35PtiRH996yqJDxAZwrWegU2qQ&m=- >> > >> >yaLm_cvg5cKTbBy3OoUs719W- >> > >> >> >E3ARETajJQmUvoE4aSAPjcEn1kulkRNxTn841D&s=VBQtpQ8vwHt9BnMrPLz >> > >> SneOm >> > >> >zhLdP5bfyLuY42fCnak&e= --project dpdk >> > >> >> > > patch apply 107645 >> > >> >> > > * Apply [RFC] app/eventdev: add software crypto adapter >> support >> > >> >> > > git-pw --server >> > >> >> > > https://urldefense.proofpoint.com/v2/url?u=https- >> > >> >> >3A__patches.dpdk.org_api_1.2_&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtf >> > >> Q&r=G >> > >> >9w4KsPaQLACBfGCL35PtiRH996yqJDxAZwrWegU2qQ&m=- >> > >> >yaLm_cvg5cKTbBy3OoUs719W- >> > >> >> >E3ARETajJQmUvoE4aSAPjcEn1kulkRNxTn841D&s=VBQtpQ8vwHt9BnMrPLz >> > >> SneOm >> > >> >zhLdP5bfyLuY42fCnak&e= --project dpdk >> > >> >> > > patch apply 107029 >> > >> >> > > * meson x86_build_debug -Dc_args='-g -O0' - >> > >> Ddisable_drivers="*/cnxk" >> > >> >> > > * ninja -C x86_build_debug >> > >> >> > > * Command to reproduce crash >> > >> >> > > sudo ./x86_build_debug/app/dpdk-test-eventdev -l 0-8 -s >> > >> >> > > 0xf0 >> > >> >> > > --vdev=event_sw0 --vdev="crypto_null" -- >> > >> >> > > --prod_type_cryptodev --crypto_adptr_mode 0 >> > >> >> > > --test=perf_queue --stlist=a --wlcores 1 --plcores 2 >> > >> > >> > >> >Can confirm that these steps indeed cause segfault as reported. >> > >> > >> > >> >In debugging, it seems like there are *zero* NEW events, and large >> > >> >numbers of RELEASE events are enqueued... if so, this is not >> > >> >compliant to >> > >> the Eventdev API. >> > >> >Can somebody confirm that? >> > >> > >> > >> >The SW PMD is being told there are events to release, but there aren't >> any. >> > >> >Eventually, this leads to a mismatch in credit allocations, which >> > >> >then causes the IQ-chunks datastructure to corrupt. >> > >> > >> > >> >All in all, I'm not convinced this is a SW PMD issue yet - initial >> > >> >testing points to incorrect event OP NEW/FWD/RELEASE usage. Can we >> > >> >verify that the OPs being sent are correct? >> > >> > >> > >> >> > >> Looks like an issue in crypto adapter service. The service is >> > >> starting with OP_FORWARD, if >> RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE is set. >> > >> Abhinandan can confirm. >> > > >> > >The service is started with what application is requesting for from the >> adapter. >> > >The app can request either OP_NEW or FWD mode. Adapter while >> creating >> > >a >> > new >> > >instance >> > >requests for evendev caps & based on that adapter enqueues events >> > >back to evdev in FWD or NEW mode. All events are triggered by >> > >application and adapter is transparent here. Could you please explain >> > >me how this creating an issue? >> > > >> > >> > In lib/eventdev/rte_event_crypto_adapter.c: >> > ... >> > eca_ops_enqueue_burst(struct event_crypto_adapter *adapter, ... >> > rte_memcpy(ev, &m_data->response_info, sizeof(*ev)); >> > ev->event_ptr = ops[i]; >> > ev->event_type = RTE_EVENT_TYPE_CRYPTODEV; >> > if (adapter->implicit_release_disabled) >> > ev->op = RTE_EVENT_OP_FORWARD; >> > else >> > ev->op = RTE_EVENT_OP_NEW; ... >> > >> > op and event_type is set in the service. Changing FORWARD to NEW will >> > fix the crash. >> >> Yes, I think that is true, but lets ensure we're all understanding the >> reason. >> >> The crash reported occurs when events with FORWARD are sent into the SW >> PMD, and later those are RELEASED. Notice, the event was never *NEW*. >> >> Eventdev demands that when adding "new" things (e.g. events not >> previously seen by the PMD) into the Eventdev instance, the type of the >> event must be NEW. The NEW op type consumes "credits" in the SW PMD, >> and causes tracking for the NEW events. >> >> I think that here the events *starts* with FORWARD events (should be NEW), >> and hence the crash occurs, because the NEW type was never enqueued >> first. >> >> Shijith suggests changing FORWARD to NEW to fix the crash, I believe that >> may *fix* the crash here, but doing so without consideration for "implicit- >> release" mode may break things elsewhere. >> >> Is the better fix to ensure that any events being enqueued into Eventdev for >> the first time are of a NEW type, and once circulated, either FORWARD or >> NEW can be used in a valid way? >> >> >> > I think, we should update the spec with what all values are used in >> response info. >> > I will remove setting op/event_type field of response info in the >> application. >> > PMD/service can take care of it. >> >> I'm not familiar with how the adapter/pmd/service interact - no input from >> me. > >Harry and Shijith, Thanks for all the observations. > >After debugging, I think the changes are required in both adapter and >application: >1. Application/Adapter in FWD mode case: The app is forming FWD events as an >event originator (it is supposed to form NEW events) which is causing the >crash! >App fix: >root@dev:/home/intel/abhi/dpdk-next-eventdev# >In crypto_adapter_enq_op_fwd() -> change as below: >- ev.op = RTE_EVENT_OP_FORWARD; >+ ev.op = RTE_EVENT_OP_NEW; > ev.queue_id = p->queue_id; > ev.sched_type = RTE_SCHED_TYPE_ATOMIC; > Will send v7 with this change + changes to not set op and event_type in application.
>2. Adapter in NEW mode case: The app is calls rte_cryptodev_enqueue_burst() >and directly enqueue crypto ops. Adapter had no clue crypto ops were derived >from events or they were directly enqueued by application. >So, below is the fix for that: >root@dev:/home/intel/abhi/dpdk-next-eventdev# git diff >lib/eventdev/rte_event_crypto_adapter.c >diff --git a/lib/eventdev/rte_event_crypto_adapter.c >b/lib/eventdev/rte_event_crypto_adapter.c >index 0b484f3695..a6328b853d 100644 >--- a/lib/eventdev/rte_event_crypto_adapter.c >+++ b/lib/eventdev/rte_event_crypto_adapter.c >@@ -658,7 +658,9 @@ eca_ops_enqueue_burst(struct event_crypto_adapter >*adapter, > rte_memcpy(ev, &m_data->response_info, sizeof(*ev)); > ev->event_ptr = ops[i]; > ev->event_type = RTE_EVENT_TYPE_CRYPTODEV; >- if (adapter->implicit_release_disabled) >+ if (adapter->mode == RTE_EVENT_CRYPTO_ADAPTER_OP_NEW) >+ ev->op = RTE_EVENT_OP_NEW; >+ else if (adapter->implicit_release_disabled) > ev->op = RTE_EVENT_OP_FORWARD; > else > ev->op = RTE_EVENT_OP_NEW; > > >With the above fix, I can run the test for both NEW and FWD mode: > >root@xdp-dev:/home/intel/abhi/dpdk-next-eventdev/abhi# ./app/dpdk-test- >eventdev -l 0-8 -s 0xf0 --vdev=event_sw0 --vdev="crypto_null" -- -- >prod_type_cryptodev --crypto_adptr_mode 0 --test=perf_queue --stlist=a -- >wlcores 1 --plcores 2 >EAL: Detected CPU lcores: 96 >EAL: Detected NUMA nodes: 2 >EAL: Detected static linkage of DPDK >EAL: Multi-process socket /var/run/dpdk/rte/mp_socket >EAL: Selected IOVA mode 'PA' >EAL: VFIO support initialized >CRYPTODEV: Creating cryptodev crypto_null > >CRYPTODEV: Initialisation parameters - name: crypto_null,socket id: 0, max >queue >pairs: 8 >TELEMETRY: No legacy callbacks, legacy socket not created > driver : event_sw > test : perf_queue > dev : 0 > verbose_level : 1 > socket_id : -1 > pool_sz : 16384 > main lcore : 0 > nb_pkts : 67108864 > nb_timers : 100000000 > available lcores : {0 1 2 3 8} > nb_flows : 1024 > worker deq depth : 16 > fwd_latency : false > nb_prod_lcores : 1 > producer lcores : {2} > nb_worker_lcores : 1 > worker lcores : {1} > nb_stages : 1 > nb_evdev_ports : 2 > nb_evdev_queues : 1 > queue_priority : false > sched_type_list : {A} > crypto adapter mode : OP_NEW > nb_cryptodev : 1 > prod_type : Event crypto adapter producers > prod_enq_burst_sz : 1 >CRYPTODEV: elt_size 0 is expanded to 208 > >0.000 mpps avg 0.000 mppsEventDev todo-fix-name: ports 3, qids 1 > rx 6080 > drop 0 > tx 2040 > sched calls: 9064463 > sched cq/qid call: 9069025 > sched no IQ enq: 9063156 > sched no CQ enq: 9063759 > inflight 4096, credits: 0 > Port 0 > rx 0 drop 0 tx 2024 inflight 0 > Max New: 4096 Avg cycles PP: 745 Credits: 40 > Receive burst distribution: > 0:100% 1-4:0.00% 5-8:0.00% 9-12:0.00% 13-16:0.00% > rx ring used: 0 free: 4096 > cq ring used: 0 free: 16 > Port 1 > rx 0 drop 0 tx 0 inflight 0 > Max New: 4096 Avg cycles PP: 0 Credits: 0 > Receive burst distribution: > 0:-nan% > rx ring used: 0 free: 4096 > cq ring used: 0 free: 16 > Port 2 > rx 6080 drop 0 tx 16 inflight 16 > Max New: 4096 Avg cycles PP: 0 Credits: 0 > Receive burst distribution: > 0:-nan% > rx ring used: 0 free: 4096 > cq ring used: 16 free: 0 > Queue 0 (Atomic) > rx 6080 drop 0 tx 2040 > Per Port Stats: > Port 0: Pkts: 2024 Flows: 0 > Port 1: Pkts: 0 Flows: 0 > Port 2: Pkts: 16 Flows: 22 > iq 1: Used 4040 >error: perf_launch_lcores() No schedules for seconds, deadlock > >Packet distribution across worker cores : >Worker 0 packets: 7e8 percentage: 100.00 >Result: Failed > > >root@xdp-dev:/home/intel/abhi/dpdk-next-eventdev/abhi# ./app/dpdk-test- >eventdev -l 0-8 -s 0xf0 --vdev=event_sw0 --vdev="crypto_null" -- -- >prod_type_cryptodev --crypto_adptr_mode 1 --test=perf_queue --stlist=a -- >wlcores 1 --plcores 2 >EAL: Detected CPU lcores: 96 >EAL: Detected NUMA nodes: 2 >EAL: Detected static linkage of DPDK >EAL: Multi-process socket /var/run/dpdk/rte/mp_socket >EAL: Selected IOVA mode 'PA' >EAL: VFIO support initialized >CRYPTODEV: Creating cryptodev crypto_null > >CRYPTODEV: Initialisation parameters - name: crypto_null,socket id: 0, max >queue >pairs: 8 >TELEMETRY: No legacy callbacks, legacy socket not created > driver : event_sw > test : perf_queue > dev : 0 > verbose_level : 1 > socket_id : -1 > pool_sz : 16384 > main lcore : 0 > nb_pkts : 67108864 > nb_timers : 100000000 > available lcores : {0 1 2 3 8} > nb_flows : 1024 > worker deq depth : 16 > fwd_latency : false > nb_prod_lcores : 1 > producer lcores : {2} > nb_worker_lcores : 1 > worker lcores : {1} > nb_stages : 1 > nb_evdev_ports : 2 > nb_evdev_queues : 1 > queue_priority : false > sched_type_list : {A} > crypto adapter mode : OP_FORWARD > nb_cryptodev : 1 > prod_type : Event crypto adapter producers > prod_enq_burst_sz : 1 >CRYPTODEV: elt_size 0 is expanded to 208 > >0.000 mpps avg 0.000 mppsEventDev todo-fix-name: ports 3, qids 1 > rx 4480 > drop 0 > tx 447 > sched calls: 8438712 > sched cq/qid call: 8442432 > sched no IQ enq: 8438434 > sched no CQ enq: 8438494 > inflight 4096, credits: 0 > Port 0 > rx 0 drop 0 tx 431 inflight 0 > Max New: 4096 Avg cycles PP: 637 Credits: 47 > Receive burst distribution: > 0:100% 1-4:0.00% 5-8:0.00% 13-16:0.00% > rx ring used: 0 free: 4096 > cq ring used: 0 free: 16 > Port 1 > rx 4480 drop 0 tx 0 inflight 0 > Max New: 4096 Avg cycles PP: 0 Credits: 0 > Receive burst distribution: > 0:-nan% > rx ring used: 0 free: 4096 > cq ring used: 0 free: 16 > Port 2 > rx 0 drop 0 tx 16 inflight 16 > Max New: 4096 Avg cycles PP: 0 Credits: 0 > Receive burst distribution: > 0:-nan% > rx ring used: 0 free: 4096 > cq ring used: 16 free: 0 > Queue 0 (Atomic) > rx 4480 drop 0 tx 447 > Per Port Stats: > Port 0: Pkts: 431 Flows: 0 > Port 1: Pkts: 0 Flows: 0 > Port 2: Pkts: 16 Flows: 1 > iq 0: Used 4033 >error: perf_launch_lcores() No schedules for seconds, deadlock > >Packet distribution across worker cores : >Worker 0 packets: 1af percentage: 100.00 >Result: Failed > >@Shijith Thotton, Any idea why the test is failing? I'm not sure what the issue is here. >Meantime, I will get the rest of the app code reviewed. >I think, we can get both RFC and crypto producer patches in. > Better keep both patch separate. RFC can be merged after fixing above issue.