Hi Gage, On Mon, Jan 08, 2018 at 06:36:36PM +0000, Eads, Gage wrote: > Hi Pavan, > > Thanks for the report and the GDB output. We've reproduced this and traced it > down to how the PMD (mis)handles the re-configuration case. When the SW PMD > is reconfigured, it reallocates the IQ chunks and reinitializes the chunk > freelist, but it doesn't delete the stale pointers in sw->qids[*].iq. This > causes multiple references to the same IQ memory to exist in the system, > eventually resulting in the segfault. >
Ah, that explains why it only happens in case of Rx adapter as first eventdev is created and then based on the ethernet device caps (RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT) the event device is stopped and reconfigured. > I expect a proper fix will take us a day or two, but in the meantime the > following change should fix the segfault ***for your specific usage only***: > I will use this while verifying example/eventdev_pipeline rework patchset. > diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c > index 1ef6340..01da538 100644 > --- a/drivers/event/sw/sw_evdev.c > +++ b/drivers/event/sw/sw_evdev.c > @@ -436,7 +436,7 @@ sw_dev_configure(const struct rte_eventdev *dev) > > /* If this is a reconfiguration, free the previous IQ allocation */ > if (sw->chunks) > - rte_free(sw->chunks); > + return 0; > > sw->chunks = rte_malloc_socket(NULL, > sizeof(struct sw_queue_chunk) * > > Mulling over the fix raises a question that the documentation is unclear on. > If the user sends events into an eventdev, then calls rte_event_dev_stop() -> > rte_event_dev_configure() -> rte_event_dev_start(), is the eventdev required > to maintain any previously queued events? I would expect not. However, if the > user calls calls rte_event_dev_stop() -> rte_event_queue_setup() -> > rte_event_dev_start() (i.e. it is an additive reconfiguration), it seems more > reasonable that the other event queues would maintain their contents. I'd > imagine this is also hardware/device-dependent. > I think it's entirely implementation dependent and when rte_event_dev_stop() is invoked we need not guarantee the state of previously queued events. > Thanks, > Gage Thanks, Pavan > > > -----Original Message----- > > From: Pavan Nikhilesh [mailto:pbhagavat...@caviumnetworks.com] > > Sent: Monday, January 8, 2018 10:06 AM > > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Eads, Gage > > <gage.e...@intel.com>; jerin.ja...@caviumnetworks.com; > > santosh.shu...@caviumnetworks.com > > Cc: dev@dpdk.org > > Subject: Re: [PATCH 2/2] event/sw: use dynamically-sized IQs > > > > On Mon, Jan 08, 2018 at 03:50:24PM +0000, Van Haaren, Harry wrote: > > > > From: Pavan Nikhilesh [mailto:pbhagavat...@caviumnetworks.com] > > > > Sent: Monday, January 8, 2018 3:32 PM > > > > To: Eads, Gage <gage.e...@intel.com>; Van Haaren, Harry > > > > <harry.van.haa...@intel.com>; jerin.ja...@caviumnetworks.com; > > > > santosh.shu...@caviumnetworks.com > > > > Cc: dev@dpdk.org > > > > Subject: Re: [PATCH 2/2] event/sw: use dynamically-sized IQs > > > > > > > > On Wed, Nov 29, 2017 at 09:08:34PM -0600, Gage Eads wrote: > > > > > This commit introduces dynamically-sized IQs, by switching the > > > > > underlying data structure from a fixed-size ring to a linked list of > > > > > queue > > 'chunks.' > > > > > > <snip> > > > > > > > Sw eventdev crashes when used alongside Rx adapter. The crash > > > > happens when pumping traffic at > 1.4mpps. This commit seems responsible > > for this. > > > > > > > > > > > > Apply the following Rx adapter patch > > > > http://dpdk.org/dev/patchwork/patch/31977/ > > > > Command used: > > > > ./build/eventdev_pipeline_sw_pmd -c 0xfffff8 --vdev="event_sw" -- > > > > -r0x800 > > > > -t0x100 -w F000 -e 0x10 > > > > > > Applied the patch to current master, recompiled; cannot reproduce here.. > > > > > master in the sense dpdk-next-eventdev right? > > > Is it 100% reproducible and "instant" or can it take some time to occur > > > there? > > > > > It is instant > > > > > > > Backtrace: > > > > > > > > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. > > > > [Switching to Thread 0xffffb6c8f040 (LWP 25291)] > > > > 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38, > > > > iq=0xffff9f764720, sw=0xffff9f332600) at > > > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:14 > > > > 2 > > > > 142 ev[total++] = current->events[index++]; > > > > > > Could we get the output of (gdb) info locals? > > > > > > > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0xffffb6c8f040 (LWP 19751)] > > 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38, > > iq=0xffff9f764620, sw=0xffff9f332500) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142 > > 142 ev[total++] = current->events[index++]; > > > > (gdb) info locals > > next = 0x7000041400be73b > > current = 0x7000041400be73b > > total = 36 > > index = 1 > > (gdb) > > > > > > Noticed an other crash: > > > > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0xffffb6c8f040 (LWP 19690)] > > 0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63 > > 63 sw->chunk_list_head = chunk->next; > > > > (gdb) info locals > > chunk = 0x14340000119 > > > > (gdb) bt > > #0 0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63 > > #1 iq_enqueue (ev=0xffff9f3967c0, iq=0xffff9f764620, sw=0xffff9f332500) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:95 > > #2 __pull_port_lb (allow_reorder=0, port_id=5, sw=0xffff9f332500) at > > /root/clean/rebase/dpdk-next- > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:463 > > #3 sw_schedule_pull_port_no_reorder (sw=0xffff9f332500, port_id=5) at > > /root/clean/rebase/dpdk-next- > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:486 > > #4 0x0000aaaaaadd0608 in sw_event_schedule (dev=0xaaaaaafbd200 > > <rte_event_devices>) at > > /root/clean/rebase/dpdk-next- > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:554 > > #5 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200 > > <rte_event_devices>) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767 > > #6 0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80, > > cs=0xffff9ffef900, service_idx=0) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/common/rte_service.c:349 > > #7 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900, > > service_mask=18446744073709551615) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/common/rte_service.c:376 > > #8 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0, > > serialize_mt_unsafe=1) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/common/rte_service.c:405 > > #9 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at > > /root/clean/rebase/dpdk-next- > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223 > > #10 0x0000aaaaaaaef234 in worker (arg=0xffff9f331c80) at > > /root/clean/rebase/dpdk-next- > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274 > > #11 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182 > > #12 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0 > > #13 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6 > > > > > > > > > > > > > > (gdb) bt > > > > #0 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, > > > > ev=0xffffb6c8dd38, iq=0xffff9f764720, sw=0xffff9f332600) at > > > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:14 > > > > 2 > > > > #1 sw_schedule_atomic_to_cq (sw=0xffff9f332600, qid=0xffff9f764700, > > > > iq_num=0, > > > > count=48) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:74 > > > > #2 0x0000aaaaaadcdc44 in sw_schedule_qid_to_cq (sw=0xffff9f332600) > > > > at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:262 > > > > #3 0x0000aaaaaadd069c in sw_event_schedule (dev=0xaaaaaafbd200 > > > > <rte_event_devices>) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:564 > > > > #4 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200 > > > > <rte_event_devices>) at > > > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:76 > > > > 7 > > > > #5 0x0000aaaaaab54740 in rte_service_runner_do_callback > > > > (s=0xffff9fffdf80, cs=0xffff9ffef900, service_idx=0) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/lib/librte_eal/common/rte_service.c:349 > > > > #6 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900, > > > > service_mask=18446744073709551615) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/lib/librte_eal/common/rte_service.c:376 > > > > #7 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0, > > > > serialize_mt_unsafe=1) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/lib/librte_eal/common/rte_service.c:405 > > > > #8 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223 > > > > #9 0x0000aaaaaaaef234 in worker (arg=0xffff9f331d80) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274 > > > > #10 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at > > > > /root/clean/rebase/dpdk-next- > > > > eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182 > > > > #11 0x0000ffffb7e46d64 in start_thread () from > > > > /usr/lib/libpthread.so.0 > > > > #12 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6 > > > > > > > > Segfault seems to happen in sw_event_schedule and only happens under > > > > high traffic load. > > > > > > I've added -n 0 to the command line allowing it to run forever, and > > > after ~2 mins its still happily forwarding pkts at ~10G line rate here. > > > > > > > On arm64 the crash is instant even without -n0. > > > > > > > > > Thanks, > > > > Pavan > > > > > > Thanks for reporting - I'm afraid I'll have to ask a few questions to > > > identify why > > I can't reproduce here before I can dig in and identify a fix. > > > > > > Anything special about the system that it is on? > > > > Running on arm64 octeontx with 8x10G connected. > > > > > What traffic pattern is being sent to the app? > > > > Using something similar to trafficgen, IPv4/UDP pkts. > > > > 0:00:51 958245 |0xB00 2816|0xB10 2832|0xB20 2848|0xB30 > > 2864|0xC00 * 3072|0xC10 * 3088|0xC20 * 3104|0xC30 * 3120| Totals > > Port Status |XFI30 Up|XFI31 Up|XFI32 Up|XFI33 > > Up|XFI40 > > Up|XFI41 Up|XFI42 Up|XFI43 Up| > > 1:Total TX packets | 7197041566| 5194976604| 5120240981| 4424870160| > > 5860892739| 5191225514| 5126500427| 4429259828|42545007819 > > 3:Total RX packets | 358886055| 323055411| 321000948| 277179800| > > 387486466| 350278086| 348080242| 295460613|2661427621 > > 6:TX packet rate | 0| 0| 0| 0| > > 0| 0| 0| > > 0| 0 > > 7:TX octet rate | 0| 0| 0| 0| > > 0| 0| 0| > > 0| 0 > > 8:TX bit rate, Mbps | 0| 0| 0| 0| > > 0| 0| 0| > > 0| 0 > > 10:RX packet rate | 0| 0| 0| 0| > > 0| 0| 0| > > 0| 0 > > 11:RX octet rate | 0| 0| 0| 0| > > 0| 0| 0| > > 0| 0 > > 12:RX bit rate, Mbps | 0| 0| 0| 0| > > 0| 0| > > 0| 0| 0 > > 36:tx.size | 60| 60| 60| 60| > > 60| 60| 60| > > 60| > > 37:tx.type | IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| > > IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| > > 38:tx.payload | abc| abc| abc| abc| > > abc| abc| > > abc| abc| > > 47:dest.mac | fb71189c0| fb71189d0| fb71189e0| fb71189bf| > > fb7118ac0| fb7118ad0| fb7118ae0| fb7118abf| > > 51:src.mac | fb71189bf| fb71189cf| fb71189df| fb71189ef| > > fb7118abf| fb7118acf| fb7118adf| fb7118aef| > > 55:dest.ip | 11.1.0.99| 11.17.0.99| 11.33.0.99| 11.0.0.99| > > 14.1.0.99| > > 14.17.0.99| 14.33.0.99| 14.0.0.99| > > 59:src.ip | 11.0.0.99| 11.16.0.99| 11.32.0.99| 11.48.0.99| > > 14.0.0.99| > > 14.16.0.99| 14.32.0.99| 14.48.0.99| > > 73:bridge | off| off| off| off| > > off| off| > > off| off| > > 77:validate packets | off| off| off| off| > > off| off| > > off| off| > > > > Thanks, > > Pavan. > > > > > > > > Thanks > > > > > > > > > <snip> > > >