On Mon, Jan 08, 2018 at 03:50:24PM +0000, Van Haaren, Harry wrote: > > From: Pavan Nikhilesh [mailto:pbhagavat...@caviumnetworks.com] > > Sent: Monday, January 8, 2018 3:32 PM > > To: Eads, Gage <gage.e...@intel.com>; Van Haaren, Harry > > <harry.van.haa...@intel.com>; jerin.ja...@caviumnetworks.com; > > santosh.shu...@caviumnetworks.com > > Cc: dev@dpdk.org > > Subject: Re: [PATCH 2/2] event/sw: use dynamically-sized IQs > > > > On Wed, Nov 29, 2017 at 09:08:34PM -0600, Gage Eads wrote: > > > This commit introduces dynamically-sized IQs, by switching the underlying > > > data structure from a fixed-size ring to a linked list of queue 'chunks.' > > <snip> > > > Sw eventdev crashes when used alongside Rx adapter. The crash happens when > > pumping traffic at > 1.4mpps. This commit seems responsible for this. > > > > > > Apply the following Rx adapter patch > > http://dpdk.org/dev/patchwork/patch/31977/ > > Command used: > > ./build/eventdev_pipeline_sw_pmd -c 0xfffff8 --vdev="event_sw" -- -r0x800 > > -t0x100 -w F000 -e 0x10 > > Applied the patch to current master, recompiled; cannot reproduce here.. > master in the sense dpdk-next-eventdev right? > Is it 100% reproducible and "instant" or can it take some time to occur there? > It is instant > > > Backtrace: > > > > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0xffffb6c8f040 (LWP 25291)] > > 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38, > > iq=0xffff9f764720, sw=0xffff9f332600) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142 > > 142 ev[total++] = current->events[index++]; > > Could we get the output of (gdb) info locals? >
Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xffffb6c8f040 (LWP 19751)] 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38, iq=0xffff9f764620, sw=0xffff9f332500) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142 142 ev[total++] = current->events[index++]; (gdb) info locals next = 0x7000041400be73b current = 0x7000041400be73b total = 36 index = 1 (gdb) Noticed an other crash: Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xffffb6c8f040 (LWP 19690)] 0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63 63 sw->chunk_list_head = chunk->next; (gdb) info locals chunk = 0x14340000119 (gdb) bt #0 0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63 #1 iq_enqueue (ev=0xffff9f3967c0, iq=0xffff9f764620, sw=0xffff9f332500) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:95 #2 __pull_port_lb (allow_reorder=0, port_id=5, sw=0xffff9f332500) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:463 #3 sw_schedule_pull_port_no_reorder (sw=0xffff9f332500, port_id=5) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:486 #4 0x0000aaaaaadd0608 in sw_event_schedule (dev=0xaaaaaafbd200 <rte_event_devices>) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:554 #5 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200 <rte_event_devices>) at /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767 #6 0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80, cs=0xffff9ffef900, service_idx=0) at /root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:349 #7 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900, service_mask=18446744073709551615) at /root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:376 #8 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0, serialize_mt_unsafe=1) at /root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:405 #9 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at /root/clean/rebase/dpdk-next-eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223 #10 0x0000aaaaaaaef234 in worker (arg=0xffff9f331c80) at /root/clean/rebase/dpdk-next-eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274 #11 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at /root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182 #12 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0 #13 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6 > > > > (gdb) bt > > #0 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38, > > iq=0xffff9f764720, sw=0xffff9f332600) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142 > > #1 sw_schedule_atomic_to_cq (sw=0xffff9f332600, qid=0xffff9f764700, > > iq_num=0, > > count=48) at > > /root/clean/rebase/dpdk-next- > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:74 > > #2 0x0000aaaaaadcdc44 in sw_schedule_qid_to_cq (sw=0xffff9f332600) at > > /root/clean/rebase/dpdk-next- > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:262 > > #3 0x0000aaaaaadd069c in sw_event_schedule (dev=0xaaaaaafbd200 > > <rte_event_devices>) at > > /root/clean/rebase/dpdk-next- > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:564 > > #4 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200 > > <rte_event_devices>) at > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767 > > #5 0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80, > > cs=0xffff9ffef900, service_idx=0) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/common/rte_service.c:349 > > #6 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900, > > service_mask=18446744073709551615) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/common/rte_service.c:376 > > #7 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0, > > serialize_mt_unsafe=1) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/common/rte_service.c:405 > > #8 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at > > /root/clean/rebase/dpdk-next- > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223 > > #9 0x0000aaaaaaaef234 in worker (arg=0xffff9f331d80) at > > /root/clean/rebase/dpdk-next- > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274 > > #10 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at > > /root/clean/rebase/dpdk-next- > > eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182 > > #11 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0 > > #12 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6 > > > > Segfault seems to happen in sw_event_schedule and only happens under high > > traffic load. > > I've added -n 0 to the command line allowing it to run forever, > and after ~2 mins its still happily forwarding pkts at ~10G line rate here. > On arm64 the crash is instant even without -n0. > > > Thanks, > > Pavan > > Thanks for reporting - I'm afraid I'll have to ask a few questions to > identify why I can't reproduce here before I can dig in and identify a fix. > > Anything special about the system that it is on? Running on arm64 octeontx with 8x10G connected. > What traffic pattern is being sent to the app? Using something similar to trafficgen, IPv4/UDP pkts. 0:00:51 958245 |0xB00 2816|0xB10 2832|0xB20 2848|0xB30 2864|0xC00 * 3072|0xC10 * 3088|0xC20 * 3104|0xC30 * 3120| Totals Port Status |XFI30 Up|XFI31 Up|XFI32 Up|XFI33 Up|XFI40 Up|XFI41 Up|XFI42 Up|XFI43 Up| 1:Total TX packets | 7197041566| 5194976604| 5120240981| 4424870160| 5860892739| 5191225514| 5126500427| 4429259828|42545007819 3:Total RX packets | 358886055| 323055411| 321000948| 277179800| 387486466| 350278086| 348080242| 295460613|2661427621 6:TX packet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0 7:TX octet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0 8:TX bit rate, Mbps | 0| 0| 0| 0| 0| 0| 0| 0| 0 10:RX packet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0 11:RX octet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0 12:RX bit rate, Mbps | 0| 0| 0| 0| 0| 0| 0| 0| 0 36:tx.size | 60| 60| 60| 60| 60| 60| 60| 60| 37:tx.type | IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| 38:tx.payload | abc| abc| abc| abc| abc| abc| abc| abc| 47:dest.mac | fb71189c0| fb71189d0| fb71189e0| fb71189bf| fb7118ac0| fb7118ad0| fb7118ae0| fb7118abf| 51:src.mac | fb71189bf| fb71189cf| fb71189df| fb71189ef| fb7118abf| fb7118acf| fb7118adf| fb7118aef| 55:dest.ip | 11.1.0.99| 11.17.0.99| 11.33.0.99| 11.0.0.99| 14.1.0.99| 14.17.0.99| 14.33.0.99| 14.0.0.99| 59:src.ip | 11.0.0.99| 11.16.0.99| 11.32.0.99| 11.48.0.99| 14.0.0.99| 14.16.0.99| 14.32.0.99| 14.48.0.99| 73:bridge | off| off| off| off| off| off| off| off| 77:validate packets | off| off| off| off| off| off| off| off| Thanks, Pavan. > > Thanks > > > <snip> >