On Mon, Feb 17, 2020 at 3:05 PM Mattias Rönnblom <mattias.ronnb...@ericsson.com> wrote: > > On 2020-02-15 11:21, Jerin Jacob wrote: > > On Fri, Jan 17, 2020 at 4:24 PM Jerin Jacob <jerinjac...@gmail.com> wrote: > >> On Fri, Jan 17, 2020 at 4:00 PM Mattias Rönnblom > >> <mattias.ronnb...@ericsson.com> wrote: > >>>> LTTng kernel tracing only needs kmod support. > >>>> For the userspace tracing at minium following libraries are required. > >>>> > >>>> a) LTTng-UST > >>>> b) LTTng-tools > >>>> c) liburcu > >>>> d) libpopt-dev > >>> This "DPDK CTF trace emitter" would make DPDK interoperate with, but > >>> without any build-time dependencies to, LTTng. Correct? > >> Yes. Native CTF trace emitter without LTTng dependency. > >> > >>> Do you have any idea of what the performance benefits one would receive > >>> from having something DPDK native, compared to just depending on LTTng > >>> UST? > >> I calibrated LTTng cost and pushed the test code to github[1]. > >> > >> I just started working on the DPDK native CTF emitter. > >> I am sure it will be less than LTTng as we can leverage hugepage, exploit > >> dpdk worker thread usage to avoid atomics and use per core variables, > >> avoid a lot function pointers in fast-path etc > >> I can share the exact overhead after the PoC. > > I did the almost feature-complete PoC. The code shared in Github[1] > > The documentation and code cleanup etc is still pending. > > > > [1] > > https://protect2.fireeye.com/v1/url?k=74b2fae0-283b556d-74b2ba7b-0cc47ad93de2-2bd7c54f29450187&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Fdpdk-trace.git > > > > trace overhead data on x86:[2] > > # 236 cyles with LTTng(>100ns) > > # 18 cycles(7ns) with Native DPDK CTF emitter. > > > > trace overhead data on arm64: > > # 312 cycles to 1100 cycles with LTTng based on the class of arm64 CPU. > > # 11 cycles to 13 cycles with Native DPDK CTF emitter based on the > > class of arm64 CPU. > > > > 18 cycles(on x86) vs 11 cycles(on arm64) is due to rdtsc() overhead in > > x86. It seems rdtsc takes around 15cycles in x86. > > > > # The Native DPDK CTF trace support does not have any dependency on > > third-party library. > > The generated output file is compatible with LTTng as both are using > > CTF trace format. > > > > The performance gain comes from: > > 1) exploit dpdk worker thread usage model to avoid atomics and use per > > core variables > > 2) use hugepage, > > 3) avoid a lot function pointers in fast-path etc > > 4) avoid unaligned store for arm64 etc > > > > Features: > > - APIs and Features are similar to rte_log dynamic framework > > API(expect log prints on stdout vs it dumps on trace file) > > - No specific limit on the events. A string-based event like rte_log > > for pattern matching > > - Dynamic enable/disable support. > > - Instructmention overhead is ~1 cycle. i.e cost of adding the code > > wth out using trace feature. > > - Timestamp support for all the events using DPDK rte_rtdsc > > - No dependency on another library. Cleanroom native implementation of CTF. > > > > Functional test case: > > a) echo "trace_autotest" | sudo ./build/app/test/dpdk-test -c 0x3 > > --trace-level=8 > > > > The above command emits the following trace events > > <code> > > uint8_t i; > > > > rte_trace_lib_eal_generic_void(); > > rte_trace_lib_eal_generic_u64(0x10000000000000); > > rte_trace_lib_eal_generic_u32(0x10000000); > > rte_trace_lib_eal_generic_u16(0xffee); > > rte_trace_lib_eal_generic_u8(0xc); > > rte_trace_lib_eal_generic_i64(-1234); > > rte_trace_lib_eal_generic_i32(-1234567); > > rte_trace_lib_eal_generic_i16(12); > > rte_trace_lib_eal_generic_i8(-3); > > rte_trace_lib_eal_generic_string("my string"); > > rte_trace_lib_eal_generic_function(__func__); > > > > for (i = 0; i < 128; i++) > > rte_trace_lib_eal_generic_u8(i); > > </code> > > Is it possible to specify custom types for the events? The equivalent of > the TRACEPOINT_EVENT() macro in LTTng.
Yes. It is possble to specify the custom event using array of rte_trace_emit_datatype of rte_trace_emit_string basic blocks. For example, ethdev configure event would be like. RTE_TRACE_POINT_DECLARE(__rte_trace_lib_ethdev_configure); static __rte_always_inline void rte_trace_lib_ethdev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, const struct rte_eth_conf *dev_conf) { rte_trace_emit_begin(&__rte_trace_lib_ethdev_configure); rte_trace_emit_datatype(mem, port_id); rte_trace_emit_datatype(mem, nb_rx_q); rte_trace_emit_datatype(mem, nb_tx_q); rte_trace_emit_datatype(mem, dev_conf->link_speeds); .. .. } I tried avoid usage of complex macro as DPDK community prefers non macro solutions. > > > Install babeltrace package in Linux and point the generated trace file > > to babel trace. By default trace file created under > > <user>/dpdk-traces/time_stamp/ > > > > example: > > # babeltrace /root/dpdk-traces/rte-2020-02-15-PM-02-56-51 | more > > > > [13:27:36.138468807] (+?.?????????) lib.eal.generic.void: { cpu_id = > > 0, name = "dpdk-test" }, { } > > [13:27:36.138468851] (+0.000000044) lib.eal.generic.u64: { cpu_id = 0, > > name = "dpdk-test" }, { in = 4503599627370496 } > > [13:27:36.138468860] (+0.000000009) lib.eal.generic.u32: { cpu_id = 0, > > name = "dpdk-test" }, { in = 268435456 } > > [13:27:36.138468934] (+0.000000074) lib.eal.generic.u16: { cpu_id = 0, > > name = "dpdk-test" }, { in = 65518 } > > [13:27:36.138468949] (+0.000000015) lib.eal.generic.u8: { cpu_id = 0, > > name = "dpdk-test" }, { in = 12 } > > [13:27:36.138468956] (+0.000000007) lib.eal.generic.i64: { cpu_id = 0, > > name = "dpdk-test" }, { in = -1234 } > > [13:27:36.138468963] (+0.000000007) lib.eal.generic.i32: { cpu_id = 0, > > name = "dpdk-test" }, { in = -1234567 } > > [13:27:36.138469024] (+0.000000061) lib.eal.generic.i16: { cpu_id = 0, > > name = "dpdk-test" }, { in = 12 } > > [13:27:36.138469044] (+0.000000020) lib.eal.generic.i8: { cpu_id = 0, > > name = "dpdk-test" }, { in = -3 } > > [13:27:36.138469051] (+0.000000007) lib.eal.generic.string: { cpu_id = > > 0, name = "dpdk-test" }, { str = "my string" } > > [13:27:36.138469203] (+0.000000152) lib.eal.generic.func: { cpu_id = > > 0, name = "dpdk-test" }, { func = "test_trace_points" } > > [13:27:36.138469239] (+0.000000036) lib.eal.generic.u8: { cpu_id = 0, > > name = "dpdk-test" }, { in = 0 } > > [13:27:36.138469246] (+0.000000007) lib.eal.generic.u8: { cpu_id = 0, > > name = "dpdk-test" }, { in = 1 } > > [13:27:36.138469252] (+0.000000006) lib.eal.generic.u8: { cpu_id = 0, > > name = "dpdk-test" }, { in = 2 } > > [13:27:36.138469262] (+0.000000010) lib.eal.generic.u8: { cpu_id = 0, > > name = "dpdk-test" }, { in = 3 } > > [13:27:36.138469269] (+0.000000007) lib.eal.generic.u8: { cpu_id = 0, > > name = "dpdk-test" }, { in = 4 } > > [13:27:36.138469276] (+0.000000007) lib.eal.generic.u8: { cpu_id = 0, > > name = "dpdk-test" }, { in = 5 } > > > > # There is a GUI based trace viewer available in Windows, Linux and Mac. > > It is called as tracecompass.(https://www.eclipse.org/tracecompass/) > > > > The example screenshot and Histogram of above DPDK trace using Tracecompass. > > > > https://protect2.fireeye.com/v1/url?k=9cad9dc9-c0243244-9caddd52-0cc47ad93de2-306d4b4409146643&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Fshare%2Fblob%2Fmaster%2Fdpdk_trace.JPG > > > > > > [2] Added performance test case to find the Trace overhead. > > Command to test trace overhead. It is the overhead of writing > > zero-argument trace. > > > > echo "trace_perf" | sudo ./build/app/test/dpdk-test -c 0x3 --trace-level=8 > > EAL: Detected 56 lcore(s) > > EAL: Detected 2 NUMA nodes > > EAL: Trace dir: /root/dpdk-traces/rte-2020-02-15-PM-03-37-33 > > RTE>>trace_perf > > Timer running at 2600.00MHz > > ZERO_ARG: cycles=17.901031 ns=6.885012 > > Test OK > > > > [3] The above test is ported to LTTng for finding the LTTng trace > > overhead. It available at > > https://protect2.fireeye.com/v1/url?k=7eb42ff5-223d8078-7eb46f6e-0cc47ad93de2-e41c22a09211c207&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Flttng-overhead > > https://protect2.fireeye.com/v1/url?k=616a430a-3de3ec87-616a0391-0cc47ad93de2-c75160108b40b11b&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Flttng-overhead%2Fblob%2Fmaster%2FREADME > > > > File walkthrough: > > > > lib/librte_eal/common/include/rte_trace.h - Public API for Trace > > provider and Trace control > > lib/librte_eal/common/eal_common_trace.c - main trace implemention > > lib/librte_eal/common/eal_common_trace_ctf.c - CTF metadata spec > > implementation > > lib/librte_eal/common/eal_common_trace_utils.c - command line utils > > and filesystem operations. > > lib/librte_eal/common/eal_common_trace_points.c - trace points for EAL > > library > > lib/librte_eal/common/include/rte_trace_eal.h - EAL tracepoint public API. > > lib/librte_eal/common/eal_trace.h - Private trace header file. > > > > > >> I think, based on the performance we can decide one or another? > > The above performance data shows a much higher improvement with LTTng. > > > > Let me know if anyone have any comments on this and or any suggestion. > > If there are no comments, I will submit the v1 with after code clean > > up before the 20.05 proposal deadline. >