On 2020-02-15 11:21, Jerin Jacob wrote: > On Fri, Jan 17, 2020 at 4:24 PM Jerin Jacob <jerinjac...@gmail.com> wrote: >> On Fri, Jan 17, 2020 at 4:00 PM Mattias Rönnblom >> <mattias.ronnb...@ericsson.com> wrote: >>>> LTTng kernel tracing only needs kmod support. >>>> For the userspace tracing at minium following libraries are required. >>>> >>>> a) LTTng-UST >>>> b) LTTng-tools >>>> c) liburcu >>>> d) libpopt-dev >>> This "DPDK CTF trace emitter" would make DPDK interoperate with, but >>> without any build-time dependencies to, LTTng. Correct? >> Yes. Native CTF trace emitter without LTTng dependency. >> >>> Do you have any idea of what the performance benefits one would receive >>> from having something DPDK native, compared to just depending on LTTng UST? >> I calibrated LTTng cost and pushed the test code to github[1]. >> >> I just started working on the DPDK native CTF emitter. >> I am sure it will be less than LTTng as we can leverage hugepage, exploit >> dpdk worker thread usage to avoid atomics and use per core variables, >> avoid a lot function pointers in fast-path etc >> I can share the exact overhead after the PoC. > I did the almost feature-complete PoC. The code shared in Github[1] > The documentation and code cleanup etc is still pending. > > [1] > https://protect2.fireeye.com/v1/url?k=74b2fae0-283b556d-74b2ba7b-0cc47ad93de2-2bd7c54f29450187&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Fdpdk-trace.git > > trace overhead data on x86:[2] > # 236 cyles with LTTng(>100ns) > # 18 cycles(7ns) with Native DPDK CTF emitter. > > trace overhead data on arm64: > # 312 cycles to 1100 cycles with LTTng based on the class of arm64 CPU. > # 11 cycles to 13 cycles with Native DPDK CTF emitter based on the > class of arm64 CPU. > > 18 cycles(on x86) vs 11 cycles(on arm64) is due to rdtsc() overhead in > x86. It seems rdtsc takes around 15cycles in x86. > > # The Native DPDK CTF trace support does not have any dependency on > third-party library. > The generated output file is compatible with LTTng as both are using > CTF trace format. > > The performance gain comes from: > 1) exploit dpdk worker thread usage model to avoid atomics and use per > core variables > 2) use hugepage, > 3) avoid a lot function pointers in fast-path etc > 4) avoid unaligned store for arm64 etc > > Features: > - APIs and Features are similar to rte_log dynamic framework > API(expect log prints on stdout vs it dumps on trace file) > - No specific limit on the events. A string-based event like rte_log > for pattern matching > - Dynamic enable/disable support. > - Instructmention overhead is ~1 cycle. i.e cost of adding the code > wth out using trace feature. > - Timestamp support for all the events using DPDK rte_rtdsc > - No dependency on another library. Cleanroom native implementation of CTF. > > Functional test case: > a) echo "trace_autotest" | sudo ./build/app/test/dpdk-test -c 0x3 > --trace-level=8 > > The above command emits the following trace events > <code> > uint8_t i; > > rte_trace_lib_eal_generic_void(); > rte_trace_lib_eal_generic_u64(0x10000000000000); > rte_trace_lib_eal_generic_u32(0x10000000); > rte_trace_lib_eal_generic_u16(0xffee); > rte_trace_lib_eal_generic_u8(0xc); > rte_trace_lib_eal_generic_i64(-1234); > rte_trace_lib_eal_generic_i32(-1234567); > rte_trace_lib_eal_generic_i16(12); > rte_trace_lib_eal_generic_i8(-3); > rte_trace_lib_eal_generic_string("my string"); > rte_trace_lib_eal_generic_function(__func__); > > for (i = 0; i < 128; i++) > rte_trace_lib_eal_generic_u8(i); > </code>
Is it possible to specify custom types for the events? The equivalent of the TRACEPOINT_EVENT() macro in LTTng. > Install babeltrace package in Linux and point the generated trace file > to babel trace. By default trace file created under > <user>/dpdk-traces/time_stamp/ > > example: > # babeltrace /root/dpdk-traces/rte-2020-02-15-PM-02-56-51 | more > > [13:27:36.138468807] (+?.?????????) lib.eal.generic.void: { cpu_id = > 0, name = "dpdk-test" }, { } > [13:27:36.138468851] (+0.000000044) lib.eal.generic.u64: { cpu_id = 0, > name = "dpdk-test" }, { in = 4503599627370496 } > [13:27:36.138468860] (+0.000000009) lib.eal.generic.u32: { cpu_id = 0, > name = "dpdk-test" }, { in = 268435456 } > [13:27:36.138468934] (+0.000000074) lib.eal.generic.u16: { cpu_id = 0, > name = "dpdk-test" }, { in = 65518 } > [13:27:36.138468949] (+0.000000015) lib.eal.generic.u8: { cpu_id = 0, > name = "dpdk-test" }, { in = 12 } > [13:27:36.138468956] (+0.000000007) lib.eal.generic.i64: { cpu_id = 0, > name = "dpdk-test" }, { in = -1234 } > [13:27:36.138468963] (+0.000000007) lib.eal.generic.i32: { cpu_id = 0, > name = "dpdk-test" }, { in = -1234567 } > [13:27:36.138469024] (+0.000000061) lib.eal.generic.i16: { cpu_id = 0, > name = "dpdk-test" }, { in = 12 } > [13:27:36.138469044] (+0.000000020) lib.eal.generic.i8: { cpu_id = 0, > name = "dpdk-test" }, { in = -3 } > [13:27:36.138469051] (+0.000000007) lib.eal.generic.string: { cpu_id = > 0, name = "dpdk-test" }, { str = "my string" } > [13:27:36.138469203] (+0.000000152) lib.eal.generic.func: { cpu_id = > 0, name = "dpdk-test" }, { func = "test_trace_points" } > [13:27:36.138469239] (+0.000000036) lib.eal.generic.u8: { cpu_id = 0, > name = "dpdk-test" }, { in = 0 } > [13:27:36.138469246] (+0.000000007) lib.eal.generic.u8: { cpu_id = 0, > name = "dpdk-test" }, { in = 1 } > [13:27:36.138469252] (+0.000000006) lib.eal.generic.u8: { cpu_id = 0, > name = "dpdk-test" }, { in = 2 } > [13:27:36.138469262] (+0.000000010) lib.eal.generic.u8: { cpu_id = 0, > name = "dpdk-test" }, { in = 3 } > [13:27:36.138469269] (+0.000000007) lib.eal.generic.u8: { cpu_id = 0, > name = "dpdk-test" }, { in = 4 } > [13:27:36.138469276] (+0.000000007) lib.eal.generic.u8: { cpu_id = 0, > name = "dpdk-test" }, { in = 5 } > > # There is a GUI based trace viewer available in Windows, Linux and Mac. > It is called as tracecompass.(https://www.eclipse.org/tracecompass/) > > The example screenshot and Histogram of above DPDK trace using Tracecompass. > > https://protect2.fireeye.com/v1/url?k=9cad9dc9-c0243244-9caddd52-0cc47ad93de2-306d4b4409146643&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Fshare%2Fblob%2Fmaster%2Fdpdk_trace.JPG > > > [2] Added performance test case to find the Trace overhead. > Command to test trace overhead. It is the overhead of writing > zero-argument trace. > > echo "trace_perf" | sudo ./build/app/test/dpdk-test -c 0x3 --trace-level=8 > EAL: Detected 56 lcore(s) > EAL: Detected 2 NUMA nodes > EAL: Trace dir: /root/dpdk-traces/rte-2020-02-15-PM-03-37-33 > RTE>>trace_perf > Timer running at 2600.00MHz > ZERO_ARG: cycles=17.901031 ns=6.885012 > Test OK > > [3] The above test is ported to LTTng for finding the LTTng trace > overhead. It available at > https://protect2.fireeye.com/v1/url?k=7eb42ff5-223d8078-7eb46f6e-0cc47ad93de2-e41c22a09211c207&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Flttng-overhead > https://protect2.fireeye.com/v1/url?k=616a430a-3de3ec87-616a0391-0cc47ad93de2-c75160108b40b11b&q=1&e=2ef0a614-dea6-4d17-ab4e-a79a7c17ac73&u=https%3A%2F%2Fgithub.com%2Fjerinjacobk%2Flttng-overhead%2Fblob%2Fmaster%2FREADME > > File walkthrough: > > lib/librte_eal/common/include/rte_trace.h - Public API for Trace > provider and Trace control > lib/librte_eal/common/eal_common_trace.c - main trace implemention > lib/librte_eal/common/eal_common_trace_ctf.c - CTF metadata spec > implementation > lib/librte_eal/common/eal_common_trace_utils.c - command line utils > and filesystem operations. > lib/librte_eal/common/eal_common_trace_points.c - trace points for EAL > library > lib/librte_eal/common/include/rte_trace_eal.h - EAL tracepoint public API. > lib/librte_eal/common/eal_trace.h - Private trace header file. > > >> I think, based on the performance we can decide one or another? > The above performance data shows a much higher improvement with LTTng. > > Let me know if anyone have any comments on this and or any suggestion. > If there are no comments, I will submit the v1 with after code clean > up before the 20.05 proposal deadline.