That is funny. Now that I replaced samba 4 and libc-2.13.so with debug symbols, the perf profile seems to be have changed a bit after the same tests !
Events: 54K cycles - 3.06% smbd [kernel.kallsyms] [k] copy_user_generic_unrolled - copy_user_generic_unrolled 52.63% __read_nocancel 36.20% __write_nocancel 2.70% __getdents64 2.44% __libc_readv + 2.00% do_fcntl 0.87% __GI___libc_read + 0.77% __fxstat64 - 2.02% smbd libc-2.13.so [.] _int_malloc + _int_malloc - 1.62% smbd [kernel.kallsyms] [k] kmem_cache_alloc + kmem_cache_alloc - 1.22% smbd libtalloc.so.2.0.7 [.] _talloc_free + _talloc_free - 0.99% smbd libtalloc.so.2.0.7 [.] _talloc_free_children_internal.isra.4 + _talloc_free_children_internal.isra.4 - 0.86% smbd libc-2.13.so [.] __memcpy_ssse3 + __memcpy_ssse3 + 0.81% smbd [kernel.kallsyms] [k] kmem_cache_free + 0.81% smbd libc-2.13.so [.] _int_free + 0.79% smbd [kernel.kallsyms] [k] __kmalloc + 0.66% smbd libtalloc.so.2.0.7 [.] _talloc_zero + 0.63% smbd [kernel.kallsyms] [k] link_path_walk + 0.63% smbd [kernel.kallsyms] [k] ext4_htree_store_dirent + 0.55% smbd libtalloc.so.2.0.7 [.] talloc_alloc_pool + 0.55% smbd libc-2.13.so [.] __memset_sse2 + 0.53% smbd libc-2.13.so [.] malloc + 0.53% smbd [kernel.kallsyms] [k] fcntl_setlk + 0.52% smbd [kernel.kallsyms] [k] get_page_from_freelist + 0.50% smbd libtalloc.so.2.0.7 [.] talloc_get_name + 0.50% smbd [kernel.kallsyms] [k] tg3_start_xmit + 0.48% smbd [kernel.kallsyms] [k] memset + 0.47% smbd libc-2.13.so [.] free + 0.47% smbd [kernel.kallsyms] [k] _raw_spin_lock + 0.45% smbd [kernel.kallsyms] [k] __d_lookup_rcu + 0.45% smbd libc-2.13.so [.] __GI___strcmp_ssse3 + 0.44% smbd libtalloc.so.2.0.7 [.] _talloc_get_type_abort + 0.43% smbd [kernel.kallsyms] [k] system_call_after_swapgs + 0.43% smbd [kernel.kallsyms] [k] ext4_mark_iloc_dirty + 0.42% smbd libtalloc.so.2.0.7 [.] talloc_is_parent + 0.41% smbd [kernel.kallsyms] [k] __alloc_skb + 0.41% smbd [kernel.kallsyms] [k] __posix_lock_file + 0.40% smbd [kernel.kallsyms] [k] __ext4_get_inode_loc + 0.39% smbd libc-2.13.so [.] __strlen_sse2 + 0.39% smbd [kernel.kallsyms] [k] kfree + 0.39% smbd [kernel.kallsyms] [k] tcp_recvmsg + 0.38% smbd libtalloc.so.2.0.7 [.] talloc_named_const + 0.37% smbd libtalloc.so.2.0.7 [.] _talloc_array On Mon, Sep 30, 2013 at 6:19 PM, Thiago Fernandes Crepaldi < togn...@gmail.com> wrote: > Agreed. For some strange reason I though perf would "follow" the new smbd > forked and account their data too =) > > Unfortunately, I don't have the libc symbols (at least for today) to see > what is going on there, but here is what I got in the child smbd process on > the server side. The client side is a Windows 7 Virtual machine running > NASPT > > Could this result mean that most of the time the performance drop I am > experiencing is due to libc ? > I've never worked with perf before, but I will still try to resolve those > crazy addresses > > Events: 45K cycles > - 7.37% smbd libc-2.13.so [.] 0x11e465 > - 0x7ffab9f2043c > 41.73% 0 > 5.32% 0x1b3fbe0 > 5.29% 0x2c4dab0 > 3.60% 0x1b0b130 > 3.37% 0x1b0b2a0 > 2.94% 0x1b5af80 > 2.70% 0x1b0d850 > 2.64% 0x2825fb0 > 1.86% 0x28e06d0 > 1.83% 0x2afcc80 > 1.71% 0x1b2ccb0 > 1.64% 0x2a4deb0 > 1.63% 0x1b56e00 > 1.51% 0x1b6bd00 > 1.16% 0x1b49eb0 > 1.15% 0x1b506e0 > 1.13% 0x1b4da00 > 1.07% 0x1b35100 > 0.93% 0x1af9050 > 0.92% 0x2b03680 > 0.91% 0x2ae21f0 > 0.90% 0x1b21210 > 0.89% 0x1b5de80 > 0.89% 0x1b5aa80 > 0.89% 0x1b2e0e0 > 0.88% 0x1b59be0 > 0.87% 0x1b4c600 > 0.86% 0x1b2aa20 > 0.85% 0x1b4a940 > 0.85% 0x1b45f50 > 0.84% 0x1b4a6d0 > 0.84% 0x1b23940 > 0.82% 0x1b37210 > 0.82% 0x1b2cf30 > 0.82% 0x1b33320 > 0.77% 0x2c96d50 > 0.76% 0x202f380 > 0.75% 0x2bd0bd0 > 0.66% 0x1b5e1d0 > - 0x7ffab9f27e10 > 37.72% 0x2f62696c2f3365 > + 23.78% 0 > + 11.24% 0x7fffc9f76d40 > + 6.25% set_unix_security_ctx > 3.13% 0x645f6e656b6f74 > 2.46% 0x1000900000000 > + 2.17% 0x11b9f22aac > 2.16% 0x1b53000 > + 2.12% 0x2a29850 > 2.08% 0xbe70f000004c4c > 2.01% 0x1b0af00 > 1.94% 0x1b07390 > 1.51% 0x1b49b00 > 1.41% 0x2010 > - 0x7ffab9fc6c10 > + 18.08% 0 > + 13.63% 0x2c5fc20 > + 11.62% 0x2be7b10 > + 7.90% 0x2be8560 > + 6.61% 0x2a29850 > + 6.30% 0x2b3d6c0 > 5.67% 0x4e6f5479706f43 > + 5.64% 0x29d7110 > + 5.54% 0x2467130 > + 5.53% 0x2b3d5e0 > + 5.31% 0x28c81a0 > + 4.20% 0x2c5fa30 > + 3.98% 0x2a98990 > + 0x7ffab9f20438 > + 0x7ffab9f2045c > 0x7ffab9fc8e03 > + 0x7ffab9fc425e > + 0x7ffab9f2a715 > + 0x7ffab9f2a6d0 > 0x7ffab9f1f851 > 0x7ffab9f1f2ac > + 0x7ffab9f27e25 > + 0x7ffab9f2a648 > + 0x7ffab9fc4240 > 0x7ffab9fc8654 > 0x7ffab9f206bf > + 0x7ffab9f20548 > + 0x7ffab9f20bc2 > + 0x7ffab9f1f130 > + 0x7ffab9f26310 > + 0x7ffab9f20422 > 0x7ffab9f1e0db > 0x7ffab9f1f179 > + 0x7ffab9f2a6f2 > + 0x7ffab9f20572 > + 0x7ffab9f2054c > + 0x7ffab9fc42c5 > - 1.72% smbd [kernel.kallsyms] [k] kmem_cache_alloc > + kmem_cache_alloc > - 1.30% smbd libtalloc.so.2.0.7 [.] _talloc_free > + _talloc_free > - 1.10% smbd libtalloc.so.2.0.7 [.] > _talloc_free_children_internal.i > + _talloc_free_children_internal.isra.4 > - 1.07% smbd [kernel.kallsyms] [k] copy_user_generic_unrolled > + copy_user_generic_unrolled > - 0.95% smbd [kernel.kallsyms] [k] __kmalloc > + __kmalloc > - 0.78% smbd [kernel.kallsyms] [k] ext4_htree_store_dirent > + ext4_htree_store_dirent > + 0x7ffab9f4f2f5 > - 0.73% smbd [kernel.kallsyms] [k] kmem_cache_free > + kmem_cache_free > - 0.73% smbd [kernel.kallsyms] [k] link_path_walk > + link_path_walk > - 0.69% smbd libc-2.13.so [.] malloc > + malloc > - 0.69% smbd libtalloc.so.2.0.7 [.] _talloc_zero > + _talloc_zero > - 0.62% smbd [kernel.kallsyms] [k] fcntl_setlk > + fcntl_setlk > + 0x7ffabcf93238 > - 0.59% smbd [kernel.kallsyms] [k] __d_lookup_rcu > + __d_lookup_rcu > - 0.57% smbd libtalloc.so.2.0.7 [.] talloc_alloc_pool > + talloc_alloc_pool > - 0.55% smbd libtalloc.so.2.0.7 [.] talloc_get_name > + talloc_get_name > - 0.55% smbd [kernel.kallsyms] [k] __posix_lock_file > + __posix_lock_file > + 0x7ffabcf93238 > - 0.50% smbd [kernel.kallsyms] [k] _raw_spin_lock > + _raw_spin_lock > + 0.49% smbd [kernel.kallsyms] [k] tg3_start_xmit > + 0.48% smbd [kernel.kallsyms] [k] system_call_after_swapgs > + 0.46% smbd libtalloc.so.2.0.7 [.] talloc_named_const > + 0.46% smbd [kernel.kallsyms] [k] memset > + 0.46% smbd libtalloc.so.2.0.7 [.] _talloc_get_type_abort > + 0.45% smbd [kernel.kallsyms] [k] str2hashbuf_signed > + 0.45% smbd [kernel.kallsyms] [k] kfree > + 0.45% smbd libc-2.13.so [.] free > + 0.44% smbd [kernel.kallsyms] [k] __alloc_skb > + 0.42% smbd libtalloc.so.2.0.7 [.] talloc_is_parent > + 0.41% smbd libtalloc.so.2.0.7 [.] _talloc_array > > > > > > On Mon, Sep 30, 2013 at 5:39 PM, Jeremy Allison <j...@samba.org> wrote: > >> On Mon, Sep 30, 2013 at 05:21:44PM -0300, Thiago Fernandes Crepaldi wrote: >> > Andrew, in my company we are also experiencing a higher CPU usage of >> Samba >> > 4 (smbd) if compared to Samba 3. >> > >> > In fact, it almost reaches 100% of CPU and uses all the memory during >> *dir >> > copies* (individual file copy is as good as samba 3's). I strongly >> believe >> > that this CPU usage is the responsible for a worse samba 4's throughput >> if >> > compared to Samba 3 tests. >> > >> > Giving that, I would like to contribute with this investigation and >> share >> > my data regarding perf profiling on smbd (parent process) >> > >> > Events: 7 cycles >> > - 90.01% smbd [kernel.kallsyms] [k] copy_pte_range >> > copy_pte_range >> > __libc_fork >> > smbd_accept_connection >> > - 9.77% smbd [kernel.kallsyms] [k] handle_edge_irq >> > handle_edge_irq >> > smbd_accept_connection >> > - 0.22% smbd [kernel.kallsyms] [k] perf_pmu_rotate_start.isra.57 >> > perf_pmu_rotate_start.isra.57 >> > __poll >> > - 0.00% smbd [kernel.kallsyms] [k] native_write_msr_safe >> > native_write_msr_safe >> > __poll >> >> It's the client process that should have the interesting >> profile data, the parent is just going to sit there doing >> accept(). >> >> Jeremy. >> > > > > -- > Thiago Crepaldi > -- Thiago Crepaldi -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba