On Tue, Mar 09, 2021 at 12:40:11PM -0500, Steven Rostedt wrote:
> On Tue,  9 Mar 2021 12:43:50 +0800
> Tony Lu <ton...@linux.alibaba.com> wrote:
> 
> > There are lots of net namespaces on the host runs containers like k8s.
> > It is very common to see the same interface names among different net
> > namespaces, such as eth0. It is not possible to distinguish them without
> > net namespace inode.
> > 
> > This adds net namespace inode for all net_dev events, help us
> > distinguish between different net devices.
> > 
> > Output:
> >   <idle>-0       [006] ..s.   133.306989: net_dev_xmit: net_inum=4026531992 
> > dev=eth0 skbaddr=0000000011a87c68 len=54 rc=0
> > 
> > Signed-off-by: Tony Lu <ton...@linux.alibaba.com>
> > ---
> >  include/trace/events/net.h | 35 +++++++++++++++++++++++++----------
> >  1 file changed, 25 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> > index 2399073c3afc..a52f90d83411 100644
> > --- a/include/trace/events/net.h
> > +++ b/include/trace/events/net.h
> > @@ -35,6 +35,7 @@ TRACE_EVENT(net_dev_start_xmit,
> >             __field(        u16,                    gso_size        )
> >             __field(        u16,                    gso_segs        )
> >             __field(        u16,                    gso_type        )
> > +           __field(        unsigned int,           net_inum        )
> >     ),
> 
> This patch made me take a look at the net_dev_start_xmit trace event, and I
> see it's very "holy". That is, it has lots of holes in the structure.
> 
>       TP_STRUCT__entry(
>               __string(       name,                   dev->name       )
>               __field(        u16,                    queue_mapping   )
>               __field(        const void *,           skbaddr         )
>               __field(        bool,                   vlan_tagged     )
>               __field(        u16,                    vlan_proto      )
>               __field(        u16,                    vlan_tci        )
>               __field(        u16,                    protocol        )
>               __field(        u8,                     ip_summed       )
>               __field(        unsigned int,           len             )
>               __field(        unsigned int,           data_len        )
>               __field(        int,                    network_offset  )
>               __field(        bool,                   transport_offset_valid)
>               __field(        int,                    transport_offset)
>               __field(        u8,                     tx_flags        )
>               __field(        u16,                    gso_size        )
>               __field(        u16,                    gso_segs        )
>               __field(        u16,                    gso_type        )
>               __field(        unsigned int,           net_inum        )
>       ),
> 
> If you look at /sys/kernel/tracing/events/net/net_dev_start_xmit/format
> 
> name: net_dev_start_xmit
> ID: 1581
> format:
>       field:unsigned short common_type;       offset:0;       size:2; 
> signed:0;
>       field:unsigned char common_flags;       offset:2;       size:1; 
> signed:0;
>       field:unsigned char common_preempt_count;       offset:3;       size:1; 
> signed:0;
>       field:int common_pid;   offset:4;       size:4; signed:1;
> 
>       field:__data_loc char[] name;   offset:8;       size:4; signed:1;
>       field:u16 queue_mapping;        offset:12;      size:2; signed:0;
>       field:const void * skbaddr;     offset:16;      size:8; signed:0;
> 
> Notice, queue_mapping is 2 bytes at offset 12 (ends at offset 14), but
> skbaddr starts at offset 16. That means there's two bytes wasted.
> 
>       field:bool vlan_tagged; offset:24;      size:1; signed:0;
>       field:u16 vlan_proto;   offset:26;      size:2; signed:0;
> 
> Another byte missing above (24 + 1 != 26)
> 
>       field:u16 vlan_tci;     offset:28;      size:2; signed:0;
>       field:u16 protocol;     offset:30;      size:2; signed:0;
>       field:u8 ip_summed;     offset:32;      size:1; signed:0;
>       field:unsigned int len; offset:36;      size:4; signed:0;
> 
> Again another three bytes missing (32 + 1 != 36)
> 
>       field:unsigned int data_len;    offset:40;      size:4; signed:0;
>       field:int network_offset;       offset:44;      size:4; signed:1;
>       field:bool transport_offset_valid;      offset:48;      size:1; 
> signed:0;
>       field:int transport_offset;     offset:52;      size:4; signed:1;
> 
> Again, another 3 bytes missing (48 + 1 != 52)
> 
>       field:u8 tx_flags;      offset:56;      size:1; signed:0;
>       field:u16 gso_size;     offset:58;      size:2; signed:0;
> 
> Another byte missing (56 + 1 != 58)
> 
>       field:u16 gso_segs;     offset:60;      size:2; signed:0;       
>       field:u16 gso_type;     offset:62;      size:2; signed:0;
>       field:unsigned int net_inum;    offset:64;      size:4; signed:0;
> 
> The above shows 10 bytes wasted for this event.
> 
> The order of the fields is important. Don't worry about breaking API by
> fixing it. The parsing code uses this output to find where the binary data
> is.
> 
> Hmm, I should write a script that reads all the format files and point out
> areas that have holes in it.

I use pahole to read vmlinux.o directly with defconfig and
CONFIG_DEBUG_INFO enabled, the result shows 22 structs prefixed with
trace_event_raw_ that have at least one hole.

Command:
        # structs have at least one hole and can be packed
        pahole vmlinux.o -H 1 -R -P -y trace_event_raw_ > output.txt

        # details (result includes above)
        pahole vmlinux.o -H 1 -y trace_event_raw_ > output_detail.txt

Here is the lists (>= 1 hole and can be packed, #1 size, #2 ):

        trace_event_raw_mm_shrink_slab_start    80      72      8
        trace_event_raw_mm_shrink_slab_end      64      56      8
        trace_event_raw_percpu_alloc_percpu     56      48      8
        trace_event_raw_writeback_inode_template        48      40      8
        trace_event_raw_filelock_lock   88      80      8
        trace_event_raw_iomap_apply     72      64      8
        trace_event_raw_ext4__map_blocks_exit   56      48      8
        trace_event_raw_ext4_ext_rm_leaf        64      56      8
        trace_event_raw_ext4_ext_remove_space_done      64      56      8
        trace_event_raw_nfs_rename_event_done   56      48      8
        trace_event_raw_nfs4_rename     56      48      8
        trace_event_raw_net_dev_start_xmit      72      64      8
        trace_event_raw_net_dev_rx_verbose_template     88      72      16
        trace_event_raw_tcp_probe       120     112     8
        trace_event_raw_qdisc_dequeue   64      56      8
        trace_event_raw_rpc_xdr_alignment       88      80      8
        trace_event_raw_rdev_return_int_mesh_config     108     104     4
        trace_event_raw_rdev_update_mesh_config 128     120     8
        trace_event_raw_rdev_get_ftm_responder_stats    120     112     8
        trace_event_raw_drv_bss_info_changed    184     168     16
        trace_event_raw_drv_ampdu_action        88      80      8
        trace_event_raw_drv_tdls_recv_channel_switch    112     104     8


Here is the detail (net_dev_start_xmit as example):

        struct trace_event_raw_net_dev_start_xmit {
                struct trace_entry         ent;                  /*     0     8 
*/
                u32                        __data_loc_name;      /*     8     4 
*/
                u16                        queue_mapping;        /*    12     2 
*/
        
                /* XXX 2 bytes hole, try to pack */
        
                const void  *              skbaddr;              /*    16     8 
*/
                bool                       vlan_tagged;          /*    24     1 
*/
        
                /* XXX 1 byte hole, try to pack */
        
                u16                        vlan_proto;           /*    26     2 
*/
                u16                        vlan_tci;             /*    28     2 
*/
                u16                        protocol;             /*    30     2 
*/
                u8                         ip_summed;            /*    32     1 
*/
        
                /* XXX 3 bytes hole, try to pack */
        
                unsigned int               len;                  /*    36     4 
*/
                unsigned int               data_len;             /*    40     4 
*/
                int                        network_offset;       /*    44     4 
*/
                bool                       transport_offset_valid; /*    48     
1 */
        
                /* XXX 3 bytes hole, try to pack */
        
                int                        transport_offset;     /*    52     4 
*/
                u8                         tx_flags;             /*    56     1 
*/
        
                /* XXX 1 byte hole, try to pack */
        
                u16                        gso_size;             /*    58     2 
*/
                u16                        gso_segs;             /*    60     2 
*/
                u16                        gso_type;             /*    62     2 
*/
                /* --- cacheline 1 boundary (64 bytes) --- */
                unsigned int               net_inum;             /*    64     4 
*/
                char                       __data[];             /*    68     0 
*/
        
                /* size: 72, cachelines: 2, members: 20 */
                /* sum members: 58, holes: 5, sum holes: 10 */
                /* padding: 4 */
                /* last cacheline: 8 bytes */
        };

pahole shows there are 5 holes with 10 bytes in net_dev_start_xmit event.


Cheers,
Tony Lu

> 
> I haven't looked at the other trace events, but they may all have the same
> issues.
> 
> -- Steve
> 
> 
> 
> >  
> >     TP_fast_assign(
> > @@ -56,10 +57,12 @@ TRACE_EVENT(net_dev_start_xmit,
> >             __entry->gso_size = skb_shinfo(skb)->gso_size;
> >             __entry->gso_segs = skb_shinfo(skb)->gso_segs;
> >             __entry->gso_type = skb_shinfo(skb)->gso_type;
> > +           __entry->net_inum = dev_net(skb->dev)->ns.inum;
> >     ),
> >  
> >

Reply via email to