Hi,

It took a while but I have finally observed some of the races between 
NAPI poll and .ndo_start_xmit in e1000 I was talking about. That is, 
e1000_clean VS e1000_xmit_frame. To reproduce the races, I added the 
delays in the appropriate places of e1000 and used hardware breakpoints 
to track accesses to the appropriate memory locations [1].

Everything was done on kernel 3.10.10, should be similar in the newer 
versions as well.

So, at least the following races do actually happen. It is up to the 
maintainers of e1000 to decide how harmful these races are.

1.
Race on dql->num_queued:

            CPU0                           CPU1
     e1000_xmit_frame                e1000_clean
     (e1000_main.c:3264)             (e1000_main.c:3812)
             |                               |
     netdev_sent_queue               e1000_clean_tx_irq
     (netdevice.h:1989)              (e1000_main.c:3881)
             |                               |
     netdev_tx_sent_queue            netdev_completed_queue
     (netdevice.h:1967)              (netdevice.h:2019)
             |                               |
     dql_queued                      netdev_tx_completed_queue
     (dynamic_queue_limits.h:76)     (netdevice.h:2008)
             |                               |
             |                       dql_avail
             |                       (dynamic_queue_limits.h:83)
             |                               |
    dql->num_queued += count;   return dql->adj_limit - dql->num_queued;

To reproduce this race as well as the races listed below, I just ran the 
test at speedtest.net in Firefox, downloaded a couple of archives from 
kernel.org and checked for software updates at the same time. Several 
minutes were enough.

2.
Races on buffer_info->next_to_watch.

That value was read at least in two places in the code:

     e1000_clean (e1000_main.c:3812)
     e1000_clean_tx_irq (e1000_main.c:3848):
       eop = tx_ring->buffer_info[i].next_to_watch;

and

     e1000_clean (e1000_main.c:3812)
     e1000_clean_tx_irq (e1000_main.c:3875):
       eop = tx_ring->buffer_info[i].next_to_watch;

Each of these reads happened concurrently with each of the following two 
writes to the same memory location several times:

     e1000_xmit_frame (e1000_main.c:3251)
     e1000_tx_csum (e1000_main.c:2814):
       buffer_info->next_to_watch = i;

and

     e1000_xmit_frame (e1000_main.c:3260)
     e1000_tx_map (e1000_main.c:2952):
       tx_ring->buffer_info[first].next_to_watch = i;

3.
Race on tx_ring->next_to_clean.

That value was read at the following location in the code:

     e1000_xmit_frame (e1000_main.c:3223)
     e1000_maybe_stop_tx (e1000_main.c:3102)
       if (likely(E1000_DESC_UNUSED(tx_ring) >= size))
       ...

E1000_DESC_UNUSED() is defined as follows in e1000.h:
#define E1000_DESC_UNUSED(R)                        \
     ((((R)->next_to_clean > (R)->next_to_use)           \
       ? 0 : (R)->count) + (R)->next_to_clean - (R)->next_to_use - 1)

'next_to_clean' is actually read there.

This read happened concurrently with the following write:

     e1000_clean (e1000_main.c:3812)
     e1000_clean_tx_irq (e1000_main.c:3879)
       tx_ring->next_to_clean = i;

4.
Race on tx_ring->next_to_use.

That value was read at the following location in the code:

     e1000_clean (e1000_main.c:3812)
     e1000_clean_tx_irq (e1000_main.c:3885)
       E1000_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD)) {
       ...

Similar to 'next_to_clean', 'next_to_use' is read in 
E1000_DESC_UNUSED(tx_ring) too.

This read happened concurrently with the following write:

     e1000_xmit_frame (e1000_main.c:3251)
     e1000_tx_csum (e1000_main.c:2817)
       tx_ring->next_to_use = i;

5.
Race on tx_desc->upper.data.

That value was read at the following location in the code:

     e1000_clean (e1000_main.c:3812)
     e1000_clean_tx_irq (e1000_main.c:3851)
       while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&
       ...

This read happened concurrently with the following writes several times:

     e1000_xmit_frame (e1000_main.c:3251)
     e1000_tx_csum (e1000_main.c:2810)
       context_desc->tcp_seg_setup.data = 0;

and

     e1000_xmit_frame (e1000_main.c:3267)
     e1000_tx_queue (e1000_main.c:3013)
       tx_desc->upper.data = cpu_to_le32(txd_upper);

6.
Race on buffer_info->time_stamp.

That value was read at the following location in the code:

     e1000_clean (e1000_main.c:3812)
     e1000_clean_tx_irq (e1000_main.c:3903)
       if (tx_ring->buffer_info[eop].time_stamp &&
           time_after(jiffies, tx_ring->buffer_info[eop].time_stamp +
       ...

This read happened concurrently with the following write:

     e1000_xmit_frame (e1000_main.c:3260)
     e1000_tx_map (e1000_main.c:2878)
       buffer_info->time_stamp = jiffies;

When trying to reproduce this race (the system under test was running 
inside VirtualBox VM), I have detached the virtual network adapter in 
VirtualBox from the host adapter and attached it again in a few seconds. 
The guest system was running the test via speedtest.net at the same 
time, as usual.


[1] RaceHound tool was used to automate the tasks of introducing the 
delays and using the hardware breakpoints:
https://github.com/winnukem/racehound. It is similar to DataCollider 
tool for Microsoft Windows kernel.


Regards,

Eugene

-- 
Eugene Shatokhin, ROSA Laboratory.
www.rosalab.com

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to