Hi,
It took a while but I have finally observed some of the races between
NAPI poll and .ndo_start_xmit in e1000 I was talking about. That is,
e1000_clean VS e1000_xmit_frame. To reproduce the races, I added the
delays in the appropriate places of e1000 and used hardware breakpoints
to track accesses to the appropriate memory locations [1].
Everything was done on kernel 3.10.10, should be similar in the newer
versions as well.
So, at least the following races do actually happen. It is up to the
maintainers of e1000 to decide how harmful these races are.
1.
Race on dql->num_queued:
CPU0 CPU1
e1000_xmit_frame e1000_clean
(e1000_main.c:3264) (e1000_main.c:3812)
| |
netdev_sent_queue e1000_clean_tx_irq
(netdevice.h:1989) (e1000_main.c:3881)
| |
netdev_tx_sent_queue netdev_completed_queue
(netdevice.h:1967) (netdevice.h:2019)
| |
dql_queued netdev_tx_completed_queue
(dynamic_queue_limits.h:76) (netdevice.h:2008)
| |
| dql_avail
| (dynamic_queue_limits.h:83)
| |
dql->num_queued += count; return dql->adj_limit - dql->num_queued;
To reproduce this race as well as the races listed below, I just ran the
test at speedtest.net in Firefox, downloaded a couple of archives from
kernel.org and checked for software updates at the same time. Several
minutes were enough.
2.
Races on buffer_info->next_to_watch.
That value was read at least in two places in the code:
e1000_clean (e1000_main.c:3812)
e1000_clean_tx_irq (e1000_main.c:3848):
eop = tx_ring->buffer_info[i].next_to_watch;
and
e1000_clean (e1000_main.c:3812)
e1000_clean_tx_irq (e1000_main.c:3875):
eop = tx_ring->buffer_info[i].next_to_watch;
Each of these reads happened concurrently with each of the following two
writes to the same memory location several times:
e1000_xmit_frame (e1000_main.c:3251)
e1000_tx_csum (e1000_main.c:2814):
buffer_info->next_to_watch = i;
and
e1000_xmit_frame (e1000_main.c:3260)
e1000_tx_map (e1000_main.c:2952):
tx_ring->buffer_info[first].next_to_watch = i;
3.
Race on tx_ring->next_to_clean.
That value was read at the following location in the code:
e1000_xmit_frame (e1000_main.c:3223)
e1000_maybe_stop_tx (e1000_main.c:3102)
if (likely(E1000_DESC_UNUSED(tx_ring) >= size))
...
E1000_DESC_UNUSED() is defined as follows in e1000.h:
#define E1000_DESC_UNUSED(R) \
((((R)->next_to_clean > (R)->next_to_use) \
? 0 : (R)->count) + (R)->next_to_clean - (R)->next_to_use - 1)
'next_to_clean' is actually read there.
This read happened concurrently with the following write:
e1000_clean (e1000_main.c:3812)
e1000_clean_tx_irq (e1000_main.c:3879)
tx_ring->next_to_clean = i;
4.
Race on tx_ring->next_to_use.
That value was read at the following location in the code:
e1000_clean (e1000_main.c:3812)
e1000_clean_tx_irq (e1000_main.c:3885)
E1000_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD)) {
...
Similar to 'next_to_clean', 'next_to_use' is read in
E1000_DESC_UNUSED(tx_ring) too.
This read happened concurrently with the following write:
e1000_xmit_frame (e1000_main.c:3251)
e1000_tx_csum (e1000_main.c:2817)
tx_ring->next_to_use = i;
5.
Race on tx_desc->upper.data.
That value was read at the following location in the code:
e1000_clean (e1000_main.c:3812)
e1000_clean_tx_irq (e1000_main.c:3851)
while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&
...
This read happened concurrently with the following writes several times:
e1000_xmit_frame (e1000_main.c:3251)
e1000_tx_csum (e1000_main.c:2810)
context_desc->tcp_seg_setup.data = 0;
and
e1000_xmit_frame (e1000_main.c:3267)
e1000_tx_queue (e1000_main.c:3013)
tx_desc->upper.data = cpu_to_le32(txd_upper);
6.
Race on buffer_info->time_stamp.
That value was read at the following location in the code:
e1000_clean (e1000_main.c:3812)
e1000_clean_tx_irq (e1000_main.c:3903)
if (tx_ring->buffer_info[eop].time_stamp &&
time_after(jiffies, tx_ring->buffer_info[eop].time_stamp +
...
This read happened concurrently with the following write:
e1000_xmit_frame (e1000_main.c:3260)
e1000_tx_map (e1000_main.c:2878)
buffer_info->time_stamp = jiffies;
When trying to reproduce this race (the system under test was running
inside VirtualBox VM), I have detached the virtual network adapter in
VirtualBox from the host adapter and attached it again in a few seconds.
The guest system was running the test via speedtest.net at the same
time, as usual.
[1] RaceHound tool was used to automate the tasks of introducing the
delays and using the hardware breakpoints:
https://github.com/winnukem/racehound. It is similar to DataCollider
tool for Microsoft Windows kernel.
Regards,
Eugene
--
Eugene Shatokhin, ROSA Laboratory.
www.rosalab.com
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired