Hi,
There seem to be several data races in e1000 driver (checked a couple of
usage scenarios on the vanilla Linux kernel 3.11-rc1). Perhaps, not that
critical but still worth fixing, I suppose.
1.
CPU0 CPU1
e1000_probe()
|
register_netdev()
|
| e1000_open()
| |
| e1000_configure()
| |
| e1000_set_rx_mode()
| |
| rctl = er32(RCTL);
| (e1000_main.c:2237)
| |
| |
e1000_vlan_filter_on_off |
(e1000_main.c:1222) |
| |
ew32(RCTL, rctl); |
(e1000_main.c:4823) |
|
ew32(RCTL, rctl);
(e1000_main.c:2259)
As soon as register_netdev() registers the network device at
e1000_main.c:1218, something (e.g. NetworkManager) may try to access the
device thus triggering e1000_open() and all the path depicted above.
The race window is rather small but still it is possible for the write
of RCTL in e1000_vlan_filter_on_off() interfere with its update in
e1000_set_rx_mode() as depicted above.
Perhaps, moving the call to e1000_vlan_filter_on_off() in e1000_probe()
before register_netdev() will fix this:
------------------------------
+ e1000_vlan_filter_on_off(adapter, false);
+
strcpy(netdev->name, "eth%d");
err = register_netdev(netdev);
if (err)
goto err_register;
- e1000_vlan_filter_on_off(adapter, false);
------------------------------
2.
The accesses to adapter->link_speed, adapter->link_duplex,
adapter->stats.*, hw->tx_packet_delta, hw->collision_delta in
e1000_get_ethtool_stats() are probably not synchronized with those in
e1000_update_stats().
Example:
CPU0 CPU1
e1000_watchdog
(e1000_main.c:2500)
|
e1000_update_stats
|
<lock adapter->stats_lock> e1000_get_ethtool_stats
(e1000_main.c:3620) |
| |
| data[i] = ... *(u32 *)p;
| (e1000_ethtool.c:1850)
adapter->stats.xonrxc += er32(XONRXC);
(e1000_main.c:3651)
|
<unlock adapter->stats_lock>
(e1000_main.c:3749)
e1000_update_stats() as well as some other functions take
'adapter->stats_lock' when they access these data, perhaps
e1000_get_ethtool_stats() should do that too?
3.
Our tools also report a number of possible races between e1000_clean()
(NAPI callback) and e1000_xmit_frame() but we may have missed something.
NAPI poll callbacks and ndo_start_xmit callbacks cannot race with each
other on a single CPU. But in my experiments, they executed on different
CPUs.
Is there anything that ensures they never execute concurrently even on
different CPUs? If so, I'll update our tools accordingly to avoid such
false positives.
If the callbacks may execute concurrently then there are races in e1000
on the accesses to tx_ring->next_to_clean, tx_ring->next_to_use,
tx_ring->buffer_info[i].*, tx_desc->upper.data. Same for the accesses to
dql->num_queued in netdev_tx_sent_queue() and netdev_completed_queue()
called from these callbacks.
Example:
CPU0 CPU1
e1000_clean()
(e1000_main.c:3812)
|
e1000_clean_tx_irq e1000_xmit_frame
(e1000_main.c:3869) (e1000_main.c:3260)
| |
e1000_unmap_and_free_tx_resource e1000_tx_map
(e1000_main.c:1961) (e1000_main.c:2880)
| |
if (buffer_info->dma) { ... buffer_info->dma = ...
------------------------------
So, what do you think?
Regards,
Eugene
--
Eugene Shatokhin, ROSA Laboratory.
www.rosalab.com
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired