> On Feb 2, 2017, at 11:21 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > > On Thu, 2017-02-02 at 15:52 -0800, Alexander Duyck wrote: >> On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham <joel.cunning...@me.com> >> wrote: >>> Hi, >>> >>> I’m studying the synchronization used on different parts of struct >>> net_device and I’m struggling to understand how structure member >>> modifications in dev_ioctl are synchronized. Getters in >>> dev_ifsioc_locked() are only holding rcu_read_lock() while setters in >>> dev_ifsioc() are holding rtnl_lock, but not using RCU APIs. I was >>> specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR. What’s to prevent one >>> CPU from executing a getter and another CPU from executing a setter >>> resulting in possibly a torn read/write? I didn’t see anything in >>> rtnl_lock() that would wait for any rcu_reader_lock() critical sections (on >>> other CPUs) to finish before acquiring the mutex. >>> >>> Is there something about dev_ioctl that prevents parallel execution? or >>> maybe something I still don’t understand about the RCU implementation? >>> >>> Thanks, >>> >>> Joel >> >> My advice would be to spend more time familiarizing yourself with RCU. >> The advantage of RCU is that it allows for updates while other threads >> are accessing the data. The rtnl_lock is just meant to prevent >> multiple writers from updating the data simultaneously. So between >> writers the rtnl_lock is used to keep things synchronized, but between >> writers and readers the mechanism that is meant to protect the data >> and keep it sane is RCU. > > Note that sometimes we do not properly handle the case one field can be > written by a writer holding RTNL (or socket lock or something else) > > We often believe compiler wont do something stupid, but it can > sometimes. > > We definitely should scrutinize things a bit more, or maybe add __rcu > like annotations to catch potential problems earlier.
This is my hunch from looking at dev_ioctl(). For some of the other fields, there is additional support to detect a write during the read, but not any of the ioctls handled in dev_ifsioc_locked(). For example, SIOCGIFNAME, dev_ifname() calls netdev_get_name() to copy dev->name, which uses devnet_rename_seq seqcount to detect if another thread called dev_change_name() and updated the name. I found more examples of accessing net_device fields in net-sysfs.c and these instances are all acquire dev_base_lock/rtnl_lock before reading fields. Maybe dev_ioctl should be implemented this way > > We recently found an issue in drivers/net/macvtap.c and > drivers/net/tun.c using q->vnet_hdr_sz without proper annotation. > > macvtap patch would be : > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c > index > 4026185658381df004a7d641e2be7bcb9a45b509..d11a807565acf371f9bbb4afbfaca1aacd000138 > 100644 > --- a/drivers/net/macvtap.c > +++ b/drivers/net/macvtap.c > @@ -681,7 +681,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, > struct msghdr *m, > size_t linear; > > if (q->flags & IFF_VNET_HDR) { > - vnet_hdr_len = q->vnet_hdr_sz; > + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); > > err = -EINVAL; > if (len < vnet_hdr_len) > @@ -820,7 +820,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, > > if (q->flags & IFF_VNET_HDR) { > struct virtio_net_hdr vnet_hdr; > - vnet_hdr_len = q->vnet_hdr_sz; > + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); > if (iov_iter_count(iter) < vnet_hdr_len) > return -EINVAL; > > @@ -1090,7 +1090,7 @@ static long macvtap_ioctl(struct file *file, unsigned > int cmd, > if (s < (int)sizeof(struct virtio_net_hdr)) > return -EINVAL; > > - q->vnet_hdr_sz = s; > + WRITE_ONCE(q->vnet_hdr_sz, s); > return 0; > > case TUNGETVNETLE: Joel