On 2021/4/16 11:27, zhudi (J) wrote: >> dependencyOn 2021/4/15 11:35, zhudi wrote: >>> From: Di Zhu <zhud...@huawei.com> >>> >>> We encountered a crash: in the packet receiving process, we got an >>> illegal VLAN device address, but the VLAN device address saved in >>> vmcore is correct. After checking the code, we found a possible data >>> competition: >>> CPU 0: CPU 1: >>> (RCU read lock) (RTNL lock) >>> vlan_do_receive() register_vlan_dev() >>> vlan_find_dev() >>> >>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid() >>> >>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is >>> executed before assigning a value to vlan devices array, otherwise we >> >> As my understanding, there is a dependency between calling kzalloc() and >> assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays, >> CPU and compiler can see the dependency, why can't it handling the >> dependency before adding the smp_wmb()? >> >> See CONTROL DEPENDENCIES section in Documentation/memory- >> barriers.txt: >> >> However, stores are not speculated. This means that ordering -is- provided >> for load-store control dependencies, as in the following example: >> >> q = READ_ONCE(a); >> if (q) { >> WRITE_ONCE(b, 1); >> } >> > > Maybe I didn't make it clear. This memory isolation is to ensure the order > of > memset(object, 0, size) in kzalloc() operations and the subsequent array > assignment statements. > > kzalloc() > ->memset(object, 0, size) > > smp_wmb() > > vg->vlan_devices_arrays[pidx][vidx] = array; > > Because __vlan_group_get_device() function depends on this order
Thanks for clarify, it would be good to mention this in the commit log too. Also, __vlan_group_get_device() is used in the data path, it would be to avoid the barrier op too. Maybe using rcu to avoid the barrier if the __vlan_group_get_device() is already protected by rcu_lock. > >> >> >>> may get a wrong address from the hardware cache on another cpu. >>> >>> So fix it by adding memory barrier instruction to ensure the order of >>> memory operations. >>> >>> Signed-off-by: Di Zhu <zhud...@huawei.com> >>> --- >>> net/8021q/vlan.c | 2 ++ >>> net/8021q/vlan.h | 3 +++ >>> 2 files changed, 5 insertions(+) >>> >>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index >>> 8b644113715e..4f541e05cd3f 100644 >>> --- a/net/8021q/vlan.c >>> +++ b/net/8021q/vlan.c >>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct vlan_group >> *vg, >>> if (array == NULL) >>> return -ENOBUFS; >>> >>> + smp_wmb(); >>> + >>> vg->vlan_devices_arrays[pidx][vidx] = array; >>> return 0; >>> } >>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index >>> 953405362795..7408fda084d3 100644 >>> --- a/net/8021q/vlan.h >>> +++ b/net/8021q/vlan.h >>> @@ -57,6 +57,9 @@ static inline struct net_device >>> *__vlan_group_get_device(struct vlan_group *vg, >>> >>> array = vg->vlan_devices_arrays[pidx] >>> [vlan_id / >> VLAN_GROUP_ARRAY_PART_LEN]; >>> + >>> + smp_rmb(); >>> + >>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] : >> NULL; } >>> >>> >