On 2021/4/17 20:33, zhudi (J) wrote: >> On 2021/4/16 11:27, zhudi (J) wrote: >>>> dependencyOn 2021/4/15 11:35, zhudi wrote: >>>>> From: Di Zhu <zhud...@huawei.com> >>>>> >>>>> We encountered a crash: in the packet receiving process, we got an >>>>> illegal VLAN device address, but the VLAN device address saved in >>>>> vmcore is correct. After checking the code, we found a possible data >>>>> competition: >>>>> CPU 0: CPU 1: >>>>> (RCU read lock) (RTNL lock) >>>>> vlan_do_receive() register_vlan_dev() >>>>> vlan_find_dev() >>>>> >>>>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid() >>>>> >>>>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is >>>>> executed before assigning a value to vlan devices array, otherwise we >>>> >>>> As my understanding, there is a dependency between calling kzalloc() and >>>> assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays, >>>> CPU and compiler can see the dependency, why can't it handling the >>>> dependency before adding the smp_wmb()? >>>> >>>> See CONTROL DEPENDENCIES section in Documentation/memory- >>>> barriers.txt: >>>> >>>> However, stores are not speculated. This means that ordering -is- >> provided >>>> for load-store control dependencies, as in the following example: >>>> >>>> q = READ_ONCE(a); >>>> if (q) { >>>> WRITE_ONCE(b, 1); >>>> } >>>> >>> >>> Maybe I didn't make it clear. This memory isolation is to ensure the order >> of >>> memset(object, 0, size) in kzalloc() operations and the subsequent array >> assignment statements. >>> >>> kzalloc() >>> ->memset(object, 0, size) >>> >>> smp_wmb() >>> >>> vg->vlan_devices_arrays[pidx][vidx] = array; >>> >>> Because __vlan_group_get_device() function depends on this order >> > >> Thanks for clarify, it would be good to mention this in the >> commit log too. > > OK, I'll change it. Thank you for your advice. > >> >> Also, __vlan_group_get_device() is used in the data path, it would >> be to avoid the barrier op too. Maybe using rcu to avoid the barrier >> if the __vlan_group_get_device() is already protected by rcu_lock. > > Using the netperf command for testing on x86, there is no difference in > performance:
This may make sense for x86 because x86 has a strong order memory model, which has smp_rmb() as compiler barrier, as my understanding. How about the weak order memory model CPU? such as arm64, which has smp_rmb() as 'dmb' instruction. Also the cpu usage may need to be looked at if data rate is at line speed. > > # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 112.113.0.12 () port 0 AF_INET > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 20.00 9386.03 > > After patch: > > # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 112.113.0.12 () port 0 AF_INET > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 20.00 9386.41 > > The same is true for UDP stream test > >> >>> >>>> >>>> >>>>> may get a wrong address from the hardware cache on another cpu. >>>>> >>>>> So fix it by adding memory barrier instruction to ensure the order of >>>>> memory operations. >>>>> >>>>> Signed-off-by: Di Zhu <zhud...@huawei.com> >>>>> --- >>>>> net/8021q/vlan.c | 2 ++ >>>>> net/8021q/vlan.h | 3 +++ >>>>> 2 files changed, 5 insertions(+) >>>>> >>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index >>>>> 8b644113715e..4f541e05cd3f 100644 >>>>> --- a/net/8021q/vlan.c >>>>> +++ b/net/8021q/vlan.c >>>>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct >> vlan_group >>>> *vg, >>>>> if (array == NULL) >>>>> return -ENOBUFS; >>>>> >>>>> + smp_wmb(); >>>>> + >>>>> vg->vlan_devices_arrays[pidx][vidx] = array; >>>>> return 0; >>>>> } >>>>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index >>>>> 953405362795..7408fda084d3 100644 >>>>> --- a/net/8021q/vlan.h >>>>> +++ b/net/8021q/vlan.h >>>>> @@ -57,6 +57,9 @@ static inline struct net_device >>>>> *__vlan_group_get_device(struct vlan_group *vg, >>>>> >>>>> array = vg->vlan_devices_arrays[pidx] >>>>> [vlan_id / >>>> VLAN_GROUP_ARRAY_PART_LEN]; >>>>> + >>>>> + smp_rmb(); >>>>> + >>>>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] : >>>> NULL; } >>>>> >>>>> >>> >