On 7/9/20 11:51 AM, Cong Wang wrote: > On Thu, Jul 9, 2020 at 10:10 AM Guenter Roeck <li...@roeck-us.net> wrote: >> >> Something seems fishy with the use of skcd->val on big endian systems. >> >> Some debug output: >> >> [ 22.643703] sock: ##### sk_alloc(sk=000000001be28100): Calling >> cgroup_sk_alloc(000000001be28550) >> [ 22.643807] cgroup: ##### cgroup_sk_alloc(skcd=000000001be28550): >> cgroup_sk_alloc_disabled=0, in_interrupt: 0 >> [ 22.643886] cgroup: #### cgroup_sk_alloc(skcd=000000001be28550): >> cset->dfl_cgrp=0000000001224040, skcd->val=0x1224040 >> [ 22.643957] cgroup: ###### cgroup_bpf_get(cgrp=0000000001224040) >> [ 22.646451] sock: ##### sk_prot_free(sk=000000001be28100): Calling >> cgroup_sk_free(000000001be28550) >> [ 22.646607] cgroup: #### sock_cgroup_ptr(skcd=000000001be28550) -> >> 0000000000014040 [v=14040, skcd->val=14040] >> [ 22.646632] cgroup: ####### cgroup_sk_free(): skcd=000000001be28550, >> cgrp=0000000000014040 >> [ 22.646739] cgroup: ####### cgroup_sk_free(): skcd->no_refcnt=0 >> [ 22.646814] cgroup: ####### cgroup_sk_free(): Calling >> cgroup_bpf_put(cgrp=0000000000014040) >> [ 22.646886] cgroup: ###### cgroup_bpf_put(cgrp=0000000000014040) > > Excellent debugging! I thought it was a double put, but it seems to > be an endian issue. I didn't realize the bit endian machine actually > packs bitfields in a big endian way too... > > Does the attached patch address this? >
Partially. I don't see the crash anymore, but something is still odd - some of my tests require a retry with this patch applied, which previously never happened. I don't know if this is another problem with this patch, or a different problem. Unfortunately, I'll be unable to debug this further until next Tuesday. Guenter