Bug#1049873: closing 1049873
On Wed, 2025-02-12 at 08:50 +0100, Salvatore Bonaccorso wrote: > Yes my undsetstanding from your comments was that 6.12.13-1 does not > expose the problem. Okay... let me summarise :-) - 6.12. doesn't show the original problem (hanging mv) described in this bug I briefly (and wrongly) thought, that instead the NFS4.1 mountpoint would not update the file size after the mv succeeded, but that was probably just a mistake on my side. - The bookworm kernel *does* show the original problem (hanging mv). > I have reopened the bug, but I believe the only one who actually can > do something here is either you, and bisect the changes down to what > broke the behaviour, or someone else using dCache and having the > possiblity to do experiments on a dedicated note. > > I would start bisecting first by debian kernel-image packages by > narring down more closely where the behaviour got introduced, then > from there the respective upstream stable series changes. > > I hope this gives you enough guide already on how to proceed. Hmm I guess that would rather be rather be quite a "waste" of time. I cannot really test this on our production system, so I'd need to set up a test system for bisecting. And I have anyway adapted my use cases of this already with a TODO to revert after upgrading to trixie. My only idea was that we might just leave it open in case someone else stumbles over the symptom. But perhaps it's indeed best to just close it as wontfix. Sorry for the back and forth :-) Cheers, Chris.
Bug#1086028: I've reproduced the bug in QEMU
tag 1086028 + patch tag 1087809 + patch tag 1093200 + patch thanks Hi! I've finally managed to reproduce this EFAULT in QEMU (using an Erlang-based script which is shipped in the wings3d source package): 1) I've installed Debian bookworm for mips64el in qemu-system-mips64el virtual machine (version from unstable), and upgraded it to the current unstable (machine is loongson3-virt, cpu is Loongson-3A4000). 2) I have to enable SMP in qemu and use -rtc clock=rt (otherwise the virtual machine won't boot, with clock=rt sometimes it boots, sometimes it hangs). The full QEMU command line is: qemu-system-mips64el -machine loongson3-virt -m 4g -cpu Loongson-3A4000 \ -smp 2,sockets=2,cores=1,threads=1,maxcpus=2 \ -kernel vmlinuz-loongson-3 \ -rtc clock=rt \ -initrd initrd.img-loongson-3 -drive if=none,file=hda1.bin,id=hd,format=raw \ -net nic -net tap,ifname=tap0,script=/bin/true \ -device virtio-blk-pci,drive=hd -append "root=/dev/vda1 console=ttyS0" \ -nographic Here kernel and initrd can be either stock 6.1.123-1 version or 6.1.123-1 with the attached patch. Unfortunately, QEMU can't boot for me using the newest 6.12.12-1 kernel (it complains that it can't uncompress initrd, I don't know why). 4) I've install the build dependencies of wings3d (basically, only erlang-base is necessary) 5) I've extracted the wings3d source package (from stable: https://packages.debian.org/source/stable/wings3d) 6) I've added the following line as the second line to wings3d-2.2.9/intl_tools/gen_char_hrl %%! +S 4:4 +SDcpu 4:4 +c false (The first two options enable multiple threads, the last one allows some workaround for the case when monotonic clock jumps backwards, which appears to be the case for QEMU with SMP enabled). 7) I've run this gen_char_hrl in a loop until it fails. The result is that with the stock 6.1.123-1 kernel approximately in 1% cases the script aborts with message: signal-dispatcher thread got unexpected error: efault (14) which is exactly the error that prevents Erlang (and many Erlang-based packages) from building on mips64el. On the other hand, with the patched kernel the script loop is still running for more than 24 hours (a few thousands runs) without aborting. So I'm now fairly confident that the patch fixes the bug. I'm not sure if there's no adverse effects caused by the patch, so it'd be better to try it on real hardware as well. The patch is derived from the thread [1]. It reverses commit [2] with an additional change, which is necessary because of changes in expand_stack() introduced in commit [3]. [1] https://lore.kernel.org/all/mvmplxraqmd@suse.de/T/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bce37a68ff884e821a02a731897a8119e0c37b7 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d7071af890768438c14db6172cc8f9f4d04e184 Cheers! -- Sergei Golovan efault0.patch Description: Binary data
Processed: I've reproduced the bug in QEMU
Processing commands for cont...@bugs.debian.org: > tag 1086028 + patch Bug #1086028 [src:linux] loupe: FTBFS on mips64el: failed to acquire jobserver token: Bad address (os error 14) Bug #1087809 [src:linux] cargo: [mipsel64] failed to acquire jobserver token/Bad address (os error 14) Bug #1093200 [src:linux] Some packages consistently FTBFS with EFAULT (Bad address) on most mips64el buildds Added tag(s) patch. Added tag(s) patch. Added tag(s) patch. > tag 1087809 + patch Bug #1087809 [src:linux] cargo: [mipsel64] failed to acquire jobserver token/Bad address (os error 14) Bug #1086028 [src:linux] loupe: FTBFS on mips64el: failed to acquire jobserver token: Bad address (os error 14) Bug #1093200 [src:linux] Some packages consistently FTBFS with EFAULT (Bad address) on most mips64el buildds Ignoring request to alter tags of bug #1087809 to the same tags previously set Ignoring request to alter tags of bug #1086028 to the same tags previously set Ignoring request to alter tags of bug #1093200 to the same tags previously set > tag 1093200 + patch Bug #1093200 [src:linux] Some packages consistently FTBFS with EFAULT (Bad address) on most mips64el buildds Bug #1086028 [src:linux] loupe: FTBFS on mips64el: failed to acquire jobserver token: Bad address (os error 14) Bug #1087809 [src:linux] cargo: [mipsel64] failed to acquire jobserver token/Bad address (os error 14) Ignoring request to alter tags of bug #1093200 to the same tags previously set Ignoring request to alter tags of bug #1086028 to the same tags previously set Ignoring request to alter tags of bug #1087809 to the same tags previously set > thanks Stopping processing here. Please contact me if you need assistance. -- 1086028: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1086028 1087809: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087809 1093200: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093200 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Processed: tagging 1091517
Processing commands for cont...@bugs.debian.org: > tags 1091517 + upstream Bug #1091517 [src:linux] linux: xhci regression breaks fastboot usb communication with android bootloader Added tag(s) upstream. > thanks Stopping processing here. Please contact me if you need assistance. -- 1091517: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1091517 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Processed: closing 1092591
Processing commands for cont...@bugs.debian.org: > close 1092591 Bug #1092591 [src:linux] linux-image-6.12.6-amd64: SO_PEERSEC fails with ENOPROTOOPT with AppArmor enabled Marked Bug as done > thanks Stopping processing here. Please contact me if you need assistance. -- 1092591: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092591 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Processed: closing 1049873
Processing commands for cont...@bugs.debian.org: > close 1049873 6.12.13-1 Bug #1049873 [src:linux] regression: linux-image-6.1.0-10-amd64: NFS4.1/pNFS mv hangs, but finishes after Ctrl-C Marked as fixed in versions linux/6.12.13-1. Bug #1049873 [src:linux] regression: linux-image-6.1.0-10-amd64: NFS4.1/pNFS mv hangs, but finishes after Ctrl-C Marked Bug as done > thanks Stopping processing here. Please contact me if you need assistance. -- 1049873: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1049873 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1049873: closing 1049873
On Fri, Feb 14, 2025 at 01:13:52AM +0100, Christoph Anton Mitterer wrote: > On Wed, 2025-02-12 at 08:50 +0100, Salvatore Bonaccorso wrote: > > Yes my undsetstanding from your comments was that 6.12.13-1 does not > > expose the problem. > > Okay... let me summarise :-) > > - 6.12. doesn't show the original problem (hanging mv) described in > this bug > I briefly (and wrongly) thought, that instead the NFS4.1 mountpoint > would not update the file size after the mv succeeded, but that was > probably just a mistake on my side. > - The bookworm kernel *does* show the original problem (hanging mv). Then after all my marking as fixed in 6.12.13-1 was actually okay, and the BTS knows that the 6.1.y version was still unfixed. > > I have reopened the bug, but I believe the only one who actually can > > do something here is either you, and bisect the changes down to what > > broke the behaviour, or someone else using dCache and having the > > possiblity to do experiments on a dedicated note. > > > > I would start bisecting first by debian kernel-image packages by > > narring down more closely where the behaviour got introduced, then > > from there the respective upstream stable series changes. > > > > I hope this gives you enough guide already on how to proceed. > > Hmm I guess that would rather be rather be quite a "waste" of time. > I cannot really test this on our production system, so I'd need to set > up a test system for bisecting. > And I have anyway adapted my use cases of this already with a TODO to > revert after upgrading to trixie. > > My only idea was that we might just leave it open in case someone else > stumbles over the symptom. > > But perhaps it's indeed best to just close it as wontfix. > > Sorry for the back and forth :-) No problem, but given that yes I will close it with the known version fixing the problem and then let the bug go :) Regards, Salvatore