Ludovic Courtès <l...@gnu.org> skribis: > While building the “guix-system.drv” derivation on AArch64, I got this > crash (not fully deterministic but quite frequent). Here the > finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e., > accessing a one-element weak vector):
With 3.0.1, I can reproduce the bug on x86_64. With rr (thanks, Andy!), I found this (starting from the point where the type cell of the weak vector is zeroed, and reverse-continuing until its gets its original value of 0x10f): --8<---------------cut here---------------start------------->8--- (rr) frame 40 #40 0x00007ffff7f2e66d in scm_i_weak_car (pair=0x7fffe15af690) at ../libguile/pairs.h:190 190 return SCM_CAR (x); (rr) down #39 0x00007ffff7f2f576 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193 193 SCM_VALIDATE_WEAK_VECTOR (1, wv); (rr) #38 0x00007ffff7ea7ba0 in scm_wrong_type_arg_msg ( subr=subr@entry=0x7ffff7f56f00 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1, bad_value=0x7fffec472b90, szMessage=szMessage@entry=0x7ffff7f56e80 "weak vector") at error.c:282 282 scm_error (scm_arg_type_key, (rr) p *((void**)0x7fffec472b90) $1 = (void *) 0x0 (rr) watch *((void**)0x7fffec472b90) Hardware watchpoint 1: *((void**)0x7fffec472b90) (rr) reverse-cont Continuing. Thread 1 received signal SIGCONT, Continued. [Switching to Thread 27074.27074] __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:101 101 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Dosiero aŭ dosierujo ne ekzistas. (rr) Continuing. Thread 1 hit Hardware watchpoint 1: *((void**)0x7fffec472b90) Old value = (void *) 0x0 New value = (void *) 0x10f __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259 259 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Dosiero aŭ dosierujo ne ekzistas. (rr) bt #0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259 #1 0x00007ffff7f1d499 in set_vtable_access_fields (vtable=vtable@entry=0x7fffeb48ee80) at struct.c:143 #2 0x00007ffff7f1dd8d in scm_i_struct_inherit_vtable_magic (vtable=vtable@entry=0x7ffff4e32fa0, obj=obj@entry=0x7fffeb48ee80) at struct.c:215 #3 0x00007ffff7f1dfea in scm_c_make_structv (vtable=0x7ffff4e32fa0, n_tail=<optimized out>, n_init=8, init=0x7fffffff50d0) at struct.c:364 #4 0x00007ffff7f1e0b9 in scm_make_struct_no_tail (vtable=0x7ffff4e32fa0, init=0x304) at struct.c:491 --8<---------------cut here---------------end--------------->8--- Bingo! There’s a mismatch in struct.c: --8<---------------cut here---------------start------------->8--- bitmask_size = (nfields + 31U) / 32U; unboxed_fields = scm_gc_malloc_pointerless (bitmask_size, "unboxed fields"); memset (unboxed_fields, 0, bitmask_size * sizeof(*unboxed_fields)); --8<---------------cut here---------------end--------------->8--- Pushed a fix as 7c17655cd3d859bf0c5a86d9782a7788205fc05a. Thanks, rr! You made my day! :-) Now testing Guix builds on x86_64, i686, ARMv7, and AArch64 to see if that addresses seemingly related issues. Ludo’.