Hello, Since libcrypto.so is implicated, Andres asked me off-list if my changes to random number state initialisation might be linked to skink's failures beginning 12 or 15 days ago. It appears not, as it was green for several runs after that commit. Looking at the report:
==2802== VALGRINDERROR-BEGIN ==2802== Invalid read of size 8 ==2802== at 0x4DED5A5: check_free (dlerror.c:188) ==2802== by 0x4DEDAB1: free_key_mem (dlerror.c:221) ==2802== by 0x4DEDAB1: __dlerror_main_freeres (dlerror.c:239) ==2802== by 0x55DCF81: __libc_freeres (in /lib/x86_64-linux-gnu/libc-2.28.so) ==2802== by 0x482D19E: _vgnU_freeres (vg_preloaded.c:77) ==2802== by 0x478AD3: bgworker_quickdie (bgworker.c:661) ==2802== by 0x48626AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so) ==2802== by 0x556DB76: epoll_wait (epoll_wait.c:30) ==2802== by 0x4E25B9: WaitEventSetWaitBlock (latch.c:1078) ==2802== by 0x4E25B9: WaitEventSetWait (latch.c:1030) ==2802== by 0x4E28C1: WaitLatchOrSocket (latch.c:407) ==2802== by 0x4E29A6: WaitLatch (latch.c:347) ==2802== by 0x49E03E: ApplyLauncherMain (launcher.c:1062) ==2802== by 0x479831: StartBackgroundWorker (bgworker.c:834) ==2802== Address 0x7fd7e28 is 12 bytes after a block of size 12 alloc'd ==2802== at 0x483577F: malloc (vg_replace_malloc.c:299) ==2802== by 0x4C3BD38: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==2802== by 0x4C37F8D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==2802== by 0x4C615B9: RAND_DRBG_get0_public (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==2802== by 0x4C615EF: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==2802== by 0x675D75: pg_strong_random (pg_strong_random.c:135) ==2802== by 0x4848EB: RandomCancelKey (postmaster.c:5251) ==2802== by 0x484909: assign_backendlist_entry (postmaster.c:5822) ==2802== by 0x4873BA: do_start_bgworker (postmaster.c:5692) ==2802== by 0x487701: maybe_start_bgworkers (postmaster.c:5955) ==2802== by 0x4878C2: reaper (postmaster.c:2940) ==2802== by 0x48626AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so) ==2802== ==2802== VALGRINDERROR-END The function __libc_freeres is a special glibc entry point provided for leak checkers to call explicitly if they want glibc to clean up after itself (normally it doesn't bother). The specific thing being cleaned up here is a piece of thread local storage that belongs to the dynamic linker support code: https://github.com/lattera/glibc/blob/master/dlfcn/dlerror.c#L228 Since we don't see strcmp() or free() at the top of the stack (and assuming they aren't inlined), I think the line numbers must line up with current glibc HEAD as of today, and it must be failing on accessing rec->errstring at line 188, meaning that rec (the value stored as a thread specific key) is a bad pointer. That's quite strange and I don't have an explanation; if libcrypto overran its buffer, for example, that would perhaps trash rec->errstring but we'd still be able to read the pointer itself. So I wonder if libcrypto.so is a red herring here. It's Debian unstable, which could be a factor. Bugs in glibc? That's 2.28, out for 3 months now, but then why only in Apply Launcher? Did we trash 'key', or the thread specific pointer table, or is my assessment above wrong, and somehow it's really errstring that is a bad pointer (which would allow for a more mundane explanation, like someone trashed a bit of heap memory by overrunning a buffer)? -- Thomas Munro http://www.enterprisedb.com