https://bugs.kde.org/show_bug.cgi?id=434926
nyanpasu64 <nyanpas...@tuta.io> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nyanpas...@tuta.io --- Comment #9 from nyanpasu64 <nyanpas...@tuta.io> --- I've been getting constant baloo crashes myself too, but within the last few weeks it's started happening more often (every time I searched in the application launcher or similar). To debug, I ran baloo_file under rr, and traced the resulting crash using Pernosco. (Sorry, I don't feel comfortable sharing the URL since the trace contains filesystem paths.) Oddly baloo_file's main thread spawns a worker thread and a child process (which itself spawns a worker thread). Then the parent process's worker thread crashes (taking the main thread with it), while the child process continues running in the background like a daemon (not sure exactly what happens, it may itself die at a later time?). I don't see any thread-unsafety related to this crash. The crash backtrace is: ``` (pernosco) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=7, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007f48bc1253d3 in __pthread_kill_internal (signo=7, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007f48bc0d5838 in __GI_raise (sig=7) at ../sysdeps/posix/raise.c:26 #3 0x00007f48bcbfe384 in KCrash::defaultCrashHandler(int) () from /sysroot/usr/lib/libKF5Crash.so.5 #4 <signal handler called> #5 0x00007f48bb641884 in mdb_node_search (mc=mc@entry=0x7f48b5ecd380, key=key@entry=0x7f48b5ecd760, exactp=exactp@entry=0x7f48b5ecd37c) at mdb.c:5341 #6 0x00007f48bb64560f in mdb_cursor_set (mc=mc@entry=0x7f48b5ecd380, key=key@entry=0x7f48b5ecd760, data=data@entry=0x7f48b5ecd750, op=op@entry=MDB_SET, exactp=exactp@entry=0x7f48b5ecd37c) at mdb.c:6157 #7 0x00007f48bb645bcf in mdb_get (txn=<optimized out>, dbi=<optimized out>, key=0x7f48b5ecd760, data=0x7f48b5ecd750) at mdb.c:5812 #8 0x00007f48bcaf22fc in Baloo::DocumentTimeDB::get (this=<optimized out>, docId=<optimized out>) at /usr/src/debug/baloo-5.95.0/src/engine/documenttimedb.cpp:76 #9 0x00007f48bcb01aff in Baloo::Transaction::documentTimeInfo (this=<optimized out>, id=id@entry=72147491998400538) at /usr/src/debug/baloo-5.95.0/src/engine/transaction.cpp:133 #10 0x000056133285052c in Baloo::UnIndexedFileIterator::shouldIndex (filePath=..., this=0x7f48b5ecd8f0) at /usr/src/debug/baloo-5.95.0/src/file/unindexedfileiterator.cpp:83 #11 Baloo::UnIndexedFileIterator::next (this=<optimized out>) at /usr/src/debug/baloo-5.95.0/src/file/unindexedfileiterator.cpp:64 #12 Baloo::UnindexedFileIndexer::run (this=0x5613341a59a0) at /usr/src/debug/baloo-5.95.0/src/file/unindexedfileindexer.cpp:36 #13 0x00007f48bc6a9291 in QThreadPoolThread::run (this=0x5613345491e0) at thread/qthreadpool.cpp:100 #14 0x00007f48bc6a538a in QThreadPrivate::start (arg=0x5613345491e0) at thread/qthread_unix.cpp:331 #15 0x00007f48bc12354d in start_thread (arg=<optimized out>) at pthread_create.c:442 #16 0x00007f48bc1a8874 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100 ``` The causality of the bug is: - fd = mdb_fopen("/home/nyanpasu64/.local/share/baloo/index") - ...env->me_map = mmap(addr, env->me_mapsize, prot, mmap_flags, env->me_fd (=fd), 0); - After many successful mdb_node_search() calls comes a failed call. mdb_node_search() calls nkeys = NUMKEYS(mp), which expands to ((mp->mp_pb.pb.pb_lower - (PAGEHDRSZ-PAGEBASE)) >> 1). mp->mp_pb.pb.pb_lower is (uint16_t)0. It should not be 0 since it's subtracted from. PAGEHDRSZ and PAGEBASE are unsigned (uint32_t), so the result is computed as uint32_t (close to 2^32), then right-shifted by 1 (close to 2^31). This value is invalid and causes LMDB mdb_node_search() to crash (I haven't traced exactly how). - According to Pernosco, mp points within the above mmap() call. - https://stackoverflow.com/q/2089167 says "SIGBUS can happen in Linux for quite a few reasons other than memory alignment faults - for example, if you attempt to access an mmap region beyond the end of the mapped file." If Pernosco is correct, my guess is that this is a symptom of a corrupt Baloo index holding invalid data, and LMDB memory-maps it but doesn't properly check for corrupted data inside. And my assumption is that the various different Baloo crashes are caused by databases corrupted in different ways (but both Bernie Innocenti and my crash boil down to mdb_node_search() in the end), all with inadequate error checking. - The immediate workaround is to delete (or rename or trash) ~/.local/share/baloo/index. I don't know *how* the Baloo index got corrupted in the first place though. - Should LMDB perform more thorough error-checking in mdb_node_search() and possibly other functions, and return a "corrupted database" error rather than SIGBUS? -- You are receiving this mail because: You are watching all bug changes.