https://bugs.kde.org/show_bug.cgi?id=434926

nyanpasu64 <nyanpas...@tuta.io> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nyanpas...@tuta.io

--- Comment #9 from nyanpasu64 <nyanpas...@tuta.io> ---
I've been getting constant baloo crashes myself too, but within the last few
weeks it's started happening more often (every time I searched in the
application launcher or similar).

To debug, I ran baloo_file under rr, and traced the resulting crash using
Pernosco. (Sorry, I don't feel comfortable sharing the URL since the trace
contains filesystem paths.)

Oddly baloo_file's main thread spawns a worker thread and a child process
(which itself spawns a worker thread). Then the parent process's worker thread
crashes (taking the main thread with it), while the child process continues
running in the background like a daemon (not sure exactly what happens, it may
itself die at a later time?). I don't see any thread-unsafety related to this
crash.

The crash backtrace is:

```
(pernosco) bt 
#0  __pthread_kill_implementation (threadid=<optimized out>,
signo=signo@entry=7, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007f48bc1253d3 in __pthread_kill_internal (signo=7, threadid=<optimized
out>) at pthread_kill.c:78
#2  0x00007f48bc0d5838 in __GI_raise (sig=7) at ../sysdeps/posix/raise.c:26
#3  0x00007f48bcbfe384 in KCrash::defaultCrashHandler(int) () from
/sysroot/usr/lib/libKF5Crash.so.5
#4  <signal handler called>
#5  0x00007f48bb641884 in mdb_node_search (mc=mc@entry=0x7f48b5ecd380,
key=key@entry=0x7f48b5ecd760, exactp=exactp@entry=0x7f48b5ecd37c) at mdb.c:5341
#6  0x00007f48bb64560f in mdb_cursor_set (mc=mc@entry=0x7f48b5ecd380,
key=key@entry=0x7f48b5ecd760, data=data@entry=0x7f48b5ecd750,
op=op@entry=MDB_SET, exactp=exactp@entry=0x7f48b5ecd37c) at mdb.c:6157
#7  0x00007f48bb645bcf in mdb_get (txn=<optimized out>, dbi=<optimized out>,
key=0x7f48b5ecd760, data=0x7f48b5ecd750) at mdb.c:5812
#8  0x00007f48bcaf22fc in Baloo::DocumentTimeDB::get (this=<optimized out>,
docId=<optimized out>) at
/usr/src/debug/baloo-5.95.0/src/engine/documenttimedb.cpp:76
#9  0x00007f48bcb01aff in Baloo::Transaction::documentTimeInfo (this=<optimized
out>, id=id@entry=72147491998400538) at
/usr/src/debug/baloo-5.95.0/src/engine/transaction.cpp:133
#10 0x000056133285052c in Baloo::UnIndexedFileIterator::shouldIndex
(filePath=..., this=0x7f48b5ecd8f0) at
/usr/src/debug/baloo-5.95.0/src/file/unindexedfileiterator.cpp:83
#11 Baloo::UnIndexedFileIterator::next (this=<optimized out>) at
/usr/src/debug/baloo-5.95.0/src/file/unindexedfileiterator.cpp:64
#12 Baloo::UnindexedFileIndexer::run (this=0x5613341a59a0) at
/usr/src/debug/baloo-5.95.0/src/file/unindexedfileindexer.cpp:36
#13 0x00007f48bc6a9291 in QThreadPoolThread::run (this=0x5613345491e0) at
thread/qthreadpool.cpp:100
#14 0x00007f48bc6a538a in QThreadPrivate::start (arg=0x5613345491e0) at
thread/qthread_unix.cpp:331
#15 0x00007f48bc12354d in start_thread (arg=<optimized out>) at
pthread_create.c:442
#16 0x00007f48bc1a8874 in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:100
```

The causality of the bug is:

- fd = mdb_fopen("/home/nyanpasu64/.local/share/baloo/index")
- ...env->me_map = mmap(addr, env->me_mapsize, prot, mmap_flags, env->me_fd
(=fd), 0);
- After many successful mdb_node_search() calls comes a failed call.
mdb_node_search() calls nkeys = NUMKEYS(mp), which expands to
((mp->mp_pb.pb.pb_lower - (PAGEHDRSZ-PAGEBASE)) >> 1). mp->mp_pb.pb.pb_lower is
(uint16_t)0. It should not be 0 since it's subtracted from. PAGEHDRSZ and
PAGEBASE are unsigned (uint32_t), so the result is computed as uint32_t (close
to 2^32), then right-shifted by 1 (close to 2^31). This value is invalid and
causes LMDB mdb_node_search() to crash (I haven't traced exactly how).
    - According to Pernosco, mp points within the above mmap() call.
    - https://stackoverflow.com/q/2089167 says "SIGBUS can happen in Linux for
quite a few reasons other than memory alignment faults - for example, if you
attempt to access an mmap region beyond the end of the mapped file."

If Pernosco is correct, my guess is that this is a symptom of a corrupt Baloo
index holding invalid data, and LMDB memory-maps it but doesn't properly check
for corrupted data inside. And my assumption is that the various different
Baloo crashes are caused by databases corrupted in different ways (but both
Bernie Innocenti and my crash boil down to mdb_node_search() in the end), all
with inadequate error checking.

- The immediate workaround is to delete (or rename or trash)
~/.local/share/baloo/index. I don't know *how* the Baloo index got corrupted in
the first place though.
- Should LMDB perform more thorough error-checking in mdb_node_search() and
possibly other functions, and return a "corrupted database" error rather than
SIGBUS?

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to