https://bugs.kde.org/show_bug.cgi?id=373021
tagwer...@innerjoin.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tagwer...@innerjoin.org --- Comment #5 from tagwer...@innerjoin.org --- (In reply to marvin24 from comment #0) ... Some time has passed > I have a rather large data partition to be indexed by baloo (~100k pdf > files). I noticed, that the index speed goes down the larger the db becomes. > At the same time, IO goes up. To me, that doesn't seem so surprising... I also notice that if "reindexing" files, if files have been edited or just "touched", the indexing speed drops. > I recreated the db in a single partition and started indexing. About half of > these files are index now (db size is 3.5 GB). The disk stats says 195916328 > sectors written, which is about 1 TB (and yes, this is an old 128 MB ssd!). > ... > sorry, that was 100 GB (not 1 TB). Still too much for my taste. If you are indexing content then there'll be an update for each word and a small change to a "page" means that the whole page is written back to the database. That's going to add up... > It looks like, everytime baloo updates the db, the whole thing is written > again - making it very slow... and dangerous regarding the wear leveling of > flash disks. There was a change (2019/09) to avoid syncing every write to the database https://bugs.kde.org/show_bug.cgi?id=404057#c12 So maybe there has been an improvement. I think there is also awareness of the problem. Baloo batches up its content indexing; it reads and indexes 40 files in one go. This is done as one transaction, so the changes for the 40 files are sorted in memory and then committed. Pages for "common terms" will be repeatedly updated/rewritten and I think you might easily expect more to be written to the disc than the static size of the database. However, watching the writes with iotop (which can show accumulated writes for a process), there can be a frightening amount written. I'm guessing increasing the batch size would help; using more RAM and reducing the number of commits. It seems to be something of a balance. -- You are receiving this mail because: You are watching all bug changes.