https://bugs.kde.org/show_bug.cgi?id=402154
--- Comment #22 from Kai Krakow <k...@kaishome.de> --- (In reply to tagwerk19 from comment #20) > (In reply to Kai Krakow from comment #18) > > ... I suggest to > > read that entirely to understand the problem ... > I've done my best :-) Thank you for the info! > > In: > > https://bugs.kde.org/show_bug.cgi?id=404057#c35 > > You have the the idea of an "Index per Filesystem" but then the idea seems I didn't... I explained why that would not work. > to have been put to the side. You mention "storage path" as a problem? Would > the way "local wastebaskets" are managed on mounted filesystems be a model? > They have to deal with the same issues as you've listed. The problem is that you would have do deal with proper synchronization when multiple databases are used. That is not just "find a writeable storage location and register this location somewhere". Also, you would need to have all these different DBs opened at the same time, and LMDB is a memory mapped database with random access patterns. So you'd multiply the memory pressure with each location, and that will dominate the filesystem cache. > https://phabricator.kde.org/T9805 This mentions "store an identifier per tracked device, e.g the filesystem UUID" which is probably my idea. Instead of using dev_id directly, the database should have a lookup table where filesystem UUIDs are stored as a simple list. The index of this list can be used as the new dev_id for the other tables. > Has a mention of "... inside encrypted containers", see this also in Bug > 390830. Encrypted containers should never be indexed in a global database as that would leak information from the encrypted container. The easiest solution would be to just not index encrypted containers unless the database itself is stored in an encrypted container - but that's also just an bandaid. Maybe encrypted containers should not be stored at all. Putting LMDB on an encrypted containers may have very bad side-effects on the performance side. > As background thoughts... > > Things like "Tags:" folders in Dolphin and incremental searches > when you type into Krunner depend on baloosearch being lightning fast. Having multiple databases per filesystem can only make this slower by definition because you'd need to query multiple databases. From my personal experience with fulltext search engines (ElasticSearch) I can only tell you that querying indexes and recombining results properly is a huge pita, and it's going to slow things way down. So the multiple database idea is probably a dead end. > It would be a shame to lose the ability to search for phrases as in > baloosearch Hello_Penguin > as opposed to > baloosearch "Hello Penguin" > > I'm guessing BTRFS usage is going to grow. The point is: Neither Linux nor POSIX state anywhere that a dev_id from stat() is unique across reboots or remounts. This is even less true for inode numbers with some remote filesystems or non-inode filesystems (where inode numbers are virtual and may be allocated from some runtime state). Those are not stable ids. At least for native Linux-filesystems we can expect inode numbers to be stable as those are stored inside the FS itself (the dev_id isn't but UUID is). On a side-note: In this context it would make sense to provide baloo as a system-wide storage and query service shared by multiple users, with an indexer running per user (to index encrypted containers). It's the only way to support these ideas: - safe access to encrypted containers - the database can be isolated from being readable by users (prevents information leakage) - solves the problem of multiple users indexing the same data multiple times - has capabilities to properly read UUIDs from filesystems/subvolumes (some FS only allow this for root) - can guard/filter which results are returned to users (by respecting FS ACLs and permission bits) - shared index location (e.g. /usr/share/docs) would be indexed just once On the contra side: - needs some sort of synchronization between multiple indexers (should work around race conditions that multiple indexers do not read and index the same files twice), could be solved by running the indexer within the system-wide service, too, but access to encrypted containers needs to be evaluated -- You are receiving this mail because: You are watching all bug changes.