I've struggled to determine if this is already a feature or in development or not (possibly because of overloading of the term "dictionary"), so I apologise in advance if the following brief is redundant:
Compressors like LZ4, zstd, and even gzip talk about "dictionary compression" meaning to pre-load the history window of the compressor and decompressor, before the file is processed, with pre-arranged patterns; so that back references can be made for text the first time it appears in the file, rather than having to build up that window from an empty set at the beginning of the file by encoding everything as literals. This can lead to an improvement in compression ratio. It's generally only useful for small files because in a larger file the back-reference widow is established early and remains full of reference material for the rest of the file; but this should also benefit block-based compression which faces a loss of history at every entry point. So that's what I'm talking about; and my question, simply, is is this is a feature (or a planned feature) of erofs? Something involving storing a set of uncompressed dictionary preload chunks within the filesystem which are then used as the starting dictionary when compressing and decompressing the small chunks of each file? In my imagination such a filesystem might provide a palette of uncompressed, and page-aligned, dictionaries and each file (or each cluster?) would give an index to the entry which it will use. Typically that choice might be implied by the file type, but sometimes files can have different dispositions as you seek through them, or a .txt file may contain English or Chinese or ASCII art, each demanding different dictionaries. Making the right choice is an external optimisation problem.