poboiko created this revision. poboiko added reviewers: Baloo, bruns, ngraham. Herald added projects: Frameworks, Baloo. poboiko requested review of this revision.
REVISION SUMMARY First of all, not all plain text-based mimetypes starts with `text/`: i.e. `application/sql` for SQL dumps (already handled in FileExcludeFilters), or `application/postscript` for PS images. There are most likely to be more. Alternative solution would be using `QMimeType::inherits` instead. Secondly, not all extractors are bad with large files: for example, if it is a PS image, then PostScriptDSExtractor still might extract useful information. Issues are mostly caused by PlainTextExtractor, which generates just too much terms. This patch aims at tackling both issues: it just skips PlaintextExtractor for large files, utilizing extractor metadata introduced in D19109: [Extractor] Add metadata to extractors <https://phabricator.kde.org/D19109>. TEST PLAN 1. Create large `.txt` file (>10Mb) 2. `baloo_file_extractor` still skips it. REPOSITORY R293 Baloo BRANCH improve-large-text-files (branched from master) REVISION DETAIL https://phabricator.kde.org/D23787 AFFECTED FILES src/file/extractor/app.cpp To: poboiko, #baloo, bruns, ngraham Cc: kde-frameworks-devel, #baloo, lots0logs, LeGast00n, fbampaloukas, GB_2, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams