> On Jan. 3, 2017, 12:51 a.m., Albert Astals Cid wrote: > > Without knowing anything about baloo this looks totally wrong > > > > QList<KFileMetaData::Extractor*> exList = > > m_extractorCollection.fetchExtractors(mimetype); > > > > why would not you want to iterate over all the iterators that support a > > given mimetype? > > Anthony Fieroni wrote: > It's a waste of time. Extractor should store file content in DB for fast > access when file content search is performed, so if more than one extractor > performs a file it will result in high cpu usage and huge transaction size in > DB, basically file content * num of extractors, at least we loose time and > disk size for nothing. > > Jan Kundrát wrote: > Do you have some numbers as a result of profiling? Have you checked that > the existing extractors are in fact redundant? Is the order of their presence > in the returned list of extractors deterministic and is the most specific one > returned first? > > One small example, there is a generic plantext extractor which returns a > number of lines in any file with the ``text/*`` MIME type. Your patch changes > that. > > Anthony Fieroni wrote: > 1. No > 2. Yes > 3. No > About me it's better to make some flag, or whatever, to indicate a parser > has done his work and we can safety stop iteration. > At least this patch *tries* to reduce CPU usage, it's not a *panacea* > > Stefan Brüns wrote: > 1. You claim it is useful to reduce CPU usage, but fail to provide any > data points. > 2. Please provide a list of redundant extractors > 3. An extractor knows if itself has extracted any data, it can not know > if a different extractor may find any data. Extractors may be orthogonal and > provide different data.
I claim that there's no redundant extractor, but depend on mimetype (if there's no known extractors) they can be more that one, where i see potential problem. I haven't any plans to test this feature, i expose my point of view. I try to correct and other side https://git.reviewboard.kde.org/r/129720/ - Anthony ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://git.reviewboard.kde.org/r/129703/#review101748 ----------------------------------------------------------- On Jan. 3, 2017, 1:43 p.m., Anthony Fieroni wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://git.reviewboard.kde.org/r/129703/ > ----------------------------------------------------------- > > (Updated Jan. 3, 2017, 1:43 p.m.) > > > Review request for Baloo, Boudhayan Gupta, Pinak Ahuja, and Vishesh Handa. > > > Repository: baloo > > > Description > ------- > > Processing large directories, +5000 files, can be CPU eater. Large file, > itself, can be another issue. > > > Diffs > ----- > > src/file/extractor/app.cpp 97332469 > src/tools/balooctl/indexer.cpp 45e42c1c > > Diff: https://git.reviewboard.kde.org/r/129703/diff/ > > > Testing > ------- > > > Thanks, > > Anthony Fieroni > >