> On Jan. 2, 2017, 10:51 p.m., Albert Astals Cid wrote:
> > Without knowing anything about baloo this looks totally wrong
> > 
> >  QList<KFileMetaData::Extractor*> exList = 
> > m_extractorCollection.fetchExtractors(mimetype);
> >  
> > why would not you want to iterate over all the iterators that support a 
> > given mimetype?
> 
> Anthony Fieroni wrote:
>     It's a waste of time. Extractor should store file content in DB for fast 
> access when file content search is performed, so if more than one extractor 
> performs a file it will result in high cpu usage and huge transaction size in 
> DB, basically file content * num of extractors, at least we loose time and 
> disk size for nothing.

Do you have some numbers as a result of profiling? Have you checked that the 
existing extractors are in fact redundant? Is the order of their presence in 
the returned list of extractors deterministic and is the most specific one 
returned first?

One small example, there is a generic plantext extractor which returns a number 
of lines in any file with the ``text/*`` MIME type. Your patch changes that.


- Jan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/129703/#review101748
-----------------------------------------------------------


On Jan. 3, 2017, 11:43 a.m., Anthony Fieroni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/129703/
> -----------------------------------------------------------
> 
> (Updated Jan. 3, 2017, 11:43 a.m.)
> 
> 
> Review request for Baloo, Boudhayan Gupta, Pinak Ahuja, and Vishesh Handa.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> Processing large directories, +5000 files, can be CPU eater. Large file, 
> itself, can be another issue.
> 
> 
> Diffs
> -----
> 
>   src/file/extractor/app.cpp 97332469 
>   src/tools/balooctl/indexer.cpp 45e42c1c 
> 
> Diff: https://git.reviewboard.kde.org/r/129703/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Anthony Fieroni
> 
>

Reply via email to