Re: Review Request 129703: [baloo_file_extractor] Limit CPU usage

Anthony Fieroni Tue, 03 Jan 2017 07:12:25 -0800


> On Jan. 3, 2017, 12:51 a.m., Albert Astals Cid wrote:
> > Without knowing anything about baloo this looks totally wrong
> > 
> >  QList<KFileMetaData::Extractor*> exList = 
> > m_extractorCollection.fetchExtractors(mimetype);
> >  
> > why would not you want to iterate over all the iterators that support a 
> > given mimetype?
> 
> Anthony Fieroni wrote:
>     It's a waste of time. Extractor should store file content in DB for fast 
> access when file content search is performed, so if more than one extractor 
> performs a file it will result in high cpu usage and huge transaction size in 
> DB, basically file content * num of extractors, at least we loose time and 
> disk size for nothing.
> 
> Jan Kundrát wrote:
>     Do you have some numbers as a result of profiling? Have you checked that 
> the existing extractors are in fact redundant? Is the order of their presence 
> in the returned list of extractors deterministic and is the most specific one 
> returned first?
>     
>     One small example, there is a generic plantext extractor which returns a 
> number of lines in any file with the ``text/*`` MIME type. Your patch changes 
> that.
> 
> Anthony Fieroni wrote:
>     1. No
>     2. Yes
>     3. No
>     About me it's better to make some flag, or whatever, to indicate a parser 
> has done his work and we can safety stop iteration.
>     At least this patch *tries* to reduce CPU usage, it's not a *panacea*
> 
> Stefan Brüns wrote:
>     1. You claim it is useful to reduce CPU usage, but fail to provide any 
> data points.
>     2. Please provide a list of redundant extractors
>     3. An extractor knows if itself has extracted any data, it can not know 
> if a different extractor may find any data. Extractors may be orthogonal and 
> provide different data.


I claim that there's no redundant extractor, but depend on mimetype (if there's 
no known extractors) they can be more that one, where i see potential problem. 
I haven't any plans to test this feature, i expose my point of view. I try to 
correct and other side https://git.reviewboard.kde.org/r/129720/


- Anthony


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/129703/#review101748
-----------------------------------------------------------


On Jan. 3, 2017, 1:43 p.m., Anthony Fieroni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/129703/
> -----------------------------------------------------------
> 
> (Updated Jan. 3, 2017, 1:43 p.m.)
> 
> 
> Review request for Baloo, Boudhayan Gupta, Pinak Ahuja, and Vishesh Handa.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> Processing large directories, +5000 files, can be CPU eater. Large file, 
> itself, can be another issue.
> 
> 
> Diffs
> -----
> 
>   src/file/extractor/app.cpp 97332469 
>   src/tools/balooctl/indexer.cpp 45e42c1c 
> 
> Diff: https://git.reviewboard.kde.org/r/129703/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Anthony Fieroni
> 
>

Re: Review Request 129703: [baloo_file_extractor] Limit CPU usage

Reply via email to