Re: Review Request 114632: Improve pdf title extraction

Luis Silva Mon, 06 Jan 2014 09:48:44 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/114632/
-----------------------------------------------------------


(Updated Jan. 6, 2014, 5:47 p.m.)


Review request for Baloo and Vishesh Handa.


Repository: kfilemetadata


Description
-------

A good portion of scientific papers in my collection had a doi or an index 
number in the title. These are in general short string chains, shorter than the 
real title.
I improve extraction of titles from pdf's by setting a minimum size below which 
parsing of the first page is forced.
The cut-off size is arbitrarily set to 25 characters (three "big words").


Diffs (updated)
-----

  src/extractors/popplerextractor.cpp b056581f51d10b632799586eed3cc15ac539fe80 

Diff: https://git.reviewboard.kde.org/r/114632/diff/


Testing
-------

This improved the title extraction on my pdf collection of scientific papers by 
quite a lot.


Thanks,

Luis Silva

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

Re: Review Request 114632: Improve pdf title extraction

Reply via email to