XPDF support for filtering PDFs for text extraction/search.
-----------------------------------------------------------

                 Key: DS-183
                 URL: http://jira.dspace.org/jira/browse/DS-183
             Project: DSpace 1.x
          Issue Type: Improvement
          Components: DSpace API
    Affects Versions: 1.5.1, 1.5.2
         Environment: Unix and Linux
            Reporter: Mark Diggory


See original description here...

https://sourceforge.net/tracker/?func=detail&aid=2745393&group_id=19984&atid=319984

Here are a pair of mediafilters to process PDF files with the
XPDF suite (see http://www.foolabs.com/xpdf/ ) replacing the
one based on PDFBox. They invoke an external command, which
must be configured. It has been tested on Unix and the concept
ought to work on Windows (and certainly on MacOS X).

XPDF2Text is a replacement for the existing PDF media filter, it
creates extracted text using the pdftotext program. I've observed it
is about 3 times as fast, and much more reliable, than PDFBox.

XPDF2Thumbnail creates a thumbnail image for the first page of
the PDF. This is especially effective for 3D PDF renderings of
engineering models, but works fine for any document.

See the instructions in xpdf-filters.html to install it.
The thumbnail filter needs an additional image library, but
the text extractor doesn't need anything else.

This code has been tested with DSpace 1.5.1

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to