Re: Getting the text from a PDF file

Ion Pop Sun, 28 Feb 2010 09:53:34 -0800

From: "Dr.Ruud" <rvtol+use...@isolution.nl>

I would like to extract the whole text from a PDF document. Can yourecommend a perl module that can do this under Windows?
I searched on cpan.org and I found very many modules, I tested a few ofthem, but none of them was able to extract the text, which can be seenwell with Acrobat Reader, but they extracted only garbage, or nothing, orjust gave an error, or they were incompatible with Windows...
Your PDF could just contain a set of pictures, in stead of textual data.

--
Ruud



The PDF I tried contains textual data and tables.

I mean, I tried with CAM::PDF and I was able to get the text from 2 pdffiles, strange formatted of course and broken words, but I was able at leastto get some text, but from the third PDF file I was able to get only garbagewith the same program.

I have also tried with pdftotex.exe and it was able to get the text muchbetter, but I would prefer a perl-based solution.


IP


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Getting the text from a PDF file

Reply via email to