From: "Dr.Ruud" <rvtol+use...@isolution.nl>
I would like to extract the whole text from a PDF document. Can you
recommend a perl module that can do this under Windows?
I searched on cpan.org and I found very many modules, I tested a few of
them, but none of them was able to extract the text, which can be seen
well with Acrobat Reader, but they extracted only garbage, or nothing, or
just gave an error, or they were incompatible with Windows...
Your PDF could just contain a set of pictures, in stead of textual data.
--
Ruud
The PDF I tried contains textual data and tables.
I mean, I tried with CAM::PDF and I was able to get the text from 2 pdf
files, strange formatted of course and broken words, but I was able at least
to get some text, but from the third PDF file I was able to get only garbage
with the same program.
I have also tried with pdftotex.exe and it was able to get the text much
better, but I would prefer a perl-based solution.
IP
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/