From: "Dr.Ruud" <rvtol+use...@isolution.nl>

I would like to extract the whole text from a PDF document. Can you recommend a perl module that can do this under Windows?

I searched on cpan.org and I found very many modules, I tested a few of them, but none of them was able to extract the text, which can be seen well with Acrobat Reader, but they extracted only garbage, or nothing, or just gave an error, or they were incompatible with Windows...

Your PDF could just contain a set of pictures, in stead of textual data.

--
Ruud


The PDF I tried contains textual data and tables.

I mean, I tried with CAM::PDF and I was able to get the text from 2 pdf files, strange formatted of course and broken words, but I was able at least to get some text, but from the third PDF file I was able to get only garbage with the same program.

I have also tried with pdftotex.exe and it was able to get the text much better, but I would prefer a perl-based solution.

IP


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to