I have seen a program named pdftotex that can extract the text from .pdf
files, and that program can be used in a perl program for extracting the
text from more .pdf files.
Search with Google for it.
I have seen that it can extract the text even from some pdf files that have
a copy protection set,
Stephen York wrote:
> First off, realise that a pdf isn't just a marked up text document.
> It's a wrapper for images and text, movies and many other formats.
>
> If you have a text pdf, then the text is a postscript object
> catalogued somewhere within the pdf.
> I've never done this in perl, but
First off, realise that a pdf isn't just a marked up text document.
It's a wrapper for images and text, movies and many other formats.
If you have a text pdf, then the text is a postscript object catalogued
somewhere within the pdf.
I've never done this in perl, but there are many commercial uti