Re: [PHP-WIN] Parsing PDF files

Paul Meagher Fri, 31 Aug 2001 09:45:32 -0700
Thanks Alain,

I was a little worried that it might be in binary format.   I will probably
create a script to try to read a PDF file just to see what happens.

Regards,
Paul

> I think that you can extract pretty easily the header, like: Subject,
> Creator, Author etc... But extracting values in a table may not that be
so
> easy as the objects creation in the file are dependent on the file
history
> and in addition the pdf file may be in a binary form.
> Alain
>
> On Fri, Aug 31, 2001 at 12:16:38PM -0300, Paul Meagher wrote:
> > Wondering if anyone has tried to parse out a table of information from
a
> > PDF file?
> >
> > Is it a matter of opening the file, looping through its contents
> > line-by-line looking for tags that demarcate table cell boundaries and
> > extracting the relevant cell values?
> >
> > I figured if Google can index PDF content it must be possible to pull
the
> > content out something like one would an HTML file.
> >
> > Mostly wondering how much work might be involved and if there are any
> > tricks that I should be aware of before I begin...
> >
> > Regards,
> > Paul
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > PHP Windows Mailing List (http://www.php.net/)
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > To contact the list administrators, e-mail:
[EMAIL PROTECTED]
>
> --
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]
>
>


-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]
Re: [PHP-WIN] Parsing PDF files

Reply via email to