We did a pretty nice article on the PDF document format while I was at
Web Techniques. You might want to look at the article by Nassib Nassar:
http://www.webtechniques.com/archives/1998/10/nassar/
On Fri, 31 Aug 2001, Alain Samoun wrote:
> I think that you can extract pretty easily the header, like: Subject,
> Creator, Author etc... But extracting values in a table may not that be so
> easy as the objects creation in the file are dependent on the file history
> and in addition the pdf file may be in a binary form.
> Alain
>
> On Fri, Aug 31, 2001 at 12:16:38PM -0300, Paul Meagher wrote:
> > Wondering if anyone has tried to parse out a table of information from a
> > PDF file?
> >
> > Is it a matter of opening the file, looping through its contents
> > line-by-line looking for tags that demarcate table cell boundaries and
> > extracting the relevant cell values?
> >
> > I figured if Google can index PDF content it must be possible to pull the
> > content out something like one would an HTML file.
> >
> > Mostly wondering how much work might be involved and if there are any
> > tricks that I should be aware of before I begin...
> >
> > Regards,
> > Paul
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > PHP Windows Mailing List (http://www.php.net/)
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > To contact the list administrators, e-mail: [EMAIL PROTECTED]
>
> --
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]
>
--
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]