We did a pretty nice article on the PDF document format while I was at 
Web Techniques.  You might want to look at the article by Nassib Nassar:
http://www.webtechniques.com/archives/1998/10/nassar/



On Fri, 31 Aug 2001, Alain Samoun wrote:

> I think that you can extract pretty easily the header, like: Subject,
> Creator, Author etc... But extracting values in a table may not that be so
> easy as the objects creation in the file are dependent on the file history
> and in addition the pdf file may be in a binary form.
> Alain
> 
> On Fri, Aug 31, 2001 at 12:16:38PM -0300, Paul Meagher wrote:
> > Wondering if anyone has tried to parse out a table of information from a
> > PDF file?
> > 
> > Is it a matter of opening the file, looping through its contents
> > line-by-line looking for tags that demarcate table cell boundaries and
> > extracting the relevant cell values?
> > 
> > I figured if Google can index PDF content it must be possible to pull the
> > content out something like one would an HTML file.
> > 
> > Mostly wondering how much work might be involved and if there are any
> > tricks that I should be aware of before I begin...
> > 
> > Regards,
> > Paul
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > PHP Windows Mailing List (http://www.php.net/)
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > To contact the list administrators, e-mail: [EMAIL PROTECTED]
> 
> -- 
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]
> 



-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to