Re: [PHP-WIN] Parsing PDF files

Alain Samoun Fri, 31 Aug 2001 09:31:28 -0700
I think that you can extract pretty easily the header, like: Subject,
Creator, Author etc... But extracting values in a table may not that be so
easy as the objects creation in the file are dependent on the file history
and in addition the pdf file may be in a binary form.
Alain

On Fri, Aug 31, 2001 at 12:16:38PM -0300, Paul Meagher wrote:
> Wondering if anyone has tried to parse out a table of information from a
> PDF file?
> 
> Is it a matter of opening the file, looping through its contents
> line-by-line looking for tags that demarcate table cell boundaries and
> extracting the relevant cell values?
> 
> I figured if Google can index PDF content it must be possible to pull the
> content out something like one would an HTML file.
> 
> Mostly wondering how much work might be involved and if there are any
> tricks that I should be aware of before I begin...
> 
> Regards,
> Paul
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]

-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]
Re: [PHP-WIN] Parsing PDF files

Reply via email to