[Tutor] PDF Scrapping

Python Beginner Tue, 24 Nov 2015 13:48:58 -0800

Hi,

I am looking for the best way to scrape the following PDF's:


(1) http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2015-gold.pdf
(table on page 1)

(2) http://minerals.usgs.gov/minerals/pubs/commodity/gold/myb1-2013-gold.pdf
(table 1)

I have done a lot of research and have read that pdftables 0.0.4 is an
excellent way to scrape tabular data from PDF'S (see
https://blog.scraperwiki.com/2013/07/pdftables-a-python-library-for-getting-tables-out-of-pdf-files/
).

I downloaded pdftables 0.0.4 (see https://pypi.python.org/pypi/pdftables).

I am new to Python and having trouble finding good documentation for how to
use this library.

Has anybody used pdftables before that could help me get started or point
me to the ideal library for scrapping the PDF links above? I have read that
different PDF libraries are used depending on the format of the PDF. What
library would be best for the PDF formats above? Knowing this will help me
get started, then I can write up some code and ask further questions if
needed.

Thanks in advance for your help!

~Chris
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] PDF Scrapping

Reply via email to