Hi, I am looking for the best way to scrape the following PDF's:
(1) http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2015-gold.pdf (table on page 1) (2) http://minerals.usgs.gov/minerals/pubs/commodity/gold/myb1-2013-gold.pdf (table 1) I have done a lot of research and have read that pdftables 0.0.4 is an excellent way to scrape tabular data from PDF'S (see https://blog.scraperwiki.com/2013/07/pdftables-a-python-library-for-getting-tables-out-of-pdf-files/ ). I downloaded pdftables 0.0.4 (see https://pypi.python.org/pypi/pdftables). I am new to Python and having trouble finding good documentation for how to use this library. Has anybody used pdftables before that could help me get started or point me to the ideal library for scrapping the PDF links above? I have read that different PDF libraries are used depending on the format of the PDF. What library would be best for the PDF formats above? Knowing this will help me get started, then I can write up some code and ask further questions if needed. Thanks in advance for your help! ~Chris _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor