[Tutor] PDF Scrapping

2015-11-24 Thread Python Beginner
Hi,

I am looking for the best way to scrape the following PDF's:

(1) http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2015-gold.pdf
(table on page 1)

(2) http://minerals.usgs.gov/minerals/pubs/commodity/gold/myb1-2013-gold.pdf
(table 1)

I have done a lot of research and have read that pdftables 0.0.4 is an
excellent way to scrape tabular data from PDF'S (see
https://blog.scraperwiki.com/2013/07/pdftables-a-python-library-for-getting-tables-out-of-pdf-files/
).

I downloaded pdftables 0.0.4 (see https://pypi.python.org/pypi/pdftables).

I am new to Python and having trouble finding good documentation for how to
use this library.

Has anybody used pdftables before that could help me get started or point
me to the ideal library for scrapping the PDF links above? I have read that
different PDF libraries are used depending on the format of the PDF. What
library would be best for the PDF formats above? Knowing this will help me
get started, then I can write up some code and ask further questions if
needed.

Thanks in advance for your help!

~Chris
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] PDF Scrapping

2015-11-25 Thread Python Beginner
Oh, I forgot to mention that I am using Python 3.4. Thanks again for your
help pointing me in the right direction.

~Chris

On Tue, Nov 24, 2015 at 1:36 PM, Python Beginner <
pythonbeginner...@gmail.com> wrote:

> Hi,
>
> I am looking for the best way to scrape the following PDF's:
>
> (1)
> http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2015-gold.pdf
> (table on page 1)
>
> (2)
> http://minerals.usgs.gov/minerals/pubs/commodity/gold/myb1-2013-gold.pdf
> (table 1)
>
> I have done a lot of research and have read that pdftables 0.0.4 is an
> excellent way to scrape tabular data from PDF'S (see
> https://blog.scraperwiki.com/2013/07/pdftables-a-python-library-for-getting-tables-out-of-pdf-files/
> ).
>
> I downloaded pdftables 0.0.4 (see https://pypi.python.org/pypi/pdftables).
>
> I am new to Python and having trouble finding good documentation for how
> to use this library.
>
> Has anybody used pdftables before that could help me get started or point
> me to the ideal library for scrapping the PDF links above? I have read that
> different PDF libraries are used depending on the format of the PDF. What
> library would be best for the PDF formats above? Knowing this will help me
> get started, then I can write up some code and ask further questions if
> needed.
>
> Thanks in advance for your help!
>
> ~Chris
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor