### table_data_extraction.py ###
# Usage: table[id][row][column]
# tables[0]       : 1st table
# tables[1][2]    : 3rd row of 2nd table
# tables[3][4][5] : cell content of 6th column of 5th row of 4th table
# len(table)      : quantity of tables
# len(table[6])   : quantity of rows of 7th table
# len(table[7][8]): quantity of columns of 9th row of 8th table

impor re
import urllib2

#to retrieve the contents of the page
page = urllib2.urlopen("http://example.com/page.html";).read().strip()

#to create the tables list
tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in 
re.findall('<TR>(.*?)</TR>',t,re.S)] for t in 
re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]


Pretty simple. Good luck!

----------------------------------------
> Date: Fri, 24 May 2013 10:32:26 -0700
> Subject: Total Beginner - Extracting Data from a Database Online (Screenshot)
> From: logan.c.gra...@gmail.com
> To: python-list@python.org
>
> Hey guys,
>
> I'm learning Python and I'm experimenting with different projects -- I like 
> learning by doing. I'm wondering if you can help me here:
>
> http://i.imgur.com/KgvSKWk.jpg
>
> What this is is a publicly-accessible webpage that's a simple database of 
> people who have used the website. Ideally what I'd like to end up with is an 
> excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email 
> tm.
>
> I'd like to use Python to do it -- crawl the page and extract the data in a 
> usable way.
>
> I'd love your input! I'm just a learner.
> --
> http://mail.python.org/mailman/listinfo/python-list                           
>           
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to