### table_data_extraction.py ### # Usage: table[id][row][column] # tables[0] : 1st table # tables[1][2] : 3rd row of 2nd table # tables[3][4][5] : cell content of 6th column of 5th row of 4th table # len(table) : quantity of tables # len(table[6]) : quantity of rows of 7th table # len(table[7][8]): quantity of columns of 9th row of 8th table
impor re import urllib2 #to retrieve the contents of the page page = urllib2.urlopen("http://example.com/page.html").read().strip() #to create the tables list tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)] Pretty simple. Good luck! ---------------------------------------- > Date: Fri, 24 May 2013 10:32:26 -0700 > Subject: Total Beginner - Extracting Data from a Database Online (Screenshot) > From: logan.c.gra...@gmail.com > To: python-list@python.org > > Hey guys, > > I'm learning Python and I'm experimenting with different projects -- I like > learning by doing. I'm wondering if you can help me here: > > http://i.imgur.com/KgvSKWk.jpg > > What this is is a publicly-accessible webpage that's a simple database of > people who have used the website. Ideally what I'd like to end up with is an > excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email > tm. > > I'd like to use Python to do it -- crawl the page and extract the data in a > usable way. > > I'd love your input! I'm just a learner. > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list