Paul McNett wrote: > Tempo wrote: > > Hello. I am getting an error and it has gotten me stuck. I think the > > best thing I can do is post my code and the error message and thank > > everybody in advanced for any help that you give this issue. Thank you. > > > > ############# > > Here's the code: > > ############# > > > > import urllib2 > > import re > > import xlrd > > from BeautifulSoup import BeautifulSoup > > > > book = xlrd.open_workbook("ige_virtualMoney.xls") > > sh = book.sheet_by_index(0) > > rx = 1 > > for rx in range(sh.nrows):
The above 2 lines should probably be: for rx.range(1, sh.nrows): otherwise the likelihood is that a column heading will be treated as data. Now read on ;-) > > u = sh.cell_value(rx, 0) > > page = urllib2.urlopen(u) > > soup = BeautifulSoup(page) > > p = soup.findAll('span', "sale") > > p = str(p) > > p2 = re.findall('\$\d+\.\d\d', p) > > for price in p2: > > print price > > > > ###################### > > Here are the error messages: > > ###################### > > > > Traceback (most recent call last): > > File "E:\Python24\scraper.py", line 16, in -toplevel- > > page = urllib2.urlopen(u) > > File "E:\Python24\lib\urllib2.py", line 130, in urlopen > > return _opener.open(url, data) > > File "E:\Python24\lib\urllib2.py", line 350, in open > > protocol = req.get_type() > > File "E:\Python24\lib\urllib2.py", line 233, in get_type > > raise ValueError, "unknown url type: %s" % self.__original > > ValueError: unknown url type: List > > You were expecting u to be a url string like "http://google.com", but it > looks like it is actually a list. I'm not familiar with package xlrd but > cell_value() must be returning a list and not a cell value. Presumably, > the list contains the cell value probably in element 0. Put in a print > statement before your call to urlopen() like: > > print u Sage advice. print repr(u) is in general even better advice. > > You'll likely discover your error. > Just for the record: 1. The xlrd package's Book.Sheet.cell_value() does *not* return lists. As its docs say, it returns scalars, of the following types: unicode, int, float, strg 2. The error is nothing to do with Python lists, it's all about malformed URLs. "unknown url type" means it's not one of http, ftp, file, data, gopher, ... |>>> x = urllib2.urlopen('List') Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\Python24\lib\urllib2.py", line 130, in urlopen return _opener.open(url, data) File "C:\Python24\lib\urllib2.py", line 350, in open protocol = req.get_type() File "C:\Python24\lib\urllib2.py", line 233, in get_type raise ValueError, "unknown url type: %s" % self.__original ValueError: unknown url type: List |>>> x = urllib2.urlopen('GOTCHA') Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\Python24\lib\urllib2.py", line 130, in urlopen return _opener.open(url, data) File "C:\Python24\lib\urllib2.py", line 350, in open protocol = req.get_type() File "C:\Python24\lib\urllib2.py", line 233, in get_type raise ValueError, "unknown url type: %s" % self.__original ValueError: unknown url type: GOTCHA |>>> HTH, John -- http://mail.python.org/mailman/listinfo/python-list