On Mar 20, 1:56 am, Tina I <[EMAIL PROTECTED]> wrote: > cjl wrote: > > Hi. > > > I am trying to screen scrape some stock data from yahoo, so I am > > trying to use urllib2 to retrieve the html and beautiful soup for the > > parsing. > > > Maybe (most likely) I am doing something wrong, but when I use > > urllib2.urlopen to fetch a page, and when I view 'page source' of the > > exact same URL in firefox, I am seeing slight differences in the raw > > html. > > > Do I need to set a browser agent so yahoo thinks urllib2 is firefox? > > Is yahoo detecting that urllib2 doesn't process javascript, and > > passing different data? > > > -cjl > > Unless the data you you need depends on the site detecting a specific > browser you will probably receive a 'cleaner' code that's more easily > parsed if you don't set a user agent. Usually the browser optimization > they do is just eye candy, bells and whistles anyway in order to give > you a more 'pleasing experience'. I doubt that your program will care > about that ;) > > Tina
You can do this fairly easily. I found a similar program in the book Core Python Programming. It actually sticks the stocks into an Excel spreadsheet. The code is below. You can easily modify it to send the output elsewhere. # Core Python Chp 23, pg 994 # estock.pyw from Tkinter import Tk from time import sleep, ctime from tkMessageBox import showwarning from urllib import urlopen import win32com.client as win32 warn = lambda app: showwarning(app, 'Exit?') RANGE = range(3, 8) TICKS = ('AMZN', 'AMD', 'EBAY', 'GOOG', 'MSFT', 'YHOO') COLS = ('TICKER', 'PRICE', 'CHG', '%AGE') URL = 'http://quote.yahoo.com/d/quotes.csv?s=%s&f=sl1c1p2' def excel(): app = 'Excel' xl = win32.gencache.EnsureDispatch('%s.Application' % app) ss = xl.Workbooks.Add() sh = ss.ActiveSheet xl.Visible = True sleep(1) sh.Cells(1, 1).Value = 'Python-to-%s Stock Quote Demo' % app sleep(1) sh.Cells(3, 1).Value = 'Prices quoted as of: %s' % ctime() sleep(1) for i in range(4): sh.Cells(5, i+1).Value = COLS[i] sleep(1) sh.Range(sh.Cells(5, 1), sh.Cells(5, 4)).Font.Bold = True sleep(1) row = 6 u = urlopen(URL % ','.join(TICKS)) for data in u: tick, price, chg, per = data.split(',') sh.Cells(row, 1).Value = eval(tick) sh.Cells(row, 2).Value = ('%.2f' % round(float(price), 2)) sh.Cells(row, 3).Value = chg sh.Cells(row, 4).Value = eval(per.rstrip()) row += 1 sleep(1) u.close() warn(app) ss.Close(False) xl.Application.Quit() if __name__ == '__main__': Tk().withdraw() excel() # Have fun - Mike -- http://mail.python.org/mailman/listinfo/python-list