Here's a useful online tool that might help you see what's happening: http://www.sitetruth.com/experimental/viewer.html
We use this to help webmasters see what our web crawler is seeing. This reads a page, using Python and FancyURLOpener, with a USER-AGENT string of "SiteTruth.com site rating system." Then it parses the page with BeautifulSoup, removes all <SCRIPT>, <EMBED>, and <OBJECT> tags, makes all the links absolute, then writes the page back out in UTF-8 Unicode. The resulting cleaned-up page is displayed. If the page you're trying to read looks OK with our viewer, you should be able to read it from Python with no problems. John Nagle cjl wrote: > Hi. > > I am trying to screen scrape some stock data from yahoo, so I am > trying to use urllib2 to retrieve the html and beautiful soup for the > parsing. > > Maybe (most likely) I am doing something wrong, but when I use > urllib2.urlopen to fetch a page, and when I view 'page source' of the > exact same URL in firefox, I am seeing slight differences in the raw > html. > > Do I need to set a browser agent so yahoo thinks urllib2 is firefox? > Is yahoo detecting that urllib2 doesn't process javascript, and > passing different data? > > -cjl > -- http://mail.python.org/mailman/listinfo/python-list