Gilles Ganault wrote: > After scratching my head as to why I failed finding data from a web > using the "re" module, I discovered that a web page as downloaded by > urllib doesn't match what is displayed when viewing the source page in > FireFox. > > For instance, when searching Amazon for "Wargames": > > URLLIB: > <a > href="http://www.amazon.fr/Wargames-Matthew-Broderick/dp/B00004RJ7H"><span > class="srTitle">Wargames</span></a> > > ~ Matthew Broderick, Dabney Coleman, John Wood, et Ally Sheedy > <span class="bindingBlock">(<span class="binding">Cassette > vidéo</span> - 2000)</span></td></tr> > > FIREFOX: > <div class="productTitle"><a > href="http://www.amazon.fr/Wargames-Matthew-Broderick/dp/B00004RJ7H/ref=sr_1_1?ie=UTF8&s=dvd&qid=1224872998&sr=8-1"> > Wargames</a> <span class="binding"> ~ Matthew Broderick, Dabney > Coleman, John Wood, et Ally Sheedy</span><span class="binding"> > (<span class="format">Cassette vidéo</span> - 2000)</span></div> > > Why do they differ?
The browser sends a different client identifier than urllib, and the server sends back different page content depending on what client is asking. Stefan -- http://mail.python.org/mailman/listinfo/python-list