Re: Urllib vs. FireFox

Stefan Behnel Fri, 24 Oct 2008 12:05:49 -0700

Gilles Ganault wrote:
> After scratching my head as to why I failed finding data from a web
> using the "re" module, I discovered that a web page as downloaded by
> urllib doesn't match what is displayed when viewing the source page in
> FireFox.
> 
> For instance, when searching Amazon for "Wargames":
> 
> URLLIB:
> <a
> href="http://www.amazon.fr/Wargames-Matthew-Broderick/dp/B00004RJ7H";><span
> class="srTitle">Wargames</span></a>
>   
>    ~ Matthew Broderick, Dabney Coleman, John Wood,  et Ally Sheedy
> <span class="bindingBlock">(<span class="binding">Cassette
> vidéo</span> - 2000)</span></td></tr>
> 
> FIREFOX:
>  <div class="productTitle"><a
> href="http://www.amazon.fr/Wargames-Matthew-Broderick/dp/B00004RJ7H/ref=sr_1_1?ie=UTF8&s=dvd&qid=1224872998&sr=8-1";>
> Wargames</a> <span class="binding"> ~ Matthew Broderick, Dabney
> Coleman, John Wood,  et Ally Sheedy</span><span class="binding">
> (<span class="format">Cassette vidéo</span> - 2000)</span></div>
> 
> Why do they differ?


The browser sends a different client identifier than urllib, and the server
sends back different page content depending on what client is asking.

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: Urllib vs. FireFox

Reply via email to