On Tue, 3 May 2016 01:56 pm, DFS wrote: > On 5/2/2016 11:27 PM, jf...@ms4.hinet.net wrote: >> DFS at 2016/5/3 9:12:24AM wrote: >>> try >>> >>> from urllib.request import urlretrieve >>> >>> http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3 >>> >>> >>> I'm running python 2.7.11 (32-bit) >> >> Alright, it works...someway. >> >> I try to get a zip file. It works, the file can be unzipped correctly. >> >>>>> from urllib.request import urlretrieve >>>>> urlretrieve("http://www.caprilion.com.tw/fed.zip", >>>>> "d:\\temp\\temp.zip") >> ('d:\\temp\\temp.zip', <http.client.HTTPMessage object at 0x03102C50>) >>>>> >> >> But when I try to get this forum page, it does get a html file but can't >> be viewed normally. >> >>>>> urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ >> bmR7A", "d:\\temp\\temp.html") >> ('d:\\temp\\temp.html', <http.client.HTTPMessage object at 0x03102A90>) >>>>> >> >> I suppose the html is a much complex situation where more processes need >> to be done before it can be opened by a web browser:-) > > > Who knows what Google has done... it won't open in Opera. The tab title > shows up, but after 20-30 seconds the screen just stays blank and the > cursor quits loading.
Dennis has given the answer to this, but since he has X-No-Archive=Yes, his useful and well-written answer will be lost forever. So I've taken the liberty of copying his answer here: Dennis Lee Bieber says: There's practically no HTML in that page -- just miles of Javascript. The one obvious item is: -=-=-=-=-=- <script type="text/javascript" language="javascript" src="/forum/C53652DA8B67255A46256B72F0D65A40.cache.js"> </script> -=-=-=-=-=- which is a RELATIVE path. If you copied the file to your machine and then load it in a browser, it will be looking for /forum/C53652DA8B67255A46256B72F0D65A40.cache.js to be on your machine in a subdirectory of where you saved the main file. You'd have to recreate most of the Google environment and fetch anything that was referenced through a relative path first, to get the content to display. Of course, you may find, for example, that the Javascript at some point is doing a database lookup -- and you'd maybe have to now duplicate the database... -- Steven -- https://mail.python.org/mailman/listinfo/python-list