"dongdong" <[EMAIL PROTECTED]> wrote: > >using web browser can get page's content formally, but when use >urllib2.open("http://tech.163.com/2004w11/12732/2004w11_1100059465339.html").read() > >the result is > ><html><head><META HTTP-EQUIV=REFRESH >CONTENT="0;URL=http://tech.163.com/04/1110/12/14QUR2BR0009159H.html"> ><META http-equiv="Pragma" >content="no-cache"></HEAD><body>?y?ú'ò?aò3??...</body></html> > >,I think the reson is the no-cache, are there person would help me?
No, that's not the reason. The reason is that this includes a redirect. As an HTML consumer, you are supposed to parse that content and notice the <meta http-equiv> tag, which says "here is something that should have been one of the HTTP headers". In this case, it wants you to act as though you saw: Refresh: 0;URL=http://tech.163.com/04/1110/12/14QUR2BR0009159H.html Pragma: no-cache In this case, the "Refresh" header means that you are supposed to go fetch the contents of that new page immediately. Try using urllib2.open on THAT address, and you should get your content. This is one way to handle a web site reorganization and still allow older URLs to work. -- - Tim Roberts, [EMAIL PROTECTED] Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list