On Tue, 13 Oct 2009 21:09:32 +0100, Dave Korn <dave.korn.cyg...@googlemail.com> wrote:
>[ We're offtopic here since it's not a cygwin-specific issue anymore, so I've >set a follow-up to the cygwin-talk list in case you have further questions or >replies. ] > >hongyi.zhao wrote: >> On Tuesday, October 13, 2009 at 13:44, dave.korn.cygwin wrote: >>> Hongyi Zhao wrote: > > >> I want to use wget to grab the following web page: >> >> http://www.cybersyndrome.net/pla5.html > > Then, you can tell wget to use your local privoxy as an http proxy, which is >exactly how your browser relates to it. > > export http_proxy=localhost:8118 > wget http://www.cybersyndrome.net/pla5.html > >should do the trick, but check the wget manual page about proxy support for >full details. (I'm assuming here you're running the usual kind of Tor setup >with a supporting co-installation of Privoxy.) > >> OTOH, I've also learned that curl support socks4/5 proxy, and I use >> the following command under my cygwin console: >> >> curl --socks5 127.0.0.1:9050 http://www.cybersyndrome.net/pla5.html >> >> But I meet the following error: >> >> ----------------------------- >> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> >> <HTML><HEAD> >> <TITLE>302 Found</TITLE> >> </HEAD><BODY> >> <H1>Found</H1> >> The document has moved <A >> HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40 >> 4.html">here</A>.<P> >> </BODY></HTML> >> ----------------------------- > > That's interesting. A real 302 redirect would have an actual 302 status >code and a Location header, not just be a 200 returning an html document with >the words "302 Found" and a URL in it. > >> Nevertheless, I can use firefox with Tor enabled to access this >> webpage. >> >> What's the reason > > It's something the server is doing deliberately, perhaps a malfunctioning or >misguided anti-bot feature of some sort, based on the request headers sent by >the user's agent. > >> and how can I grab this webpage just by a >> command-line downloading tool? > > Well, you can use wget! Or you can tell your curl to pretend it is wget! > >> $ curl 'http://www.cybersyndrome.net/pla5.html' >> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> >> <HTML><HEAD> >> <TITLE>302 Found</TITLE> >> </HEAD><BODY> >> <H1>Found</H1> >> The document has moved <A >> HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40 >> 4.html">here</A>.<P> >> </BODY></HTML> > >> $ wget 'http://www.cybersyndrome.net/pla5.html' >> --2009-10-13 21:00:36-- http://www.cybersyndrome.net/pla5.html >> Resolving www.cybersyndrome.net... 210.153.118.69 >> Connecting to www.cybersyndrome.net|210.153.118.69|:80... connected. >> HTTP request sent, awaiting response... 200 OK >> Length: unspecified [text/html] >> Saving to: `pla5.html' >> >> [ <=> ] 18,151 3.11K/s in 5.7s >> >> 2009-10-13 21:00:42 (3.11 KB/s) - `pla5.html' saved [18151] > >> $ curl 'http://www.cybersyndrome.net/pla5.html' -A 'User-Agent: Wget/1.11.4' >> <html> >> <head> >> <meta http-equiv="content-type" content="text/html; charset=Shift_JIS"> >> <meta name="robots" content="noarchive"> >> <meta name="description" content="???p??\??OV??Proxy?????????J???A?????B"> >> <title>CyberSyndrome : Proxy List / Anonymous</title> >> <style type="text/css"> > [ ... snip ... ] > > cheers, > DaveK > Good, thanks a lot, I've got it. -- .: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple