[ We're offtopic here since it's not a cygwin-specific issue anymore, so I've set a follow-up to the cygwin-talk list in case you have further questions or replies. ]
hongyi.zhao wrote: > On Tuesday, October 13, 2009 at 13:44, dave.korn.cygwin wrote: >> Hongyi Zhao wrote: > I want to use wget to grab the following web page: > > http://www.cybersyndrome.net/pla5.html Then, you can tell wget to use your local privoxy as an http proxy, which is exactly how your browser relates to it. export http_proxy=localhost:8118 wget http://www.cybersyndrome.net/pla5.html should do the trick, but check the wget manual page about proxy support for full details. (I'm assuming here you're running the usual kind of Tor setup with a supporting co-installation of Privoxy.) > OTOH, I've also learned that curl support socks4/5 proxy, and I use > the following command under my cygwin console: > > curl --socks5 127.0.0.1:9050 http://www.cybersyndrome.net/pla5.html > > But I meet the following error: > > ----------------------------- > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > <HTML><HEAD> > <TITLE>302 Found</TITLE> > </HEAD><BODY> > <H1>Found</H1> > The document has moved <A > HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40 > 4.html">here</A>.<P> > </BODY></HTML> > ----------------------------- That's interesting. A real 302 redirect would have an actual 302 status code and a Location header, not just be a 200 returning an html document with the words "302 Found" and a URL in it. > Nevertheless, I can use firefox with Tor enabled to access this > webpage. > > What's the reason It's something the server is doing deliberately, perhaps a malfunctioning or misguided anti-bot feature of some sort, based on the request headers sent by the user's agent. > and how can I grab this webpage just by a > command-line downloading tool? Well, you can use wget! Or you can tell your curl to pretend it is wget! > $ curl 'http://www.cybersyndrome.net/pla5.html' > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > <HTML><HEAD> > <TITLE>302 Found</TITLE> > </HEAD><BODY> > <H1>Found</H1> > The document has moved <A > HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40 > 4.html">here</A>.<P> > </BODY></HTML> > $ wget 'http://www.cybersyndrome.net/pla5.html' > --2009-10-13 21:00:36-- http://www.cybersyndrome.net/pla5.html > Resolving www.cybersyndrome.net... 210.153.118.69 > Connecting to www.cybersyndrome.net|210.153.118.69|:80... connected. > HTTP request sent, awaiting response... 200 OK > Length: unspecified [text/html] > Saving to: `pla5.html' > > [ <=> ] 18,151 3.11K/s in 5.7s > > 2009-10-13 21:00:42 (3.11 KB/s) - `pla5.html' saved [18151] > $ curl 'http://www.cybersyndrome.net/pla5.html' -A 'User-Agent: Wget/1.11.4' > <html> > <head> > <meta http-equiv="content-type" content="text/html; charset=Shift_JIS"> > <meta name="robots" content="noarchive"> > <meta name="description" content="▒▒▒p▒?\▒?OV▒▒Proxy▒▒▒▒▒??▒▒J▒▒▒A▒▒?▒▒B"> > <title>CyberSyndrome : Proxy List / Anonymous</title> > <style type="text/css"> [ ... snip ... ] cheers, DaveK -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple