Steve Holden wrote: > Steve Holden wrote: > > Johnny Lee wrote: > > [...] > > > >>I've sent the source, thanks for your help. > >> > > > > [...] > > Preliminary result, in case this rings bells with people who use urllib2 > > quite a lot. I modified the error case to report the actual message > > returned with the exception and I'm seeing things like: > > > > http://www.holdenweb.com/./Python/webframeworks.html > > Message: <urlopen error (120, 'Operation already in progress')> > > Start process > > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20 > > Error: IOError while parsing > > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20 > > Message: <urlopen error (120, 'Operation already in progress')> > > . > > . > > . > > > > So at least we know now what the error is, and it looks like some sort > > of resource limit (though why only on Cygwin betas me) ... anyone, > > before I start some serious debugging? > > > I realized after this post that WingIDE doesn't run under Cygwin, so I > modified the code further to raise an error and give us a proper > traceback. I also tested the program under the standard Windows 2.4.1 > release, where it didn't fail, so I conclude you have unearthed a Cygwin > socket bug. Here's the traceback: > > End process http://www.holdenweb.com/contact.html > Start process http://freshmeat.net/releases/192449 > Error: IOError while parsing http://freshmeat.net/releases/192449 > Message: <urlopen error (120, 'Operation already in progress')> > Traceback (most recent call last): > File "Spider_bug.py", line 225, in ? > spider.run() > File "Spider_bug.py", line 143, in run > self.grabUrl(tempUrl) > File "Spider_bug.py", line 166, in grabUrl > webPage = urllib2.urlopen(url).read() > File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen > return _opener.open(url, data) > File "/usr/lib/python2.4/urllib2.py", line 358, in open > response = self._open(req, data) > File "/usr/lib/python2.4/urllib2.py", line 376, in _open > '_open', req) > File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain > result = func(*args) > File "/usr/lib/python2.4/urllib2.py", line 1021, in http_open > return self.do_open(httplib.HTTPConnection, req) > File "/usr/lib/python2.4/urllib2.py", line 996, in do_open > raise URLError(err) > urllib2.URLError: <urlopen error (120, 'Operation already in progress')> > > Looking at that part of the course of urrllib2 we see: > > headers["Connection"] = "close" > try: > h.request(req.get_method(), req.get_selector(), req.data, > headers) > r = h.getresponse() > except socket.error, err: # XXX what error? > raise URLError(err) > > So my conclusion is that there's something in the Cygwin socket module > that causes problems not seen under other platforms. > > I couldn't find any obviously-related error in the Python bug tracker, > and I have copied this message to the Cygwin list in case someone there > knows what the problem is. > > Before making any kind of bug submission you should really see if you > can build a program shorter that the existing 220+ lines to demonstrate > the bug, but it does look to me like your program should work (as indeed > it does on other platforms). > > regards > Steve > -- > Steve Holden +44 150 684 7255 +1 800 494 3119 > Holden Web LLC www.holdenweb.com > PyCon TX 2006 www.python.org/pycon/
But if you change urllib2 to urllib, it works under cygwin. Are they using different mechanism to connect to the page? -- http://mail.python.org/mailman/listinfo/python-list