urllib2 slow for multiple requests
Hello everybody, really new to python, so bear with me. I am trying to do some very basic scraping tool. Bascally it just grabs a page xy times and tells me how long it took. When I do this once, it is blazingly fast, but when I increase the number of repetitions, it is slowing down considerably (1 is like 3 ms, 100 takes 6 seconds). I have done implementations in couple more languages (php, ruby) and none of them seems to suffer from a similar problem and it seems, that it behaves linearly. Maybe it is a known issue in urllib2, or I am simply using it badly. I am using python 2.4.3, machine has CentOS, below is the sc. Thanks in advance import urllib2 from datetime import datetime def application(): start = datetime.now() req = urllib2.Request("http://127.0.0.1/gdc/about";, None, {'Accept': 'application/json'}) for number in range(100): response = urllib2.urlopen(req) end = datetime.now() output = end - start print output application() -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 slow for multiple requests
On May 13, 4:55 pm, cgoldberg wrote: > > Bascally it just grabs a page xy > > times and tells me how long it took. > > you aren't doing a read(), so technically you are just connecting to > the web server and sending the request but never reading the content > back from the socket. So your timing wouldn't be accurate. > > try this instead: > response = urllib2.urlopen(req).read() > > But that is not the problem you are describing... Thanks for this pointer, didn't come to my mind. > > when I increase the number of repetitions, it is > > slowing down considerably (1 is like 3 ms, 100 takes 6 seconds). > > Maybe it is a known issue in urllib2 > > I ran your code and can not reproduce that behavior. No matter how > many repetitions, I still get a similar response time per transaction. > > any more details or code samples you can provide? > I don;t know, I have tried the program on my local MacOs, where I have several python runtimes installed and there is huge dfference between result after running at 2.6 and 2.4. So this might be the problem. When ran on the 2.6 result are comparable to php and better than ruby, which is what I expect. The problem is, that CentOS is running on the server and there is only 2.4 available. On wich version did you ran these tests? Thanks > -Corey Goldberg -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 slow for multiple requests
One more thing, since I am stuck with 2.4 (and if this is really 2.4 issue), is there some substitution for urllib2? On May 14, 11:00 am, Tomas Svarovsky wrote: > On May 13, 4:55 pm, cgoldberg wrote: > > > > Bascally it just grabs a page xy > > > times and tells me how long it took. > > > you aren't doing a read(), so technically you are just connecting to > > the web server and sending the request but never reading the content > > back from the socket. So your timing wouldn't be accurate. > > > try this instead: > > response = urllib2.urlopen(req).read() > > > But that is not the problem you are describing... > > Thanks for this pointer, didn't come to my mind. > > > > when I increase the number of repetitions, it is > > > slowing down considerably (1 is like 3 ms, 100 takes 6 seconds). > > > Maybe it is a known issue in urllib2 > > > I ran your code and can not reproduce that behavior. No matter how > > many repetitions, I still get a similar response time per transaction. > > > any more details or code samples you can provide? > > I don;t know, I have tried the program on my local MacOs, where I have > several python runtimes installed and there is huge dfference between > result after running at 2.6 and 2.4. So this might be the problem. > When ran on the 2.6 result are comparable to php and better than ruby, > which is what I expect. > > The problem is, that CentOS is running on the server and there is only > 2.4 available. On wich version did you ran these tests? > > Thanks > > > -Corey Goldberg -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 slow for multiple requests
On May 14, 11:57 am, "Richard Brodie" wrote: > "cgoldberg" wrote in message > > news:9ae58862-1cb2-4981-ae6a-0428c7684...@z5g2000vba.googlegroups.com... > > > you aren't doing a read(), so technically you are just connecting to > > the web server and sending the request but never reading the content > > back from the socket. > > > But that is not the problem you are describing... > > It might be, if the local server doesn't scale well enough to handle > 100 concurrent requests. This is a good point, but then it would manifest regardless of the language used AFAIK. And this is not the case, ruby and php implementations are working quite fine. Thanks for reply -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 slow for multiple requests
On May 14, 6:33 pm, "Richard Brodie" wrote: > "Tomas Svarovsky" wrote in message > > news:747b0d4f-f9fd-4fa6-bb6d-0a4365f32...@b1g2000vbc.googlegroups.com... > > > This is a good point, but then it would manifest regardless of the > > language used AFAIK. And this is not the case, ruby and php > > implementations are working quite fine. > > What I meant was: not reading the data and leaving the connection > open is going to force the server to handle all 100 requests concurrently. > I'm guessing that's not what your other implementations do. > What happens to the timing if you call response.read(), response.close() ? Now I get it, but nevertheless, even when I explicitely read from the socket and then close it properly, the timing still doesn't change. Thanks for advice though -- http://mail.python.org/mailman/listinfo/python-list