from:"Tomas Svarovsky"

urllib2 slow for multiple requests

2009-05-13 Thread Tomas Svarovsky

Hello everybody, really new to python, so bear with me. I am trying to
do some very basic scraping tool. Bascally it just grabs a page xy
times and tells me how long it took. When I do this once, it is
blazingly fast, but when I increase the number of repetitions, it is
slowing down considerably (1 is like 3 ms, 100 takes 6 seconds). I
have done implementations in couple more languages (php, ruby) and
none of them seems to suffer from a similar problem and it seems, that
it behaves linearly. Maybe it is a known issue in urllib2, or I am
simply using it badly. I am using python 2.4.3, machine has CentOS,
below is the sc. Thanks in advance

import urllib2
from datetime import datetime

def application():
start = datetime.now()
req = urllib2.Request("http://127.0.0.1/gdc/about";, None,
{'Accept': 'application/json'})
for number in range(100):
  response = urllib2.urlopen(req)
end = datetime.now()
output = end - start
print output

application()
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 slow for multiple requests

2009-05-14 Thread Tomas Svarovsky

On May 13, 4:55 pm, cgoldberg  wrote:
> > Bascally it just grabs a page xy
> > times and tells me how long it took.
>
> you aren't doing a read(), so technically you are just connecting to
> the web server and sending the request but never reading the content
> back from the socket.  So your timing wouldn't be accurate.
>
> try this instead:
> response = urllib2.urlopen(req).read()
>
> But that is not the problem you are describing...

Thanks for this pointer, didn't come to my mind.

> > when I increase the number of repetitions, it is
> > slowing down considerably (1 is like 3 ms, 100 takes 6 seconds).
> > Maybe it is a known issue in urllib2
>
> I ran your code and can not reproduce that behavior.  No matter how
> many repetitions, I still get a similar response time per transaction.
>
> any more details or code samples you can provide?
>

I don;t know, I have tried the program on my local MacOs, where I have
several python runtimes installed and there is huge dfference between
result after running at 2.6 and 2.4. So this might be the problem.
When ran on the 2.6 result are comparable to php and better than ruby,
which is what I expect.

The problem is, that CentOS is running on the server and there is only
2.4 available. On wich version did you ran these tests?

Thanks

> -Corey Goldberg

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 slow for multiple requests

2009-05-14 Thread Tomas Svarovsky

One more thing, since I am stuck with 2.4 (and if this is really 2.4
issue), is there some substitution for urllib2?

On May 14, 11:00 am, Tomas Svarovsky 
wrote:
> On May 13, 4:55 pm, cgoldberg  wrote:
>
> > > Bascally it just grabs a page xy
> > > times and tells me how long it took.
>
> > you aren't doing a read(), so technically you are just connecting to
> > the web server and sending the request but never reading the content
> > back from the socket.  So your timing wouldn't be accurate.
>
> > try this instead:
> > response = urllib2.urlopen(req).read()
>
> > But that is not the problem you are describing...
>
> Thanks for this pointer, didn't come to my mind.
>
> > > when I increase the number of repetitions, it is
> > > slowing down considerably (1 is like 3 ms, 100 takes 6 seconds).
> > > Maybe it is a known issue in urllib2
>
> > I ran your code and can not reproduce that behavior.  No matter how
> > many repetitions, I still get a similar response time per transaction.
>
> > any more details or code samples you can provide?
>
> I don;t know, I have tried the program on my local MacOs, where I have
> several python runtimes installed and there is huge dfference between
> result after running at 2.6 and 2.4. So this might be the problem.
> When ran on the 2.6 result are comparable to php and better than ruby,
> which is what I expect.
>
> The problem is, that CentOS is running on the server and there is only
> 2.4 available. On wich version did you ran these tests?
>
> Thanks
>
> > -Corey Goldberg

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 slow for multiple requests

2009-05-14 Thread Tomas Svarovsky

On May 14, 11:57 am, "Richard Brodie"  wrote:
> "cgoldberg"  wrote in message
>
> news:9ae58862-1cb2-4981-ae6a-0428c7684...@z5g2000vba.googlegroups.com...
>
> > you aren't doing a read(), so technically you are just connecting to
> > the web server and sending the request but never reading the content
> > back from the socket.
>
> > But that is not the problem you are describing...
>
> It might be, if the local server doesn't scale well enough to handle
> 100 concurrent requests.

This is a good point, but then it would manifest regardless of the
language used AFAIK. And this is not the case, ruby and php
implementations are working quite fine.

Thanks for reply
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 slow for multiple requests

2009-05-15 Thread Tomas Svarovsky

On May 14, 6:33 pm, "Richard Brodie"  wrote:
> "Tomas Svarovsky"  wrote in message
>
> news:747b0d4f-f9fd-4fa6-bb6d-0a4365f32...@b1g2000vbc.googlegroups.com...
>
> > This is a good point, but then it would manifest regardless of the
> > language used AFAIK. And this is not the case, ruby and php
> > implementations are working quite fine.
>
> What I meant was: not reading the data and leaving the connection
> open is going to force the server to handle all 100 requests concurrently.
> I'm guessing that's not what your other implementations do.
> What happens to the timing if you call response.read(), response.close() ?

Now I get it, but nevertheless, even when I explicitely read from the
socket and then close it properly, the timing still doesn't change.

Thanks for advice though
-- 
http://mail.python.org/mailman/listinfo/python-list

urllib2 slow for multiple requests

Re: urllib2 slow for multiple requests

Re: urllib2 slow for multiple requests

Re: urllib2 slow for multiple requests

Re: urllib2 slow for multiple requests

5 matches

Site Navigation

Mail list logo

Footer information