Re: urllib2 performance on windows, usb connection

dq Sat, 07 Feb 2009 01:41:26 -0800

MRAB wrote:

dq wrote:
MRAB wrote:
dq wrote:
dq wrote:
MRAB wrote:
dq wrote:
Martin v. Löwis wrote:
So does anyone know what the deal is with this? Why is thesame code so much slower on Windows? Hope someone can tell mebefore a holy war erupts :-)
Only the holy war can give an answer here. It certainly has
*nothing* to do with Python; Python calls the operating systemfunctions to read from the network and write to the disk almostdirectly. So it must be the operating system itself that slowsit down.
To investigate further, you might drop the write operating,
and measure only source.read(). If that is slower, then, forsome reason, the network speed is bad on Windows. Maybeyou have the network interfaces misconfigured? Maybe you areusing wireless on Windows, but cable on Linux? Maybe you havesome network filtering software running on Windows? Maybe it'sjust that Windows sucks?-)
If the network read speed is fine, but writing slows down,
I ask the same questions. Perhaps you have some virus scannerinstalled that filters all write operations? Maybe
 Windows sucks?

Regards, Martin
Thanks for the ideas, Martin.  I ran a couple of experiments
 to find the culprit, by downloading the same 20 MB file from
 the same fast server. I compared:

1.  DL to HD vs USB iPod. 2.  AV on-access protection on vs.
 off 3.  "source. read()" only vs.  "file.write(
source.read() )"
The culprit is definitely the write speed on the iPod. That is,everything runs plenty fast (~1 MB/s down) as long as I'mnot writing directly to the iPod. This is kind of odd, becauseif I copy the file over from the HD to the iPod usingwindows (drag-n-drop), it takes about a second or two, so about10 MB/s.
So the problem is definitely partially Windows, but it also seemsthat Python's file.write() function is not without blame. It'sthe combination of Windows, iPod and Python's data stream that isslowing me down.
I'm not really sure what I can do about this. I'll experiment alittle more and see if there's any way around this bottleneck.If anyone has run into a problem like this,
 I'd love to hear about it...
You could try copying the file to the iPod using the command line,or copying data from disk to iPod in, say, C, anything but Python.This would allow you to identify whether Python itself hasanything to do with it.
Well, I think I've partially identified the problem. target.write(source.read() ) runs perfectly fast, copies 20 megs
 in about a second, from HD to iPod.  However, if I run the same
code in a while loop, using a certain block size, saytarget.write( source.read(4096) ), it takes forever (or at least
 I'm still timing it while I write this post).
The mismatch seems to be between urllib2's block size and the writespeed of the iPod, I might try to tweak this a little in the codeand see if it has any effect.
Oh, there we go: 20 megs in 135.8 seconds. Yeah... I might wantto try to improve that...
After some tweaking of the block size, I managed to get the DL speedup to about 900 Mb/s. It's still not quite Ubuntu, but it's
 a good order of magnitude better.  The new DL code is pretty much
 this:
""" blocksize = 2 ** 16 # plus or minus a power of 2 source =urllib2.urlopen( 'url://string' ) target = open( pathname, 'wb')fullsize = float( source.info()['Content-Length'] ) DLd = 0 whileDLd < fullsize: DLd = DLd + blocksize # optional: write some DLprogress info # somewhere, e.g. stdout target.close() source.close()"""
I'd like to suggest that the block size you add to 'DLd' be theactual size of the returned block, just in case the read() doesn'treturn all you asked for (it might not be guaranteed, and the chances
 are that the final block will be shorter, unless 'fullsize' happens
 to be a multiple of 'blocksize').

If less is returned by read() then the while-loop might finish before
all the data has been downloaded, and if you just add 'blocksize'each time it might end up > 'fullsize', ie apparently >100% downloaded!
Interesting. I'll if to see if any of the downloaded files endprematurely :)
btw, I forgot the most important line of the code!

"""
blocksize = 2 ** 16    # plus or minus a power of 2
source = urllib2.urlopen( 'url://string' )
target = open( pathname, 'wb')
fullsize = float( source.info()['Content-Length'] )
DLd = 0
while DLd < fullsize:
    #  +++
    target.write( source.read( blocksize ) )  # +++
    #  +++
    DLd = DLd + blocksize
    # optional:  write some DL progress info
    # somewhere, e.g. stdout
target.close()
source.close()
"""
Using that, I'm not quite sure where I can grab onto the value of howmuch was actually read from the block. I suppose I could use anintermediate variable, read the data into it, measure the size, andthen write it to the file stream, but I'm not sure it would be worththe overhead. Or is there some other magic I should know about?
If I start to get that problem, at least I'll know where to look...
It's just:

    data = source.read(blocksize)
    target.write(data)
    DLd = DLd + len(data)

The overhead is tiny because you're not copying the data.
If 'x' refers to a 1MB bytestring and you do "y = x" or "foo(x)", you'renot actually copying that bytestring; you're just making 'y' also referto it or passing the reference to it into 'foo'. It's a bit passingpointers around, but without the nasty bits! :-)

Yeah, that's about what I was thinking, although not quite assuccintly. Thanks for the help!

--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 performance on windows, usb connection

Reply via email to