Re: Download the "head" of a large file?

Gabriel Genellina Mon, 27 Jul 2009 19:18:09 -0700

En Mon, 27 Jul 2009 19:40:25 -0300, John Yeung
<gallium.arsen...@gmail.com> escribió:

On Jul 27, 4:38 pm, erikcw <erikwickst...@gmail.com> wrote:

I'm trying to figure out how to download just the first few lines of a
large (50mb) text file form a server to save bandwidth.  Can Python do
this?

Something like the Python equivalent of curlhttp://url.com/file.xml|
head -c 2048


urllib.urlopen gives you a file-like object, which you can then read
line by line or in fixed-size chunks.  For example:

import urllib
chunk = urllib.urlopen('http://url.com/file.xml').read(2048)

At that point, chunk is just bytes, which you can write to a local
file, print, or whatever it is you want.


As the OP wants to save bandwidth, it's better to ask exactly the amount
of data to read. That is, add a Range header field [1] to the request, and
inspect the response for a corresponding Content-Range header [2].

py> import urllib2
py> url = "http://www.python.org/";
py> req = urllib2.Request(url)
py> req.add_header('Range', 'bytes=0-10239')  # first 10K
py> f = urllib2.urlopen(req)
py> data = f.read()
py> print repr(data[-30:]), len(data)
'\t    <a href="http://www.zope.' 10240
py> f.headers['Content-Range']
'bytes 0-10239/18196'
py> f.getcode()
206            # 206=Partial Content
py> f.close()

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35

[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: Download the "head" of a large file?

Reply via email to