Re: Download excel file from web?

Guilherme Polo Tue, 29 Jul 2008 03:34:29 -0700

On Tue, Jul 29, 2008 at 1:47 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On Jul 28, 6:05 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote:
>> On Mon, Jul 28, 2008 at 9:39 PM, MRAB <[EMAIL PROTECTED]> wrote:
>> > On Jul 29, 12:41 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
>> >> On Jul 28, 4:20 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote:
>>
>> >> > On Mon, Jul 28, 2008 at 8:04 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> 
>> >> > wrote:
>> >> > > On Jul 28, 3:52 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote:
>> >> > >> On Mon, Jul 28, 2008 at 7:43 PM, [EMAIL PROTECTED] <[EMAIL 
>> >> > >> PROTECTED]> wrote:
>> >> > >> > On Jul 28, 3:33 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
>> >> > >> >> On Jul 28, 3:29 pm, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
>>
>> >> > >> >> > [EMAIL PROTECTED] schrieb:
>>
>> >> > >> >> > > On Jul 28, 3:00 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> 
>> >> > >> >> > > wrote:
>> >> > >> >> > >> Hi - experienced programmer but this is my first Python 
>> >> > >> >> > >> program.
>>
>> >> > >> >> > >> This URL will retrieve an excel spreadsheet containing (that 
>> >> > >> >> > >> day's)
>> >> > >> >> > >> msci stock index returns.
>>
>> >> > >> >> > >>http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&;...
>>
>> >> > >> >> > >> Want to write python to download and save the file.
>>
>> >> > >> >> > >> So far I've arrived at this:
>>
>> >> > >> >> > >> [quote]
>> >> > >> >> > >> # import pdb
>> >> > >> >> > >> import urllib2
>> >> > >> >> > >> from win32com.client import Dispatch
>>
>> >> > >> >> > >> xlApp = Dispatch("Excel.Application")
>>
>> >> > >> >> > >> # test 1
>> >> > >> >> > >> # xlApp.Workbooks.Add()
>> >> > >> >> > >> # xlApp.ActiveSheet.Cells(1,1).Value = 'A'
>> >> > >> >> > >> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
>> >> > >> >> > >> # xlBook = xlApp.ActiveWorkbook
>> >> > >> >> > >> # xlBook.SaveAs(Filename='C:\\test.xls')
>>
>> >> > >> >> > >> # pdb.set_trace()
>> >> > >> >> > >> response = 
>> >> > >> >> > >> urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>> >> > >> >> > >> excel?
>> >> > >> >> > >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> >> > >> >> > >> +25%2C+2008&export=Excel_IEIPerfRegional')
>> >> > >> >> > >> # test 2 - returns check = False
>> >> > >> >> > >> check_for_data = 
>> >> > >> >> > >> urllib2.Request('http://www.mscibarra.com/webapp/
>> >> > >> >> > >> indexperf/excel?
>> >> > >> >> > >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> >> > >> >> > >> +25%2C+2008&export=Excel_IEIPerfRegional').has_data()
>>
>> >> > >> >> > >> xlApp = response.fp
>> >> > >> >> > >> print(response.fp.name)
>> >> > >> >> > >> print(xlApp.name)
>> >> > >> >> > >> xlApp.write
>> >> > >> >> > >> xlApp.Close
>> >> > >> >> > >> [/quote]
>>
>> >> > >> >> > > Woops hit Send when I wanted Preview.  Looks like the html 
>> >> > >> >> > > [quote] tag
>> >> > >> >> > > doesn't work from groups.google.com (nice).
>>
>> >> > >> >> > > Anway, in test 1 above, I determined how to instantiate an 
>> >> > >> >> > > excel
>> >> > >> >> > > object; put some stuff in it; then save to disk.
>>
>> >> > >> >> > > So, in theory, I'm retrieving my excel spreadsheet with
>>
>> >> > >> >> > > response = urllib2.urlopen()
>>
>> >> > >> >> > > Except what then do I do with this?
>>
>> >> > >> >> > > Well for one read some of the urllib2 documentation and found 
>> >> > >> >> > > the
>> >> > >> >> > > Request class with the method has_data() on it.  It returns 
>> >> > >> >> > > False.
>> >> > >> >> > > Hmm that's not encouraging.
>>
>> >> > >> >> > > I supposed the trick to understand what urllib2.urlopen is 
>> >> > >> >> > > returning
>> >> > >> >> > > to me; rummage around in there; and hopefully find my excel 
>> >> > >> >> > > file.
>>
>> >> > >> >> > > I use pdb to debug.  This is interesting:
>>
>> >> > >> >> > > (Pdb) dir(response)
>> >> > >> >> > > ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 
>> >> > >> >> > > 'close',
>> >> > >> >> > > 'code', '
>> >> > >> >> > > fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 
>> >> > >> >> > > 'read',
>> >> > >> >> > > 'readline', '
>> >> > >> >> > > readlines', 'url']
>> >> > >> >> > > (Pdb)
>>
>> >> > >> >> > > I suppose the members with __*_ are methods; and the names 
>> >> > >> >> > > without the
>> >> > >> >> > > underbars are attributes (variables) (?).
>>
>> >> > >> >> > No, these are the names of all attributes and methods. read is 
>> >> > >> >> > a method,
>> >> > >> >> > for example.
>>
>> >> > >> >> right - I got it backwards.
>>
>> >> > >> >> > > Or maybe this isn't at all the right direction to take (maybe 
>> >> > >> >> > > there
>> >> > >> >> > > are much better modules to do this stuff).  Would be happy to 
>> >> > >> >> > > learn if
>> >> > >> >> > > that's the case (and if that gets the job done for me).
>>
>> >> > >> >> > The docs (http://docs.python.org/lib/module-urllib2.html) are 
>> >> > >> >> > pretty
>> >> > >> >> > clear on this:
>>
>> >> > >> >> > """
>> >> > >> >> > This function returns a file-like object with two additional 
>> >> > >> >> > methods:
>> >> > >> >> > """
>>
>> >> > >> >> > And then for file-like objects:
>>
>> >> > >> >> >http://docs.python.org/lib/bltin-file-objects.html
>>
>> >> > >> >> > """
>> >> > >> >> > read(   [size])
>> >> > >> >> >      Read at most size bytes from the file (less if the read 
>> >> > >> >> > hits EOF
>> >> > >> >> > before obtaining size bytes). If the size argument is negative 
>> >> > >> >> > or
>> >> > >> >> > omitted, read all data until EOF is reached. The bytes are 
>> >> > >> >> > returned as a
>> >> > >> >> > string object. An empty string is returned when EOF is 
>> >> > >> >> > encountered
>> >> > >> >> > immediately. (For certain files, like ttys, it makes sense to 
>> >> > >> >> > continue
>> >> > >> >> > reading after an EOF is hit.) Note that this method may call the
>> >> > >> >> > underlying C function fread() more than once in an effort to 
>> >> > >> >> > acquire as
>> >> > >> >> > close to size bytes as possible. Also note that when in 
>> >> > >> >> > non-blocking
>> >> > >> >> > mode, less data than what was requested may be returned, even 
>> >> > >> >> > if no size
>> >> > >> >> > parameter was given.
>> >> > >> >> > """
>>
>> >> > >> >> > Diez
>>
>> >> > >> >> Just stumbled upon .read:
>>
>> >> > >> >> response = 
>> >> > >> >> urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>> >> > >> >> excel?
>> >> > >> >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> >> > >> >> +25%2C+2008&export=Excel_IEIPerfRegional').read
>>
>> >> > >> >> Now the question is: what to do with this?  I'll look at the
>> >> > >> >> documentation that you point to.
>>
>> >> > >> >> thanx - pat
>>
>> >> > >> > Or rather (next iteration):
>>
>> >> > >> > response = 
>> >> > >> > urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>> >> > >> > excel?
>> >> > >> > priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> >> > >> > +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
>>
>> >> > >> > The file is generally something like 26 KB so specifying 1,000,000
>> >> > >> > seems like a good idea (first approximation).
>>
>> >> > >> > And then when I do:
>>
>> >> > >> > print(response)
>>
>> >> > >> > I get a whole lot of garbage (and some non-garbage), so I know I'm
>> >> > >> > onto something.
>>
>> >> > >> > When I read the .read documentation further, it says that read() 
>> >> > >> > has
>> >> > >> > returned the data as a string object.  Now - how do I convince 
>> >> > >> > Python
>> >> > >> > that the string object is in fact an excel file - and save it to 
>> >> > >> > disk?
>>
>> >> > >> You don't need to convince Python, just write it to a file.
>> >> > >> More reading for you:http://docs.python.org/tut/node9.html
>>
>> >> > >> > pat
>> >> > >> > --
>> >> > >> >http://mail.python.org/mailman/listinfo/python-list
>>
>> >> > >> --
>> >> > >> -- Guilherme H. Polo Goncalves
>>
>> >> > > OK:
>>
>> >> > > response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>> >> > > excel?
>> >> > > priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> >> > > +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
>> >> > > # print(response)
>> >> > > f = open("c:\\msci.xls",'w')
>> >> > > f.write(response)
>>
>> >> > I would initially change that to:
>>
>> >> > response = 
>> >> > urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&;...)
>>
>> >> > f = open("c:\\msci.xls", "wb")
>> >> > for line in response:
>> >> >     f.write(line)
>> >> > f.close()
>>
>> >> > and then..
>>
>> >> > > OK this makes the file, and there's a c:\msci.xls in place and it's
>> >> > > about the right size. But whether I make the second param to open 'w'
>> >> > > or 'wb', when I try to open msci.xls from the Windows file explorer,
>> >> > > excel tells me that the file is corrupted.
>>
>> >> > try it.
>>
>> >> > > pat
>> >> > > --
>> >> > >http://mail.python.org/mailman/listinfo/python-list
>>
>> >> > --
>> >> > -- Guilherme H. Polo Goncalves
>>
>> >> A simple f.write(response) does work (click on a single row in Excel
>> >> and you get a single row).
>>
>> >> But I can see that what you recommend Guilherme is probably safer -
>> >> thanx.
>>
>> >> pat
>>
>> > If response contains a string then:
>>
>> Did you notice I removed the read(...) part ?
>>
>> > for line in response:
>> >    f.write(line)
>>
>> > will actually be writing the string one character at a time!
>> > --
>> >http://mail.python.org/mailman/listinfo/python-list
>>
>> --
>> -- Guilherme H. Polo Goncalves
>
> Actually no I didn't Guilherme (although I'll take it out now).
>
> Would leaving the in urllib2.urlopen().read() imply, as MRAB would
> seem to indicate, that the following for loop would act byte-by-byte?
> And if so, how?


.read() returns a string, so yes.
The point in removing the .read(xxxxx) is that you no longer need to
guess how long is the file to read it entirely.

>
> Even with the .read() in, it was very fast.  But it looks like it
> won't hurt (and very possibly helps) to take it out.
>
> pat
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
-- Guilherme H. Polo Goncalves
--
http://mail.python.org/mailman/listinfo/python-list

Re: Download excel file from web?

Reply via email to