Re: Download excel file from web?

[EMAIL PROTECTED] Mon, 28 Jul 2008 16:22:21 -0700

On Jul 28, 4:04 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> On Jul 28, 3:52 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Mon, Jul 28, 2008 at 7:43 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> 
> > wrote:
> > > On Jul 28, 3:33 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> > >> On Jul 28, 3:29 pm, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
>
> > >> > [EMAIL PROTECTED] schrieb:
>
> > >> > > On Jul 28, 3:00 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> > >> > >> Hi - experienced programmer but this is my first Python program.
>
> > >> > >> This URL will retrieve an excel spreadsheet containing (that day's)
> > >> > >> msci stock index returns.
>
> > >> > >>http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&;...
>
> > >> > >> Want to write python to download and save the file.
>
> > >> > >> So far I've arrived at this:
>
> > >> > >> [quote]
> > >> > >> # import pdb
> > >> > >> import urllib2
> > >> > >> from win32com.client import Dispatch
>
> > >> > >> xlApp = Dispatch("Excel.Application")
>
> > >> > >> # test 1
> > >> > >> # xlApp.Workbooks.Add()
> > >> > >> # xlApp.ActiveSheet.Cells(1,1).Value = 'A'
> > >> > >> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
> > >> > >> # xlBook = xlApp.ActiveWorkbook
> > >> > >> # xlBook.SaveAs(Filename='C:\\test.xls')
>
> > >> > >> # pdb.set_trace()
> > >> > >> response = 
> > >> > >> urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
> > >> > >> excel?
> > >> > >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> > >> > >> +25%2C+2008&export=Excel_IEIPerfRegional')
> > >> > >> # test 2 - returns check = False
> > >> > >> check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
> > >> > >> indexperf/excel?
> > >> > >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> > >> > >> +25%2C+2008&export=Excel_IEIPerfRegional').has_data()
>
> > >> > >> xlApp = response.fp
> > >> > >> print(response.fp.name)
> > >> > >> print(xlApp.name)
> > >> > >> xlApp.write
> > >> > >> xlApp.Close
> > >> > >> [/quote]
>
> > >> > > Woops hit Send when I wanted Preview.  Looks like the html [quote] 
> > >> > > tag
> > >> > > doesn't work from groups.google.com (nice).
>
> > >> > > Anway, in test 1 above, I determined how to instantiate an excel
> > >> > > object; put some stuff in it; then save to disk.
>
> > >> > > So, in theory, I'm retrieving my excel spreadsheet with
>
> > >> > > response = urllib2.urlopen()
>
> > >> > > Except what then do I do with this?
>
> > >> > > Well for one read some of the urllib2 documentation and found the
> > >> > > Request class with the method has_data() on it.  It returns False.
> > >> > > Hmm that's not encouraging.
>
> > >> > > I supposed the trick to understand what urllib2.urlopen is returning
> > >> > > to me; rummage around in there; and hopefully find my excel file.
>
> > >> > > I use pdb to debug.  This is interesting:
>
> > >> > > (Pdb) dir(response)
> > >> > > ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 
> > >> > > 'close',
> > >> > > 'code', '
> > >> > > fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
> > >> > > 'readline', '
> > >> > > readlines', 'url']
> > >> > > (Pdb)
>
> > >> > > I suppose the members with __*_ are methods; and the names without 
> > >> > > the
> > >> > > underbars are attributes (variables) (?).
>
> > >> > No, these are the names of all attributes and methods. read is a 
> > >> > method,
> > >> > for example.
>
> > >> right - I got it backwards.
>
> > >> > > Or maybe this isn't at all the right direction to take (maybe there
> > >> > > are much better modules to do this stuff).  Would be happy to learn 
> > >> > > if
> > >> > > that's the case (and if that gets the job done for me).
>
> > >> > The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
> > >> > clear on this:
>
> > >> > """
> > >> > This function returns a file-like object with two additional methods:
> > >> > """
>
> > >> > And then for file-like objects:
>
> > >> >http://docs.python.org/lib/bltin-file-objects.html
>
> > >> > """
> > >> > read(   [size])
> > >> >      Read at most size bytes from the file (less if the read hits EOF
> > >> > before obtaining size bytes). If the size argument is negative or
> > >> > omitted, read all data until EOF is reached. The bytes are returned as 
> > >> > a
> > >> > string object. An empty string is returned when EOF is encountered
> > >> > immediately. (For certain files, like ttys, it makes sense to continue
> > >> > reading after an EOF is hit.) Note that this method may call the
> > >> > underlying C function fread() more than once in an effort to acquire as
> > >> > close to size bytes as possible. Also note that when in non-blocking
> > >> > mode, less data than what was requested may be returned, even if no 
> > >> > size
> > >> > parameter was given.
> > >> > """
>
> > >> > Diez
>
> > >> Just stumbled upon .read:
>
> > >> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
> > >> excel?
> > >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> > >> +25%2C+2008&export=Excel_IEIPerfRegional').read
>
> > >> Now the question is: what to do with this?  I'll look at the
> > >> documentation that you point to.
>
> > >> thanx - pat
>
> > > Or rather (next iteration):
>
> > > response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
> > > excel?
> > > priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> > > +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
>
> > > The file is generally something like 26 KB so specifying 1,000,000
> > > seems like a good idea (first approximation).
>
> > > And then when I do:
>
> > > print(response)
>
> > > I get a whole lot of garbage (and some non-garbage), so I know I'm
> > > onto something.
>
> > > When I read the .read documentation further, it says that read() has
> > > returned the data as a string object.  Now - how do I convince Python
> > > that the string object is in fact an excel file - and save it to disk?
>
> > You don't need to convince Python, just write it to a file.
> > More reading for you:http://docs.python.org/tut/node9.html
>
> > > pat
> > > --
> > >http://mail.python.org/mailman/listinfo/python-list
>
> > --
> > -- Guilherme H. Polo Goncalves
>
> OK:
>
> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
> excel?
> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
> # print(response)
> f = open("c:\\msci.xls",'w')
> f.write(response)
>
> OK this makes the file, and there's a c:\msci.xls in place and it's
> about the right size. But whether I make the second param to open 'w'
> or 'wb', when I try to open msci.xls from the Windows file explorer,
> excel tells me that the file is corrupted.
>
> pat


Nope - must have been stumbling over my own feet.

'wb' _is_ necessary (as I would expect).

So it works:

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'wb')
f.write(response)
f.flush
f.close

I know the f.flush and f.close are redundant - in the sense that both
flush the contents to disk.  So I can probably just take out the
f.flush.

Thanx for the help.

pat
--
http://mail.python.org/mailman/listinfo/python-list

Re: Download excel file from web?

Reply via email to