On Mon, Jul 28, 2008 at 8:04 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > On Jul 28, 3:52 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote: >> On Mon, Jul 28, 2008 at 7:43 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> > On Jul 28, 3:33 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: >> >> On Jul 28, 3:29 pm, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: >> >> >> > [EMAIL PROTECTED] schrieb: >> >> >> > > On Jul 28, 3:00 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: >> >> > >> Hi - experienced programmer but this is my first Python program. >> >> >> > >> This URL will retrieve an excel spreadsheet containing (that day's) >> >> > >> msci stock index returns. >> >> >> > >>http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&... >> >> >> > >> Want to write python to download and save the file. >> >> >> > >> So far I've arrived at this: >> >> >> > >> [quote] >> >> > >> # import pdb >> >> > >> import urllib2 >> >> > >> from win32com.client import Dispatch >> >> >> > >> xlApp = Dispatch("Excel.Application") >> >> >> > >> # test 1 >> >> > >> # xlApp.Workbooks.Add() >> >> > >> # xlApp.ActiveSheet.Cells(1,1).Value = 'A' >> >> > >> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B' >> >> > >> # xlBook = xlApp.ActiveWorkbook >> >> > >> # xlBook.SaveAs(Filename='C:\\test.xls') >> >> >> > >> # pdb.set_trace() >> >> > >> response = >> >> > >> urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ >> >> > >> excel? >> >> > >> priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul >> >> > >> +25%2C+2008&export=Excel_IEIPerfRegional') >> >> > >> # test 2 - returns check = False >> >> > >> check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/ >> >> > >> indexperf/excel? >> >> > >> priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul >> >> > >> +25%2C+2008&export=Excel_IEIPerfRegional').has_data() >> >> >> > >> xlApp = response.fp >> >> > >> print(response.fp.name) >> >> > >> print(xlApp.name) >> >> > >> xlApp.write >> >> > >> xlApp.Close >> >> > >> [/quote] >> >> >> > > Woops hit Send when I wanted Preview. Looks like the html [quote] tag >> >> > > doesn't work from groups.google.com (nice). >> >> >> > > Anway, in test 1 above, I determined how to instantiate an excel >> >> > > object; put some stuff in it; then save to disk. >> >> >> > > So, in theory, I'm retrieving my excel spreadsheet with >> >> >> > > response = urllib2.urlopen() >> >> >> > > Except what then do I do with this? >> >> >> > > Well for one read some of the urllib2 documentation and found the >> >> > > Request class with the method has_data() on it. It returns False. >> >> > > Hmm that's not encouraging. >> >> >> > > I supposed the trick to understand what urllib2.urlopen is returning >> >> > > to me; rummage around in there; and hopefully find my excel file. >> >> >> > > I use pdb to debug. This is interesting: >> >> >> > > (Pdb) dir(response) >> >> > > ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close', >> >> > > 'code', ' >> >> > > fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read', >> >> > > 'readline', ' >> >> > > readlines', 'url'] >> >> > > (Pdb) >> >> >> > > I suppose the members with __*_ are methods; and the names without the >> >> > > underbars are attributes (variables) (?). >> >> >> > No, these are the names of all attributes and methods. read is a method, >> >> > for example. >> >> >> right - I got it backwards. >> >> >> > > Or maybe this isn't at all the right direction to take (maybe there >> >> > > are much better modules to do this stuff). Would be happy to learn if >> >> > > that's the case (and if that gets the job done for me). >> >> >> > The docs (http://docs.python.org/lib/module-urllib2.html) are pretty >> >> > clear on this: >> >> >> > """ >> >> > This function returns a file-like object with two additional methods: >> >> > """ >> >> >> > And then for file-like objects: >> >> >> >http://docs.python.org/lib/bltin-file-objects.html >> >> >> > """ >> >> > read( [size]) >> >> > Read at most size bytes from the file (less if the read hits EOF >> >> > before obtaining size bytes). If the size argument is negative or >> >> > omitted, read all data until EOF is reached. The bytes are returned as a >> >> > string object. An empty string is returned when EOF is encountered >> >> > immediately. (For certain files, like ttys, it makes sense to continue >> >> > reading after an EOF is hit.) Note that this method may call the >> >> > underlying C function fread() more than once in an effort to acquire as >> >> > close to size bytes as possible. Also note that when in non-blocking >> >> > mode, less data than what was requested may be returned, even if no size >> >> > parameter was given. >> >> > """ >> >> >> > Diez >> >> >> Just stumbled upon .read: >> >> >> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ >> >> excel? >> >> priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul >> >> +25%2C+2008&export=Excel_IEIPerfRegional').read >> >> >> Now the question is: what to do with this? I'll look at the >> >> documentation that you point to. >> >> >> thanx - pat >> >> > Or rather (next iteration): >> >> > response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ >> > excel? >> > priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul >> > +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000) >> >> > The file is generally something like 26 KB so specifying 1,000,000 >> > seems like a good idea (first approximation). >> >> > And then when I do: >> >> > print(response) >> >> > I get a whole lot of garbage (and some non-garbage), so I know I'm >> > onto something. >> >> > When I read the .read documentation further, it says that read() has >> > returned the data as a string object. Now - how do I convince Python >> > that the string object is in fact an excel file - and save it to disk? >> >> You don't need to convince Python, just write it to a file. >> More reading for you:http://docs.python.org/tut/node9.html >> >> > pat >> > -- >> >http://mail.python.org/mailman/listinfo/python-list >> >> -- >> -- Guilherme H. Polo Goncalves > > OK: > > response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ > excel? > priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul > +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000) > # print(response) > f = open("c:\\msci.xls",'w') > f.write(response)
I would initially change that to: response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul+25%2C+2008&export=Excel_IEIPerfRegional') f = open("c:\\msci.xls", "wb") for line in response: f.write(line) f.close() and then.. > > OK this makes the file, and there's a c:\msci.xls in place and it's > about the right size. But whether I make the second param to open 'w' > or 'wb', when I try to open msci.xls from the Windows file explorer, > excel tells me that the file is corrupted. try it. > > pat > -- > http://mail.python.org/mailman/listinfo/python-list > -- -- Guilherme H. Polo Goncalves -- http://mail.python.org/mailman/listinfo/python-list