On Jul 28, 6:05 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote: > On Mon, Jul 28, 2008 at 9:39 PM, MRAB <[EMAIL PROTECTED]> wrote: > > On Jul 29, 12:41 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > >> On Jul 28, 4:20 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote: > > >> > On Mon, Jul 28, 2008 at 8:04 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> > >> > wrote: > >> > > On Jul 28, 3:52 pm, "Guilherme Polo" <[EMAIL PROTECTED]> wrote: > >> > >> On Mon, Jul 28, 2008 at 7:43 PM, [EMAIL PROTECTED] <[EMAIL > >> > >> PROTECTED]> wrote: > >> > >> > On Jul 28, 3:33 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > >> > >> >> On Jul 28, 3:29 pm, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > > >> > >> >> > [EMAIL PROTECTED] schrieb: > > >> > >> >> > > On Jul 28, 3:00 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > >> > >> >> > > wrote: > >> > >> >> > >> Hi - experienced programmer but this is my first Python > >> > >> >> > >> program. > > >> > >> >> > >> This URL will retrieve an excel spreadsheet containing (that > >> > >> >> > >> day's) > >> > >> >> > >> msci stock index returns. > > >> > >> >> > >>http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&... > > >> > >> >> > >> Want to write python to download and save the file. > > >> > >> >> > >> So far I've arrived at this: > > >> > >> >> > >> [quote] > >> > >> >> > >> # import pdb > >> > >> >> > >> import urllib2 > >> > >> >> > >> from win32com.client import Dispatch > > >> > >> >> > >> xlApp = Dispatch("Excel.Application") > > >> > >> >> > >> # test 1 > >> > >> >> > >> # xlApp.Workbooks.Add() > >> > >> >> > >> # xlApp.ActiveSheet.Cells(1,1).Value = 'A' > >> > >> >> > >> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B' > >> > >> >> > >> # xlBook = xlApp.ActiveWorkbook > >> > >> >> > >> # xlBook.SaveAs(Filename='C:\\test.xls') > > >> > >> >> > >> # pdb.set_trace() > >> > >> >> > >> response = > >> > >> >> > >> urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ > >> > >> >> > >> excel? > >> > >> >> > >> priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul > >> > >> >> > >> +25%2C+2008&export=Excel_IEIPerfRegional') > >> > >> >> > >> # test 2 - returns check = False > >> > >> >> > >> check_for_data = > >> > >> >> > >> urllib2.Request('http://www.mscibarra.com/webapp/ > >> > >> >> > >> indexperf/excel? > >> > >> >> > >> priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul > >> > >> >> > >> +25%2C+2008&export=Excel_IEIPerfRegional').has_data() > > >> > >> >> > >> xlApp = response.fp > >> > >> >> > >> print(response.fp.name) > >> > >> >> > >> print(xlApp.name) > >> > >> >> > >> xlApp.write > >> > >> >> > >> xlApp.Close > >> > >> >> > >> [/quote] > > >> > >> >> > > Woops hit Send when I wanted Preview. Looks like the html > >> > >> >> > > [quote] tag > >> > >> >> > > doesn't work from groups.google.com (nice). > > >> > >> >> > > Anway, in test 1 above, I determined how to instantiate an > >> > >> >> > > excel > >> > >> >> > > object; put some stuff in it; then save to disk. > > >> > >> >> > > So, in theory, I'm retrieving my excel spreadsheet with > > >> > >> >> > > response = urllib2.urlopen() > > >> > >> >> > > Except what then do I do with this? > > >> > >> >> > > Well for one read some of the urllib2 documentation and found > >> > >> >> > > the > >> > >> >> > > Request class with the method has_data() on it. It returns > >> > >> >> > > False. > >> > >> >> > > Hmm that's not encouraging. > > >> > >> >> > > I supposed the trick to understand what urllib2.urlopen is > >> > >> >> > > returning > >> > >> >> > > to me; rummage around in there; and hopefully find my excel > >> > >> >> > > file. > > >> > >> >> > > I use pdb to debug. This is interesting: > > >> > >> >> > > (Pdb) dir(response) > >> > >> >> > > ['__doc__', '__init__', '__iter__', '__module__', '__repr__', > >> > >> >> > > 'close', > >> > >> >> > > 'code', ' > >> > >> >> > > fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', > >> > >> >> > > 'read', > >> > >> >> > > 'readline', ' > >> > >> >> > > readlines', 'url'] > >> > >> >> > > (Pdb) > > >> > >> >> > > I suppose the members with __*_ are methods; and the names > >> > >> >> > > without the > >> > >> >> > > underbars are attributes (variables) (?). > > >> > >> >> > No, these are the names of all attributes and methods. read is a > >> > >> >> > method, > >> > >> >> > for example. > > >> > >> >> right - I got it backwards. > > >> > >> >> > > Or maybe this isn't at all the right direction to take (maybe > >> > >> >> > > there > >> > >> >> > > are much better modules to do this stuff). Would be happy to > >> > >> >> > > learn if > >> > >> >> > > that's the case (and if that gets the job done for me). > > >> > >> >> > The docs (http://docs.python.org/lib/module-urllib2.html) are > >> > >> >> > pretty > >> > >> >> > clear on this: > > >> > >> >> > """ > >> > >> >> > This function returns a file-like object with two additional > >> > >> >> > methods: > >> > >> >> > """ > > >> > >> >> > And then for file-like objects: > > >> > >> >> >http://docs.python.org/lib/bltin-file-objects.html > > >> > >> >> > """ > >> > >> >> > read( [size]) > >> > >> >> > Read at most size bytes from the file (less if the read > >> > >> >> > hits EOF > >> > >> >> > before obtaining size bytes). If the size argument is negative or > >> > >> >> > omitted, read all data until EOF is reached. The bytes are > >> > >> >> > returned as a > >> > >> >> > string object. An empty string is returned when EOF is > >> > >> >> > encountered > >> > >> >> > immediately. (For certain files, like ttys, it makes sense to > >> > >> >> > continue > >> > >> >> > reading after an EOF is hit.) Note that this method may call the > >> > >> >> > underlying C function fread() more than once in an effort to > >> > >> >> > acquire as > >> > >> >> > close to size bytes as possible. Also note that when in > >> > >> >> > non-blocking > >> > >> >> > mode, less data than what was requested may be returned, even if > >> > >> >> > no size > >> > >> >> > parameter was given. > >> > >> >> > """ > > >> > >> >> > Diez > > >> > >> >> Just stumbled upon .read: > > >> > >> >> response = > >> > >> >> urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ > >> > >> >> excel? > >> > >> >> priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul > >> > >> >> +25%2C+2008&export=Excel_IEIPerfRegional').read > > >> > >> >> Now the question is: what to do with this? I'll look at the > >> > >> >> documentation that you point to. > > >> > >> >> thanx - pat > > >> > >> > Or rather (next iteration): > > >> > >> > response = > >> > >> > urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ > >> > >> > excel? > >> > >> > priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul > >> > >> > +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000) > > >> > >> > The file is generally something like 26 KB so specifying 1,000,000 > >> > >> > seems like a good idea (first approximation). > > >> > >> > And then when I do: > > >> > >> > print(response) > > >> > >> > I get a whole lot of garbage (and some non-garbage), so I know I'm > >> > >> > onto something. > > >> > >> > When I read the .read documentation further, it says that read() has > >> > >> > returned the data as a string object. Now - how do I convince > >> > >> > Python > >> > >> > that the string object is in fact an excel file - and save it to > >> > >> > disk? > > >> > >> You don't need to convince Python, just write it to a file. > >> > >> More reading for you:http://docs.python.org/tut/node9.html > > >> > >> > pat > >> > >> > -- > >> > >> >http://mail.python.org/mailman/listinfo/python-list > > >> > >> -- > >> > >> -- Guilherme H. Polo Goncalves > > >> > > OK: > > >> > > response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/ > >> > > excel? > >> > > priceLevel=0&scope=0¤cy=15&style=C&size=36&market=1897&asOf=Jul > >> > > +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000) > >> > > # print(response) > >> > > f = open("c:\\msci.xls",'w') > >> > > f.write(response) > > >> > I would initially change that to: > > >> > response = > >> > urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...) > > >> > f = open("c:\\msci.xls", "wb") > >> > for line in response: > >> > f.write(line) > >> > f.close() > > >> > and then.. > > >> > > OK this makes the file, and there's a c:\msci.xls in place and it's > >> > > about the right size. But whether I make the second param to open 'w' > >> > > or 'wb', when I try to open msci.xls from the Windows file explorer, > >> > > excel tells me that the file is corrupted. > > >> > try it. > > >> > > pat > >> > > -- > >> > >http://mail.python.org/mailman/listinfo/python-list > > >> > -- > >> > -- Guilherme H. Polo Goncalves > > >> A simple f.write(response) does work (click on a single row in Excel > >> and you get a single row). > > >> But I can see that what you recommend Guilherme is probably safer - > >> thanx. > > >> pat > > > If response contains a string then: > > Did you notice I removed the read(...) part ? > > > for line in response: > > f.write(line) > > > will actually be writing the string one character at a time! > > -- > >http://mail.python.org/mailman/listinfo/python-list > > -- > -- Guilherme H. Polo Goncalves
Actually no I didn't Guilherme (although I'll take it out now). Would leaving the in urllib2.urlopen().read() imply, as MRAB would seem to indicate, that the following for loop would act byte-by-byte? And if so, how? Even with the .read() in, it was very fast. But it looks like it won't hurt (and very possibly helps) to take it out. pat -- http://mail.python.org/mailman/listinfo/python-list