practical limits of urlopen()

2009-01-24 Thread webcomm
Hi, Am I going to have problems if I use urlopen() in a loop to get data from 3000+ URLs? There will be about 2KB of data on average at each URL. I will probably run the script about twice per day. Data from each URL will be saved to my database. I'm asking because I've never opened that many

Re: BadZipfile "file is not a zip file"

2009-01-12 Thread webcomm
On Jan 12, 11:53 am, "Chris Mellon" wrote: > On Sat, Jan 10, 2009 at 1:32 PM,webcomm wrote: > > On Jan 9, 7:33 pm, John Machin wrote: > >> It is not impossible for a file with dummy data to have been > >> handcrafted or otherwise produced by a process differ

Re: BadZipfile "file is not a zip file"

2009-01-12 Thread webcomm
If anyone's interested, here are my django views... from django.shortcuts import render_to_response from django.http import HttpResponse from xml.etree.ElementTree import ElementTree import urllib, base64, subprocess def get_data(request): service_url = 'http://www.something.com/webservices/

Re: BadZipfile "file is not a zip file"

2009-01-10 Thread webcomm
On Jan 9, 7:33 pm, John Machin wrote: > It is not impossible for a file with dummy data to have been > handcrafted or otherwise produced by a process different to that used > for a real-data file. I knew it was produced by the same process, or I wouldn't have shared it. : ) But you couldn't have

Re: distinction between unzipping bytes and unzipping a file

2009-01-10 Thread webcomm
On Jan 9, 6:07 pm, John Machin wrote: > Yup, it looks like it's encoded in utf_16_le, i.e. no BOM as > God^H^H^HGates intended: > > >>> buff = open('data', 'rb').read() > >>> buff[:100] > > '<\x00R\x00e\x00g\x00i\x00s\x00t\x00r\x00a\x00t\x00i\x00o\x00n\x00> > \x00<\x00B\x0 > 0a\x00l\x00a\x00n\x00c

how to remove 'FFFD' character

2009-01-09 Thread webcomm
Does anyone know a way to remove the 'FFFD' character with python? You can see the browser output I'm dealing with here: http://webcomm.webfactional.com/htdocs/fffd.JPG I deleted a big chunk out of the middle of that JPG to protect sensitive data. I don't know what the character encoding of this

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 5:21 pm, John Machin wrote: > Thanks. Would you mind spending a few minutes more on this so that we > can see if it's a problem that can be fixed easily, like the one that > Chris Mellon reported? > Don't mind at all. I'm now working with a zip file with some dummy data I downloaded fr

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 5:00 pm, webcomm wrote: > If I unzip it like this... > popen("unzip data.zip") > ...then the bad characters are 'FFFD' characters as described and > pictured > here...http://groups.google.com/group/comp.lang.python/browse_thread/thread/... >

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 8, 8:39 pm, "James Mills" wrote: > Send us a sample of this file in question... Here's a sample with some dummy data from the web service: http://webcomm.webfactional.com/htdocs/data.zip That's the zip created in this line of my code... f = open('data.zip', 'wb') If I open the file it co

Re: distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
On Jan 9, 4:12 pm, "Chris Mellon" wrote: > It would really help if you could post a sample file somewhere. Here's a sample with some dummy data from the web service: http://webcomm.webfactional.com/htdocs/data.zip That's the zip created in this line of my code... f = open('data.zip', 'wb') If I

Re: distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
On Jan 9, 3:15 pm, Steve Holden wrote: > webcomm wrote: > > Hi, > > In python, is there a distinction between unzipping bytes and > > unzipping a binary file to which those bytes have been written? > > > The following code is, I think, an example of writing bytes to

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 1:32 pm, Scott David Daniels wrote: > I'd certainly try to figure out if the archive was mis-handled > somewhere along the way.   Quite possible that I'm mishandling something, or the service provider is mishandling something. Probably the former. Please see this more recent thread...

Re: distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
On Jan 9, 2:49 pm, webcomm wrote: > decoded = base64.b64decode(datum) > #datum is a base64 encoded string of data downloaded from a web > service > f = open('data.zip', 'wb') > f.write(decoded) > f.close() > x = zipfile.ZipFile('data.zip', '

distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
Hi, In python, is there a distinction between unzipping bytes and unzipping a binary file to which those bytes have been written? The following code is, I think, an example of writing bytes to a file and then unzipping... decoded = base64.b64decode(datum) #datum is a base64 encoded string of data

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 10:14 am, "Chris Mellon" wrote: > This is a ticket about another issue or 2 with invalid zipfiles that > the zipfile module won't load, but that other tools will compensate > for: > > http://bugs.python.org/issue1757072 Looks like I just need to do this to unzip with unix... from os im

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 10:14 am, "Chris Mellon" wrote: > This is a ticket about another issue or 2 with invalid zipfiles that > the zipfile module won't load, but that other tools will compensate > for: > > http://bugs.python.org/issue1757072 Hmm. That's interesting. Are there other tools I can use in a pyt

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 5:42 am, John Machin wrote: > And here's a little gadget that might help the diagnostic effort; it > shows the archive size and the position of all the "magic" PKnn > markers. In a "normal" uncommented archive, EndArchive_pos + 22 == > archive_size. I ran the diagnostic gadget... archi

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 3:46 am, Carl Banks wrote: > The zipfile format is kind of brain dead, you can't tell where the end > of the file is supposed to be by looking at the header.  If the end of > file hasn't yet been reached there could be more data.  To make > matters worse, somehow zip files came to have t

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 3:16 am, Steven D'Aprano wrote: > The full signature of ZipFile is: > > ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True) > > Try passing compression=zipfile.ZIP_DEFLATED and/or allowZip64=False and > see if that makes any difference. Those arguments didn't make a differe

Re: BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
On Jan 8, 8:54 pm, MRAB wrote: > Have you tried gzip instead? There's no option to download the data in a gzipped format. The files are .zip archives. -- http://mail.python.org/mailman/listinfo/python-list

Re: BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
On Jan 8, 8:39 pm, "James Mills" wrote: > Send us a sample of this file in question... It contains data that I can't share publicly. I could ask the providers of the service if they have a dummy file I could use that doesn't contain any real data, but I don't know how responsive they'll be. It'

Re: BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
On Jan 8, 8:02 pm, MRAB wrote: > You're just creating a file called "data.zip". That doesn't make it a > zip file. A zip file has a specific format. If the file doesn't have > that format then the zipfile module will complain. Hmm. When I open it in Windows or with 7-Zip, it contains a text file

BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
The error... >>> file = zipfile.ZipFile('data.zip', "r") Traceback (most recent call last): File "", line 1, in file = zipfile.ZipFile('data.zip', "r") File "C:\Python25\lib\zipfile.py", line 346, in __init__ self._GetContents() File "C:\Python25\lib\zipfile.py", line 366, in _GetCo