On Sep 10, 10:52 pm, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > Sean Davis schrieb: > > > > > I have a large file that I would like to transform and then feed to a > > function (psycopg2 copy_from) that expects a file-like object (needs > > read and readline methods). > > > I have a class like so: > > > class GeneInfo(): > > def __init__(self): > > #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/ > > gene_info.gz',"/tmp/gene_info.gz") > > self.fh = gzip.open("/tmp/gene_info.gz") > > self.fh.readline() #deal with header line > > > def _read(self,n=1): > > for line in self.fh: > > if line=='': > > break > > line=line.strip() > > line=re.sub("\t-","\t",line) > > rowvals = line.split("\t") > > yield "\t".join([rowvals[i] for i in > > [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n" > > > def readline(self,n=1): > > return self._read().next() > > > def read(self,n=1): > > return self._read().next() > > > def close(self): > > self.fh.close() > > > and I use it like so: > > > a=GeneInfo() > > cur.copy_from(a,"gene_info") > > a.close() > > > It works well except that the end of file is not caught by copy_from. > > I get errors like: > > > psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error > > during .read() call > > CONTEXT: COPY gene_info, line 1000: "" > > > for a 1000 line test file. Any ideas what is going on? > > I'm a bit lost why the above actually works - as _read() appears to be > re-created instead of re-used for each invocation, and thus can't work IMHO. > Each generator that's created reads a single line from the file (self.fh), yields the result, and is then discarded; none of the individual generator read more than one line from the file.
> Anyway, I think the real problem is that you don't follow the > readline-protocol. it returns "" if there is no more line to read, > instead you raise a StopIteration > > Diez -- http://mail.python.org/mailman/listinfo/python-list