On Sep 10, 6:59 pm, Sean Davis <[EMAIL PROTECTED]> wrote: > I have a large file that I would like to transform and then feed to a > function (psycopg2 copy_from) that expects a file-like object (needs > read and readline methods). > > I have a class like so: > > class GeneInfo(): > def __init__(self): > #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/ > gene_info.gz',"/tmp/gene_info.gz") > self.fh = gzip.open("/tmp/gene_info.gz") > self.fh.readline() #deal with header line > > def _read(self,n=1): > for line in self.fh: > if line=='': > break > line=line.strip() > line=re.sub("\t-","\t",line) > rowvals = line.split("\t") > yield "\t".join([rowvals[i] for i in > [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n" > > def readline(self,n=1): > return self._read().next() > > def read(self,n=1): > return self._read().next() > Each time readline() and read() call self._read() they are creating a new generator. They then get one value from the newly-created generator and then discard that generator. What you should do is create the generator in __init__ and then use it in readline() and read().
> def close(self): > self.fh.close() > > and I use it like so: > > a=GeneInfo() > cur.copy_from(a,"gene_info") > a.close() > > It works well except that the end of file is not caught by copy_from. > I get errors like: > > psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error > during .read() call > CONTEXT: COPY gene_info, line 1000: "" > > for a 1000 line test file. Any ideas what is going on? > I wonder whether it's expecting readline() and read() to return an empty string at the end of the file instead of raising StopIteration. -- http://mail.python.org/mailman/listinfo/python-list