On Sep 11, 9:23 am, Sean Davis <[EMAIL PROTECTED]> wrote:
> On Sep 10, 7:54 pm, John Machin <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Sep 11, 8:01 am, MRAB <[EMAIL PROTECTED]> wrote:
>
> > > On Sep 10, 6:59 pm, Sean Davis <[EMAIL PROTECTED]> wrote:
>
> > > > I have a large file that I would like to transform and then feed to a
> > > > function (psycopg2 copy_from) that expects a file-like object (needs
> > > > read and readline methods).
>
> > > > I have a class like so:
>
> > > > class GeneInfo():
> > > >     def __init__(self):
> > > >         #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
> > > > gene_info.gz',"/tmp/gene_info.gz")
> > > >         self.fh = gzip.open("/tmp/gene_info.gz")
> > > >         self.fh.readline() #deal with header line
>
> > > >     def _read(self,n=1):
> > > >         for line in self.fh:
> > > >             if line=='':
> > > >                 break
> > > >             line=line.strip()
> > > >             line=re.sub("\t-","\t",line)
> > > >             rowvals = line.split("\t")
> > > >             yield "\t".join([rowvals[i] for i in
> > > > [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
>
> > > >     def readline(self,n=1):
> > > >         return self._read().next()
>
> > > >     def read(self,n=1):
> > > >         return self._read().next()
>
> > > Each time readline() and read() call self._read() they are creating a
> > > new generator. They then get one value from the newly-created
> > > generator and then discard that generator. What you should do is
> > > create the generator in __init__ and then use it in readline() and
> > > read().
>
> > > >     def close(self):
> > > >         self.fh.close()
>
> > > > and I use it like so:
>
> > > > a=GeneInfo()
> > > > cur.copy_from(a,"gene_info")
> > > > a.close()
>
> > > > It works well except that the end of file is not caught by copy_from.
> > > > I get errors like:
>
> > > > psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
> > > > during .read() call
> > > > CONTEXT:  COPY gene_info, line 1000: ""
>
> > > > for a 1000 line test file.  Any ideas what is going on?
>
> > > I wonder whether it's expecting readline() and read() to return an
> > > empty string at the end of the file instead of raising StopIteration.
>
> > Don't wonder; ReadTheFantasticManual:
>
> > read( [size])
>
> > ... An empty string is returned when EOF is encountered
> > immediately. ...
>
> > readline( [size])
>
> >  ... An empty string is returned only when EOF is encountered
> > immediately.
>
> Thanks.  This was indeed my problem--not reading the manual closely
> enough.
>
> And the points about the iterator being re-instantiated were also
> right on point.  Interestingly, in this case, the code was working
> because read() and readline() were still returning the next line each
> time since the file handle was being read one line at a time.
>
After further thought, do you actually need a generator? read() and
readline() could just call _read(), which would read a line from the
file and return the result or an empty string. Or the processing could
be done in readline() and read() just could call readline().
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to