On Feb 7, 5:01 pm, Tim Chase <python.l...@tim.thechases.com> wrote: > > Is there a way to do this, without decompressing each file to a temp > > dir? Like is there a method using some tarfile interface adapter to > > read a compressed file? Otherwise I'll just access each file, extract > > it, grab the 1st and last lines and then delete the temp file. > > I think you're looking for the extractfile() method of the > TarFile object: > > from glob import glob > from tarfile import TarFile > for fname in glob('*.tgz'): > print fname > tf = TarFile.gzopen(fname) > for ti in tf: > print ' %s' % ti.name > f = tf.extractfile(ti) > if not f: continue > fi = iter(f) # f doesn't natively support next() > first_line = fi.next() > for line in fi: pass > f.close() > print " First line: %r" % first_line > print " Last line: %r" % line > tf.close() > > If you just want the first & last lines, it's a little more > complex if you don't want to scan the entire file (like I do with > the for-loop), but the file-like object returned by extractfile() > is documented as supporting seek() so you can skip to the end and > then read backwards until you have sufficient lines. I wrote a > "get the last line of a large file using seeks from the EOF" > function which you can find at [1] which should handle the odd > edge cases of $BUFFER_SIZE containing more or less than a full > line and then reading backwards in chunks (if needed) until you > have one full line, handling a one-line file, and other > odd/annoying edge-cases. Hope it helps. > > -tkc > > [1]http://mail.python.org/pipermail/python-list/2009-January/1186176.html
Thanks Tim - this was very helpful. Just learning about tarfile. 'mark -- http://mail.python.org/mailman/listinfo/python-list