Thanks! make-pipe isn't something that I've had to use otherwise, so I missed the optional parameter. That does certainly seem to help.
Here's my first take of with-input-from-gzipped-file: (define (with-input-from-gzipped-file filename thunk #:buffer-size [buffer-size #f]) (call-with-input-file filename (lambda (file-from) (define-values (pipe-from pipe-to) (make-pipe buffer-size)) (thread (λ () (gunzip-through-ports file-from pipe-to) (close-output-port pipe-to))) (current-input-port pipe-from) (thunk) (close-input-port pipe-from)))) The main thing missing is that there's no error handling (where the pipe should still be closed). At the very least, if I try to call this on a non-gzipped file, it breaks on the gunzip-through-ports line. Theoretically, some variation of with-handlers should work (error should raise an exn:fail?, yes?), but it doesn't seem to be helping. Any help with that? Alternatively, I've now found this: http://planet.racket-lang.org/display.ss?package=gzip.plt&owner=soegaard It seems to do exactly what I need, albeit without the call-with-* forms, but that's easy enough to wrap. With some very basic testing, it does seem to be buffering though, although it is a bit slower than the above. Not enough to cause trouble though. On Mon, Aug 5, 2013 at 4:51 PM, Ryan Culpepper <ry...@ccs.neu.edu> wrote: > On 08/05/2013 04:29 PM, JP Verkamp wrote: > >> Is there a nice / idiomatic way to work with gzipped data in a streaming >> manner (to avoid loading the rather large files into memory at once). So >> far as I can tell, my code isn't doing that. It hangs for a while on the >> call to gunzip-through-ports, long enough to uncompress the entire file, >> then reads are pretty quick afterwords. >> >> Here's what I have thus far: >> >> #lang racket >> >> (require file/gunzip) >> >> (define-values (pipe-from pipe-to) (make-pipe)) >> (with-input-from-file "test.rkt.gz" >> (lambda () >> (gunzip-through-ports (current-input-port) pipe-to) >> (for ([line (in-lines pipe-from)]) >> (displayln line)))) >> > > You should probably 1) limit the size of the pipe (to stop it from > inflating the whole file at once) and 2) put the gunzip-through-ports call > in a separate thread. The gunzip thread will block when the pipe is full; > when your program reads some data out of the pipe, the gunzip thread will > be able to make some more progress. Something like this: > > (define-values (pipe-from pipe-to) (make-pipe 4000)) > (with-input-from-file "test.rkt.gz" > (lambda () > (thread > > (lambda () > (gunzip-through-ports (current-input-port) pipe-to) > (close-output-port pipe-to))) > > (for ([line (in-lines pipe-from)]) > (displayln line)))) > > As an additional problem, that code doesn't actually work. >> in-lines seems to be waiting for an eof-object? that >> gunzip-through-ports isn't sending. Am I missing something? It ends up >> just hanging after reading and printing the file. >> > > The docs don't say anything about closing the port, so you'll probably > have to do that yourself. In the code above, I added a call to > close-output-port. > > Ryan > >
____________________ Racket Users list: http://lists.racket-lang.org/users