On 10/30/2018 11:32 AM, 'Paulo Matos' via Racket Users wrote:
I have quite a few large files that I want to gzip to a single file
(without an intermediate concatenation) and then later gunzip.

I don't think you can do that - at least not without other software.  gzip/gunzip are meant to work only with a single file. gzip is a compression format, not an archive format - the compressed stream is assumed to contain a single object, it has no index or other metadata to handle multiple objects.  Typically you would tar the files and then zip the tar.


Interestingly the gunzipping is blocking on a read-line. I wonder if
this is because I cannot use gzip-through-ports the way I am doing it or
if there's a bug somewhere.

Generate 3 files with:
$ base64 /dev/urandom | head -c 1000000 > foo3
$ base64 /dev/urandom | head -c 1000000 > foo2
$ base64 /dev/urandom | head -c 1000000 > foo1

Now run the code:
```
#lang racket

(require file/gzip
          file/gunzip)

(define paths '("foo1" "foo2" "foo3"))

;; compress
(printf "compressing~n")
(call-with-atomic-output-file "foo.gz"
   (lambda (op p)
     (for ([f (in-list paths)])
       (call-with-input-file f
         (lambda (i) (gzip-through-ports i op #false (current-seconds)))
         #:mode 'binary))))

;; decompress
(printf "decompressing~n")
(define-values (in out) (make-pipe))
(void
  (thread
   (thunk
    (call-with-input-file "foo.gz"
      (lambda (cin)
        (gunzip-through-ports cin out))
      #:mode 'binary))))
(call-with-atomic-output-file "foo.txt"
   (lambda (op p)
     (let loop ([l (read-line in)])
       (unless (eof-object? l)
         (write l op)
         (loop (read-line in))))))

```

This is going to block in a read-line, and I have a suspicion that it
blocks at the end of a compressed file. Is there a reason for it
blocking? Note that if you `zcat foo.gz | less` you can see the whole
file, so I am suspicious that something might be wrong with
gunzip-through-ports.

Any suggestions to improve this?

Thanks,

I think you can recover by closing the port when gunzip finishes, but in this case you will only receive the 1st file  - additional files will be lost.  If you want to send multiple objects in the same stream, you need to use some protocol to delimit them.  I'm not sure how [or if] you can duplicate the port to gunzip-through-ports,  but you can read the port yourself and pass the data to gunzip.

George

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to