zip archives contain files and directories. Often, some of the files 
contained in zip archives are also archives, zip and tar(.gz|bz2). 
Sometimes, these contained archives also contain such archives (and so 
on...).

Reading a zip file requires an io.ReaderAt in the func zip.NewReader(r 
io.ReaderAt, size int64) (*Reader, error) This function returns a 
*zip.Reader, which exposes an array of *zip.File, which can be opened with 
func (f *File) Open() (io.ReadCloser, error).

So reading any zip archives contained in a zip archive is not supported, as 
the  zip.Reader[n].File.Open() returns a ReadCloser, not an io.ReaderAt.

Others have encountered this issue, with some suggested solutions: 
https://stackoverflow.com/questions/40245442/handling-nested-zip-files-with-archive-zip

Reading tar(.gz|bz2) archives that are found in a zip archive are not 
problematic, as archive/tar.NewReader, compress/gzip.NewReader and 
compress/bzip2.NewReader all require io.Reader and return io.Reader. This 
is also true for tar archives that are contained in tar archives.

tar archives that contain zip files are not supported, as archive/tar does 
not expose an io.ReaderAt. Often (usually?), on-disk tar archives are 
gzipped/bzip2ed, and compress/gzip and compress/bzip2 also do not expose an 
io.ReaderAt that could be passed to archive/tar.

Let's take a step back: usually, a reader will either correspond to an 
ephemeral stream that comes in through a pipe or a socket, or it 
corresponds to an on-disk static file. An io.ReaderAt for the former is 
problematic (needing to store maybe the whole stream to allow 
rewind,seek,etc...); Here I am looking to address the use case of on-disk 
archives
supporting exposing a ReaderAt. In this case, any depth of zip/tar 
combinations could be supported through a recursive hierarchy of 
io.ReaderAts, with rewinds (probably) bubbling up to the top lever 
ioReaderAt, which could support io.ReaderAt.ReadAt, as it has the on-disk 
bytes to work with.

Proposal:

---- Add to archive/zip:

type ReaderAt struct {
    File    []*File
    Comment string
    // contains filtered or unexported fields
}

// This would be used at the 0th level and opens an on-disk zip archive
func OpenReaderAt(name string) (*zip.ReaderAt, error)

// This would return an error if the *zip.File was not contained in a
// zip.ReaderAt
func (f *File) OpenAt() (io.ReaderAt, error)

---- Add archive/tar:
type ReaderAt struct {
    // contains filtered or unexported fields
}

func OpenReaderAt(name string) (*tar.ReaderAt, error)
func NewReaderAt(r io.ReaderAt) (*tar.ReaderAt, error)
func (ra *ReaderAt) ReadAt(p []byte, off int64) (n int, err error)

---- Add compress/gzip
func OpenReaderAt(f *io.File) (io.ReaderAt, error)

---- Add compress/bzip2
func OpenReaderAt(f *io.File) (io.ReaderAt, error)


////////// zip containing zip example

// foo.zip contains: a.txt, bar.zip
//   bar.zip contains: x.txt
//
fooZipReaderAt := zip.OpenReaderAt("foo.zip")
for _, fFoo := range fooZipReaderAt.File {
    if strings.HasSuffix(fFoo.Name, ".zip") {
       // ReaderAt for bar.zip
       barReaderAt, err := fFoo.OpenAt()

       // Open zip for bar.zip
       barZipReaderAt,err := zip.NewReaderAt(barReaderAt)

       for _, fBar := range barZipReaderAt.File {      
           xReader, err :=  fBar.Open()
           // read x.txt using io.ReadCloser
       }
    }
}

////////// tar.gz containing zip example
// foo.tar.gz contains: bar.zip
//   bar.zip contains: x.txt
//

f, err := os.Open("foo.tar.gz")
gzReaderAt, err := gzip.OpenReaderAt(f)

tarReader := tar.NewReaderAt(gzf)

for {
    header, err := tarReader.Next()

    if err == io.EOF {
        break
    }
        
    switch header.Typeflag {
    case tar.TypeDir:
        continue
    case tar.TypeReg:
            raBarZip,err := zip.NewReaderAt(tarReader)
            for _, fBar := range rBarZip.File {      
                rx, err :=  fBar.Open()
                // read x.txt using io.ReadCloser
       }
}

This proposal may be slightly or very off. I am not tied to it. But I am 
looking for a solution that supports the use case of being able to 
recursively drill down into zip and tar(.gz|.bz2) on-disk archives that 
contain an unknown number and depth of contained zip and tar(.gz|.bz2) 
archives, without having to read the zip file entirely into memory, like 
the above stackoverflow example (which I have tried).

Thanks,
Glen  :-)

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/1108e024-5e0b-437e-b75e-7db64d0e176an%40googlegroups.com.

Reply via email to