> Since we now have multiple archivers that require seeking, I suggest > we add a SeekableStream class or something along those lines. The > Commons Imaging project also has the same problem to solve for images, > and it uses ByteSources, which can be arrays, files, or an InputStream > wrapper that caches what has been read (so seeking is efficient, while > it only reads as much from the InputStream as is necessary).
I would also like to advocate for this approach. I was looking into writing up an implementation of Google SNAPPY decompressor, but was unable to effectively wrap it into an InputStream. Having a seekable stream would make my efforts a better fit for this library. On Sun, Oct 6, 2013 at 9:25 AM, Stefan Bodewig <bode...@apache.org> wrote: > On 2013-10-01, Damjan Jovanovic wrote: > >> On Tue, Oct 1, 2013 at 6:09 AM, Stefan Bodewig <bode...@apache.org> wrote: > >>> Reading may be simpler, here you can store the meta-information from the >>> start of the file in memory and then read entries as you go, ZipFile >>> inside the zip package does something like this. > >> From what I remember: > >> The "meta-information" can be anywhere in the file, as can the >> compressed files themselves. The 7zip tool seems to write the >> meta-information at the end of the 7z file when multi-file archives >> are created. > > Oh yes, my understanding has been pretty much wrong and re-reading your > implementation has helped me to see clearer. Right now I think the > important metadata actually is at the end but there is a smaller part at > the front - in particular a pointer to the Header holding the metadata. > >> Compressed file codecs, positions, lengths, and solid compression >> details are only stored in the meta-information, so it's not possible >> to write a streaming reader without O(n) memory in the worst case. > > I agree. > >> Writing also requires seeking or O(n) memory, as the initial header at >> the beginning of the file contains the offset to the next header, and >> we only know the size/contents/location of the next header once all >> the files have been written. > > or a temporary file to which the first header could be prepended - but > if you have that, you could use seeking as well. So yes, I agree again. > >> Since we now have multiple archivers that require seeking, I suggest >> we add a SeekableStream class or something along those lines. The >> Commons Imaging project also has the same problem to solve for images, >> and it uses ByteSources, which can be arrays, files, or an InputStream >> wrapper that caches what has been read (so seeking is efficient, while >> it only reads as much from the InputStream as is necessary). > > Interesting idea. > > Right now I'm willing to postpone and streaming API for 7z and rather > cut a release with a files only API. > > Stefan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org