On 28/03/2009, bode...@apache.org <bode...@apache.org> wrote: > Author: bodewig > Date: Sat Mar 28 14:46:32 2009 > New Revision: 759472 > > URL: http://svn.apache.org/viewvc?rev=759472&view=rev > Log: > some more in depth documentation
Very useful! > Added: > commons/proper/compress/trunk/src/site/xdoc/examples.xml (with props) > commons/proper/compress/trunk/src/site/xdoc/zip.xml (with props) > Modified: > commons/proper/compress/trunk/src/site/site.xml > commons/proper/compress/trunk/src/site/xdoc/index.xml > > Modified: commons/proper/compress/trunk/src/site/site.xml > URL: > http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/site.xml?rev=759472&r1=759471&r2=759472&view=diff > > ============================================================================== > --- commons/proper/compress/trunk/src/site/site.xml (original) > +++ commons/proper/compress/trunk/src/site/site.xml Sat Mar 28 14:46:32 2009 > @@ -28,6 +28,7 @@ > <body> > <menu name="Compress"> > <item name="Overview" href="/index.html"/> > + <item name="Examples" href="/examples.html"/> > <item name="Issue Tracking" href="/issue-tracking.html"/> > <item name="Download" href="/downloads.html"/> > <item name="Wiki" > href="http://wiki.apache.org/commons/Compress"/> > > Added: commons/proper/compress/trunk/src/site/xdoc/examples.xml > URL: > http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/examples.xml?rev=759472&view=auto > > ============================================================================== > --- commons/proper/compress/trunk/src/site/xdoc/examples.xml (added) > +++ commons/proper/compress/trunk/src/site/xdoc/examples.xml Sat Mar 28 > 14:46:32 2009 > @@ -0,0 +1,279 @@ > +<?xml version="1.0"?> > +<!-- > + > + Licensed to the Apache Software Foundation (ASF) under one or more > + contributor license agreements. See the NOTICE file distributed with > + this work for additional information regarding copyright ownership. > + The ASF licenses this file to You under the Apache License, Version 2.0 > + (the "License"); you may not use this file except in compliance with > + the License. You may obtain a copy of the License at > + > + http://www.apache.org/licenses/LICENSE-2.0 > + > + Unless required by applicable law or agreed to in writing, software > + distributed under the License is distributed on an "AS IS" BASIS, > + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + See the License for the specific language governing permissions and > + limitations under the License. > + > +--> > +<document> > + <properties> > + <title>Commons Compress Examples</title> > + <author email="dev@commons.apache.org">Commons Documentation > Team</author> > + </properties> > + <body> > + <section name="Examples"> > + > + <subsection name="Factories"> > + > + <p>Compress provides factory methods to create input/output > + streams based on the names of the compressor or archiver > + format as well as factory methods that try to guess the > + format of an input stream.</p> > + > + <p>To create a compressor writing to a given output by using > + the algorithm name:</p> > + <source><![CDATA[ > +CompressorOutputStream gzippedOut = new CompressorStreamFactory() > + .createCompressorOutputStream("gz", myOutputStream); > +]]></source> > + > + <p>Make the factory guess the input format for a given stream:</p> > + <source><![CDATA[ > +ArchiveInputStream input = new ArchiveStreamFactory() > + .createArchiveInputStream(originalInput); > +]]></source> > + > + </subsection> > + > + <subsection name="ar"> > + > + <p>In addition to the information stored > + in <code>ArchiveEntry</code> a <code>ArArchiveEntry</code> > + stores information about the owner user and group as well as > + Unix permissions.</p> > + > + <p>Adding an entry to an ar archive:</p> > +<source><![CDATA[ > +ArArchiveEntry entry = new ArArchiveEntry(name, size); > +arOutput.putNextEntry(entry); > +arOutput.write(contentOfEntry); > +arOutput.closeArchiveEntry(); > +]]></source> > + > + <p>Reading entries from an ar archive:</p> > +<source><![CDATA[ > +ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry(); > +byte[] content = new byte[entry.getSize()]; > +LOOP UNTIL entry.getSize() HAS BEEN READ { I thought the idea was that the ArchiveInputStreams would not allow one to read past the end of the entry, so one can just read until read() returns -1? > + arInput(read, offset, content.length - offset); > +} > +]]></source> > + > + </subsection> > + > + <subsection name="cpio"> > + > + <p>In addition to the information stored > + in <code>ArchiveEntry</code> a <code>CpioArchiveEntry</code> > + stores various attributes including information about the > + original owner and permissions.</p> > + > + <p>The cpio package supports the "new portable" as well as the > + "old" format of CPIO archives in their binary, ASCII and > + "with CRC" variants.</p> > + > + <p>Adding an entry to a cpio archive:</p> > +<source><![CDATA[ > +CpioArchiveEntry entry = new CpioArchiveEntry(name, size); > +cpioOutput.putNextEntry(entry); > +cpioOutput.write(contentOfEntry); > +cpioOutput.closeArchiveEntry(); > +]]></source> > + > + <p>Reading entries from an cpio archive:</p> > +<source><![CDATA[ > +CpioArchiveEntry entry = cpioInput.getNextCPIOEntry(); > +byte[] content = new byte[entry.getSize()]; > +LOOP UNTIL entry.getSize() HAS BEEN READ { As above. > + cpioInput(read, offset, content.length - offset); > +} > +]]></source> > + > + </subsection> > + > + <subsection name="tar"> > + > + <p>In addition to the information stored > + in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code> > + stores various attributes including information about the > + original owner and permissions.</p> > + > + <p>There are several different tar formats and the TAR package > + of Compress 1.0 only provides the common functionality of > + the existing variants.</p> > + <p>The original format didn't support file names longer than > + 100 characters and the tar package will fail if you try to > + add an entry longer than that. > + The <code>longFileMode</code> option > + of <code>TarArchiveOutputStream</code> can be used to make > + the archive truncate such names or use the GNU tar variant > + of storing such names. If you choose the GNU tar option, > + the archive can not be extracted using many other tar > + implementations like the ones of OpenBSD, Solaris or MacOS > + X.</p> > + > + <p><code>TarArchiveInputStream</code> will recognize the GNU > + tar extension for long file names and read the longer names > + accordingly.</p> > + > + <p>Adding an entry to a tar archive:</p> > +<source><![CDATA[ > +TarArchiveEntry entry = new TarArchiveEntry(name); > +entry.setSize(size); > +tarOutput.putNextEntry(entry); > +tarOutput.write(contentOfEntry); > +tarOutput.closeArchiveEntry(); > +]]></source> > + > + <p>Reading entries from an tar archive:</p> > +<source><![CDATA[ > +TarArchiveEntry entry = tarInput.getNextTarEntry(); > +byte[] content = new byte[entry.getSize()]; > +LOOP UNTIL entry.getSize() HAS BEEN READ { As above. > + tarInput(read, offset, content.length - offset); > +} > +]]></source> > + </subsection> > + > + <subsection name="zip"> > + <p>The ZIP package has a <a href="zip.html">dedicated > + documentation page</a>.</p> > + > + <p>Adding an entry to a zip archive:</p> > +<source><![CDATA[ > +ZipArchiveEntry entry = new ZipArchiveEntry(name); > +entry.setSize(size); > +zipOutput.putNextEntry(entry); > +zipOutput.write(contentOfEntry); > +zipOutput.closeArchiveEntry(); > +]]></source> > + > + <p>Reading entries from an zip archive:</p> > +<source><![CDATA[ > +ZipArchiveEntry entry = zipInput.getNextZipEntry(); > +byte[] content = new byte[entry.getSize()]; > +LOOP UNTIL entry.getSize() HAS BEEN READ { As above > + zipInput(read, offset, content.length - offset); > +} > +]]></source> > + > + <p>Reading entries from an zip archive using the > + recommended <code>ZipFile</code> class:</p> > +<source><![CDATA[ > +ZipArchiveEntry entry = zipFile.getEntry(name); > +InputStream content = zipFile.getInputStream(entry); > +try { > + READ UNTIL content IS EXHAUSTED > +} finally { > + content.close(); > +} > +]]></source> > + </subsection> > + > + <subsection name="jar"> > + <p>In general, JAR archives are ZIP files, so the JAR package > + supports all options provided by the ZIP package.</p> > + > + <p>To be interoperable JAR archives should always be created > + using the UTF-8 encoding for file names (which is the > + default).</p> > + > + <p>Archives created using <code>JarArchiveOutputStream</code> > + will implicitly add a <code>JarMarker</code> extra field to > + the very first archive entry of the archive which will make > + Solaris recognize them as Java archives and allows them to > + be used as executables.</p> > + > + <p>Note that <code>ArchiveStreamFactory</code> doesn't > + distinguish ZIP archives from JAR archives, so if you use > + the one-argument <code>createArchiveInputStream</code> > + method on a JAR archive, it will still return the more > + generic <code>ZipArchiveInputStream</code>.</p> > + > + <p>The <code>JarArchiveEntry</code> class contains fields for > + certificates and attributes that are planned to be supported > + in the future but are not supported as of Compress 1.0.</p> > + > + <p>Adding an entry to a jar archive:</p> > +<source><![CDATA[ > +JarArchiveEntry entry = new JarArchiveEntry(name, size); > +entry.setSize(size); > +jarOutput.putNextEntry(entry); > +jarOutput.write(contentOfEntry); > +jarOutput.closeArchiveEntry(); > +]]></source> > + > + <p>Reading entries from an jar archive:</p> > +<source><![CDATA[ > +JarArchiveEntry entry = jarInput.getNextJarEntry(); > +byte[] content = new byte[entry.getSize()]; > +LOOP UNTIL entry.getSize() HAS BEEN READ { As above > + jarInput(read, offset, content.length - offset); > +} > +]]></source> > + </subsection> > + > + <subsection name="bzip2"> > + > + <p>Note that <code>BZipCompressorOutputStream</code> keeps > + hold of some big data structures in memory. While it is > + true recommended for any stream that you close it as soon as > + you no longer needed, this is even more important > + for <code>BZipCompressorOutputStream</code>.</p> > + > + <p>Uncompressing a given bzip2 compressed file (you would > + certainly add exception handling and make sure all streams > + get closed properly):</p> > +<source><![CDATA[ > +FileInputStream in = new FileInputStream("archive.tar.bz2"); > +FileOutputStream out = new FileOutputStream("archive.tar"); > +BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in); > +final byte[] buffer = new byte[buffersize]; > +int n = 0; > +while (-1 != (n = bzIn.read(buffer))) { > + out.write(buffer, 0, n); > +} > +out.close(); > +bzIn.close(); > +]]></source> > + > + </subsection> > + > + <subsection name="gzip"> > + > + <p>The implementation of this package is provided by > + the <code>java.util.zip</code> package of the Java class > + library.</p> > + > + <p>Uncompressing a given bzip2 compressed file (you would > + certainly add exception handling and make sure all streams > + get closed properly):</p> > +<source><![CDATA[ > +FileInputStream in = new FileInputStream("archive.tar.gz"); > +FileOutputStream out = new FileOutputStream("archive.tar"); > +GZipCompressorInputStream bzIn = new GZipCompressorInputStream(in); > +final byte[] buffer = new byte[buffersize]; > +int n = 0; > +while (-1 != (n = bzIn.read(buffer))) { > + out.write(buffer, 0, n); > +} > +out.close(); > +bzIn.close(); > +]]></source> > + </subsection> > + > + </section> > + </body> > +</document> > > Propchange: commons/proper/compress/trunk/src/site/xdoc/examples.xml > > ------------------------------------------------------------------------------ > svn:eol-style = native > > Modified: commons/proper/compress/trunk/src/site/xdoc/index.xml > URL: > http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/index.xml?rev=759472&r1=759471&r2=759472&view=diff > > ============================================================================== > --- commons/proper/compress/trunk/src/site/xdoc/index.xml (original) > +++ commons/proper/compress/trunk/src/site/xdoc/index.xml Sat Mar 28 > 14:46:32 2009 > @@ -56,7 +56,34 @@ > </subsection> > </section> > <section name="Documentation"> > + <p>The compress component is split into <em>compressors</em> and > + <em>archivers</em>. While <em>compressors</em> > + (un)compress streams that usually store a single > + entry, <em>archivers</em> deal with archives that contain > + structured content represented > + by <code>ArchiveEntry</code> instances which in turn > + usually correspond to single files or directories.</p> > + > + <p>Currently the bzip2 and gzip formats are supported as > + compressors where gzip support is provided by > + the <code>java.util.zip</code> package of the Java class > + library.</p> > + > + <p>The ar, cpio, tar and zip formats are supported as > + archivers where the <a href="zip.html">zip</a> > + implementation provides capabilities that go beyond the > + features found in java.util.zip.</p> > + > + <p>The compress component provides abstract base classes for > + compressors and archivers together with factories that can > + be used to choose implementations by algorithm name. In > + the case of input streams the factories can also be used > + to guess the format and provide the matching > + implementation.</p> > + > <ul> > + <li>The <a href="examples.html">examples page</a> contains > + more detailed information and some examples.</li> > <li>The <a href="apidocs/index.html">Javadoc</a> of the latest > SVN</li> > <li>The <a > href="http://svn.apache.org/viewvc/commons/proper/compress/">SVN > repository</a> can be browsed.</li> > > Added: commons/proper/compress/trunk/src/site/xdoc/zip.xml > URL: > http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/zip.xml?rev=759472&view=auto > > ============================================================================== > --- commons/proper/compress/trunk/src/site/xdoc/zip.xml (added) > +++ commons/proper/compress/trunk/src/site/xdoc/zip.xml Sat Mar 28 14:46:32 > 2009 > @@ -0,0 +1,226 @@ > +<?xml version="1.0"?> > +<!-- > + > + Licensed to the Apache Software Foundation (ASF) under one or more > + contributor license agreements. See the NOTICE file distributed with > + this work for additional information regarding copyright ownership. > + The ASF licenses this file to You under the Apache License, Version 2.0 > + (the "License"); you may not use this file except in compliance with > + the License. You may obtain a copy of the License at > + > + http://www.apache.org/licenses/LICENSE-2.0 > + > + Unless required by applicable law or agreed to in writing, software > + distributed under the License is distributed on an "AS IS" BASIS, > + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + See the License for the specific language governing permissions and > + limitations under the License. > + > +--> > +<document> > + <properties> > + <title>Commons Compress ZIP package</title> > + <author email="dev@commons.apache.org">Commons Documentation > Team</author> > + </properties> > + <body> > + <section name="The ZIP package"> > + > + <p>The ZIP package provides features not found > + in <code>java.util.zip</code>:</p> > + > + <ul> > + <li>Support for encodings other than UTF-8 for filenames and > + comments.</li> > + <li>Access to internal and external attributes (which are used > + to store Unix permission by some zip implementations).</li> > + <li>Structured support for extra fields.</li> > + </ul> > + > + <p>In addition to the information stored > + in <code>ArchiveEntry</code> a <code>ZipArchiveEntry</code> > + stores internal and external attributes as well as extra > + fields which may contain information like Unix permissions, > + information about the platform they've been created on, their > + last modification time and an optional comment.</p> > + > + <subsection name="ZipArchiveInputStream vs ZipFile"> > + > + <p>ZIP archives store a archive entries in sequence and > + contain a registry of all entries at the very end of the > + archive. It is acceptable for an archive to contain several > + entries of the same name and have the registry (called the > + central directory) decide which entry is actually to be used > + (if any).</p> > + > + <p>In addition the ZIP format stores certain information only > + inside the central directory but not together with the entry > + itself, this is:</p> > + > + <ul> > + <li>internal and external attributes</li> > + <li>different or additional extra fields</li> > + </ul> > + > + <p>This means the ZIP format cannot really be parsed > + correctly while reading a non-seekable stream, which is what > + <code>ZipArchiveInputStream</code> is forced to do. As a > + result <code>ZipArchiveInputStream</code></p> > + <ul> > + <li>may return entries that are not part of the central > + directory at all and shouldn't be considered part of the > + archive.</li> > + <li>may return several entries with the same name.</li> > + <li>will not return internal or external attributes.</li> > + <li>may return incomplete extra field data.</li> > + </ul> > + > + <p><code>ZipArchiveInputStream</code> shares these limitations > + with <code>java.util.zip.ZipInputStream</code>.</p> > + > + <p><code>ZipFile</code> is able to read the central directory > + first and provide correct and complete information on any > + ZIP archive.</p> > + > + <p>If possible, you should always prefer <code>ZipFile</code> > + over <code>ZipArchiveInputStream</code>.</p> > + </subsection> > + > + <subsection name="Extra Fields"> > + > + <p>Inside a ZIP archive, additional data can be attached to > + each entry. The <code>java.util.zip.ZipEntry</code> class > + provides access to this via the <code>get/setExtra</code> > + methods as arrays of <code>byte</code>s.</p> > + > + <p>Actually the extra data is supposed to be more structured > + than that and Compress' ZIP package provides access to the > + structured data as <code>ExtraField</code> instances. Only > + a subset of all defined extra field formats is supported by > + the package, any other extra field will be stored > + as <code>UnrecognizedExtraField</code>.</p> > + > + </subsection> > + > + <subsection name="Encoding" id="encoding"> > + > + <p>Traditionally the ZIP archive format uses CodePage 437 as > + encoding for file name, which is not sufficient for many > + international character sets.</p> > + > + <p>Over time different archivers have chosen different ways to > + work around the limitation - the <code>java.util.zip</code> > + packages simply uses UTF-8 as its encoding for example.</p> > + > + <p>Ant has been offering the encoding attribute of the zip and > + unzip task as a way to explicitly specify the encoding to > + use (or expect) since Ant 1.4. It defaults to the > + platform's default encoding for zip and UTF-8 for jar and > + other jar-like tasks (war, ear, ...) as well as the unzip > + family of tasks.</p> > + > + <p>More recent versions of the ZIP specification introduce > + something called the "language encoding flag" > + which can be used to signal that a file name has been > + encoded using UTF-8. All ZIP-archives written by Compress > + will set this flag, if the encoding has been set to UTF-8. > + Our interoperability tests with existing archivers didn't > + show any ill effects (in fact, most archivers ignore the > + flag to date), but you can turn off the "language encoding > + flag" by setting the attribute > + <code>useLanguageEncodingFlag</code> to <code>false</code> on the > + <code>ZipArchiveOutputStream</code> if you should encounter > + problems.</p> > + > + <p>The <code>ZipFile</code> > + and <code>ZipArchiveInputStream</code> classes will > + recognize the language encoding flag and ignore the encoding > + set in the constructor if it has been found.</p> > + > + <p>The InfoZIP developers have introduced new ZIP extra fields > + that can be used to add an additional UTF-8 encoded file > + name to the entry's metadata. Most archivers ignore these > + extra fields. <code>ZipArchiveOutputStream</code> supports > + an option <code>createUnicodeExtraFields</code> which makes > + it write these extra fields either for all entries > + ("always") or only those whose name cannot be encoded using > + the specified encoding (not-encodeable), it defaults to > + "never" since the extra fields create bigger archives.</p> > + > + <p>The fallbackToUTF8 attribute > + of <code>ZipArchiveOutputStream</code> can be used to create > + archives that use the specified encoding in the majority of > + cases but UTF-8 and the language encoding flag for filenames > + that cannot be encoded using the specified encoding.</p> > + > + <p>The <code>ZipFile</code> > + and <code>ZipArchiveInputStream</code> classes recognize the > + Unicode extra fields by default and read the file name > + information from them, unless you set the constructor parameter > + <code>scanForUnicodeExtraFields</code> to false.</p> > + > + <h4>Recommendations for Interoperability</h4> > + > + <p>The optimal setting of flags depends on the archivers you > + expect as consumers/producers of the ZIP archives. Below > + are some test results which may be superseded with later > + versions of each tool.</p> > + > + <ul> > + <li>The java.util.zip package used by the jar executable or > + to read jars from your CLASSPATH reads and writes UTF-8 > + names, it doesn't set or recognize any flags or Unicode > + extra fields.</li> > + > + <li>7Zip writes CodePage 437 by default but uses UTF-8 and > + the language encoding flag when writing entries that > + cannot be encoded as CodePage 437 (similar to the zip task > + with fallbacktoUTF8 set to true). It recognizes the > + language encoding flag when reading and ignores the > + Unicode extra fields.</li> > + > + <li>WinZIP writes CodePage 437 and uses Unicode extra fields > + by default. It recognizes the Unicode extra field and the > + language encoding flag when reading.</li> > + > + <li>Windows' "compressed folder" feature doesn't recognize > + any flag or extra field and creates archives using the > + platforms default encoding - and expects archives to be in > + that encoding when reading them.</li> > + > + <li>InfoZIP based tools can recognize and write both, it is > + a compile time option and depends on the platform so your > + mileage may vary.</li> > + > + <li>PKWARE zip tools recognize both and prefer the language > + encoding flag. They create archives using CodePage 437 if > + possible and UTF-8 plus the language encoding flag for > + file names that cannot be encoded as CodePage 437.</li> > + </ul> > + > + <p>So, what to do?</p> > + > + <p>If you are creating jars, then java.util.zip is your main > + consumer. We recommend you set the encoding to UTF-8 and > + keep the language encoding flag enabled. The flag won't > + help or hurt java.util.zip but archivers that support it > + will show the correct file names.</p> > + > + <p>For maximum interop it is probably best to set the encoding > + to UTF-8, enable the language encoding flag and create > + Unicode extra fields when writing ZIPs. Such archives > + should be extracted correctly by java.util.zip, 7Zip, > + WinZIP, PKWARE tools and most likely InfoZIP tools. They > + will be unusable with Windows' "compressed folders" feature > + and bigger than archives without the Unicode extra fields, > + though.</p> > + > + <p>If Windows' "compressed folders" is your primary consumer, > + then your best option is to explicitly set the encoding to > + the target platform. You may want to enable creation of > + Unicode extra fields so the tools that support them will > + extract the file names correctly.</p> > + </subsection> > + > + </section> > + </body> > +</document> > > Propchange: commons/proper/compress/trunk/src/site/xdoc/zip.xml > > ------------------------------------------------------------------------------ > svn:eol-style = native > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org