Review Request: 6443578: Continuation lines in JAR manifests do not follow RFC-822

Philipp Kunz Fri, 12 Oct 2018 10:25:10 -0700

Hi,

Attached is a patch with a proposal to fix bug 6443578 about Manifest
inserting line breaks in between bytes of characters encoded in utf-8 with more 
than one byte and a test for it.



The Issue

The current Manifest implementation places line breaks for breaking and
continuing values onto the next line always at 72 bytes from the start
of the line without paying attention to characters encoded in utf-8
with more than one byte and to all bytes of one character together. It
occurs that a character encoded in utf-8 with more than one byte is
partially written at the end of one line and then after a line break
for a continuation line ("\r\n ") continued on the next line resulting
in an invalid utf-8 byte sequence where "\r\n " is inserted in between
the multiple bytes of the byte sequence a character is encoded in utf-8 
with.

That cannot happen with characters encoded with one byte in utf-8 which
are used most often. It only affects characters with more than one byte
encoded in utf-8. It also does not affect header names (the keys)
because header names are limited to {A-Z} | {a-z} | {0-9} | - | _ [1]
which all happen to be characters encoded in utf-8 with one byte.

The Manifest implementation removes these continuation line delimiters
("\r\n ") again in [2] and [3] (mind the --) and [4] (mind the 1 as
second parameter to arraycopy skipping the space) and concatenates the
parts from individual continuation lines together in [5] operating
still on a sequence of bytes before decoding the remaining bytes of the
value back into the string value applying the utf-8 decoding in [6].
The process is very similar also for names of named sections, see [7],
[8], [9], and [10]. Manifests written with the current implementation
are read again correctly.

Not only the specification says that sequences of bytes of the same
utf-8 encoded character cannot be broken apart [1]
> value: SPACE *otherchar newline *continuation
> continuation: SPACE *otherchar newline
> otherchar: any UTF-8 character except NUL, CR and LF
but line break and space can occur only between whole utf-8 encoded
characters and also jar manifests cannot be viewed with utf-8 aware
file viewers correctly. Manifests with values with characters illegally
broken onto a new line look for example simplified and with a shortened
line width limit similar like this one in any of my favorite file viewers:

> Manifest-Version: 1.0
> Key: Gänsefüßchen in Übergr?
> ?ße

where ? and ? should in fact be a German o umlaut ö. A file viewer may
understand and decode utf-8 well but has no way to know that the two
bytes, one before and one after the continuation line break actually
constitute the utf-8 encoded byte sequence of one single character.


Don't Put Continuation Line Breaks in Front of UTF-8 Continuation Bytes

The approach chosen in my patch is to prevent putting continuation line
breaks before a utf-8 continuation byte. UTF-* continuation bytes can
be detected with a bit mask and comparison, see [11]. An alternative
would have been to write one character (unicode codepoint and not Java
16-bit char) after another computing the length in number of bytes in
utf-8 encoded form for each and checking if it still fits on the same
line before writing it with or without an addition continuation line
break. When working with strings in Java, an additional challenge would
have been to work with surrogate pairs. I figure the bitmask comparison
fits the purpose here best, it is also used elsewhere in the jdk, see
[12] and [13] among other places, and presumably involves the least
resource usage and performance penalty, unfortunately none of which
accessible (private).


Tricky Strings of Actual Bytes

The current Manifest implementation does some kind of strange
processing when writing and encoding manifests with respect to using
strings with bytes as already described and mentioned in [14] and [15]
among probably more. The values are converted into strings with the
invocation of the deprecated constructor [17] in [16]:
> value = new String(vb, 0, 0, vb.length);
which converts the utf-8 encoded bytes into a string of chars each of
which is set to a byte value in its lower byte and zero in the higher
byte [18]. Each char of that string actually holds a byte and its
length corresponds to the number of bytes of the utf-8 encoded value.
After inserting line breaks in make72Safe [19] the string is written to
the output stream disregarding the higher bytes of the chars by
applying a lossy conversion in [20] thereby restoring the original utf-
8 byte sequence with line breaks added which also don't use higher
bytes.

I'll be glad to also solve bug 8066619 by replacing these weird strings
with byte arrays or anyhow else but prefer to do that in a separate
effort and patch and for this patch here the strings are encoded as
they always were. Above paragraph explains why my patch's
isUTF8ContinuationByte takes a parameter of type char which looks weird
at first glance but after studying the situation it implies the least
possible number of type casts and each char actually represents a byte
held in each chars lower byte accounting for that methods name.


Compatibility

In addition to the tests which usually test the current implementation
it has to be made sure that manifests written by newer jdk can be read
by an older one and manifests written by an older jdk can be read by a
newer one. Obviously also manifests written by older versions of the
jdk should be read correctly by older versions of it which has been addressed 
in older jdk version already and the test in the patch is supposed to test 
writing and reading of manifests with all kinds of cases related to breaking 
values on continuation lines in conjunction with characters near that break 
with different number of bytes in their utf-8 encoded forms.
Reading the manifests remains unchanged which is why only writing
manifests like it was done before became a part of the new test in the
patch, see writeManifestWithBrokenCharacters and make72Safe in the 
LineBreakCharacter test in the attached patch.


Performance and Resource Usage Implications

I figure that reading and writing manifests are frequent operations
already optimized to a certain extent guessing for instance from the
name of the class Manifest.FastInputStream. Adding more complexity as
in my patch will certainly make operations not faster. I have no facts
here but state some assumptions.

When writing a manifest, performance and resource usage is only
affected if lines are actually broken onto continuation lines, which I
would assume happens statistically to a minority of headers only.

When breaking a value onto a continuation line when writing a manifest,
at least one and at most four bytes will be checked if they are
continuation bytes. In average I assume that characters encoded with
fewer bytes occur much more often.

It may also happen that the same manifest will need more continuation
lines because lines will be filled with less than the complete 72 byte
limit if at the end of the line a multi-byte character would exceed it.
When the next line starts with the whole character instead of the
remaining bytes the next continuation line break could have to placed
earlier and so on. The utf-8 encoded values will not change but there
might be more continuation line breaks ("\r\n ") than before using
storage space in main memory as well as on disk. A line can be at most
three bytes shorter than the limit and therefore at least 24 lines are
necessary to require an additional continuation line break.

When reading manifests, the only difference might make the increased
manifest file sizes or input stream lengths in number of bytes.

Altogether, I guess that these implications are negligible.


The issue has been open for quite some time now and Java still has
worked well without its resolution. Advantages of applying that patch
would be that the manifest files would comply with respect to utf-8
encoding to its own specification and could be correctly viewed and
also edited with any utf-8 capable viewer or editor with the complete
unicode character set and another bug could be closed.

I'm looking forward to any feedback and would be glad if someone would
volunteer to sponsor the change.

Regards,
Philipp







[1] https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html
#Section-Specification
[2] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Attributes.java#l384 byte c = lbuf[--len];
[3] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Attributes.java#l392 --len;
[4] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Attributes.java#l407 System.arraycopy(lbuf, 1,
buf, lastline.length, len - 1);
[5] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Attributes.java#l406 System.arraycopy(lastline,
0, buf, 0, lastline.length);
[6] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Attributes.java#l412 value = new String(buf, 0,
buf.length, "UTF8");
[7] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Manifest.java#l236 byte c = lbuf[--len];
[8] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Manifest.java#l244 --len;
[9] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/share
/classes/java/util/jar/Manifest.java#l267 System.arraycopy(lbuf, 1,
buf, lastline.length, len - 1);
[10] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/java/util/jar/Manifest.java#l273 name = new String(buf, 0,
buf.length, "UTF8");
[11] https://tools.ietf.org/html/rfc3629#section-3
[12] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/java/lang/StringCoding.java#l632 return (b & 0xc0) != 0x80;
[13] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/sun/nio/cs/UTF_8.java#l90 return (b & 0xc0) != 0x80;
[14] https://bugs.openjdk.java.net/browse/JDK-8066619?focusedCommentId=
14085301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel#comment-14085301
[15] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/0529
46.html
[16] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/java/util/jar/Manifest.java#l177 value = new String(vb, 0, 0,
vb.length);
[17] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/java/lang/String.java#l375 @Deprecated(since="1.1") public
String(byte ascii[], int hibyte, int offset, int count) {
[18] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/java/lang/String.java#l389 StringUTF16.putChar(val, i, hibyte
| (ascii[offset++] & 0xff));
[19] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/java/util/jar/Manifest.java#l191
[20] http://hg.openjdk.org/jdk/jdk/file/331fbd2db6b5/src/java.base/shar
e/classes/java/io/DataOutputStream.java#l276
out.write((byte)s.charAt(i));
[21] https://bugs.openjdk.java.net/browse/JDK-6695402
[22] https://tools.ietf.org/html/rfc822#section-3.1.1

diff -r 0c1e44da019c src/java.base/share/classes/java/util/jar/Manifest.java
--- a/src/java.base/share/classes/java/util/jar/Manifest.java	Fri Oct 12 16:25:24 2018 +0200
+++ b/src/java.base/share/classes/java/util/jar/Manifest.java	Fri Oct 12 18:12:41 2018 +0200
@@ -186,17 +186,27 @@
     }
 
     /**
-     * Adds line breaks to enforce a maximum 72 bytes per line.
+     * Adds line breaks to enforce a maximum of 72 bytes per line.
      */
     static void make72Safe(StringBuffer line) {
         int length = line.length();
         int index = 72;
         while (index < length) {
+            while (isUTF8ContinuationByte(line.charAt(index))) index--;
             line.insert(index, "\r\n ");
             index += 74; // + line width + line break ("\r\n")
             length += 3; // + line break ("\r\n") and space
         }
-        return;
+    }
+
+    /**
+     * @see <a href="https://tools.ietf.org/html/rfc3629#section-3";>
+     * RFC 3629 - UTF-8, a transformation format of ISO 10646</a>
+     * @see StringCoding#isNotContinuation(int)
+     * @see sun.nio.cs.UTF_8.Decoder#isNotContinuation(int)
+     */
+    private static boolean isUTF8ContinuationByte(char b) {
+        return (b & 0xc0) == 0x80;
     }
 
     static String getErrorPosition(String filename, final int lineNumber) {
diff -r 0c1e44da019c test/jdk/java/util/jar/Manifest/LineBreakCharacter.java
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/jdk/java/util/jar/Manifest/LineBreakCharacter.java	Fri Oct 12 18:12:41 2018 +0200
@@ -0,0 +1,431 @@
+/*
+ * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+import java.util.jar.Attributes.Name;
+import java.util.List;
+import java.util.LinkedList;
+
+import org.testng.annotations.Test;
+import org.testng.annotations.DataProvider;
+import static org.testng.Assert.*;
+
+/**
+ * @test
+ * @bug 6443578
+ * @run testng LineBreakCharacter
+ * @summary Tests reading and writing of jar manifest header values broken
+ * across lines in conjunction with unicode characters encoded with a variable
+ * number of bytes in UTF-8 encoded.
+ * <p>
+ * The manifest line length limit (72 bytes) may be reached at a position
+ * between multiple bytes of a UTF-8 encoded character. Although characters
+ * should not be broken across lines according to the specification the
+ * previous Manifest implementation did that.
+ * <p>
+ * This test makes sure that no character is broken apart across a line break
+ * when writing manifests and also that manifests are still read correctly
+ * whether or not characters encoded in UTF-8 with more than one byte are
+ * interrupted with and continued after a line break.
+ */
+public class LineBreakCharacter {
+
+    static final int MANIFEST_LINE_CONTENT_WIDTH_BYTES = 72;
+
+    /**
+     * character string that has one byte size in its UTF-8 encoded form to
+     * yield one byte of position offset.
+     */
+    static final String FILL1BYTE = "x";
+
+    /**
+     * By using header names of four characters length the same values can be
+     * used for testing line break in both headers (in main attributes as well
+     * as named sections) and section names because a named section name is
+     * represented basically like any other header but follows an empty line
+     * and the key is always "Name".
+     * Relative to the start of the value, this way the same offset to the
+     * character to test breaking can be used in all cases.
+     */
+    static final String NAME = "Name";
+
+    /**
+     * Distinguish main attributes headers, section names, and headers in named
+     * sections because an implementation might make a difference.
+     * <p>
+     * Implementation of writing the main section headers in
+     * {@link Attributes#writeMain(java.io.DataOutputStream)} differs from the
+     * one writing named section headers in
+     * {@link Attributes#write(java.io.DataOutputStream)} regarding the special
+     * order of {@link Name#MANIFEST_VERSION} and
+     * {@link Name#SIGNATURE_VERSION} and named section names are written in
+     * {@link Manifest#write(OutputStream)}.
+     */
+    enum PositionInManifest {
+        MAIN_ATTRIBUTES, SECTION_NAME, NAMED_SECTION;
+    }
+
+    static String numByteUnicodeCharacter(int numBytes) {
+        String string;
+        switch (numBytes) {
+        case 1: string = "i"; break;
+        case 2: string = "ï"; break;
+        case 3: string = "\uFB00"; break; // "ﬀ"
+        case 4: string = Character.toString(0x2070E); break; // "𠜎"
+        default: throw new RuntimeException();
+        }
+        assertEquals(string.getBytes(UTF_8).length, numBytes,
+                "self-test failed: wrong UTF-8 encoded character length");
+        return string;
+    }
+
+    /**
+     * Produces test cases with all combinations of circumstances covered in
+     * which a character could possibly be attempted to be broken across a line
+     * break or onto a continuation line:<ul>
+     * <li>different sizes of a UTF-8 encoded character: one, two, three, and
+     * four bytes</li>
+     * <li>all possible relative positions of the 72-byte line length limit
+     * with respect to the character to test breaking including immediately
+     * before that character and immediately after the character and every
+     * position  in between for multi-byte UTF-8 encoded characters</li>
+     * <li>different number of preceding line breaks having occurred encoding
+     * the same value</li>
+     * <li>at the end of the value or followed by another character</li>
+     * <li>in a main attributes header value, section name, or named section
+     * header value (see also {@link #PositionInManifest})</li>
+     * </ul>
+     * <p>
+     * The same set of test parameters is used to write and read manifests
+     * once without breaking characters apart
+     * ({@link #testWriteLineBreaksKeepCharactersTogether(int, int, int, int,
+     * PositionInManifest, String, String)}) and once with doing so
+     * ({@link #readCharactersBrokenAcrossLines(int, int, int, int,
+     * PositionInManifest, String, String)}).
+     * The latter case covers backwards compatibility and involves writing
+     * manifests like they were written before resolution of bug 6443578.
+     */
+    @DataProvider(name = "lineBreakParameters")
+    public static Object[][] lineBreakParameters() {
+        LinkedList<Object[]> params = new LinkedList<>();
+
+        // b: number of line breaks before character under test
+        for (int b = 0; b <= 3; b++) {
+
+           // c: unicode character UTF-8 encoded length in bytes
+           for (int c = 1; c <= 4; c++) {
+
+                // p: potential break position offset in bytes
+                // p == 0 => before character,
+                // p == c => after character, and
+                // 0 < p < c => character potentially broken across line break
+                //              within the character
+                for (int p = c; p >= 0; p--) {
+
+                    // a: no or one character following the one under test
+                    // (a == 0 meaning the character under test is the end of
+                    // the value which is followed by a line break in the
+                    // resulting manifest without continuation line space which
+                    // concludes the value)
+                    for (int a = 0; a <= 1; a++) {
+
+                        // offset: so many characters (actually bytes here,
+                        // filled with one byte characters) are needed to place
+                        // the next character (the character under test) into a
+                        // position relative to the maximum line width that it
+                        // may or may not have to be broken onto the next line
+                        int offset =
+                                // number of lines; - 1 due to continuation " "
+                                b * (MANIFEST_LINE_CONTENT_WIDTH_BYTES - 1)
+                                // line length minus "Name: ".length()
+                                + MANIFEST_LINE_CONTENT_WIDTH_BYTES - 6
+                                // position of maximum line width relative to
+                                // beginning of encoded character
+                                - p;
+                        String value = "";
+                        for (int i = 0; i < offset - 1; i++) {
+                            value += FILL1BYTE;
+                        }
+                        // character before the one to test the break
+                        value += FILL1BYTE;
+                        String character = numByteUnicodeCharacter(c);
+                        value += character;
+                        for (int i = 0; i < a; i++) {
+                            // character after the one to test the break
+                            value += FILL1BYTE;
+                        }
+
+                        for (PositionInManifest i :
+                                PositionInManifest.values()) {
+
+                            params.add(new Object[] {
+                                    b, c, p, a, i, character, value});
+                        }
+                    }
+                }
+            }
+        }
+
+        return params.toArray(new Object[0][0]);
+    }
+
+    /**
+     * Checks that unicode characters work well with line breaks and
+     * continuation lines in jar manifests without breaking a character
+     * encoded in UTF-8 with more than one byte apart across a line break.
+     * <p>
+     * For each of the cases provided by {@link #lineBreakParameters()} the
+     * break position is verified in the written manifest binary form as well
+     * as verified that it restores to the original values when read again.
+     */
+    @Test(dataProvider = "lineBreakParameters")
+    public void testWriteLineBreaksKeepCharactersTogether(int b, int c, int p,
+            int a, PositionInManifest i, String character, String value)
+                    throws IOException {
+        byte[] mfBytes = writeManifest(i, NAME, value);
+
+        // expect the whole character on the next line unless it fits
+        // completely on the current line
+        boolean breakExpected = p < c;
+        String brokenPart = character;
+        if (breakExpected) {
+            brokenPart = "\r\n " + brokenPart;
+        }
+        // expect a line break before the next character if there is a next
+        // character and the previous not already broken on next line
+        if (a == 1) {
+            if (!breakExpected) {
+                brokenPart += "\r\n ";
+            }
+            brokenPart += FILL1BYTE;
+        }
+        brokenPart = FILL1BYTE + brokenPart + "\r\n";
+        try {
+            assertOccurrence(mfBytes, brokenPart.getBytes(UTF_8));
+            readManifestAndAssertValue(mfBytes, i, NAME, value);
+        } catch (AssertionError e) {
+            printManifest(mfBytes);
+            throw e;
+        }
+    }
+
+    static byte[] writeManifest(PositionInManifest i, String name,
+            String value) throws IOException {
+        Manifest mf = new Manifest();
+        mf.getMainAttributes().put(Name.MANIFEST_VERSION, "1.0");
+        Attributes attributes = new Attributes();
+
+        switch (i) {
+        case MAIN_ATTRIBUTES:
+            mf.getMainAttributes().put(new Name(name), value);
+            break;
+        case SECTION_NAME:
+            mf.getEntries().put(value, attributes);
+            break;
+        case NAMED_SECTION:
+            mf.getEntries().put(NAME, attributes);
+            attributes.put(new Name(name), value);
+            break;
+        }
+
+        ByteArrayOutputStream out = new ByteArrayOutputStream();
+        mf.write(out);
+        return out.toByteArray();
+    }
+
+    static void printManifest(byte[] mf) {
+        String SEP_LINE = "------------------------------------------"
+                + "------------------------------";
+        System.out.println(SEP_LINE);
+        System.out.print(new String(mf, UTF_8));
+        System.out.println(SEP_LINE);
+    }
+
+    static void assertOccurrence(byte[] mf, byte[] part) {
+        List<Integer> matchPos = new LinkedList<>();
+        for (int i = 0; i < mf.length; i++) {
+            for (int j = 0; j < part.length && i + j <= mf.length; j++) {
+                if (part[j] == 0) {
+                    if (i + j != mf.length) {
+                        break; // expected eof not found
+                    }
+                } else if (i + j == mf.length) {
+                    break;
+                } else if (mf[i + j] != part[j]) {
+                    break;
+                }
+                if (j == part.length - 1) {
+                    matchPos.add(i);
+                }
+            }
+        }
+        assertEquals(matchPos.size(), 1, "not "
+                + (matchPos.size() < 1 ? "found" : "unique") + ": '"
+                + new String(part, UTF_8) + "'");
+    }
+
+    static void readManifestAndAssertValue(
+            byte[] mfBytes, PositionInManifest i, String name, String value)
+                    throws IOException {
+        Manifest mf = new Manifest(new ByteArrayInputStream(mfBytes));
+
+        switch (i) {
+        case MAIN_ATTRIBUTES:
+            assertEquals(mf.getMainAttributes().getValue(name), value,
+                    "main attributes header value");
+            break;
+        case SECTION_NAME:
+            Attributes attributes = mf.getAttributes(value);
+            assertNotNull(attributes, "named section not found");
+            break;
+        case NAMED_SECTION:
+            attributes = mf.getAttributes(NAME);
+            assertEquals(attributes.getValue(name), value,
+                    "named section attributes header value");
+            break;
+        }
+    }
+
+    @Test(dataProvider = "lineBreakParameters")
+    public void readCharactersBrokenAcrossLines(int b, int c, int p, int a,
+            PositionInManifest i, String character, String value)
+                    throws IOException {
+        byte[] mfBytes = writeManifestWithBrokenCharacters(i, NAME, value);
+
+        ByteArrayOutputStream buf = new ByteArrayOutputStream();
+        buf.write(FILL1BYTE.getBytes(UTF_8));
+        byte[] characterBytes = character.getBytes(UTF_8);
+        // the portion of the character that fits on the current line before
+        // a break at 72 bytes, ranges from nothing (p == 0) to the whole
+        // character (p == c)
+        for (int j = 0; j < p; j++) {
+            buf.write(characterBytes, j, 1);
+        }
+        // expect a line break at exactly 72 bytes from the beginning of the
+        // line unless the whole character fits on that line
+        boolean breakExpected = p < c;
+        if (breakExpected) {
+            buf.write("\r\n ".getBytes(UTF_8));
+        }
+        // the remaining portion of the character, if any
+        for (int j = p; j < c; j++) {
+            buf.write(characterBytes, j, 1);
+        }
+        // expect another line break if the whole character fitted on the same
+        // line and there is another character
+        if (a == 1) {
+            if (c == p) {
+                buf.write("\r\n ".getBytes(UTF_8));
+            }
+            buf.write(FILL1BYTE.getBytes(UTF_8));
+        }
+        buf.write("\r\n".getBytes(UTF_8));
+        byte[] brokenPart = buf.toByteArray();
+        try {
+            assertOccurrence(mfBytes, brokenPart);
+            readManifestAndAssertValue(mfBytes, i, NAME, value);
+        } catch (AssertionError e) {
+            printManifest(mfBytes);
+            throw e;
+        }
+    }
+
+    /**
+     * From previous Manifest implementation reduced to the minimum required
+     * to demonstrate compatibility.
+     */
+    @SuppressWarnings("deprecation")
+    static byte[] writeManifestWithBrokenCharacters(
+            PositionInManifest i, String name, String value)
+                    throws IOException {
+        byte[] vb = value.getBytes(UTF_8);
+        value = new String(vb, 0, 0, vb.length);
+        ByteArrayOutputStream out = new ByteArrayOutputStream();
+        DataOutputStream dos = new DataOutputStream(out);
+        dos.writeBytes(Name.MANIFEST_VERSION + ": 0.1\r\n");
+
+        if (i == PositionInManifest.MAIN_ATTRIBUTES) {
+            StringBuffer buffer = new StringBuffer(name);
+            buffer.append(": ");
+            buffer.append(value);
+            make72Safe(buffer);
+            buffer.append("\r\n");
+            dos.writeBytes(buffer.toString());
+        }
+        dos.writeBytes("\r\n");
+
+        if (i == PositionInManifest.SECTION_NAME ||
+                i == PositionInManifest.NAMED_SECTION) {
+            StringBuffer buffer = new StringBuffer("Name: ");
+            if (i == PositionInManifest.SECTION_NAME) {
+                buffer.append(value);
+            } else {
+                buffer.append(NAME);
+            }
+            make72Safe(buffer);
+            buffer.append("\r\n");
+            dos.writeBytes(buffer.toString());
+
+            if (i == PositionInManifest.NAMED_SECTION) {
+                buffer = new StringBuffer(name);
+                buffer.append(": ");
+                buffer.append(value);
+                make72Safe(buffer);
+                buffer.append("\r\n");
+                dos.writeBytes(buffer.toString());
+            }
+
+            dos.writeBytes("\r\n");
+        }
+
+        dos.flush();
+        return out.toByteArray();
+    }
+
+    /**
+     * Adds line breaks to enforce a maximum 72 bytes per line.
+     * <p>
+     * From previous Manifest implementation without respect for UTF-8 encoded
+     * character boundaries breaking also within multi-byte UTF-8 encoded
+     * characters.
+     *
+     * @see {@link Manifest#make72Safe(StringBuffer)}
+     */
+    static void make72Safe(StringBuffer line) {
+        int length = line.length();
+        int index = 72;
+        while (index < length) {
+            line.insert(index, "\r\n ");
+            index += 74; // + line width + line break ("\r\n")
+            length += 3; // + line break ("\r\n") and space
+        }
+    }
+
+}
diff -r 0c1e44da019c test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java
--- a/test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java	Fri Oct 12 16:25:24 2018 +0200
+++ b/test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java	Fri Oct 12 18:12:41 2018 +0200
@@ -21,27 +21,45 @@
  * questions.
  */
 
-/*
- * @test
- * @bug 6695402
- * @summary verify signatures of jars containing classes with names
- *          with multi-byte unicode characters broken across lines
- * @library /test/lib
- */
-
 import java.io.IOException;
 import java.io.InputStream;
 import java.nio.file.Files;
+import java.nio.file.Path;
 import java.nio.file.Paths;
+import java.util.HashMap;
 import java.util.jar.JarFile;
 import java.util.jar.Attributes.Name;
 import java.util.jar.JarEntry;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipFile;
 
+import static java.util.jar.JarFile.MANIFEST_NAME;
 import static java.nio.charset.StandardCharsets.UTF_8;
 
 import jdk.test.lib.SecurityTools;
 import jdk.test.lib.util.JarUtils;
 
+import org.testng.annotations.Test;
+import org.testng.annotations.BeforeClass;
+
+/**
+ * @test
+ * @bug 6695402
+ * @library /test/lib
+ * @run testng LineBrokenMultiByteCharacter
+ * @summary verify signatures of jars containing classes with names
+ *          with multi-byte unicode characters broken across lines.
+ *
+ * Before characters were kept in one piece by the resolution of bug 6443578
+ * and were sometimes broken across a line break in manifests onto a
+ * continuation line with the line break and the continuation space in between
+ * two bytes of one single utf-8 encoded character represented with more than
+ * one byte, bug 6695402 was found resulting in a signature considered invalid.
+ * That can still happen after bug 6443578 fixed when signing manifests that
+ * already contain entries with characters broken across a line break,
+ * presumably when an older manifest is present created with an earlier version
+ * of the jdk when signing it or verifying its signature.
+ */
 public class LineBrokenMultiByteCharacter {
 
     /**
@@ -58,54 +76,176 @@
      * @see #verifyClassNameLineBroken(JarFile, String)
      */
     static final String testClassName =
-            "LineBrokenMultiByteCharacterA1234567890B1234567890C123456789D1234\u00E9xyz.class";
+            "LineBrokenMultiByteCharacterA1234567890B1234567890C123456789" +
+            "D1234\u00E9xyz.class";
 
     static final String anotherName =
-            "LineBrokenMultiByteCharacterA1234567890B1234567890C123456789D1234567890.class";
+            "LineBrokenMultiByteCharacterA1234567890B1234567890C123456789" +
+            "D1234567890.class";
 
     static final String alias = "a";
     static final String keystoreFileName = "test.jks";
-    static final String manifestFileName = "MANIFEST.MF";
 
-    public static void main(String[] args) throws Exception {
-        prepare();
+    byte[] manifestNoDigest = 
+            (Name.MANIFEST_VERSION + ": 1.0\r\n\r\n").getBytes(UTF_8);
+    byte[] manifestWithDigest; // with character broken across line break
 
-        testSignJar("test.jar");
-        testSignJarNoManifest("test-no-manifest.jar");
-        testSignJarUpdate("test-update.jar", "test-updated.jar");
+    /**
+     * Simulate a jar manifest as it would have been created by an earlier jdk
+     * by re-arranging the line break at exactly 72 bytes content thereby
+     * breaking the multi-byte character under test as was the case before
+     * resolution of bug 6443578.
+     */
+    static byte[] rebreak(byte[] mf0) {
+        byte[] mf1 = new byte[mf0.length];
+        int c0 = 0, c1 = 0; // bytes since last line start
+        for (int i0 = 0, i1 = 0; i0 < mf0.length; i0++, i1++) {
+            switch (mf0[i0]) {
+            case '\r':
+                if (i0 + 2 < mf0.length &&
+                        mf0[i0 + 1] == '\n' && mf0[i0 + 2] == ' ') {
+                    // skip line break
+                    i0 += 2;
+                    i1 -= 1;
+                } else {
+                    mf1[i1] = mf0[i0];
+                    c0 = c1 = 0;
+                }
+                break;
+            case '\n':
+                if (i0 + 1 < mf0.length && mf0[i0 + 1] == ' ') {
+                    // skip line break
+                    i0 += 1;
+                    i1 -= 1;
+                } else {
+                    mf1[i1] = mf0[i0];
+                    c0 = c1 = 0;
+                }
+                break;
+            case ' ':
+                if (c0 == 0) {
+                    continue;
+                }
+            default:
+                c0++;
+                if (c1 == 72) {
+                    mf1[i1++] = '\r';
+                    mf1[i1++] = '\n';
+                    mf1[i1++] = ' ';
+                    c1 = 1;
+                } else {
+                    c1++;
+                }
+                mf1[i1] = mf0[i0];
+            }
+        }
+        return mf1;
     }
 
-    static void prepare() throws Exception {
+    static Path[] createDummyFile(String... filenames) throws IOException {
+        Path[] pathes = new Path[filenames.length];
+        for (int i = 0; i < filenames.length; i++) {
+            String filename = filenames[i];
+            Path path = Paths.get(filename);
+            Files.write(path, filename.getBytes(UTF_8));
+            pathes[i] = path;
+        }
+        return pathes;
+    }
+
+    @SuppressWarnings("serial")
+    static void createJarWithManifest(String jarFileName, byte[] manifest,
+            String... files) throws IOException {
+        JarUtils.createJarFile(Paths.get("yetwithoutmanifest-" + jarFileName),
+                Paths.get("."), createDummyFile(files));
+        JarUtils.updateJar("yetwithoutmanifest-" + jarFileName, jarFileName,
+                new HashMap<>() {{ put(MANIFEST_NAME, manifest); }});
+    }
+
+    @BeforeClass
+    public void prepare() throws Exception {
         SecurityTools.keytool("-keystore", keystoreFileName, "-genkeypair",
                 "-storepass", "changeit", "-keypass", "changeit", "-storetype",
                 "JKS", "-alias", alias, "-dname", "CN=X", "-validity", "366")
             .shouldHaveExitValue(0);
 
-        Files.write(Paths.get(manifestFileName), (Name.
+        String jarFileName = "reference-manifest.jar";
+        createJarWithManifest(jarFileName, manifestNoDigest,
+                testClassName, anotherName);
+        SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype",
+                "JKS", "-storepass", "changeit", "-debug", jarFileName, alias)
+            .shouldHaveExitValue(0);
+        try (ZipFile refJar = new ZipFile(jarFileName);) {
+            ZipEntry mfEntry = refJar.getEntry(JarFile.MANIFEST_NAME);
+            manifestWithDigest = refJar.getInputStream(mfEntry).readAllBytes();
+        }
+        System.out.println("manifestWithDigest before re-break = " +
+                new String(manifestWithDigest, UTF_8));
+        manifestWithDigest = rebreak(manifestWithDigest);
+        System.out.println("manifestWithDigest after re-break = " +
+                new String(manifestWithDigest, UTF_8));
+        // now, manifestWithDigest is accepted as unmodified by
+        // jdk.security.jarsigner.JarSigner#updateDigests
+        // (ZipEntry,ZipFile,MessageDigest[],Manifest) on line 985:
+        // "if (!mfDigest.equalsIgnoreCase(base64Digests[i])) {"
+        // and therefore left unchanged when the jar is signed and
+        // signature verification will check it, which is the test case.
+
+        Files.createDirectories(Paths.get(MANIFEST_NAME).getParent());
+        Files.write(Paths.get(MANIFEST_NAME), (Name.
                 MANIFEST_VERSION.toString() + ": 1.0\r\n").getBytes(UTF_8));
     }
 
-    static void testSignJar(String jarFileName) throws Exception {
-        JarUtils.createJar(jarFileName, manifestFileName, testClassName);
-        verifyJarSignature(jarFileName);
-    }
-
-    static void testSignJarNoManifest(String jarFileName) throws Exception {
-        JarUtils.createJar(jarFileName, testClassName);
-        verifyJarSignature(jarFileName);
+    @Test
+    public void testSignJarDefaultManifest() throws Exception {
+        String jarFileName = "test-default-manifest.jar";
+        JarUtils.createJarFile(Paths.get(jarFileName), Paths.get("."),
+                createDummyFile(testClassName));
+        signAndVerifyJarSignature(jarFileName, false);
     }
 
-    static void testSignJarUpdate(
-            String initialFileName, String updatedFileName) throws Exception {
-        JarUtils.createJar(initialFileName, manifestFileName, anotherName);
-        SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype",
-                "JKS", "-storepass", "changeit", "-debug", initialFileName,
-                alias).shouldHaveExitValue(0);
-        JarUtils.updateJar(initialFileName, updatedFileName, testClassName);
-        verifyJarSignature(updatedFileName);
+    @Test
+    public void testSignJarWithNoExistingClassEntry() throws Exception {
+        String jarFileName = "test-eacuteinonepiece.jar";
+        createJarWithManifest(jarFileName, manifestNoDigest, testClassName);
+        signAndVerifyJarSignature(jarFileName, false);
     }
 
-    static void verifyJarSignature(String jarFileName) throws Exception {
+    @Test
+    public void testSignJarWithBrokenEAcuteClassEntry() throws Exception {
+        String jarFileName = "test-brokeneacute.jar";
+        createJarWithManifest(jarFileName, manifestWithDigest, testClassName);
+        signAndVerifyJarSignature(jarFileName, true);
+    }
+
+    @Test
+    public void testSignJarUpdateWithEAcuteClassEntryInOnePiece()
+            throws Exception {
+        String jarFileName = "test-eacuteinonepiece-update.jar";
+        createJarWithManifest(jarFileName, manifestNoDigest, anotherName);
+        SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype",
+                "JKS", "-storepass", "changeit", "-debug", jarFileName,
+                alias).shouldHaveExitValue(0);
+        JarUtils.updateJarFile(Paths.get(jarFileName),
+                Paths.get("."), createDummyFile(testClassName));
+        signAndVerifyJarSignature(jarFileName, false);
+    }
+
+    @Test
+    public void testSignJarUpdateWithBrokenEAcuteClassEntry()
+            throws Exception {
+        String jarFileName = "test-brokeneacute-update.jar";
+        createJarWithManifest(jarFileName, manifestWithDigest, anotherName);
+        SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype",
+                "JKS", "-storepass", "changeit", "-debug", jarFileName,
+                alias).shouldHaveExitValue(0);
+        JarUtils.updateJarFile(Paths.get(jarFileName),
+                Paths.get("."), createDummyFile(testClassName));
+        signAndVerifyJarSignature(jarFileName, true);
+    }
+
+    static void signAndVerifyJarSignature(String jarFileName,
+            boolean expectBrokenEAcute) throws Exception {
         // actually sign the jar
         SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype",
                 "JKS", "-storepass", "changeit", "-debug", jarFileName, alias)
@@ -114,7 +254,7 @@
         try (
             JarFile jar = new JarFile(jarFileName);
         ) {
-            verifyClassNameLineBroken(jar, testClassName);
+            verifyClassNameLineBroken(jar, testClassName, expectBrokenEAcute);
             verifyCodeSigners(jar, jar.getJarEntry(testClassName));
         }
     }
@@ -131,8 +271,8 @@
      * this relies on {@link java.util.jar.Manifest} breaking lines unaware
      * of bytes that belong to the same multi-byte utf characters.
      */
-    static void verifyClassNameLineBroken(JarFile jar, String className)
-            throws IOException {
+    static void verifyClassNameLineBroken(JarFile jar, String className,
+            boolean expectBrokenEAcute) throws IOException {
         byte[] eAcute = "\u00E9".getBytes(UTF_8);
         byte[] eAcuteBroken =
                 new byte[] {eAcute[0], '\r', '\n', ' ', eAcute[1]};
@@ -142,6 +282,10 @@
         }
 
         JarEntry manifestEntry = jar.getJarEntry(JarFile.MANIFEST_NAME);
+        System.out.println("jar = " + jar.getName());
+        System.out.println("expectBrokenEAcute = " + expectBrokenEAcute);
+        System.out.println("Manifest = \n" + new String(
+                jar.getInputStream(manifestEntry).readAllBytes(), UTF_8));
         try (
             InputStream manifestIs = jar.getInputStream(manifestEntry);
         ) {
@@ -156,7 +300,7 @@
                     bytesMatched = 0;
                 }
             }
-            if (bytesMatched < eAcuteBroken.length) {
+            if (expectBrokenEAcute == (bytesMatched < eAcuteBroken.length)) {
                 throw new AssertionError("self-test failed: multi-byte "
                         + "utf-8 character not broken across lines");
             }

Review Request: 6443578: Continuation lines in JAR manifests do not follow RFC-822

Reply via email to