On Mon, 1 Jan 2024 14:29:50 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:
>> Please review this test-only PR which adds test coverage for >> `ZipFile.getEntry` under certain charset conditions. >> >> When `ZipFile.getEntry` is called for an entry which has the `Language >> encoding flag` general purpose bit flag set, then `ZipCoder.UTF8` is used >> unconditionally, even when a different charset was supplied to the `ZipFile` >> constructor. >> >> It turns out we do not have any testing for this particular case, as can be >> verified by commenting out the following line of code in >> `ZipFile.Source.getEntryPos`: >> >> >> //ZipCoder zc = zipCoderForPos(pos); >> ``` >> >> and then running `make test TEST="test/jdk/java/util/zip"` >> >> The current test verifies that the correct ZipCoder is used by >> `ZipFile.entries()`, but does not exercise `ZipFile.getEntry` the same way. >> >> Seeing that [JDK-7009069](https://bugs.openjdk.org/browse/JDK-7009069) was >> (accidentally?) fixed by >> [JDK-8243469](https://bugs.openjdk.org/browse/JDK-8243469), I think it is >> worthwhile to add explicit testing for this case to avoid regressions. >> >> While visiting `ZipCoding.java`, I took the opportunity to convert it to >> JUnit 5. The conversion and modernization of the code is done in the first >> commit 1384850ed51ec845af06dd6d13616f20f8bbaa6a in this PR, while the second >> commit 1776b258b0fe8383709ae0c095f2631a4e6237f6 actually adds the code >> required to verify the `Language encoding flag` condition for >> `ZipFile.getEntry`. >> >> Testing: Verified that the test indeed fails when >> `ZipFile.Source.getEntryPos` is updated to use the ZipFile's ZipCoder as >> suggested above. > > Eirik Bjørsnøs has updated the pull request incrementally with one additional > commit since the last revision: > > Add @bug tag for 8322802 test/jdk/java/util/zip/ZipCoding.java line 86: > 84: // ZipOutputStream sets the 'Language encoding flag' when > writing using UTF-8 > 85: // UTF-8 should be used for decoding, even when opening > with a different charset > 86: Arguments.of("utf-8", "iso-8859-1", Hello Eirik, there are 3 streams of arguments above which use "utf-8" for writing out the entry. All those other 3 streams of arguments, use "utf-8" for reading too (which is OK). For each of those 3 streams of arguments, could you also add another argument stream which uses non-utf8 charset for reading, like you do here? I think that would give this a bit more coverage, especially for cases like the surrogates in comments and entry names. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17207#discussion_r1439211686