[ https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782395#comment-17782395 ]
ASF GitHub Bot commented on TIKA-4165: -------------------------------------- lxb007981 opened a new pull request, #1436: URL: https://github.com/apache/tika/pull/1436 ### Description - Type of change : - [ ] New feature - [ ] Bug fix for existing feature - [ ] Code quality improvement - [X] Addition or Improvement of tests - [ ] Addition or Improvement of documentation Fix a flaky test, caused by nondeterministic iteration order of HashMap ### Related Test [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious](https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59) ### Root Cause https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119 When recursively extracting metadata from an XPS file, a simple `HashMap` is used and iterated. However, note that `HashMap` does not gurantee the order of iteration, the extracted metadata have no guaranteed order in the resulting metadata list. And later in the test `org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious`, the test assumes the order of metadata in the list is the same as the `HashMap` insertion order, thus renders the test flaky. ### Fix We sort the metadata list before doing comparison. ### How to reproduce the test **Java version**: 11.0.20.1 **Maven version**: 3.6.3 1. Build the module `mvn clean install -DskipTests -pl tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module -am` 2. Test without shuffling `mvn -pl tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module test -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious` This test passed. 3. Test with shuffling using [NonDex](https://github.com/TestingResearchIllinois/NonDex) `mvn -pl tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious` This test passed with the proposed fix but failed without it. > Fix a flaky test, caused by nondeterministic iteration order of HashMap > ----------------------------------------------------------------------- > > Key: TIKA-4165 > URL: https://issues.apache.org/jira/browse/TIKA-4165 > Project: Tika > Issue Type: Improvement > Reporter: Xinbo Lu > Priority: Minor > Attachments: diff > > > Fix a flaky test, caused by nondeterministic iteration order of HashMap > h3. Related Test > [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59] > h3. Root Cause > > [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119] > Line 119 in > [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de] > ||for (Map.Entry<String, Metadata> embeddedImage : embeddedImages.entrySet()) > {|| > > When recursively extracting metadata from an XPS file, a simple {{HashMap}} > is used and iterated. However, note that {{HashMap}} does not gurantee the > order of iteration, the extracted metadata have no guaranteed order in the > resulting metadata list. And later in the test > {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, > the test assumes the order of metadata in the list is the same as the > {{HashMap}} insertion order, thus renders the test flaky. > h3. Fix > We sort the metadata list before doing comparison. > h3. How to reproduce the test > {*}Java version{*}: 11.0.20.1 > {*}Maven version{*}: 3.6.3 > # Build the module > {{mvn clean install -DskipTests -pl > tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module > -am}} > # Test without shuffling > {{mvn -pl > tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module > test > -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}} > This test passed. > # Test with shuffling using > [NonDex|https://github.com/TestingResearchIllinois/NonDex] > {{{}mvn -pl > tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module > edu.illinois:nondex-maven-plugin:2.1.1:nondex > -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This > test passed with the proposed fix but failed without it. -- This message was sent by Atlassian Jira (v8.20.10#820010)