lxb007981 opened a new pull request, #1436:
URL: https://github.com/apache/tika/pull/1436

   ### Description
   
   - Type of change :
     - [ ] New feature
     - [ ] Bug fix for existing feature
     - [ ] Code quality improvement
     - [X] Addition or Improvement of tests
     - [ ] Addition or Improvement of documentation
   
   Fix a flaky test, caused by nondeterministic iteration order of HashMap
   
   ### Related Test
   
[org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious](https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59)
   
   ### Root Cause
   
https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119
   
   When recursively extracting metadata from an XPS file, a simple `HashMap` is 
used and iterated. However, note that `HashMap` does not gurantee the order of 
iteration, the extracted metadata have no guaranteed order in the resulting 
metadata list. And later in the test 
`org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious`, the 
test assumes the order of metadata in the list is the same as the `HashMap` 
insertion order, thus renders the test flaky.
   
   ### Fix
   We sort the metadata list before doing comparison.
   
   ### How to reproduce the test
   **Java version**: 11.0.20.1
   **Maven version**: 3.6.3
   
   1. Build the module
   `mvn clean install -DskipTests -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 -am`
   2. Test without shuffling
   `mvn -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 test 
-Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious`
   This test passed.
   
   3. Test with shuffling using 
[NonDex](https://github.com/TestingResearchIllinois/NonDex)
   `mvn -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 edu.illinois:nondex-maven-plugin:2.1.1:nondex 
-Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious`
   
   This test passed with the proposed fix but failed without it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to