All,

  Again, my apologies for being late, but the results might still be
useful for work towards 4.1.1.

http://162.242.228.174/reports/poi-4.1.0-reports.zip

Some tentative observations:
1) there was the new and non-replicable set of problems with the XSSFBParser.

2) The emf/wmf regressions are responsible for the decrease in
attachments and common words.

3) It looks like there are spacing problems/new line problems with the
update emf/wmf code, but that might be on Tika's side.

4) The large increase in common words in ooxml that were formally
tika-ooxml is caused by ZipSalvager.  On the Tika side, we're now
creating a valid zip from truncated zips and rerunning the parse.  So,
we used to get the content via the PkgParser and that content would
have gone into "attachments".

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to