[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888235#comment-17888235 ]
Tim Allison commented on TIKA-4280: ----------------------------------- Regression results here: https://corpora.tika.apache.org/base/reports/tika-3.0.0-SNAPSHOT-reports.tgz Lots of changes with new detection of csv delimiter=colon; some (many?) appear to be false positives. There are changes in charset detection because we changed the charset detector -- some good, some not. Some modest changes in html parsing even when the encoding is detected as the same because we changed the html parser to jsoup. Let me know what you think. > Tasks for the 3.0.0 release > --------------------------- > > Key: TIKA-4280 > URL: https://issues.apache.org/jira/browse/TIKA-4280 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > I'm too lazy to open separate tickets. Please do so if desired. > Some items: > * Before releasing the real 3.0.0 we need to remove any "-M" dependencies > * Decide about the ffmpeg issue and the hdf5 issue > * Run the regression tests vs 2.9.x > * Convert tika-grpc to use the dependency plugin instead of the shade plugin > * Turn javadocs back on. I got errors during the deploy process because > javadoc needed the auto-generated code ("cannot find symbol > DeleteFetcherRequest"). We need to enable javadocs for the rest of the > project. > * TIKA-4290 Tilman question > Other things? Thank you [~tilman] for the first two! -- This message was sent by Atlassian Jira (v8.20.10#820010)