Tilman Hausherr created TIKA-4274: ------------------------------------- Summary: Improve ExtractReaderException Key: TIKA-4274 URL: https://issues.apache.org/jira/browse/TIKA-4274 Project: Tika Issue Type: Improvement Components: tika-eval Affects Versions: 2.9.2 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 3.0.0, 2.9.3
I saw this stack trace in the eval log and it's not really helpful {noformat} org.apache.tika.eval.app.io.ExtractReaderException at org.apache.tika.eval.app.io.ExtractReader.loadExtract(ExtractReader.java:125) at org.apache.tika.eval.app.ExtractComparer.compareFiles(ExtractComparer.java:198) at org.apache.tika.eval.app.ExtractComparer.processFileResource(ExtractComparer.java:180) at org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:152) at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:87) at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) {noformat} so I'm adding the type, the cause and also some logging for EXTRACT_FILE_TOO_SHORT / EXTRACT_FILE_TOO_LONG so that we can know what this is about, and then do something (or not) about it. -- This message was sent by Atlassian Jira (v8.20.10#820010)