[ https://issues.apache.org/jira/browse/TIKA-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863552#comment-17863552 ]
Tilman Hausherr commented on TIKA-4274: --------------------------------------- new output: {noformat} INFO [pool-3-thread-4] 11:41:41,973 org.apache.tika.eval.app.io.ExtractReader maxExtractLength 2000000 > IGNORE_LENGTH -1 and length 2587452 > maxExtractLength 2000000 org.apache.tika.eval.app.io.ExtractReaderException: EXTRACT_FILE_TOO_LONG at org.apache.tika.eval.app.io.ExtractReader.loadExtract(ExtractReader.java:129) at org.apache.tika.eval.app.ExtractComparer.compareFiles(ExtractComparer.java:198) at org.apache.tika.eval.app.ExtractComparer.processFileResource(ExtractComparer.java:180) at org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:152) at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:87) at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) {noformat} > Improve ExtractReaderException > ------------------------------ > > Key: TIKA-4274 > URL: https://issues.apache.org/jira/browse/TIKA-4274 > Project: Tika > Issue Type: Improvement > Components: tika-eval > Affects Versions: 2.9.2 > Reporter: Tilman Hausherr > Assignee: Tilman Hausherr > Priority: Minor > Fix For: 3.0.0, 2.9.3 > > > I saw this stack trace in the eval log and it's not really helpful > {noformat} > org.apache.tika.eval.app.io.ExtractReaderException > at > org.apache.tika.eval.app.io.ExtractReader.loadExtract(ExtractReader.java:125) > at > org.apache.tika.eval.app.ExtractComparer.compareFiles(ExtractComparer.java:198) > at > org.apache.tika.eval.app.ExtractComparer.processFileResource(ExtractComparer.java:180) > at > org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:152) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:87) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} > so I'm adding the type, the cause and also some logging for > EXTRACT_FILE_TOO_SHORT / EXTRACT_FILE_TOO_LONG so that we can know what this > is about, and then do something (or not) about it. -- This message was sent by Atlassian Jira (v8.20.10#820010)