I think with the soon-to-be-next-release of Tika, you can turn off throwing
zero-byte file exceptions via the config. The exceptions should be harmless
and you can safely ignore them.
Just upgraded to tika 2.9.0.
Testing, as below, same error thrown.
not certain of the correct config here :-/
added to
edit /etc/tika/tika-server-config-custom.xml
...
+ <parser class="org.apache.tika.parser.AutoDetectParserConfig">
+ <params>
+ <param name="ThrowOnZeroBytes" type="bool">false</param>
+ </params>
+ </parser>
...
Reading
https://downloads.apache.org/tika/2.9.0/CHANGES-2.9.0.txt
* Users may now avoid the ZeroByteFileException via asetting on the
AutoDetectParserConfig (TIKA-3976).
restarted tika, re-testing, still end with
tika[32035]: Aug 28, 2023 7:13:15 PM org.apache.cxf.jaxrs.utils.JAXRSUtils
logMessageHandlerProblem
tika[32035]: SEVERE: Problem with writing the data, class
org.apache.tika.server.core.resource.TikaResource$$Lambda$388/0x00007f4fb42aa2d0,
ContentType: text/plain
so, atm, dovecot's still sending zero bytes, and tika's still unhappy about it
-------- Original Message --------
From: dovecot@dovecot.org
Sent: at Friday, Aug 18, 2023, 16:06 PM EDT
To: talli...@apache.org Cc: dovecot@dovecot.org
Subject: Re: [bug] dovecot passes zero byte input stream when passing email
with .eml attachment to apache tika parser, causes 'SEVERE' error
soon-to-be-next-release of Tika,
i saw that was coming
you can turn off throwing zero-byte file exceptions via the config
can you point to the config toggle, or docs, in https://github.com/apache/tika ?
The exceptions should be harmless and you can safely ignore them.
including the SEVERE notice?
For some users, they need to know that there's a zero-byte file, hence the
default behavior. It can also be useful while doing parser development to find
files where embedded files are zero-byte files. Sometimes things go wrong in
the container parser.
iiuc, the exception's thrown WHEN input's a zero-byte file.
in this dovecot <-> tika case, that only occurs when the attachment sent is a
.eml, not with any other attachment type (so far)
is current-release tika known/verified to handle .eml (iirc, there were some
issues awhile ago ...) ? and not mistakenly munging the input size to zero?
if it's demonstrated OK, then it's likely Dovecot mistakenly sending no input
in the .eml-attachment case, no?
_______________________________________________
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org
_______________________________________________
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org