I think with the soon-to-be-next-release of Tika, you can turn off throwing 
zero-byte file exceptions via the config.  The exceptions should be harmless 
and you can safely ignore them.

Just upgraded to tika 2.9.0.
Testing, as below, same error thrown.

not certain of the correct config here :-/

added to

 edit /etc/tika/tika-server-config-custom.xml
    ...
+   <parser class="org.apache.tika.parser.AutoDetectParserConfig">
+     <params>
+       <param name="ThrowOnZeroBytes" type="bool">false</param>
+     </params>
+   </parser>
    ...

Reading

        https://downloads.apache.org/tika/2.9.0/CHANGES-2.9.0.txt
        * Users may now avoid the ZeroByteFileException via asetting on the 
AutoDetectParserConfig (TIKA-3976).

restarted tika, re-testing, still end with

  tika[32035]: Aug 28, 2023 7:13:15 PM org.apache.cxf.jaxrs.utils.JAXRSUtils 
logMessageHandlerProblem
  tika[32035]: SEVERE: Problem with writing the data, class 
org.apache.tika.server.core.resource.TikaResource$$Lambda$388/0x00007f4fb42aa2d0,
 ContentType: text/plain

so, atm, dovecot's still sending zero bytes, and tika's still unhappy about it

-------- Original Message --------
From: dovecot@dovecot.org
Sent:  at Friday, Aug 18, 2023, 16:06 PM EDT
To: talli...@apache.org Cc: dovecot@dovecot.org
Subject: Re: [bug] dovecot passes zero byte input stream when passing email 
with .eml attachment to apache tika parser, causes 'SEVERE' error

soon-to-be-next-release of Tika,

i saw that was coming

you can turn off throwing zero-byte file exceptions via the config

can you point to the config toggle, or docs, in https://github.com/apache/tika ?

The exceptions should be harmless and you can safely ignore them.

including the SEVERE notice?

For some users, they need to know that there's a zero-byte file, hence the 
default behavior.  It can also be useful while doing parser development to find 
files where embedded files are zero-byte files.  Sometimes things go wrong in 
the container parser.

iiuc, the exception's thrown WHEN input's a zero-byte file.

in this dovecot <-> tika case, that only occurs when the attachment sent is a 
.eml, not with any other attachment type (so far)

is current-release tika known/verified to handle .eml (iirc, there were some 
issues awhile ago ...) ?  and not mistakenly munging the input size to zero?

if it's demonstrated OK, then it's likely Dovecot mistakenly sending no input 
in the .eml-attachment case, no?
_______________________________________________
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org



_______________________________________________
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org

Reply via email to