In Java 11, the AIFFReader swallows the EOF and throws an
UnsupportedAudioFileException.

We have this:

} catch (UnsupportedAudioFileException e) {
    // There is no way to know whether this exception was
    // caused by the document being corrupted or by the format
    // just being unsupported. So we do nothing.

In main, I've added a warning message in the metadata using the
TikaCoreProperties.TIKA_META_EXCEPTION_WARNING key.

This is not a Tika problem, 1.27 or otherwise. :D

Onwards!

On Thu, Jul 1, 2021 at 11:30 AM Tim Allison <talli...@apache.org> wrote:
>
> >I'll dig into it, but I'm concerned
> but I'm _not_ concerned...famous last words.  See above about
> day/week/month/year...
>
> This is caused by a diff in java versions.  This is not a problem at
> the Tika level.  With Java 8, there's an EOF[0].  With Java 11,
> there's no EOF.[1]  Not sure if this is a feature of Java 11 or worthy
> of a bug report.
>
> [0] openjdk version "1.8.0_292" OpenJDK Runtime Environment
> (AdoptOpenJDK)(build 1.8.0_292-b10
> [1] openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment
> AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
>
> On Wed, Jun 30, 2021 at 5:01 PM Tim Allison <talli...@apache.org> wrote:
> >
> >
> > Just one of those days, weeks, months, years.... Sorry... and thank you, 
> > Ken!
> >
> > AIFF, we're now getting more eofs than we were.  This might be a Java 
> > issue, but I don't think there's anything to do at the Tika level.  I don't 
> > remember any changes in the AudioParser in 1.27.  I'll dig into it, but I'm 
> > concerned...famous last words...
> >
> > o.a.t.exception.TikaException
> > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:287)
> > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> > at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188)
> > at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84)
> > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188)
> > at 
> > o.a.t.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:376)
> > at o.a.t.parser.DelegatingParser.parse(DelegatingParser.java:72)
> > at 
> > o.a.t.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:104)
> > at o.a.t.parser.pkg.RarParser.parse(RarParser.java:95)
> > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> > at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188)
> > at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84)
> > at 
> > o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:239)
> > at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:406)
> > at 
> > o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:105)
> > at 
> > o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:181)
> > at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
> > at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:50)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.io.EOFException
> > at java.io.DataInputStream.readInt(DataInputStream.java
> > at com.sun.media.sound.AiffFileReader.getCOMM(AiffFileReader.java:267)
> > at 
> > com.sun.media.sound.AiffFileReader.getAudioFileFormat(AiffFileReader.java:76)
> > at javax.sound.sampled.AudioSystem.getAudioFileFormat(AudioSystem.java:1004)
> > at o.a.t.parser.audio.AudioParser.parse(AudioParser.java:73)
> > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> > ... 28 more
> >
> >
> > On Wed, Jun 30, 2021 at 4:37 PM Ken Krugler <kkrugler_li...@transpac.com> 
> > wrote:
> >>
> >> Hi Tim,
> >>
> >> Don’t leave us hanging… :)
> >>
> >> — Ken
> >>
> >> On Jun 30, 2021, at 12:47 PM, Tim Allison <talli...@apache.org> wrote:
> >>
> >> There's an apparent change in mime detection: application/msword ->
> >> application/pkcs7-signature and a few other file formats are now
> >> apparently being detected as pkcs2-signature...
> >>
> >> This is an artifact of tika-eval and not a problem.  The issue is that
> >> we used to parse files wrapped in pkcs7 sigs twice, and tika-eval
> >> mailed to match up diff numbers of attachments.
> >>
> >> There may be a genuine new issue with
> >>
> >>
> >> On Wed, Jun 30, 2021 at 3:06 PM Tim Allison <talli...@apache.org> wrote:
> >>
> >>
> >> Reports are here:
> >> https://corpora.tika.apache.org/base/reports/tika-1.27-pre-rc1-reports.tgz
> >>
> >> I've since fixed the MP4 issue.
> >>
> >> I'm running prepping 1.27-rc1 now.
> >>
> >> On Mon, Jun 28, 2021 at 3:56 PM Tim Allison <talli...@apache.org> wrote:
> >>
> >>
> >> Updated dependencies that I could.  Kicking off regression tests now.
> >> Onwards to 1.27!
> >>
> >> Cheers,
> >>
> >>         Tim
> >>
> >> On Mon, Jun 28, 2021 at 1:11 PM Nicholas DiPiazza
> >> <nicholas.dipia...@gmail.com> wrote:
> >>
> >>
> >> +1 on 1.27 release.
> >>
> >> On Mon, Jun 28, 2021, 10:57 AM Tim Allison <talli...@apache.org> wrote:
> >>
> >>
> >> All,
> >>  The recent release of PDFBox fixed 2 DoS CVEs.  Let's update our
> >> dependencies and go for a 1.27 release soon?  Any blockers?  Any
> >> strong prefs to go for a 2.0.0 or 2.0.0-BETA2 first?
> >>
> >>  Cheers,
> >>
> >>              Tim
> >>
> >>
> >> --------------------------
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> Custom big data solutions
> >> Flink, Pinot, Solr, Elasticsearch
> >>
> >>
> >>

Reply via email to