[ 
https://issues.apache.org/jira/browse/TIKA-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quentin Laville updated TIKA-2891:
----------------------------------
    Description: 
Due to "OOM: heap space" caused by big ".doc" files, we have decided to move to 
a "ForkParser" in order to get this errors, log them and keep the processing of 
next documents.

Unfortunately, whenever we have an image in a document, we get the following 
error:
{code:java}
Unexpected error in forked server process
org.apache.tika.exception.TikaException: Unexpected error in forked server 
process
... (bunch of line to tell call to "ForkParser.parse" failed)
Cause: java.util.ServiceConfigurationError: 
javax.imageio.spi.ImageOutputStreamSpi: Provider 
com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi could not be 
instantiated
 at java.util.ServiceLoader.fail(ServiceLoader.java:232)
 at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
 at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
 at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
 at 
javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
 at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
 at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
 at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
 at org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:174)
 ...
 Cause: java.lang.ExceptionInInitializerError:
 at 
com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at java.lang.Class.newInstance(Class.java:442)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
 at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
 at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
 at 
javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
 ...
 Cause: java.lang.NullPointerException:
 at com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)
 at 
com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at java.lang.Class.newInstance(Class.java:442)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
 at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
 at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
 ...
{code}
This kind of errors didn't appear before, when we were only using an 
"AutodetectParser". My research of a solution lead me to "ForkClient" where you 
can see that only the "Main-Class" is defined in "META-INF/MANIFEST.MF",  
whereas in 
"com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)" 
they check that the "Implementation-Vendor" and "Implementation-Version" are 
not null.

As the name of the package suggests, it happens only with files containing 
image(s).

It's quite easy to reproduce:
 # download a simple file example with [this 
link|https://file-examples.com/wp-content/uploads/2017/10/file-sample_100kB.odt]
 # use this piece of code:

{code:java}
def test = {
  val forkParser = new ForkParser(ExtractText.getClass.getClassLoader, new 
AutoDetectParser())

  val output = new BodyContentHandler()
  val stream = TikaInputStream.get(new 
FileInputStream("/path/to/file-sample_100kB.odt"))
  val ctx = new ParseContext()

  forkParser.parse(stream, output, new Metadata(), ctx)
}{code}

  was:
Due to "OOM: heap space" caused by big ".doc" files, we have decided to move to 
a "ForkParser" in order to get this errors, log them and keep the processing of 
next documents.

Unfortunately, whenever we have an image in a document, we get the following 
error:
{code:java}
Unexpected error in forked server process
org.apache.tika.exception.TikaException: Unexpected error in forked server 
process
... (bunch of line to tell call to "ForkParser.parse" failed)
Cause: java.util.ServiceConfigurationError: 
javax.imageio.spi.ImageOutputStreamSpi: Provider 
com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi could not be 
instantiated
 at java.util.ServiceLoader.fail(ServiceLoader.java:232)
 at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
 at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
 at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
 at 
javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
 at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
 at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
 at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
 at org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:174)
 ...
 Cause: java.lang.ExceptionInInitializerError:
 at 
com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at java.lang.Class.newInstance(Class.java:442)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
 at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
 at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
 at 
javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
 ...
 Cause: java.lang.NullPointerException:
 at com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)
 at 
com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at java.lang.Class.newInstance(Class.java:442)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
 at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
 at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
 ...
{code}
This kind of errors didn't appear before, when we were only using an 
"AutodetectParser". My research of a solution lead me to "ForkClient" where you 
can see that only the "Main-Class" is defined,  whereas in 
"com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)" 
they check that the "Implementation-Vendor" and "Implementation-Version" are 
not null.

As the name of the package suggests, it happens only with files containing 
image(s).

It's quite easy to reproduce:
 # download a simple file example with [this 
link|https://file-examples.com/wp-content/uploads/2017/10/file-sample_100kB.odt]
 # use this piece of code:

{code:java}
def test = {
  val forkParser = new ForkParser(ExtractText.getClass.getClassLoader, new 
AutoDetectParser())

  val output = new BodyContentHandler()
  val stream = TikaInputStream.get(new 
FileInputStream("/path/to/file-sample_100kB.odt"))
  val ctx = new ParseContext()

  forkParser.parse(stream, output, new Metadata(), ctx)
}{code}


> ForkClient "fillBootstrapJar()" lack few "MANIFEST.MF" properties
> -----------------------------------------------------------------
>
>                 Key: TIKA-2891
>                 URL: https://issues.apache.org/jira/browse/TIKA-2891
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.18
>            Reporter: Quentin Laville
>            Priority: Blocker
>              Labels: bug, forkclient, forkparser, parser
>
> Due to "OOM: heap space" caused by big ".doc" files, we have decided to move 
> to a "ForkParser" in order to get this errors, log them and keep the 
> processing of next documents.
> Unfortunately, whenever we have an image in a document, we get the following 
> error:
> {code:java}
> Unexpected error in forked server process
> org.apache.tika.exception.TikaException: Unexpected error in forked server 
> process
> ... (bunch of line to tell call to "ForkParser.parse" failed)
> Cause: java.util.ServiceConfigurationError: 
> javax.imageio.spi.ImageOutputStreamSpi: Provider 
> com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi could not be 
> instantiated
>  at java.util.ServiceLoader.fail(ServiceLoader.java:232)
>  at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
>  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
>  at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
>  at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
>  at 
> javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
>  at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
>  at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
>  at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
>  at 
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:174)
>  ...
>  Cause: java.lang.ExceptionInInitializerError:
>  at 
> com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at java.lang.Class.newInstance(Class.java:442)
>  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
>  at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
>  at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
>  at 
> javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
>  ...
>  Cause: java.lang.NullPointerException:
>  at 
> com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)
>  at 
> com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at java.lang.Class.newInstance(Class.java:442)
>  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
>  at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
>  at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
>  ...
> {code}
> This kind of errors didn't appear before, when we were only using an 
> "AutodetectParser". My research of a solution lead me to "ForkClient" where 
> you can see that only the "Main-Class" is defined in "META-INF/MANIFEST.MF",  
> whereas in 
> "com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)" 
> they check that the "Implementation-Vendor" and "Implementation-Version" are 
> not null.
> As the name of the package suggests, it happens only with files containing 
> image(s).
> It's quite easy to reproduce:
>  # download a simple file example with [this 
> link|https://file-examples.com/wp-content/uploads/2017/10/file-sample_100kB.odt]
>  # use this piece of code:
> {code:java}
> def test = {
>   val forkParser = new ForkParser(ExtractText.getClass.getClassLoader, new 
> AutoDetectParser())
>   val output = new BodyContentHandler()
>   val stream = TikaInputStream.get(new 
> FileInputStream("/path/to/file-sample_100kB.odt"))
>   val ctx = new ParseContext()
>   forkParser.parse(stream, output, new Metadata(), ctx)
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to