[ 
https://issues.apache.org/jira/browse/TIKA-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-934.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2
         Assignee: Jukka Zitting

Thanks for the report! This was indeed the case, the CLI would keep just a 
single global Metadata instance, which ended up causing trouble when running in 
server mode. I fixed than in revision 1355719 by making the CLI use 
per-document Metadata instances.
                
> Tika in server mode stops responding and reports NPE over and over in logs
> --------------------------------------------------------------------------
>
>                 Key: TIKA-934
>                 URL: https://issues.apache.org/jira/browse/TIKA-934
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>         Environment: CentOS 5.x
>            Reporter: Rob Tulloh
>            Assignee: Jukka Zitting
>            Priority: Critical
>             Fix For: 1.2
>
>
> We run tika in server mode via:
> /usr/java/jdk/bin/java -Dlog4j.app.name=-server 
> -Djavax.xml.soap.MessageFactory=com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl
>  -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -server -Xms256M 
> -Xmx768M -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/var/log/oom/content-extractor-8983.dump.1 -server -Xms500M 
> -Xmx500M -jar /opt/tika/tika-app-1.1.jar --text --encoding=UTF-8 --server 8983
> Our client talks to this over port 8983. We pass data via the socket and get 
> the responses back. However, sometimes, tika will get into a bad state and 
> stop responding. 
> When this happens, we see this in the logs over and over. 
> 2012-05-24_20:12:33.88573 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:33.88576       at 
> org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:33.88580       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:33.88584       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:33.88589       at 
> org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:33.88593       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:33.88597       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:33.88602       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:33.88606       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:33.88611       ... 4 more
> 2012-05-24_20:12:49.28441 org.apache.tika.exception.TikaException: Unexpected 
> RuntimeException from org.apache.tika.parser.microsoft.OfficeParse
> r@6906daba
> 2012-05-24_20:12:49.28458       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> 2012-05-24_20:12:49.28466       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28477       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 2012-05-24_20:12:49.28489       at 
> org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
> 2012-05-24_20:12:49.28497       at 
> org.apache.tika.cli.TikaCLI$TikaServer$1.run(TikaCLI.java:735)
> 2012-05-24_20:12:49.28509 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:49.28516       at 
> org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:49.28524       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:49.28532       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:49.28541       at 
> org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:49.28550       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:49.28558       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:49.28565       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:49.28577       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28585       ... 4 more
> We have tried to figure out what causes this with no success. We only know 
> that once the server gets into this state, there is no recourse but to 
> restart the tika service.
> Other instances of tika we have running in the test environment continue to 
> work. There is some combination of content or work that causes
> tika to destabilize. Our working theory is that perhaps tika server is not 
> thread safe and that may be causing this behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to