Rob Tulloh created TIKA-934:
-------------------------------
Summary: Tika in server mode stops responding and reports NPE over
and over in logs
Key: TIKA-934
URL: https://issues.apache.org/jira/browse/TIKA-934
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.1
Environment: CentOS 5.x
Reporter: Rob Tulloh
Priority: Critical
We run tika in server mode via:
/usr/java/jdk/bin/java -Dlog4j.app.name=-server
-Djavax.xml.soap.MessageFactory=com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl
-Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -server -Xms256M
-Xmx768M -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/oom/content-extractor-8983.dump.1 -server -Xms500M
-Xmx500M -jar /opt/tika/tika-app-1.1.jar --text --encoding=UTF-8 --server 8983
Our client talks to this over port 8983. We pass data via the socket and get
the responses back. However, sometimes, tika will get into a bad state and stop
responding.
When this happens, we see this in the logs over and over.
2012-05-24_20:12:33.88573 Caused by: java.lang.NullPointerException
2012-05-24_20:12:33.88576 at
org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
2012-05-24_20:12:33.88580 at
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
2012-05-24_20:12:33.88584 at
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
2012-05-24_20:12:33.88589 at
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
2012-05-24_20:12:33.88593 at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
2012-05-24_20:12:33.88597 at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
2012-05-24_20:12:33.88602 at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
2012-05-24_20:12:33.88606 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
2012-05-24_20:12:33.88611 ... 4 more
2012-05-24_20:12:49.28441 org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParse
r@6906daba
2012-05-24_20:12:49.28458 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
2012-05-24_20:12:49.28466 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
2012-05-24_20:12:49.28477 at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
2012-05-24_20:12:49.28489 at
org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
2012-05-24_20:12:49.28497 at
org.apache.tika.cli.TikaCLI$TikaServer$1.run(TikaCLI.java:735)
2012-05-24_20:12:49.28509 Caused by: java.lang.NullPointerException
2012-05-24_20:12:49.28516 at
org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
2012-05-24_20:12:49.28524 at
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
2012-05-24_20:12:49.28532 at
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
2012-05-24_20:12:49.28541 at
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
2012-05-24_20:12:49.28550 at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
2012-05-24_20:12:49.28558 at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
2012-05-24_20:12:49.28565 at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
2012-05-24_20:12:49.28577 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
2012-05-24_20:12:49.28585 ... 4 more
We have tried to figure out what causes this with no success. We only know that
once the server gets into this state, there is no recourse but to restart the
tika service.
Other instances of tika we have running in the test environment continue to
work. There is some combination of content or work that causes
tika to destabilize. Our working theory is that perhaps tika server is not
thread safe and that may be causing this behavior.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira