[ 
https://issues.apache.org/jira/browse/TIKA-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285068#comment-13285068
 ] 

Rob Tulloh commented on TIKA-934:
---------------------------------

Additional evidence of re-entrancy issues:

2012-05-22_19:10:39.31249 Caused by: java.util.ConcurrentModificationException
2012-05-22_19:10:39.31253       at 
java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
2012-05-22_19:10:39.31257       at 
java.util.HashMap$KeyIterator.next(HashMap.java:828)
2012-05-22_19:10:39.31262       at 
java.util.AbstractCollection.toArray(AbstractCollection.java:171)
2012-05-22_19:10:39.31266       at 
org.apache.tika.metadata.Metadata.names(Metadata.java:171)
2012-05-22_19:10:39.31270       at 
org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:156)
2012-05-22_19:10:39.31275       at 
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
2012-05-22_19:10:39.31280       at 
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:281)
2012-05-22_19:10:39.31285       at 
org.apache.tika.parser.pdf.PDF2XHTML.startPage(PDF2XHTML.java:128)
2012-05-22_19:10:39.31289       at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:420)
2012-05-22_19:10:39.31293       at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
2012-05-22_19:10:39.31296       at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
2012-05-22_19:10:39.31300       at 
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:63)
2012-05-22_19:10:39.31304       at 
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:140)
2012-05-22_19:10:39.31308       at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
2012-05-22_19:10:39.31312       ... 4 more

                
> Tika in server mode stops responding and reports NPE over and over in logs
> --------------------------------------------------------------------------
>
>                 Key: TIKA-934
>                 URL: https://issues.apache.org/jira/browse/TIKA-934
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>         Environment: CentOS 5.x
>            Reporter: Rob Tulloh
>            Priority: Critical
>
> We run tika in server mode via:
> /usr/java/jdk/bin/java -Dlog4j.app.name=-server 
> -Djavax.xml.soap.MessageFactory=com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl
>  -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -server -Xms256M 
> -Xmx768M -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/var/log/oom/content-extractor-8983.dump.1 -server -Xms500M 
> -Xmx500M -jar /opt/tika/tika-app-1.1.jar --text --encoding=UTF-8 --server 8983
> Our client talks to this over port 8983. We pass data via the socket and get 
> the responses back. However, sometimes, tika will get into a bad state and 
> stop responding. 
> When this happens, we see this in the logs over and over. 
> 2012-05-24_20:12:33.88573 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:33.88576       at 
> org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:33.88580       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:33.88584       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:33.88589       at 
> org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:33.88593       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:33.88597       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:33.88602       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:33.88606       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:33.88611       ... 4 more
> 2012-05-24_20:12:49.28441 org.apache.tika.exception.TikaException: Unexpected 
> RuntimeException from org.apache.tika.parser.microsoft.OfficeParse
> r@6906daba
> 2012-05-24_20:12:49.28458       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> 2012-05-24_20:12:49.28466       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28477       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 2012-05-24_20:12:49.28489       at 
> org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
> 2012-05-24_20:12:49.28497       at 
> org.apache.tika.cli.TikaCLI$TikaServer$1.run(TikaCLI.java:735)
> 2012-05-24_20:12:49.28509 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:49.28516       at 
> org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:49.28524       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:49.28532       at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:49.28541       at 
> org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:49.28550       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:49.28558       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:49.28565       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:49.28577       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28585       ... 4 more
> We have tried to figure out what causes this with no success. We only know 
> that once the server gets into this state, there is no recourse but to 
> restart the tika service.
> Other instances of tika we have running in the test environment continue to 
> work. There is some combination of content or work that causes
> tika to destabilize. Our working theory is that perhaps tika server is not 
> thread safe and that may be causing this behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to