[ 
https://issues.apache.org/jira/browse/TIKA-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414296#comment-13414296
 ] 

Rob Tulloh commented on TIKA-954:
---------------------------------

We bumped the JVM size to 2 GB. We now get an empty reply from the call. Here 
is what tika reported in the log file. What I cannot tell is if this is a 
limitation of the server or perhaps curl. I am tempted to believe it is the 
server rather than curl. The document in question appears to be 3000+ pages of 
text.

2012-07-14_00:17:40.15182 INFO: tika/12345/Word.docx (autodetecting type)
2012-07-14_01:04:14.43799 Jul 13, 2012 8:04:12 PM 
org.apache.cxf.jaxrs.impl.WebApplicationExceptionMapper toResponse
t South Africa in 2000 on my unhappy first senior England tour."
2012-07-14_01:04:14.75706 Jul 13, 2012 8:04:12 PM 
org.apache.cxf.phase.PhaseInterceptorChain doDefaultLogging
 unwinding now
2012-07-14_01:04:14.75707 org.apache.cxf.interceptor.Fault: Could not send 
Message.
dleMessage(MessageSenderInterceptor.java:64)
2012-07-14_01:04:14.75709       at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:263)
ptor.java:77)
2012-07-14_01:04:14.75710       at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:263)
a:123)
nation.java:323)
n.java:289)
2012-07-14_01:04:14.76707       at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:72)
2012-07-14_01:04:14.76707       at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:943)
2012-07-14_01:04:14.76708       at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:879)
2012-07-14_01:04:14.76708       at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
ion.java:250)
2012-07-14_01:04:14.76709       at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
2012-07-14_01:04:14.76709       at 
org.eclipse.jetty.server.Server.handle(Server.java:345)
2012-07-14_01:04:14.76710       at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441)
ava:919)
2012-07-14_01:04:14.76712       at 
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:588)
2012-07-14_01:04:14.76712       at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:218)
2012-07-14_01:04:14.76714       at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51)
2012-07-14_01:04:14.76714       at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586)
2012-07-14_01:04:14.76715       at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44)
2012-07-14_01:04:14.76715       at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598)
2012-07-14_01:04:14.76716       at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533)
2012-07-14_01:04:14.76716       at java.lang.Thread.run(Thread.java:662)
2012-07-14_01:04:14.76716 Caused by: org.eclipse.jetty.io.EofException
2012-07-14_01:04:14.76717       at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:921)
2012-07-14_01:04:14.76717       at 
org.eclipse.jetty.server.HttpConnection.flushResponse(HttpConnection.java:612)
2012-07-14_01:04:14.76718       at 
org.eclipse.jetty.server.HttpConnection$Output.close(HttpConnection.java:995)
2012-07-14_01:04:14.76718       at 
org.apache.cxf.transport.http.AbstractHTTPDestination$WrappedOutputStream.close(AbstractHTTPDestination.java:650)
2012-07-14_01:04:14.76720       at 
org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
2012-07-14_01:04:14.76721       at 
org.apache.cxf.transport.http.AbstractHTTPDestination$BackChannelConduit.close(AbstractHTTPDestination.java:593)
2012-07-14_01:04:14.76721       at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62)
2012-07-14_01:04:14.76722       ... 23 more
2012-07-14_01:04:14.76722 Caused by: java.nio.channels.ClosedChannelException
2012-07-14_01:04:14.76722       at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
2012-07-14_01:04:14.76724       at 
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:357)
2012-07-14_01:04:14.76724       at 
java.nio.channels.SocketChannel.write(SocketChannel.java:360)
2012-07-14_01:04:14.76725       at 
org.eclipse.jetty.io.nio.ChannelEndPoint.gatheringFlush(ChannelEndPoint.java:354)
2012-07-14_01:04:14.76725       at 
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:292)
2012-07-14_01:04:14.76725       at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:300)
2012-07-14_01:04:14.76726       at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:848)
2012-07-14_01:04:14.76726       ... 29 more
2012-07-14_01:04:14.76727 Jul 13, 2012 8:04:12 PM 
org.apache.cxf.phase.PhaseInterceptorChain doDefaultLogging
2012-07-14_01:04:14.76730 WARNING: Interceptor for 
{http://server.tika.apache.org/}MetadataResource has thrown exception, 
unwinding now
2012-07-14_01:04:14.76731 org.apache.cxf.interceptor.Fault: XML_WRITE_EXC
2012-07-14_01:04:14.76731       at 
org.apache.cxf.binding.xml.interceptor.XMLFaultOutInterceptor.handleMessage(XMLFaultOutInterceptor.java:87)
2012-07-14_01:04:14.76731       at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:263)
2012-07-14_01:04:14.76732       at 
org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:113)
2012-07-14_01:04:14.76732       at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:323)
2012-07-14_01:04:14.76733       at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:77)
2012-07-14_01:04:14.76734       at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:263)
2012-07-14_01:04:14.76735       at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123)
2012-07-14_01:04:14.76735       at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.serviceRequest(JettyHTTPDestination.java:323)
n.java:289)
2012-07-14_01:04:14.76737       at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:72)
2012-07-14_01:04:14.76737       at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:943)
2012-07-14_01:04:14.76738       at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:879)
2012-07-14_01:04:14.76738       at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
2012-07-14_01:04:14.76739       at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
2012-07-14_01:04:14.76740       at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
2012-07-14_01:04:14.76741       at 
org.eclipse.jetty.server.Server.handle(Server.java:345)
2012-07-14_01:04:14.76741       at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441)
2012-07-14_01:04:14.76741       at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:919)
2012-07-14_01:04:14.76742       at 
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:588)
2012-07-14_01:04:14.76742       at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:218)
2012-07-14_01:04:14.76743       at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51)
2012-07-14_01:04:14.76744       at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586)
2012-07-14_01:04:14.76744       at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44)
2012-07-14_01:04:14.76745       at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598)
2012-07-14_01:04:14.76745       at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533)
2012-07-14_01:04:14.76745       at java.lang.Thread.run(Thread.java:662)
2012-07-14_01:04:14.76748 Caused by: com.ctc.wstx.exc.WstxIOException: null
2012-07-14_01:04:14.76748       at 
com.ctc.wstx.sw.BaseStreamWriter.flush(BaseStreamWriter.java:257)
tInterceptor.java:85)
2012-07-14_01:04:14.76749       ... 25 more
2012-07-14_01:04:14.76750 Caused by: org.eclipse.jetty.io.EofException
2012-07-14_01:04:14.76751       at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:403)
2012-07-14_01:04:14.76752       at 
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:535)
2012-07-14_01:04:14.76752       at 
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:159)
2012-07-14_01:04:14.76754       at 
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:101)
2012-07-14_01:04:14.76755       at 
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:46)
2012-07-14_01:04:14.76756       at 
com.ctc.wstx.sw.EncodingXmlWriter.flushBuffer(EncodingXmlWriter.java:697)
2012-07-14_01:04:14.76756       at 
com.ctc.wstx.sw.EncodingXmlWriter.flush(EncodingXmlWriter.java:171)
2012-07-14_01:04:14.76757       at 
com.ctc.wstx.sw.BaseStreamWriter.flush(BaseStreamWriter.java:255)
2012-07-14_01:04:14.76757       ... 26 more

                
> Tika throws OOM and GC limited exceeded on Microsoft docx file
> --------------------------------------------------------------
>
>                 Key: TIKA-954
>                 URL: https://issues.apache.org/jira/browse/TIKA-954
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: Linux (CentOS 4.x)
>            Reporter: Rob Tulloh
>         Attachments: Word.docx
>
>
> Stack trace produced with attached docx file
> 2012-07-13_04:45:36.86910 java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
> 2012-07-13_04:45:36.86932 Dumping heap to 
> /var/log/oom/content-extractor-9998.dump.1 ...
> 2012-07-13_04:46:47.38774 Heap dump file created [925402960 bytes in 70.518 
> secs]
> 2012-07-13_04:46:57.17658 java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
> 2012-07-13_04:46:57.17718       at 
> java.lang.String.substring(String.java:1939)
> 2012-07-13_04:46:57.17736       at 
> org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3254)
> 2012-07-13_04:46:57.17750       at 
> org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
> 2012-07-13_04:46:57.17763       at 
> org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1822)
> 2012-07-13_04:46:57.17777       at 
> org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
> 2012-07-13_04:46:57.17793       at 
> org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
> 2012-07-13_04:46:57.17806       at 
> org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1293)
> 2012-07-13_04:46:57.17819       at 
> org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261)
> 2012-07-13_04:46:57.17839       at 
> org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4808)
> 2012-07-13_04:46:57.17853       at 
> org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
> 2012-07-13_04:46:57.17868       at 
> org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
> 2012-07-13_04:46:57.17883       at 
> org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
> 2012-07-13_04:46:57.17897       at 
> org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
> 2012-07-13_04:46:57.17911       at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
> 2012-07-13_04:46:57.17929       at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
> 2012-07-13_04:46:57.17945       at 
> org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
> 2012-07-13_04:46:57.17962       at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
>  Source)
> 2012-07-13_04:46:57.17978       at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134)
> 2012-07-13_04:46:57.17991       at 
> org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
> 2012-07-13_04:46:57.18004       at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:116)
> 2012-07-13_04:46:57.18019       at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53)
> 2012-07-13_04:46:57.18035       at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
> 2012-07-13_04:46:57.18051       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87)
> 2012-07-13_04:46:57.18066       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
> 2012-07-13_04:46:57.18078       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-07-13_04:46:57.18090       at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 2012-07-13_04:46:57.18103       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-07-13_04:46:57.18115       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 2012-07-13_04:46:57.18127       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
> 2012-07-13_04:46:57.18146       at 
> org.apache.tika.server.TikaResource$3.write(TikaResource.java:138)
> 2012-07-13_04:46:57.18158       at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:117)
> 2012-07-13_04:46:57.18169       at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:257)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to