[ https://issues.apache.org/jira/browse/TIKA-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414297#comment-13414297 ]
Rob Tulloh commented on TIKA-954: --------------------------------- curl output: * Connected to localhost (127.0.0.1) port 9998 > PUT /tika/12345/Word.docx HTTP/1.1 > User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 > OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5 > Host: localhost:9998 > Accept: */* > Content-Type: application/octet-stream > Content-Length: 4543821 > Expect: 100-continue > < HTTP/1.1 100 Continue % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4437k 0 0 100 4437k 0 12612 0:06:00 0:06:00 --:--:-- 0Empty reply from server 100 4437k 0 0 100 4437k 0 12612 0:06:00 0:06:00 --:--:-- 0* Connection #0 to host localhost left intact curl: (52) Empty reply from server * Closing connection #0 > Tika throws OOM and GC limited exceeded on Microsoft docx file > -------------------------------------------------------------- > > Key: TIKA-954 > URL: https://issues.apache.org/jira/browse/TIKA-954 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.2 > Environment: Linux (CentOS 4.x) > Reporter: Rob Tulloh > Attachments: Word.docx > > > Stack trace produced with attached docx file > 2012-07-13_04:45:36.86910 java.lang.OutOfMemoryError: GC overhead limit > exceeded > 2012-07-13_04:45:36.86932 Dumping heap to > /var/log/oom/content-extractor-9998.dump.1 ... > 2012-07-13_04:46:47.38774 Heap dump file created [925402960 bytes in 70.518 > secs] > 2012-07-13_04:46:57.17658 java.lang.OutOfMemoryError: GC overhead limit > exceeded > 2012-07-13_04:46:57.17718 at > java.lang.String.substring(String.java:1939) > 2012-07-13_04:46:57.17736 at > org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3254) > 2012-07-13_04:46:57.17750 at > org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082) > 2012-07-13_04:46:57.17763 at > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1822) > 2012-07-13_04:46:57.17777 at > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521) > 2012-07-13_04:46:57.17793 at > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362) > 2012-07-13_04:46:57.17806 at > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1293) > 2012-07-13_04:46:57.17819 at > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261) > 2012-07-13_04:46:57.17839 at > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4808) > 2012-07-13_04:46:57.17853 at > org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290) > 2012-07-13_04:46:57.17868 at > org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400) > 2012-07-13_04:46:57.17883 at > org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714) > 2012-07-13_04:46:57.17897 at > org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439) > 2012-07-13_04:46:57.17911 at > org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270) > 2012-07-13_04:46:57.17929 at > org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257) > 2012-07-13_04:46:57.17945 at > org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345) > 2012-07-13_04:46:57.17962 at > org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown > Source) > 2012-07-13_04:46:57.17978 at > org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134) > 2012-07-13_04:46:57.17991 at > org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159) > 2012-07-13_04:46:57.18004 at > org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:116) > 2012-07-13_04:46:57.18019 at > org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53) > 2012-07-13_04:46:57.18035 at > org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180) > 2012-07-13_04:46:57.18051 at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87) > 2012-07-13_04:46:57.18066 at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) > 2012-07-13_04:46:57.18078 at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > 2012-07-13_04:46:57.18090 at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91) > 2012-07-13_04:46:57.18103 at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > 2012-07-13_04:46:57.18115 at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > 2012-07-13_04:46:57.18127 at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136) > 2012-07-13_04:46:57.18146 at > org.apache.tika.server.TikaResource$3.write(TikaResource.java:138) > 2012-07-13_04:46:57.18158 at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:117) > 2012-07-13_04:46:57.18169 at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:257) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira