[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626408#comment-16626408 ]
Slava G commented on TIKA-2727: ------------------------------- Tried to reproduce, after few hundreds xml that was transfer to TIKA for parsing, it's hanged out: at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanLiteral(Unknown Source) at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.apache.tika.utils.XMLReaderUtils.parseSAX(XMLReaderUtils.java:371) at org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:53) at org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:44) at org.apache.tika.mime.MimeTypes.getMimeType(MimeTypes.java:212) at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:493) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) > Parsing and detect mime type of XML file stuck in infinite loop > --------------------------------------------------------------- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser > Affects Versions: 1.17 > Reporter: Slava G > Assignee: Tim Allison > Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)