[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753790#comment-15753790 ]
David Pilato edited comment on TIKA-2208 at 12/16/16 8:16 AM: -------------------------------------------------------------- So I tried this way. Basically I declared `<service-loader loadErrorHandler="IGNORE"/>` But when I looked at what is happening, this is only used when you build the Tika instance. When you use it with `parseToString` method for example, this service-loader is not used. Here the problem is happening when Tika tries to parse a Word document which is containing a Visio schema. At parsing time, not at initializing time. Here the cause of the issue is clearly coming from our side. We removed a needed library (com.github.virtuald:curvesapi:1.04). {code} _transitive_org.apache.poi:poi-ooxml:3.15 \--- org.apache.poi:poi-ooxml:3.15 +--- org.apache.poi:poi:3.15 | +--- commons-codec:commons-codec:1.10 | \--- org.apache.commons:commons-collections4:4.1 +--- org.apache.poi:poi-ooxml-schemas:3.15 | \--- org.apache.xmlbeans:xmlbeans:2.6.0 | \--- stax:stax-api:1.0.1 \--- com.github.virtuald:curvesapi:1.04 {code} But would it be possible for Tika to catch some end user errors and send a more friendly exception? was (Author: dadoonet): So I tried this way. Basically I declared `<service-loader loadErrorHandler="IGNORE"/>` But when I looked at what is happening, this is only used when you build the Tika instance. When you use it with `parseToString` method for example, this service-loader is not used. Here the problem is happening when Tika tries to parse a Word document which is containing a Visio schema. At parsing time, not at initializing time. Here the cause of the issue is clearly coming from our side. We removed a needed library (`com.github.virtuald:curvesapi:1.04`). ``` _transitive_org.apache.poi:poi-ooxml:3.15 \--- org.apache.poi:poi-ooxml:3.15 +--- org.apache.poi:poi:3.15 | +--- commons-codec:commons-codec:1.10 | \--- org.apache.commons:commons-collections4:4.1 +--- org.apache.poi:poi-ooxml-schemas:3.15 | \--- org.apache.xmlbeans:xmlbeans:2.6.0 | \--- stax:stax-api:1.0.1 \--- com.github.virtuald:curvesapi:1.04 ``` But would it be possible for Tika to catch some end user errors and send a more friendly exception? > Catch missing libraires > ----------------------- > > Key: TIKA-2208 > URL: https://issues.apache.org/jira/browse/TIKA-2208 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: David Pilato > > Hi there > We have decided to remove support for some formats when using Tika to extract > text and metadata. > We defined our list of Parsers: > {code:java} > private static final Parser PARSERS[] = new Parser[] { > // documents > new org.apache.tika.parser.html.HtmlParser(), > new org.apache.tika.parser.rtf.RTFParser(), > new org.apache.tika.parser.pdf.PDFParser(), > new org.apache.tika.parser.txt.TXTParser(), > new org.apache.tika.parser.microsoft.OfficeParser(), > new org.apache.tika.parser.microsoft.OldExcelParser(), > new org.apache.tika.parser.microsoft.ooxml.OOXMLParser(), > new org.apache.tika.parser.odf.OpenDocumentParser(), > new org.apache.tika.parser.iwork.IWorkPackageParser(), > new org.apache.tika.parser.xml.DcXMLParser(), > new org.apache.tika.parser.epub.EpubParser(), > }; > private static final AutoDetectParser PARSER_INSTANCE = new > AutoDetectParser(PARSERS); > private static final Tika TIKA_INSTANCE = new > Tika(PARSER_INSTANCE.getDetector(), PARSER_INSTANCE); > {code} > But when a MS Office Word document embeds another non supported document > (Like a Visio Schema) an {{NoClassDefFoundError}} is raised. > Would it be possible to catch such a case and throw in that case a > {{TikaException}} so it behaves as an Exception and not as a Throwable? -- This message was sent by Atlassian JIRA (v6.3.4#6332)