[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758897#comment-15758897 ]
David Pilato commented on TIKA-2208: ------------------------------------ Adding missing libs {code} compile "com.github.virtuald:curvesapi:1.04" compile "com.bbn.poi.visio:ooxml-visio-schemas:2011.1" {code} This is now causing a JAR Hell issue. Same class available in 2 JARs: {code} Caused by: java.lang.IllegalStateException: jar hell! class: com.microsoft.schemas.office.visio.x2012.main.CellType$Factory jar1: /Users/dpilato/.gradle/caches/modules-2/files-2.1/org.apache.poi/poi-ooxml-schemas/3.15/de4a50ca39de48a19606b35644ecadb2f733c479/poi-ooxml-schemas-3.15.jar jar2: /Users/dpilato/.gradle/caches/modules-2/files-2.1/com.bbn.poi.visio/ooxml-visio-schemas/2011.1/5c395aefc5c1a33f517c243843c909c1f4d6b3f0/ooxml-visio-schemas-2011.1.jar {code} > Catch missing libraires > ----------------------- > > Key: TIKA-2208 > URL: https://issues.apache.org/jira/browse/TIKA-2208 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: David Pilato > > Hi there > We have decided to remove support for some formats when using Tika to extract > text and metadata. > We defined our list of Parsers: > {code:java} > private static final Parser PARSERS[] = new Parser[] { > // documents > new org.apache.tika.parser.html.HtmlParser(), > new org.apache.tika.parser.rtf.RTFParser(), > new org.apache.tika.parser.pdf.PDFParser(), > new org.apache.tika.parser.txt.TXTParser(), > new org.apache.tika.parser.microsoft.OfficeParser(), > new org.apache.tika.parser.microsoft.OldExcelParser(), > new org.apache.tika.parser.microsoft.ooxml.OOXMLParser(), > new org.apache.tika.parser.odf.OpenDocumentParser(), > new org.apache.tika.parser.iwork.IWorkPackageParser(), > new org.apache.tika.parser.xml.DcXMLParser(), > new org.apache.tika.parser.epub.EpubParser(), > }; > private static final AutoDetectParser PARSER_INSTANCE = new > AutoDetectParser(PARSERS); > private static final Tika TIKA_INSTANCE = new > Tika(PARSER_INSTANCE.getDetector(), PARSER_INSTANCE); > {code} > But when a MS Office Word document embeds another non supported document > (Like a Visio Schema) an {{NoClassDefFoundError}} is raised. > Would it be possible to catch such a case and throw in that case a > {{TikaException}} so it behaves as an Exception and not as a Throwable? -- This message was sent by Atlassian JIRA (v6.3.4#6332)