Two other points... This is one triggering file: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/resources/test-documents/EmbeddedDocument.docx
The problem goes away if I use poi-ooxml-full. On Mon, Dec 21, 2020 at 11:26 AM Tim Allison <[email protected]> wrote: > Andi, > Thank you for all of your work on this! This is probably user error, > but I'm getting a failed test when I integrate poi trunk with Tika. Is > this something I can fix at the Tika level? > > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.microsoft.ooxml.OOXMLParser@785a4557 > > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at > org.apache.tika.extractor.ParserContainerExtractor.extract(ParserContainerExtractor.java:82) > at > org.apache.tika.parser.microsoft.AbstractPOIContainerExtractionTest.process(AbstractPOIContainerExtractionTest.java:68) > at > org.apache.tika.parser.microsoft.POIContainerExtractionTest.testEmbeddedOfficeFilesXML(POIContainerExtractionTest.java:335) > ... > Caused by: org.apache.xmlbeans.SchemaTypeLoaderException: XML-BEANS > compiled schema: Could not locate compiled schema resource > org/apache/poi/schemas/ooxml/system/ooxml/oleobjectelement.xsb > (org.apache.poi.schemas.ooxml.system.ooxml.oleobjectelement) - code 0 > at > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.<init>(SchemaTypeSystemImpl.java:1315) > at > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.resolveHandle(SchemaTypeSystemImpl.java:3138) > at > org.apache.xmlbeans.SchemaComponent$Ref.getComponent(SchemaComponent.java:113) > at > org.apache.xmlbeans.SchemaGlobalElement$Ref.get(SchemaGlobalElement.java:76) > at > org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.findElement(SchemaTypeLoaderBase.java:103) > at > org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:988) > at > org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:913) > at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1597) > at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2571) > at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2565) > at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:819) > at org.apache.xmlbeans.impl.store.Cursor.syncWrapHelper(Cursor.java:2522) > at org.apache.xmlbeans.impl.store.Cursor.syncWrap(Cursor.java:2453) > at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2080) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractParagraph(XWPFWordExtractorDecorator.java:236) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractIBodyText(XWPFWordExtractorDecorator.java:161) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:124) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:136) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:213) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > On Sat, Dec 19, 2020 at 8:47 AM Tim Allison <[email protected]> wrote: > >> If anyone else on this list has time and an interest POI 5.0.0 is on the >> way! Please help test! >> >> ---------- Forwarded message --------- >> From: Tim Allison <[email protected]> >> Date: Sat, Dec 19, 2020 at 8:45 AM >> Subject: Re: Plea - test the POI 5.0.0 snapshot >> To: POI Users List <[email protected]> >> >> >> Will integrate w Tika on Monday and test it out. Thank you!!! >> >> On Sat, Dec 19, 2020 at 7:52 AM Andreas Beeker <[email protected]> >> wrote: >> >>> Dear POI users, >>> >>> we are shortly before releasing POI 5.0.0 and there have been some >>> breaking changes [1]. >>> Notably the JPMS/JigSaw migration and the upgrade of the ECMA-376 >>> schemas to the 5th edition. >>> >>> Please download the snapshot [2] and give it a try - especially with the >>> new schemas, I'm interested if documents created by POI still can be opened >>> without errors in various office applications. >>> >>> Thank you for your support. >>> >>> Andi >>> >>> >>> [1] http://poi.apache.org/changes.html >>> >>> [2] >>> https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/ >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>>
