Two other points...

This is one triggering file:
https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/resources/test-documents/EmbeddedDocument.docx

The problem goes away if I use poi-ooxml-full.


On Mon, Dec 21, 2020 at 11:26 AM Tim Allison <[email protected]> wrote:

> Andi,
>   Thank you for all of your work on this!  This is probably user error,
> but I'm getting a failed test when I integrate poi trunk with Tika.  Is
> this something I can fix at the Tika level?
>
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser@785a4557
>
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at
> org.apache.tika.extractor.ParserContainerExtractor.extract(ParserContainerExtractor.java:82)
> at
> org.apache.tika.parser.microsoft.AbstractPOIContainerExtractionTest.process(AbstractPOIContainerExtractionTest.java:68)
> at
> org.apache.tika.parser.microsoft.POIContainerExtractionTest.testEmbeddedOfficeFilesXML(POIContainerExtractionTest.java:335)
> ...
> Caused by: org.apache.xmlbeans.SchemaTypeLoaderException: XML-BEANS
> compiled schema: Could not locate compiled schema resource
> org/apache/poi/schemas/ooxml/system/ooxml/oleobjectelement.xsb
> (org.apache.poi.schemas.ooxml.system.ooxml.oleobjectelement) - code 0
> at
> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.<init>(SchemaTypeSystemImpl.java:1315)
> at
> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.resolveHandle(SchemaTypeSystemImpl.java:3138)
> at
> org.apache.xmlbeans.SchemaComponent$Ref.getComponent(SchemaComponent.java:113)
> at
> org.apache.xmlbeans.SchemaGlobalElement$Ref.get(SchemaGlobalElement.java:76)
> at
> org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.findElement(SchemaTypeLoaderBase.java:103)
> at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:988)
> at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:913)
> at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1597)
> at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2571)
> at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2565)
> at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:819)
> at org.apache.xmlbeans.impl.store.Cursor.syncWrapHelper(Cursor.java:2522)
> at org.apache.xmlbeans.impl.store.Cursor.syncWrap(Cursor.java:2453)
> at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2080)
> at
> org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractParagraph(XWPFWordExtractorDecorator.java:236)
> at
> org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractIBodyText(XWPFWordExtractorDecorator.java:161)
> at
> org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:124)
> at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:136)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:213)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
> On Sat, Dec 19, 2020 at 8:47 AM Tim Allison <[email protected]> wrote:
>
>> If anyone else on this list has time and an interest POI 5.0.0 is on the
>> way! Please help test!
>>
>> ---------- Forwarded message ---------
>> From: Tim Allison <[email protected]>
>> Date: Sat, Dec 19, 2020 at 8:45 AM
>> Subject: Re: Plea - test the POI 5.0.0 snapshot
>> To: POI Users List <[email protected]>
>>
>>
>> Will integrate w Tika on Monday and test it out. Thank you!!!
>>
>> On Sat, Dec 19, 2020 at 7:52 AM Andreas Beeker <[email protected]>
>> wrote:
>>
>>> Dear POI users,
>>>
>>> we are shortly before releasing POI 5.0.0 and there have been some
>>> breaking changes [1].
>>> Notably the JPMS/JigSaw migration and the upgrade of the ECMA-376
>>> schemas to the 5th edition.
>>>
>>> Please download the snapshot [2] and give it a try - especially with the
>>> new schemas, I'm interested if documents created by POI still can be opened
>>> without errors in various office applications.
>>>
>>> Thank you for your support.
>>>
>>> Andi
>>>
>>>
>>> [1] http://poi.apache.org/changes.html
>>>
>>> [2]
>>> https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>

Reply via email to