Thanks Tim, I see there's a bug opened since 2017, so I'll vote for it, but don't think opening a new one will help.
On Wed, Jul 29, 2020 at 7:35 PM Tim Allison <talli...@apache.org> wrote: > > as for files, in my case they are from customer and I don't want to > share them. > > > https://corpora.tika.apache.org/datasette/corpora-metadata?sql=select+file_path%2C+orig_stack_trace%0D%0Afrom+containers+c%0D%0Ajoin+profiles+p+on+p.container_id%3Dc.container_id%0D%0Ajoin+PARSE_EXCEPTIONS+e+on+p.id%3De.id%0D%0Awhere+orig_stack_trace+like+%27%250x203%25%27%0D%0Aorder+by+file_path+limit+101 > > triggering file available here: > https://corpora.tika.apache.org/base/docs/bug_trackers/poi/POI-47251-4.xls > > Victory for our regression corpus! > > On Wed, Jul 29, 2020 at 12:14 PM Slava G <slav...@gmail.com> wrote: > >> Thanks Tim. >> Will do, as for files, in my case they are from customer and I don't want >> to share them. >> Thanks >> >> On Wed, Jul 29, 2020, 19:06 Tim Allison <talli...@apache.org> wrote: >> >>> Looks like I identified that one i >>> <https://bz.apache.org/bugzilla/show_bug.cgi?id=60833>n our regression >>> corpus here: https://bz.apache.org/bugzilla/show_bug.cgi?id=60833#c10 >>> >>> Please open an issue on POI's bug tracker. If you need an example file, >>> we can dig one up. >>> >>> On Wed, Jul 29, 2020 at 10:10 AM Slava G <slav...@gmail.com> wrote: >>> >>>> Hi, >>>> I have some Excel files that opens fine in Excel or Numbers in Mac but >>>> TIKA (inprocess and app) throws exception: >>>> >>>> >>>> org.apache.tika.exception.TikaException: Unexpected RuntimeException >>>> from org.apache.tika.parser.microsoft.OfficeParser@408ae0bf >>>> at >>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) >>>> at >>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) >>>> at >>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) >>>> at >>>> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) >>>> at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84) >>>> at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358) >>>> at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309) >>>> at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:267) >>>> at >>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) >>>> at >>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) >>>> at >>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) >>>> at >>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) >>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376) >>>> at >>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:842) >>>> at com.apple.laf.AquaMenuItemUI.doClick(AquaMenuItemUI.java:157) >>>> at >>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:886) >>>> at java.awt.Component.processMouseEvent(Component.java:6539) >>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) >>>> at java.awt.Component.processEvent(Component.java:6304) >>>> at java.awt.Container.processEvent(Container.java:2239) >>>> at java.awt.Component.dispatchEventImpl(Component.java:4889) >>>> at java.awt.Container.dispatchEventImpl(Container.java:2297) >>>> at java.awt.Component.dispatchEvent(Component.java:4711) >>>> at >>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4904) >>>> at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4535) >>>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4476) >>>> at java.awt.Container.dispatchEventImpl(Container.java:2283) >>>> at java.awt.Window.dispatchEventImpl(Window.java:2746) >>>> at java.awt.Component.dispatchEvent(Component.java:4711) >>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:760) >>>> at java.awt.EventQueue.access$500(EventQueue.java:97) >>>> at java.awt.EventQueue$3.run(EventQueue.java:709) >>>> at java.awt.EventQueue$3.run(EventQueue.java:703) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at >>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) >>>> at >>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:84) >>>> at java.awt.EventQueue$4.run(EventQueue.java:733) >>>> at java.awt.EventQueue$4.run(EventQueue.java:731) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at >>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) >>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:730) >>>> at >>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205) >>>> at >>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) >>>> at >>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) >>>> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) >>>> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) >>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) >>>> Caused by: >>>> org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: >>>> Initialisation of record 0x203(NumberRecord) left 4 bytes remaining still >>>> to be read. >>>> at >>>> org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188) >>>> at >>>> org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:235) >>>> at >>>> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:168) >>>> at >>>> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:129) >>>> at >>>> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:343) >>>> at >>>> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:172) >>>> at >>>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) >>>> at >>>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) >>>> at >>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) >>>> ... 46 more >>>> >>>> I'm using TIKA 1.24.1 but it also happens in the previous version. >>>> Thanks >>>> >>>>