Here are the results of a regression run on 1258881 documents.

Overall it looks fairly good, I see no big problems with either the
replaced zip-classes or XmlBeans 3.0.

There are 1332 suspicious new errors with the following breakdown:

* 664 times java.lang.NoClassDefFoundError:
org/openxmlformats/schemas/presentationml/x2006/main/CTHeaderFooter, which
indicates a missing unit test for this CT-class and should be easy to fix
(actually already in the works in my local repo)
* 192 strange NullPointerExceptions in my code which I cannot fully explain
* a number of "Element styles@
http://schemas.openxmlformats.org/wordprocessingml/2006/main is not a valid
styleSheet@http://schemas.openxmlformats.org/spreadsheetml/2006/main
document or a valid substitution.", which I think we saw before as well, so
probably just a flakiness where we had TIMEOUT or OOM before
* 26 java.lang.ArrayIndexOutOfBoundsException in XSLF handling somewhere,
see report for details
* and finally a number of new ZipBombExceptions, which sounds like the
limit for reporting was lowered by the zip-handling-changes?

Full list of errors is at
http://people.apache.org/~centic/poi_regression/reportsAll/index317to400SNAPSHOT.html
Comparison of 3.17 vs. 4.0.0-SNAPSHOT is at
http://people.apache.org/~centic/poi_regression/reports/index317to400SNAPSHOT.html

Thanks... Dominik.

On Tue, Jun 26, 2018 at 9:30 PM, Dominik Stadler <[email protected]>
wrote:

> Hi,
>
> Unfortunately I have to restart the run, commons-compress was missing
> which caused 70k failures out of 1.2mio documents. Thus XmlBeans was not
> properly tested and I need to re-trigger.
>
>
> However I already noticed the following failures that look new (however
> these are all very rare, not related to XmlBeans at all, mostly in
> HSSF/HSLF!):
>
> 4
> ERROR
> java.lang.ClassCastException: o.a.p.hssf.record.StyleRecord cannot be cast
> to o.a.p.hssf.record.ExtendedFormatRecord
>
> java.lang.ClassCastException: o.a.p.hssf.record.StyleRecord cannot be cast to 
> o.a.p.hssf.record.ExtendedFormatRecord
>       at 
> o.a.p.hssf.model.InternalWorkbook.getExFormatAt(InternalWorkbook.java:870)
>       at o.a.p.hssf.usermodel.HSSFCell.getCellStyle(HSSFCell.java:943)
>       at o.a.p.hssf.usermodel.HSSFCell.getCellStyle(HSSFCell.java:71)
>       at o.a.p.stress.HSSFFileHandler.handleFile(HSSFFileHandler.java:63)
>       at 
> org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:64)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
>
> Download
> <http://people.apache.org/~centic/poi_regression/reports/download.oldindex/com.adapchain.www_files_MMP_Based_Sequence.xls>
>
>
>
> 1
> ERROR
> java.lang.NullPointerException
>
> java.lang.NullPointerException
>       at 
> o.a.p.hslf.model.textproperties.HSLFTabStopPropCollection.writeTabStops(HSLFTabStopPropCollection.java:83)
>       at o.a.p.hslf.record.TextRulerAtom.writeIf(TextRulerAtom.java:150)
>       at o.a.p.hslf.record.TextRulerAtom.writeOut(TextRulerAtom.java:126)
>       at 
> o.a.p.hslf.record.EscherTextboxWrapper.writeOut(EscherTextboxWrapper.java:90)
>       at 
> o.a.p.hslf.usermodel.HSLFTextParagraph.refreshRecords(HSLFTextParagraph.java:1158)
>       at 
> o.a.p.hslf.usermodel.HSLFTextParagraph.storeText(HSLFTextParagraph.java:969)
>       at o.a.p.hslf.usermodel.HSLFTextShape.storeText(HSLFTextShape.java:851)
>       at 
> o.a.p.hslf.usermodel.HSLFSlideShow.writeDirtyParagraphs(HSLFSlideShow.java:485)
>       at 
> o.a.p.hslf.usermodel.HSLFSlideShow.writeDirtyParagraphs(HSLFSlideShow.java:477)
>       at o.a.p.hslf.usermodel.HSLFSlideShow.write(HSLFSlideShow.java:451)
>       at o.a.p.stress.SlideShowHandler.writeToArray(SlideShowHandler.java:63)
>       at 
> o.a.p.stress.SlideShowHandler.handleSlideShow(SlideShowHandler.java:49)
>       at o.a.p.stress.HSLFFileHandler.handleFile(HSLFFileHandler.java:49)
>       at 
> org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:64)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
>
> Download
> <http://people.apache.org/~centic/poi_regression/reports/download.oldindex/ca.casid-acedi.www_sites_default_files_GDS_20save_20the_20date_20Nov_204.ppt>
>
>
>
> 1
> ERROR
> java.lang.ArrayIndexOutOfBoundsException: *
>
> java.lang.ArrayIndexOutOfBoundsException: *
>       at 
> o.a.p.hssf.usermodel.HSSFOptimiser.optimiseCellStyles(HSSFOptimiser.java:229)
>       at o.a.p.stress.HSSFFileHandler.handleFile(HSSFFileHandler.java:59)
>       at 
> org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:64)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
>
> Download
> <http://people.apache.org/~centic/poi_regression/reports/download.oldindex/com.9thtee.www_tivoconv3release.xls>
>
> Dominik.
>
> On Mon, Jun 25, 2018 at 7:30 AM, Dominik Stadler <[email protected]>
> wrote:
>
>> A regression run on the > 1mio commoncrawl documents is underway...
>>
>> Dominik
>>
>> On Sun, Jun 24, 2018, 22:50 Andreas Beeker <[email protected]> wrote:
>>
>>> I'm +1 too ... we can refactor the code in the next release.
>>>
>>> A govdocs run would be nice, to see if we have further OOM problems.
>>>
>>> Andi
>>>
>>>
>>>
>

Reply via email to