Re: [VOTE] Release Apache POI 5.4.1 (RC1)

2025-04-04 Thread Tim Allison
+1 confirmed digests built locally with Gradle 8.13 and java 17 Integrated and built successfully with Tika 3.x -- did not run full regression tests Thank you, PJ and team! On Tue, Apr 1, 2025 at 5:25 PM PJ Fanning wrote: > Hello POI Community, > > This is a call for a vote to release Apache P

Re: convergence issues

2025-01-08 Thread Tim Allison
> > poms published to Maven Central. > > > > > https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html > > > > On Wed, 8 Jan 2025 at 21:16, Dave Fisher wrote: > >> > >> > >> > >>> On Jan 8, 2025, at 11:13 AM, Tim Allison wrot

Re: convergence issues

2025-01-08 Thread Tim Allison
21:16, Dave Fisher wrote: > > > > > > > > > On Jan 8, 2025, at 11:13 AM, Tim Allison wrote: > > > > > > Thank you, all. I'm sorry for the noise. > > > > > > As you all point out, these are not a POI or even XMLBeans issue, and >

convergence issues

2025-01-08 Thread Tim Allison
-api:jar:2.24.3:compile > > > Not sure if you’d like to address this before release, but this would > > make our build with the dependencyConvergence rule enabled in the Maven > > enforcer plugin unhappy. For now I have fixed it by excluding the > log4j-api > > d

Convergence issues...

2025-01-07 Thread Tim Allison
Sorry. I'm looking at these more closely, and the problem is with the maven dependencies brought in my xmlbeans...not something that we should fix on POI or xmlbeans. WDYT? P.S. I did notice some convergence issues. I don't think these are a > showstopper...not clear if we should fix these in XM

Re: [VOTE] Release Apache POI 5.4.0 (RC2)

2025-01-07 Thread Tim Allison
+1 Apologies for my delay. Looks good. Confirmed src.tgz digest Built locally and ran tests Integrated with Tika's main branch. Thank you PJ, Dominik and team! P.S. I did notice some convergence issues. I don't think these are a showstopper...not clear if we should fix these in XMLBeans or let

Re: [VOTE] Apache XmlBeans 5.3.0 release (RC2)

2024-12-13 Thread Tim Allison
Sorry for my delay. +1 * built from source * confirmed sha512 of source * built Tika successfully with expected modifications Thank you, PJ, Dominik and team! On Thu, Dec 12, 2024 at 6:43 PM PJ Fanning wrote: > > Would any of the POI PMC members have time to review this RC? We just need > one

Re: [DISCUSS] XMLBeans 5.3.0 release and after that, a POI 5.4.0 release (RC2)

2024-11-25 Thread Tim Allison
Sounds great to me. Thank you, PJ! On Mon, Nov 25, 2024 at 8:56 AM PJ Fanning wrote: > With the log4j 2.24.2 release, I think we should have an XMLBeans 5.3.0 > release. > The main changes are to upgrade the log4j version away from log4j 2.24.1 > which caused problems. > I've also added a wrappe

Re: [DISCUSS] working around log4j issues

2024-11-19 Thread Tim Allison
lf4j-api is not immune to making breaking changes. > I still argue that we could wrap the logger init so that we can avoid having > logger init issues fail POI startup. > > > > > > > On Saturday 16 November 2024 at 15:12:14 GMT+1, Tim Allison > wrote: > >

Re: [DISCUSS] working around log4j issues

2024-11-16 Thread Tim Allison
Thank you, PJ, for leading this effort. I completely agree that we can't let log4j cause problems for us, and I like your proposal to wrap log4j. Is going back to slf4j off the table? On Thu, Nov 14, 2024 at 5:06 PM PJ Fanning wrote: > > We've already migrated from our own POILogger that was disa

Re: [VOTE] Apache XmlBeans 5.2.2 release (RC1)

2024-11-04 Thread Tim Allison
Checked source checksum and built with Tika's main branch* and POI's main branch. I had some convergence issues with the main branch of POI (5.4.0-SNAPSHOT)'s versions of plexus-utils and plexus-classworlds, but those are likely user error and trivial to fix? I also noticed that rat didn't like t

Re: xmlbeans and poi releases

2024-10-28 Thread Tim Allison
Sounds great. Thank you, PJ! On Sat, Oct 26, 2024 at 11:36 AM PJ Fanning wrote: > I'm wondering whether we should do new releases. > The XMLBeans changes don't affect POI much but it would be nice for POI to > depend on the latest XMLBeans release. > > > https://issues.apache.org/jira/browse/XML

Re: [VOTE] Release Apache POI 5.2.5 (RC1)

2023-11-20 Thread Tim Allison
+1 Confirmed digests, built locally and integrated into a local build of Apache Tika's main branch. Ran regression tests earlier and found improvements on items identified in 5.2.4. Thank you, PJ, Dominik and team! On Sun, Nov 19, 2023 at 3:30 PM Dominik Stadler wrote: > Hi, > > Verified conte

Re: [DISCUSS] POI 5.2.5 release

2023-11-17 Thread Tim Allison
om> wrote: > > > > > > The build is not stable at the moment. Looks like there are some build > fixes needed before we can get an RC ready. > > > > > > > On Thursday 16 November 2023 at 22:41:28 GMT+1, Tim Allison < > talli...@apache.org> wrote:

Re: [DISCUSS] POI 5.2.5 release

2023-11-16 Thread Tim Allison
update POI to use the new XMLBeans version. > > I think we can then create an RC1 for POI 5.2.5. I can do this. Maybe > tomorrow. > > According to Tim Allison, Apache Tika are waiting for this release [1]. > > The changes are listed here [2]. > > [1] https://lists.apache.or

Re: [DISCUSS] XMLBeans 5.2.0 release

2023-11-16 Thread Tim Allison
Thank you, PJ, for running the XMLBeans 5.2.0 release! We are holding the Tika 3.0.0-BETA release for POI 5.2.5. I agree there's not a major rush, but it would be great to get that out. Let me know if/when I should run our regression tests with 5.2.5. Thank you, again! Best, Tim On Tue,

Re: [VOTE] Apache XmlBeans 5.2.0 release (RC1)

2023-11-15 Thread Tim Allison
+1 Thank you, PJ! I verified the checksums. I did get two rat failures that don't concern me (user error?) when I ran `gradle build test`: ...xmlbeans-5.2.0/javadocs/package-list ...xmlbeans-5.2.0/javadocs/script.js On Wed, Nov 8, 2023 at 4:48 PM Dominik Stadler wrote: > Hi, > > did a check

Re: POI 5.2.5 release

2023-10-16 Thread Tim Allison
This just bit us on Tika: https://bz.apache.org/bugzilla/show_bug.cgi?id=67767 The fix is easy. I can patch it today. It would be great to get it into 5.2.5. I'm sorry that I didn't catch it during the earlier regression tests...my fault. On Sun, Oct 15, 2023 at 4:34 PM Dominik Stadler wrote:

Re: [VOTE] Release Apache POI 5.2.4 (RC1)

2023-09-22 Thread Tim Allison
+1 Reports are here: There's surprisingly little difference: https://corpora.tika.apache.org/base/reports/poi-reports.tgz I only had time to glance briefly. Thank you PJ and team! On Fri, Sep 22, 2023 at 4:09 AM PJ Fanning wrote: > Thanks Alex. The pdfbox issue is tracked at > https://bz.apa

DirectoryNode's getEntry() and IllegalArgumentException

2023-09-22 Thread Tim Allison
All, First, I'm not proposing any changes for 5.2.4 (many thanks PJ for running the release!). In looking at DirectoryNode's getEntry, I see this: @Override public Entry getEntry(final String name) throws FileNotFoundException { Entry rval = null; if (name != null) { rval = _

Re: poi 5.2.4 release

2023-09-21 Thread Tim Allison
Sounds great. I’ll try to make a run against our corpus as well. Thank you! On Thu, Sep 21, 2023 at 2:58 AM Dominik Stadler wrote: > Hi, > > Yes, I agree, a release soon would be good to get the many many > improvements out to users. > > P.J., could you run the process once more and maybe updat

Fwd: [jira] [Created] (TIKA-4015) Extract symbols as symbols from .docx

2023-04-12 Thread Tim Allison
?)? And of course, in the email below the characters have been modified back to the underlying text, but they should be "alpha" "beta" "chi", etc... see the screenshot on the issue Thank you! Best, Tim -- Forwarded message - From: Tim All

Re: POI PMC roll call

2023-03-03 Thread Tim Allison
Similar to Nick and others. I have time to pay attention, but not as much as I'd like to contribute. Always hopeful... So, y, I'm still interested. Thank you for calling roll and all of your work on POI and beyond! On Fri, Mar 3, 2023 at 12:06 PM Nick Burch wrote: > > On Fri, 3 Mar 2023, PJ Fa

docx attachment names only appear in EMF?!

2023-02-06 Thread Tim Allison
Fellow Devs, I recently came across this issue: https://issues.apache.org/jira/browse/TIKA-3968. Has anyone else seen this? Am I missing an easy way to associate embedded file names with the actual embedded file? I'm sure there's a reason to do this, but it feels to me like docx is giving PD

Re: [VOTE] Apache POI 5.2.3 release (RC1)

2022-09-15 Thread Tim Allison
+1 There's one new pptx exception, and a small number of fixed emf/wmf exceptions. Reports are here: https://corpora.tika.apache.org/base/reports/tika-2.5.0-poi-reports.tgz Let me know if you have any questions! Cheers, Tim On Fri, Sep 9, 2022 at 4:51 PM PJ Fanning wrote: > > Hi e

Re: poi 5.2.3 release

2022-09-08 Thread Tim Allison
+1 I'll have time next week to run against our regression corpus, too. If there's interest. On Wed, Sep 7, 2022 at 4:35 PM PJ Fanning wrote: > Hi everyone, > > Is it time for new POI release? It's about 6 months since the last one and > the change list is fairly big - https://poi.apache.org/cha

Re: [VOTE] Apache POI 5.2.1 release (RC1)

2022-03-01 Thread Tim Allison
+1 I didn't have time to run any regression tests, but Tika builds with these artifacts. Thank you, PJ and team! On Sat, Feb 26, 2022 at 5:15 PM Andreas Beeker wrote: > > Hi, > > thank you for preparing the release, PJ! > > I've done some rudimentary checks - here is my +1. > > Andi > > On 26.0

Re: POI 5.1.0 RC2?

2021-10-19 Thread Tim Allison
Apologies for being absent... The xsb issue is why we haven't upgraded to 5.x on Tika yet. I _think_ we'd like to avoid the ooxml-full jar, but if that's the most robust option, we'll have to go with that. I'm also happy to grab new files, or run against our corpus if that'd be of any use. Many

Re: Tika, POI and PDFBOX used in Pandora Papers

2021-10-12 Thread Tim Allison
Autocorrect!!! Tika On Tue, Oct 12, 2021 at 4:42 PM Tim Allison wrote: > > https://www.wired.co.uk/article/pandora-papers-leak > > Repo: > https://github.com/ICIJ/datashare/ - To unsubscribe, e-mai

Tim’s, POI and PDFBOX used in Pandora Papers

2021-10-12 Thread Tim Allison
https://www.wired.co.uk/article/pandora-papers-leak Repo: https://github.com/ICIJ/datashare/

Re: Building with Java 11?

2021-05-11 Thread Tim Allison
difference in build-setup is when they are created > differently. > > Thanks... Dominik. > > On Fri, May 7, 2021 at 6:13 PM Tim Allison wrote: >> >> Hi All, >>I recently tried to build with Java 11 because of [1], I found that >> the build was modifying module

Building with Java 11?

2021-05-07 Thread Tim Allison
Hi All, I recently tried to build with Java 11 because of [1], I found that the build was modifying module-info.java and module-info.class. Is this expected? Is the combination of the Java issue and this item a sign I should put down the keyboard for the weekend a bit early? Cheers,

Re: missing oleobjectelement.xsb in ooxml-lite?

2021-03-23 Thread Tim Allison
All seems to work if I uncomment this line in build.xml: Any objections? On Tue, Mar 23, 2021 at 10:24 AM Tim Allison wrote: > > Going back to Andi's point [1]...trying this now. > > [1] > https://lists.apache.org/x/thread.html/ra9ff58e6af046a51ba459915fe536a2ea1fe

Re: missing oleobjectelement.xsb in ooxml-lite?

2021-03-23 Thread Tim Allison
Going back to Andi's point [1]...trying this now. [1] https://lists.apache.org/x/thread.html/ra9ff58e6af046a51ba459915fe536a2ea1fe71e85329abc4e513711e@%3Cuser.poi.apache.org%3E On Tue, Mar 23, 2021 at 10:17 AM Tim Allison wrote: > > All, > Over on Tika [1], I'm gettin

missing oleobjectelement.xsb in ooxml-lite?

2021-03-23 Thread Tim Allison
All, Over on Tika [1], I'm getting an exception that oleobjectelement.xsb can't be found. When I look in the ooxml-lite.jar, I see there's an oleobjelement.xsb, but no oleobjectelement.xsb. I tried adding the triggering document (EmbeddedDocument.docx) to a poi unit test[2] and rebuilding 5.0.

Re: build?

2021-02-23 Thread Tim Allison
org/apache/poi/xddf/usermodel/XDDFSolidFillProperties.java:38: error: recursive constructor invocation public XDDFSolidFillProperties(XDDFColor color) { ^ On Tue, Feb 23, 2021 at 7:53 AM Tim Allison wrote: > > ant test seems to be working (waiting for completion, but it at leas

Re: build?

2021-02-23 Thread Tim Allison
23, 2021 at 7:43 AM Tim Allison wrote: > > All, > Many apologies...it has been too long since I've worked with our > codebase. I recently did a fresh pull and can't get a clean > build...ant compile works, but I get a failure with ant test. See link > below for sys

build?

2021-02-23 Thread Tim Allison
All, Many apologies...it has been too long since I've worked with our codebase. I recently did a fresh pull and can't get a clean build...ant compile works, but I get a failure with ant test. See link below for system, versions and stacktrace [1]. User error? Thank you! Ch

Fwd: [OT] Looking for Apache POI help

2020-10-20 Thread Tim Allison
-- Forwarded message - From: Sergey Beryozkin Date: Tue, Oct 20, 2020 at 7:54 AM Subject: [OT] Looking for Apache POI help To: Hi All, sorry for this off-topic post, it is a little bit relevant to Tika dev, but only a little bit :-), We are having some good interest in making

Re: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Tim Allison
Does this meet the needs? https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPPT_oleWorkbook.ppt On Sun, Oct 11, 2020 at 5:09 PM Andreas Beeker wrote: > Hi Nick, > > > Should we have WorkbookFactory spot this case, gr

Re: dependency on ooxml-schemas?

2020-08-14 Thread Tim Allison
o modules? > > Best wishes, > Andi > > > [1] > https://builds.apache.org/view/P/view/POI/job/POI-XMLBeans-DSL-1.8/lastSuccessfulBuild/artifact/build/ > > On 13.08.20 20:06, Tim Allison wrote: > > All, > > > > I've been away from POI for a bit, and And

dependency on ooxml-schemas?

2020-08-13 Thread Tim Allison
All, I've been away from POI for a bit, and Andi has done some amazing work. THANK YOU! The build works as it should on the commandline, but what's the recommendation for adding ooxml-schemas as a dependency in the IDE? Should I run a full build and then create my own lib/poi-ooxml-schemas

Re: Next version? - was Re: Missing commons-compress jar in dist

2020-06-23 Thread Tim Allison
fix this too. > I guess this will take another few weeks to be completed. > > Best wishes, > Andi > > > On 22.06.20 22:28, Tim Allison wrote: > > All, > >From a Tika perspective, I'm happy with 5.0.0 as well...any idea when > > the next release will

Re: Next version? - was Re: Missing commons-compress jar in dist

2020-06-22 Thread Tim Allison
All, From a Tika perspective, I'm happy with 5.0.0 as well...any idea when the next release will be? Last release was in February. Now that we have the regression testing vm back up and running, I can kick off tests whenever... Thank you! Cheers, Tim On

new mailing list for corpora vm

2020-06-05 Thread Tim Allison
All, If you have an interest in guiding the ongoing development of the regression corpus vm, please join the new mailing list: corpora-...@tika.apache.org via the usual means: corpora-dev-subscr...@tika.apache.org Unless there are objections, we can continue to use the regular Tika JIRA to tr

Fwd: New mailing list queued for creation: corpora-...@tika.apache.org

2020-06-04 Thread Tim Allison
Should have cc'd you all...this should be up and running in the next 24 hours. Please subscribe if you'd like to discuss/collaborate on the vm and regression corpora. -- Forwarded message ----- From: Tim Allison Date: Thu, Jun 4, 2020 at 8:56 AM Subject: Fwd: New ma

Vm slack channel

2020-02-29 Thread Tim Allison
All, I started #tika-vm on the ASF’s Slack for informal discussion/coordination of the regression corpus and vm. Cheers, Tim

Re: [COMPRESS and Tika/PDFBox/POI] files from bug trackers

2020-02-27 Thread Tim Allison
gt; > > On Fri, Feb 14, 2020 at 10:48 PM Tim Allison wrote: > >> All, >> >> I recently downloaded attachments from the following bug trackers: >> COMPRESS, TIKA, PDFBox, POI, Open Office, Libre Office and ghostscript: >> http://162.242.228.174/docs/bugtrack

[COMPRESS and Tika/PDFBox/POI] files from bug trackers

2020-02-14 Thread Tim Allison
All, I recently downloaded attachments from the following bug trackers: COMPRESS, TIKA, PDFBox, POI, Open Office, Libre Office and ghostscript: http://162.242.228.174/docs/bugtrackers/ I then unpackaged/uncompressed all of the package/compressed files so: COMPRESS-115-1.zip is the second fil

Re: [VOTE] Apache POI 4.1.2 release (RC3)

2020-02-11 Thread Tim Allison
+1 Thank you, Andi (and team)! http://162.242.228.174/reports/reports_poi_4.1.2-rc3.tgz On Mon, Feb 10, 2020 at 3:38 PM Andreas Beeker wrote: > Hi *, > > I've prepared artifacts for the release of Apache POI 4.1.2 (RC3). > > The most notable changes in this release are: > > - XDDF - some work

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-10 Thread Tim Allison
Sorry for the late reply. See Bug 64130 for a regression in parsing old excel spreadsheets that have worksheets without names. There were about 550 new exceptions caused by this in our regression corpus. On Sat, Feb 8, 2020 at 5:30 PM Tim Allison wrote: > I’m afk, but it looked like th

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-08 Thread Tim Allison
afk. On Sat, Feb 8, 2020 at 1:21 PM Andreas Beeker wrote: > Hi *, > > just to be sure ... I'm waiting for Tims second +1 or should I release the > artifacts? > I.e. as far as I understand the reports we only have marginal differences. > > Andi > > On 07.02.20 13:0

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
nsion PixelAspectRatio": "1.0", "Dimension VerticalPhysicalPixelSpacing": "0.26462027", "X-Parsed-By": [ "org.apache.tika.parser.CompositeParser", "org.apache.tika.parser.DefaultParser", "org.apach

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
wildly even with the same versions on different runs. The key for me is the rollup by parse time suggests _overall_ for ppt, the time is nearly identical. > On 07.02.20 13:05, Tim Allison wrote: > > Hi All,, > > I haven't had the chance to look, but will

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
to have ASF infrastructure > provision > > a VM to be managed by POI PMC. > > > > Regards, > > Dave > > > > Sent from my iPhone > > > > > On Feb 5, 2020, at 3:38 PM, Andreas Beeker > wrote: > > > > > > Hi Tim, > >

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
Hi All,, I haven't had the chance to look, but will do so later today:: http://162.242.228.174/reports/poi_4.1.2_reports.tgz On Wed, Feb 5, 2020 at 7:47 PM Tim Allison wrote: > Might be faster than I thought...results tomorrow...perhaps. > > On Wed, Feb 5, 2020 at 5:51 PM Tim

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-05 Thread Tim Allison
Might be faster than I thought...results tomorrow...perhaps. On Wed, Feb 5, 2020 at 5:51 PM Tim Allison wrote: > I did not. I can kick it off now, but with travel and other stuff, > wouldn't have results until Monday. Happy to do so if desired. > > On Wed, Feb 5, 2020 at

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-05 Thread Tim Allison
nt is unavailable. > > Andi > > On 05.02.20 01:05, Tim Allison wrote: > > +1 > > > > built without surprises, digests check out and Tika builds. Thank you, > > Andi and team! > > > > On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker > wrote: >

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-04 Thread Tim Allison
+1 built without surprises, digests check out and Tika builds. Thank you, Andi and team! On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker wrote: > +1 ... the NOTICE file was still on 2019, but I don't think this matters. > Apart of it, my sample application works. > > On 03.02.20 22:55, PJ Fannin

Re: next release?

2020-01-23 Thread Tim Allison
up till then ... > > Andi > > > On 23.01.20 15:41, Tim Allison wrote: > > Hi All, > > We're getting pinged over on Tika for when the next release of POI will > > be available. Any plans? > > > > https://issues.apache.org/jira/browse/TIKA-3017 > > > > Thank you! > > > > >

next release?

2020-01-23 Thread Tim Allison
Hi All, We're getting pinged over on Tika for when the next release of POI will be available. Any plans? https://issues.apache.org/jira/browse/TIKA-3017 Thank you!

Re: [ANNOUNCE] Apache POI 4.1.1 released

2019-10-21 Thread Tim Allison
All, Thank you for this release! I'm sorry that I was mostly AWOL. Andi, Thank you for running this release! Cheers, Tim On Sun, Oct 20, 2019 at 3:52 PM Andreas Beeker wrote: > The Apache POI project is pleased to announce the release of POI 4.1.1. > Featured are a

Re: POI 4.1.1

2019-10-07 Thread Tim Allison
heers, Tim On Sat, Oct 5, 2019 at 8:38 AM Tim Allison wrote: > > Andi, > I’m sorry for my delay. I’ve booked a chunk of time on Monday to look at > this...data is prepped...just need to run latest code and compare. I don’t > want to hold up the release tho...please move fo

Re: POI 4.1.1

2019-10-05 Thread Tim Allison
wrote: > Hi Tim, > > On 20.09.19 13:55, Tim Allison wrote: > > I think I remember a regression in emf/wmf...could be spurious or my > fault > > at the Tika level. > > I've just checked my mails for the original emf/wmf issue, which you've > (partly) fixed v

Next release?

2019-07-24 Thread Tim Allison
Hi All, Do we have any sense of when the next release will be? IIRC I have a bit of work to do w emf[1], what else do we want to include? Thank you! Cheers, Tim [1] I have a vague memory of slight regressions in text extraction, but I have to test w latest.

[COMPRESS] zip-based entry names/metadata data set available

2019-04-22 Thread Tim Allison
All, For some recent work on Apache Tika, I used commons-compress to extract entry names and metadata via a streaming read from roughly 500k zip-based files we have in Tika's regression corpus. I was happy to see we have some POI-generated files in there. :) I noticed some areas for improveme

regression results

2019-04-10 Thread Tim Allison
All, Again, my apologies for being late, but the results might still be useful for work towards 4.1.1. http://162.242.228.174/reports/poi-4.1.0-reports.zip Some tentative observations: 1) there was the new and non-replicable set of problems with the XSSFBParser. 2) The emf/wmf regressions are

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
Hi Andi, Y, to be clear, I really like what you’ve done and it is all a bunch cleaner than my earlier stuff; I wasn’t at all questioning the design. The question was more to back compat. There was quite a bit of red when I made the upgrade and before I modernized our code on Tika. As long as we’r

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
On Mon, Apr 8, 2019 at 4:55 PM Andreas Beeker wrote: > Hi Tim, > > I've made that changes on purpose, as I wanted to make the EMF API similar > to the WMF one. > > > oap.hemf.extractor.HemfExtractor -> oap.hemf.usermodel.HemfPicture > All (?) our user models are called by their content and being

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
itial 4.0.2 to 4.1.0, but that's not an area of code > I'm familiar with. > > On Mon, Apr 8, 2019 at 6:07 AM Tim Allison wrote: > > > Are we ok with the backward incompatibilities in EMF...These are just > > a few. I realize these class

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
Are we ok with the backward incompatibilities in EMF...These are just a few. I realize these classes are @Internal, and the updates look great. HwmfRecord.getRecordType() -> getWmfRecordType() oap.hemf.record.AbstractHemfComment -> oap.hemf.record.hemf.Comment oap.hemf.record.HemfRecord -> oap.h

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-06 Thread Tim Allison
Sorry for being late to the game. I won’t have time to run regression tests until Monday or so... thank you Dominik and Greg! On Sat, Apr 6, 2019 at 4:27 AM Dominik Stadler wrote: > Hi Greg, > > thanks for running the release and removing all the obstacles on the way, > always good if as many pe

Re: Event Based APIs for parsing docx,doc,pptx,ppt files

2019-02-15 Thread Tim Allison
I've added SAX parsers for pptx and docx over on Apache Tika. These rely on POI for OPCPackage, a bunch of other classes and overall design. I've thought about moving that code into POI, but I haven't found the time or need, and the code is my typical kludgy-mess...and I don't want to pollute POI

Re: [VOTE] Apache POI 4.0.1 release (RC2)

2018-11-27 Thread Tim Allison
+1 Reports are available here: http://162.242.228.174/reports/reports_poi_4_0_1-rc2.tgz Thank you, Andi! On Mon, Nov 26, 2018 at 6:01 PM Andreas Beeker wrote: > > Hi, > > I've prepared artifacts for the release of Apache POI 4.0.1 (RC2). > > The most notable changes in this release are: > > - de

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-23 Thread Tim Allison
Sorry, now that I've figured out what the problem was, I'm -1. Y, let's respin. On Thu, Nov 22, 2018 at 4:34 PM Andreas Beeker wrote: > > Hi Tim, > > On 21.11.18 19:26, Tim Allison wrote: > > This looks like a regression. > > > Please make your mind up

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
ike a regression. On Wed, Nov 21, 2018 at 12:56 PM Tim Allison wrote: > > >These were in the header...I have to step away from the keyboard for > now...any ideas? > > I confirmed this by flipping btwn 4.0.0 and 4.0.1 in our dependencies > and using our Tika's SNAPSHOT f

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
>These were in the header...I have to step away from the keyboard for now...any ideas? I confirmed this by flipping btwn 4.0.0 and 4.0.1 in our dependencies and using our Tika's SNAPSHOT for both. This is not caused by a different version of Tika. On Wed, Nov 21, 2018 at 12:53 PM Tim

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
: de: 2 | la: 2 | 03: 1 | 06: 1 | 1: 1 | 16: 1 | 2009: 1 | 3: 1 | conciencia: 1 | despertar: 1 These were in the header...I have to step away from the keyboard for now...any ideas? On Wed, Nov 21, 2018 at 12:37 PM Tim Allison wrote: > > Reports are available here: > http://162.242.228.174/r

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
Reports are available here: http://162.242.228.174/reports/reports_poi_4_0_1-rc1.tgz We have a bunch less content in ppt, but I _think_ this is because at the Tika level we used to duplicate notes content, and we've fixed that bug. So, I think this is an improvement, but I need to check. On Wed,

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-20 Thread Tim Allison
the release process was too smooth. > Only my local version of the commons-openpgp needed to be used. [1] > > Andi > > [1] https://issues.apache.org/jira/browse/SANDBOX-508 > > > On 20.11.18 22:33, Tim Allison wrote: > > Andi, > >Thank you! I've built thi

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-20 Thread Tim Allison
Andi, Thank you! I've built this locally and integrated it into Tika, and I've kicked off the regression tests. The one small glitch I noticed so far is that poi-ooxml-schemas jar has an extra ".jar" in it: build/dist/maven/poi-ooxml-schemas/poi-ooxml-schemas-4.0.1.jar.jar I'll let you all k

Dave's POI talk at COSCON

2018-11-06 Thread Tim Allison
W00t!!! Here's Dave's talk on POI at COSCON in Shenzhen, China on October 20, 2018: https://www.youtube.com/watch?v=N7_Y3zNb_-w - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@p

Re: Build failed in Jenkins: POI-DSL-OpenJDK #545

2018-11-02 Thread Tim Allison
Autoboxing?! On Fri, Nov 2, 2018 at 7:27 AM Tim Allison wrote: > > Colleagues, any idea what might be going on? How can -1 != -1?! > > Error: Test with 2/3: Should not find 3 but found it at -1 in 0 1 2 > at org.apache.poi.hwpf.usermodel.TestBug47563.test(TestBug47563.java:80)

Re: Build failed in Jenkins: POI-DSL-OpenJDK #545

2018-11-02 Thread Tim Allison
Colleagues, any idea what might be going on? How can -1 != -1?! Error: Test with 2/3: Should not find 3 but found it at -1 in 0 1 2 at org.apache.poi.hwpf.usermodel.TestBug47563.test(TestBug47563.java:80) assertTrue("Test with " + rows + "/" + columns + ": Should not find " + i + " but found it a

Re: POI 4.0.1 release

2018-10-30 Thread Tim Allison
+1 "end of this week" that'll work well for my issues, too. I want to confirm I didn't break anything in my recent commits via large scale regression testing. On Tue, Oct 30, 2018 at 8:31 AM Yegor Kozlov wrote: > > +1 > > Bug 62836 is pending. I'm going to check in the code anyway, just waiting >

Re: Apache POI

2018-10-10 Thread Tim Allison
Dejan, Thank you for letting us know about this problem. I was able to reproduce it, and I've opened a ticket: https://bz.apache.org/bugzilla/show_bug.cgi?id=62815 On Wed, Sep 12, 2018 at 5:58 AM dejan ikodinovic wrote: > > Hi guys, > > I m working on parsing Excel xlsb files using Apache POI 3

Re: EMF corpus

2018-10-09 Thread Tim Allison
Turns out that's a subset. It looks like there should be ~200k emfs. I'll try to dig up the extraction code and re-run. On Tue, Oct 9, 2018 at 8:55 AM Tim Allison wrote: > > Y. Turns out I extracted a bunch a while ago. See the 'emfs' > directory in this tar.bz2 f

Re: EMF corpus

2018-10-09 Thread Tim Allison
Y. Turns out I extracted a bunch a while ago. See the 'emfs' directory in this tar.bz2 file: http://162.242.228.174/embedded_files/xmfs.tar.bz2 Let me know if you have any questions and/or if I can make that any more useful for you. Cheers, Tim On Mon, Oct 8, 2018 at 7

Re: EMF corpus

2018-10-08 Thread Tim Allison
At some point I extracted all emfs from our corpus. I’ll see if that data is still around and/or re-extract...prob have time tomorrow/ Wednesday On Sun, Oct 7, 2018 at 5:01 PM Dominik Stadler wrote: > Hi Andi > > It is easy to change CommonCrawlDocumentDownload to fetch other mime-types, > see >

updating data on the regression corpus

2018-10-05 Thread Tim Allison
All, I opened https://issues.apache.org/jira/browse/TIKA-2750 to track updating data on the regression corpus. Please track/join the conversation there if you'd like to participate. Cheers, Tim -

Re: Welcome to the regression vm!

2018-10-05 Thread Tim Allison
Tobias, I just gave you access to the vm and sent login stuff to you personally. I have to update some groups and permissions, but I'll let you know when that is ready. Let me know if you have problems getting on. Best, Tim > 1. Is it OK that 100% CPU is used wh

Welcome to the regression vm!

2018-09-28 Thread Tim Allison
Tobias, I'm sorry for my delay. We welcome you to use our regression vm hosted by Rackspace for fuzzing work to identify vulnerabilities. Our one request: we ask that you pause/stop your processes when we need to run regression tests before a release. Email me privately with your desired user

Re: Worth doing a 4.0.1 release soon?

2018-09-24 Thread Tim Allison
All, I broke our mp3 parser w changes in Tika 1.19. We're about to roll a 1.19.1. Is there anything catastrophic in 4.0.0 that would lead us to wait for 4.0.1? I noticed the 62692 (wildfly xml parser)...is there anything else? Thank you! Cheers, Tim On Wed, Sep 19, 2018 at 5

Re: Speaking on POI at China Open Source Conference in October

2018-09-22 Thread Tim Allison
Let me know if these are of any use... https://github.com/centic9/CommonCrawlDocumentDownload http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/ https://events.static.linuxfound.org/sites/events/files/slides/ApacheConMiami2017_tallison_v2.pdf https://wiki.apac

Re: Apache POI

2018-09-15 Thread Tim Allison
Can you open an issue on out bugzilla and post a test file w a unit test? Thank you for sharing this w us! On Wed, Sep 12, 2018 at 5:58 AM dejan ikodinovic wrote: > Hi guys, > > I m working on parsing Excel xlsb files using Apache POI 3.17 version and > have problem for some numbers. > The probl

Re: Speaking on POI at China Open Source Conference in October

2018-09-15 Thread Tim Allison
Looks great! If at all possible, I’d appreciate a bullet or two on Dominik’s and my large scale regression tests... More input on test files for the corpus would be useful. Complete understand if this is off topic. Thank you! On Fri, Sep 14, 2018 at 5:27 PM Dave Fisher wrote: > Hi Team, > > I’ve

Re: [VOTE] Apache POI 4.0.0 release (RC1)

2018-09-05 Thread Tim Allison
+1 Reports are here: http://162.242.228.174/reports/poi-4.0.0-reports-e.tgz These reports compare 3.17 with 4.0.0-RC1. There are numerous fixed exceptions. The new exceptions appear to be caused by better exception reporting for truncated files. Two small issues that I'm ok with for now: 1) We

Re: [VOTE] Apache POI 4.0.0 release (RC1)

2018-09-04 Thread Tim Allison
Sorry for my delay. I'm kicking off our regression tests now. On Sat, Sep 1, 2018 at 11:46 AM Dominik Stadler wrote: > > Hi, > > Content of release-archives look good compared to 3.17. > > Only found a very minor glitch: osgi/build.xml and sonar/**/pom.xml still > contain "4.0.0-SNAPSHOT", but I

Re: Remove OPOIFSFileSystem for 4.0.0?

2018-08-27 Thread Tim Allison
+1. Thank you, Andi! On Mon, Aug 27, 2018 at 5:52 AM Alain FAGOT BÉAREZ wrote: > > +1 for full refactoring to POIFS* > > ⁣Gesendet mit BlueMail > > > Originale Nachricht > Von: Andreas Beeker > Gesendet: Sun Aug 26 19:06:02 GMT-03:00 2018 > An: dev@poi.apache.org > Betreff: Re:

Re: Prepare POI 4.0.0 RC 1

2018-08-17 Thread Tim Allison
Despite that gaffe -- thank you, again, Andi -- I compared the output after some recent modifications, and there are no differences: http://162.242.228.174/reports/poi-4.0.0-reports-d.tgz On Fri, Aug 17, 2018 at 11:22 AM Tim Allison wrote: > > Ugh, and thank you! > On Fri, Aug 17, 201

  1   2   >