3.2.2 release?

2025-08-04 Thread Tim Allison
All, We've had a few important bug fixes and dependency upgrades since our last release. What do you think of working towards releasing 3.2.2 soon? Thank you. Best, Tim

[jira] [Resolved] (TIKA-4459) protected ODF encryption detection fail

2025-07-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4459. --- Resolution: Fixed > protected ODF encryption detection f

[jira] [Commented] (TIKA-4459) protected ODF encryption detection fail

2025-07-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010999#comment-18010999 ] Tim Allison commented on TIKA-4459: --- Thank you, both, for all of your work on

[jira] [Commented] (TIKA-4457) Typo in cad parser module pom for Automatic-Module

2025-07-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005255#comment-18005255 ] Tim Allison commented on TIKA-4457: --- PR isn't the problem. commons-lang has

[jira] [Resolved] (TIKA-4457) Typo in cad parser module pom for Automatic-Module

2025-07-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4457. --- Fix Version/s: 4.0.0 3.2.2 Resolution: Fixed > Typo in cad parser mod

[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2025-07-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005233#comment-18005233 ] Tim Allison commented on TIKA-4194: --- For those interested, please review: [h

[jira] [Commented] (TIKA-4454) Media-type application/pkcs7-mime

2025-07-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005232#comment-18005232 ] Tim Allison commented on TIKA-4454: --- [https://github.com/apache/tika/pull/

[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2025-07-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005231#comment-18005231 ] Tim Allison commented on TIKA-3784: --- This may fix a number of pkcs detection is

[jira] [Commented] (TIKA-4454) Media-type application/pkcs7-mime

2025-07-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005120#comment-18005120 ] Tim Allison commented on TIKA-4454: --- Super helpful. This is where I'm heading

[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2025-07-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004720#comment-18004720 ] Tim Allison commented on TIKA-3784: --- I'm toying with a simplified ASN1Dump t

[jira] [Commented] (TIKA-4454) Media-type application/pkcs7-mime

2025-07-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004698#comment-18004698 ] Tim Allison commented on TIKA-4454: --- https://github.com/gabriel-vasile/mimetype/is

[jira] [Commented] (TIKA-4454) Media-type application/pkcs7-mime

2025-07-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004688#comment-18004688 ] Tim Allison commented on TIKA-4454: --- The first 20 bytes of the attached file

[jira] [Commented] (TIKA-4454) Media-type application/pkcs7-mime

2025-07-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004686#comment-18004686 ] Tim Allison commented on TIKA-4454: --- When I parse our {{{}testDetached.p7s{}}}, I

[jira] [Commented] (TIKA-4454) Media-type application/pkcs7-mime

2025-07-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004683#comment-18004683 ] Tim Allison commented on TIKA-4454: --- This looks like it goes all the way back: [h

[jira] [Comment Edited] (TIKA-4453) ForkParser fails on documents with more than 100 embedded documents

2025-07-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004547#comment-18004547 ] Tim Allison edited comment on TIKA-4453 at 7/10/25 9:1

[jira] [Commented] (TIKA-4453) ForkParser fails on documents with more than 100 embedded documents

2025-07-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004547#comment-18004547 ] Tim Allison commented on TIKA-4453: --- [~steveaitch] , let me know what you thin

[jira] [Resolved] (TIKA-4453) ForkParser fails on documents with more than 100 embedded documents

2025-07-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4453. --- Fix Version/s: 4.0.0 3.2.2 Resolution: Fixed > ForkParser fails

[jira] [Commented] (TIKA-4453) ForkParser fails on documents with more than 100 embedded documents

2025-07-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004419#comment-18004419 ] Tim Allison commented on TIKA-4453: --- Thank you, [~steveaitch] ! > ForkParser f

[ANNOUNCE] Apache Tika 3.2.1 released

2025-07-09 Thread Tim Allison
Apache Tika, visit the project home page: https://tika.apache.org/ NOTE: This release requires Java 11. As of April 2025, we are no longer supporting the 2.x branch (which requires Java 8). See: https://cwiki.apache.org/confluence/display/TIKA/Tika+Roadmap+--+2.x%2C+3.x+and+Beyond -- Tim Allison, on

[RESULT][VOTE] Release Apache Tika 3.2.1 Candidate #2

2025-07-09 Thread Tim Allison
The vote has passed with 3 PMC +1s, 2 community +1s and no -1s. PMC +1s Oleg Tikhonov Tilman Hausherr Tim Allison Community +1s (non-binding) Alvaro Nogueira Craig Muchinsky I'll release the artifacts and update the website in the next few hours. Thank you, all! Best, Tim O

Re: [HELP NEEDED][VOTE] Release Apache Tika 3.2.1 Candidate #2

2025-07-09 Thread Tim Allison
Thank you so much, Oleg! On Wed, Jul 9, 2025 at 12:10 PM Oleg Tikhonov wrote: > > Sorry for late reply, > +1 > Thanks, > Oleg > > On Wed, 9 Jul 2025 at 18:29 Tim Allison wrote: > > > All, > > > > We need one more committer/pmc to vote on this rele

[HELP NEEDED][VOTE] Release Apache Tika 3.2.1 Candidate #2

2025-07-09 Thread Tim Allison
All, We need one more committer/pmc to vote on this release. If you're a committer/pmc and you happen to have the time, please vote. Thank you. Best, Tim On Thu, Jun 26, 2025 at 2:10 PM Tim Allison wrote: > > A candidate for the Tika 3.2.1 release is a

[jira] [Resolved] (TIKA-4333) Remove tika-batch from 4.x/main

2025-07-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4333. --- Fix Version/s: 4.0.0 Resolution: Fixed > Remove tika-batch from 4.x/m

[jira] [Resolved] (TIKA-4452) Remove FileProfiler from tika-eval in 4.x

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4452. --- Fix Version/s: 4.0.0 Resolution: Fixed We can easily add this back if anyone needs it. Please

[jira] [Commented] (TIKA-4451) Remove XMLErrorLogUpdater from tika-eval in 4.x

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003911#comment-18003911 ] Tim Allison commented on TIKA-4451: --- To be clear, this was a workaround to ha

[jira] [Resolved] (TIKA-4451) Remove XMLErrorLogUpdater from tika-eval in 4.x

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4451. --- Fix Version/s: 4.0.0 Resolution: Fixed > Remove XMLErrorLogUpdater from tika-eval in

[jira] [Resolved] (TIKA-4342) Remove tika-batch from tika-eval's FileProfiler

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4342. --- Fix Version/s: 4.0.0 Resolution: Fixed > Remove tika-batch from tika-eval's File

[jira] [Resolved] (TIKA-4450) Remove tika-batch from ExtractComparer

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4450. --- Fix Version/s: 4.0.0 Resolution: Fixed > Remove tika-batch from ExtractCompa

[jira] [Created] (TIKA-4452) Remove FileProfiler from tika-eval in 4.x

2025-07-08 Thread Tim Allison (Jira)
Tim Allison created TIKA-4452: - Summary: Remove FileProfiler from tika-eval in 4.x Key: TIKA-4452 URL: https://issues.apache.org/jira/browse/TIKA-4452 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-4451) Remove XMLErrorLogUpdater from tika-eval in 4.x

2025-07-08 Thread Tim Allison (Jira)
Tim Allison created TIKA-4451: - Summary: Remove XMLErrorLogUpdater from tika-eval in 4.x Key: TIKA-4451 URL: https://issues.apache.org/jira/browse/TIKA-4451 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-4450) Remove tika-batch from ExtractComparer

2025-07-08 Thread Tim Allison (Jira)
Tim Allison created TIKA-4450: - Summary: Remove tika-batch from ExtractComparer Key: TIKA-4450 URL: https://issues.apache.org/jira/browse/TIKA-4450 Project: Tika Issue Type: Sub-task

[jira] [Updated] (TIKA-4420) Simplify tika-eval -- remove tags

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4420: -- Description: In {{{}main{}}}, I'd like to streamline tika-eval. I'm not sure the tag counts

[jira] [Resolved] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4446. --- Fix Version/s: 4.0.0 3.2.2 Resolution: Fixed > Fix embedded file metad

[jira] [Commented] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003804#comment-18003804 ] Tim Allison commented on TIKA-4446: --- Fixed with: [https://github.com/apache/tika/

[jira] [Commented] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003803#comment-18003803 ] Tim Allison commented on TIKA-4446: --- K. It turns out that comment was actually outd

[jira] [Resolved] (TIKA-4449) Improve xmp metadata key precision for PDFs

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4449. --- Fix Version/s: 4.0.0 3.2.2 Resolution: Fixed > Improve xmp metadata

[jira] [Commented] (TIKA-4449) Improve xmp metadata key precision for PDFs

2025-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003745#comment-18003745 ] Tim Allison commented on TIKA-4449: --- [~peterhoogendijk] sounds good. I just merged

[jira] [Commented] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-07-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17993143#comment-17993143 ] Tim Allison commented on TIKA-4446: --- LOL...   !image-2025-07-03-14-27-42-989

[jira] [Updated] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-07-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4446: -- Attachment: image-2025-07-03-14-27-42-989.png > Fix embedded file metadata matching in tika-e

[jira] [Commented] (TIKA-4447) eml attachement duplicate filename on extract

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987950#comment-17987950 ] Tim Allison commented on TIKA-4447: --- https://issues.apache.org/jira/browse/MIME4J

[jira] [Updated] (TIKA-4447) eml attachement duplicate filename on extract

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4447: -- Attachment: screenshot-1.png > eml attachement duplicate filename on extr

[jira] [Commented] (TIKA-4447) eml attachement duplicate filename on extract

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987942#comment-17987942 ] Tim Allison commented on TIKA-4447: --- That looks like a bug/area for improvement in

[jira] [Commented] (TIKA-4448) Downgrade junit5 to 5.13.2

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987932#comment-17987932 ] Tim Allison commented on TIKA-4448: --- Is this user error? I invalidated the cache

[jira] [Resolved] (TIKA-4437) Extract more info from doc/docx

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4437. --- Fix Version/s: 3.2.2 Resolution: Fixed > Extract more info from doc/d

[jira] [Commented] (TIKA-4449) Improve xmp metadata key precision for PDFs

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987925#comment-17987925 ] Tim Allison commented on TIKA-4449: --- [~tilman], wdyt of https://github.com/apache/

[jira] [Commented] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987924#comment-17987924 ] Tim Allison commented on TIKA-: --- Let's move over to TIKA-4449. > PDF

[jira] [Created] (TIKA-4449) Improve xmp metadata key precision for PDFs

2025-07-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4449: - Summary: Improve xmp metadata key precision for PDFs Key: TIKA-4449 URL: https://issues.apache.org/jira/browse/TIKA-4449 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987921#comment-17987921 ] Tim Allison commented on TIKA-: --- Doh. Sorry. Y. I'll open a new ticket a

[jira] [Commented] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987917#comment-17987917 ] Tim Allison commented on TIKA-: --- Or, more clearly with string values for

[jira] [Commented] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987915#comment-17987915 ] Tim Allison commented on TIKA-: --- [~peterhoogendijk], please take a look at the

[jira] [Commented] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987886#comment-17987886 ] Tim Allison commented on TIKA-: --- Agree. I'm working on a PR now f

[jira] [Created] (TIKA-4448) Downgrade junit5 to 5.13.2

2025-07-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4448: - Summary: Downgrade junit5 to 5.13.2 Key: TIKA-4448 URL: https://issues.apache.org/jira/browse/TIKA-4448 Project: Tika Issue Type: Task Reporter: Tim

[jira] [Commented] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987271#comment-17987271 ] Tim Allison commented on TIKA-: --- I'll try to open a PR tomorrow (Op

[jira] [Commented] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987250#comment-17987250 ] Tim Allison commented on TIKA-: --- I'm in favor of letting users have acce

[jira] [Comment Edited] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987250#comment-17987250 ] Tim Allison edited comment on TIKA- at 7/1/25 3:28 PM: ---

[jira] [Comment Edited] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987250#comment-17987250 ] Tim Allison edited comment on TIKA- at 7/1/25 3:50 PM: ---

[jira] [Comment Edited] (TIKA-4444) PDFParser shows wrong data in xmp "dc:subject" tag

2025-07-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987250#comment-17987250 ] Tim Allison edited comment on TIKA- at 7/1/25 3:31 PM: ---

Re: [VOTE] Release Apache Tika 3.2.1 Candidate #2

2025-06-30 Thread Tim Allison
Repo is back up (at least for me). Thank you, Tilman, for the info! David, This link works for me (now): https://repository.apache.org/content/repositories/orgapachetika-1116/org/apache/tika/tika-parent/3.2.1/tika-parent-3.2.1.pom On Mon, Jun 30, 2025 at 12:03 PM Tim Allison wrote: > > I

Re: [VOTE] Release Apache Tika 3.2.1 Candidate #2

2025-06-30 Thread Tim Allison
eems to be available. > > Am I doing something wrong? > > > David Pilato > da...@pilato.fr > 06 13 03 08 41 > > Le 30 juin 2025 à 15:50 +0200, Tim Allison , a écrit : > > Fellow devs, if we can get one more binding +1, we can move forth on > > this release. Tha

Re: [VOTE] Release Apache Tika 3.2.1 Candidate #2

2025-06-30 Thread Tim Allison
Fellow devs, if we can get one more binding +1, we can move forth on this release. Thank you! On Thu, Jun 26, 2025 at 2:10 PM Tim Allison wrote: > > A candidate for the Tika 3.2.1 release is available at: > https://dist.apache.org/repos/dist/dev/tika/3.2.1 > > The release can

[VOTE] Release Apache Tika 3.2.1 Candidate #2

2025-06-26 Thread Tim Allison
A candidate for the Tika 3.2.1 release is available at: https://dist.apache.org/repos/dist/dev/tika/3.2.1 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/3.2.1-rc2/ The SHA-512 checksum of the archive is e752a50654900dc551bda8449d7a14a4b4b0d908f76

[jira] [Commented] (TIKA-4441) InputStream is consumed by Tika.detect for certain files

2025-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986372#comment-17986372 ] Tim Allison commented on TIKA-4441: --- You can grab the jars or more simply add

[jira] [Updated] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4446: -- Priority: Minor (was: Major) > Fix embedded file metadata matching in tika-eval report

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986365#comment-17986365 ] Tim Allison commented on TIKA-4438: --- Yikes. y. https://issues.apache.org/jira/br

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986368#comment-17986368 ] Tim Allison commented on TIKA-4438: --- Agreed on jsoup. Good to go for rc2? >

[jira] [Updated] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4446: -- Description: As [~tilman] noticed on  > Fix embedded file metadata matching in tika-eval report

[jira] [Updated] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4446: -- Description: As [~tilman] noticed on https://issues.apache.org/jira/browse/TIKA-4438?focusedCommentId

[jira] [Created] (TIKA-4446) Fix embedded file metadata matching in tika-eval reporting

2025-06-26 Thread Tim Allison (Jira)
Tim Allison created TIKA-4446: - Summary: Fix embedded file metadata matching in tika-eval reporting Key: TIKA-4446 URL: https://issues.apache.org/jira/browse/TIKA-4446 Project: Tika Issue Type

[jira] [Comment Edited] (TIKA-4438) Prepare for 3.2.1 release

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986260#comment-17986260 ] Tim Allison edited comment on TIKA-4438 at 6/26/25 12:4

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986260#comment-17986260 ] Tim Allison commented on TIKA-4438: --- Attached reports comparing 3.2.1-rc1 and branc

[jira] [Updated] (TIKA-4438) Prepare for 3.2.1 release

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4438: -- Attachment: tika-3.2.1bVsc.tgz > Prepare for 3.2.1 rele

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986223#comment-17986223 ] Tim Allison commented on TIKA-4438: --- And, they're running. :D > Prepa

[jira] [Commented] (TIKA-4441) InputStream is consumed by Tika.detect for certain files

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986195#comment-17986195 ] Tim Allison commented on TIKA-4441: --- If you set the marklimit to, say, 120, you&#x

[RESULT][VOTE] Release Apache Tika 3.2.1 Candidate #1

2025-06-25 Thread Tim Allison
rc1 is canceled. rc2 is on its way. On Wed, Jun 25, 2025 at 1:17 PM Tim Allison wrote: > > All, given https://issues.apache.org/jira/browse/TIKA-4441, I'm > changing my vote to -1, and I'll respin an rc2 shortly. > > My apologies for this regression and for the time was

[jira] [Resolved] (TIKA-4441) InputStream is consumed by Tika.detect for certain files

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4441. --- Fix Version/s: 3.2.1 Resolution: Fixed > InputStream is consumed by Tika.detect for cert

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986219#comment-17986219 ] Tim Allison commented on TIKA-4438: --- Should I re-run the regression tests after fi

Re: [VOTE] Release Apache Tika 3.2.1 Candidate #1

2025-06-25 Thread Tim Allison
; +1, oracle jdk 11.0.21, Windows > > Tilman > > On 6/21/2025 12:06 AM, Tim Allison wrote: > > A candidate for the Tika 3.2.1 release is available at: > > https://dist.apache.org/repos/dist/dev/tika/3.2.1 > > > > The release candidate is a zip archive of the sources i

[jira] [Commented] (TIKA-4441) InputStream is consumed by Tika.detect for certain files

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986196#comment-17986196 ] Tim Allison commented on TIKA-4441: --- {noformat} @Test public void testDetector() th

[jira] [Commented] (TIKA-4441) InputStream is consumed by Tika.detect for certain files

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986194#comment-17986194 ] Tim Allison commented on TIKA-4441: --- This is tricky. It was a breaking change, and

[jira] [Commented] (TIKA-4441) InputStream is consumed by Tika.detect for certain files

2025-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986112#comment-17986112 ] Tim Allison commented on TIKA-4441: --- I'll take a look today. If this is wort

[VOTE] Release Apache Tika 3.2.1 Candidate #1

2025-06-20 Thread Tim Allison
A candidate for the Tika 3.2.1 release is available at: https://dist.apache.org/repos/dist/dev/tika/3.2.1 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/3.2.1-rc1/ The SHA-512 checksum of the archive is 8b030e1baac0c5866bf182915adbb8853bcdea76dff577e

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17984984#comment-17984984 ] Tim Allison commented on TIKA-4438: --- Apache infra looks to be having a challenging

[jira] [Updated] (TIKA-4438) Prepare for 3.2.1 release

2025-06-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4438: -- Attachment: image-2025-06-20-11-34-49-574.png > Prepare for 3.2.1 rele

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17984991#comment-17984991 ] Tim Allison commented on TIKA-4438: --- Nope, that's me. I was blocked. :facepalm

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17983698#comment-17983698 ] Tim Allison commented on TIKA-4438: --- And away we go. Thank you! > Prepare fo

[jira] [Comment Edited] (TIKA-4438) Prepare for 3.2.1 release

2025-06-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17983692#comment-17983692 ] Tim Allison edited comment on TIKA-4438 at 6/19/25 1:54 PM: ---

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17983695#comment-17983695 ] Tim Allison commented on TIKA-4438: --- I started a TIKA-4439 branch with the chang

[jira] [Created] (TIKA-4439) Improve text extraction from EMF, round 2

2025-06-19 Thread Tim Allison (Jira)
Tim Allison created TIKA-4439: - Summary: Improve text extraction from EMF, round 2 Key: TIKA-4439 URL: https://issues.apache.org/jira/browse/TIKA-4439 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17983692#comment-17983692 ] Tim Allison commented on TIKA-4438: --- I think this is good enough progress for a 3

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17983691#comment-17983691 ] Tim Allison commented on TIKA-4438: --- The local changes I made to the emf parsing

[jira] [Updated] (TIKA-4438) Prepare for 3.2.1 release

2025-06-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4438: -- Attachment: tika-3.2.1b.tgz > Prepare for 3.2.1 rele

[jira] [Comment Edited] (TIKA-4438) Prepare for 3.2.1 release

2025-06-18 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17982701#comment-17982701 ] Tim Allison edited comment on TIKA-4438 at 6/18/25 1:0

[jira] [Commented] (TIKA-4438) Prepare for 3.2.1 release

2025-06-18 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17982701#comment-17982701 ] Tim Allison commented on TIKA-4438: --- Not surprisingly, lots of text differences in

[jira] [Updated] (TIKA-4438) Prepare for 3.2.1 release

2025-06-18 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4438: -- Attachment: tika-3.2.1-reports.tgz > Prepare for 3.2.1 rele

[jira] [Created] (TIKA-4438) Prepare for 3.2.1 release

2025-06-18 Thread Tim Allison (Jira)
Tim Allison created TIKA-4438: - Summary: Prepare for 3.2.1 release Key: TIKA-4438 URL: https://issues.apache.org/jira/browse/TIKA-4438 Project: Tika Issue Type: Task Reporter: Tim

Re: Bug fix release?

2025-06-17 Thread Tim Allison
I've kicked the regression tests off just now after merging TIKA-4432. Apologies for the delay. On Tue, Jun 10, 2025 at 10:45 AM Tilman Hausherr wrote: > > > > > > Maybe start regression tests at the end of this week and then aim to start > > the vote early next week? > > Yes > >

[jira] [Resolved] (TIKA-4432) Issue with EMF Parser Merging Header, Page Number, and First Content Word During Extraction

2025-06-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4432. --- Fix Version/s: 4.0.0 3.2.1 Resolution: Fixed I added backoff to use the

[jira] [Created] (TIKA-4437) Extract more info from doc/docx

2025-06-10 Thread Tim Allison (Jira)
Tim Allison created TIKA-4437: - Summary: Extract more info from doc/docx Key: TIKA-4437 URL: https://issues.apache.org/jira/browse/TIKA-4437 Project: Tika Issue Type: Task Reporter

Bug fix release?

2025-06-10 Thread Tim Allison
Now that TIKA-4424 is fixed, I think we should go for a 3.2.1 bug fix release soonish. I’d like to add more unit tests around that issue. Anything else we need? Maybe start regression tests at the end of this week and then aim to start the vote early next week? WDYT? Best, Tim

[jira] [Commented] (TIKA-4424) Regression in zip-based detection with an InputStream in 3.2.0

2025-06-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17957007#comment-17957007 ] Tim Allison commented on TIKA-4424: --- I merged this into {{main}} and cherrypicke

  1   2   3   4   5   6   7   8   9   10   >