[jira] [Created] (TIKA-4377) Update helm for recent 3.1.0 release

2025-01-31 Thread Tim Allison (Jira)
Tim Allison created TIKA-4377: - Summary: Update helm for recent 3.1.0 release Key: TIKA-4377 URL: https://issues.apache.org/jira/browse/TIKA-4377 Project: Tika Issue Type: Task Report

[ANNOUNCE] Apache Tika 3.1.0 released

2025-01-31 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika 3.1.0. The release contents have been pushed out to the main Apache release site and to the Maven Central sync. Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various document

[RESULT][VOTE] Release Apache Tika 3.1.0 Candidate #1

2025-01-31 Thread Tim Allison
The vote has passed with 3 +1s and no -1s. Nicholas Dipiazzo Tilman Hausherr Tim Allison I'll push the artifacts and update the website shortly. Thank you, all! Best, Tim On Tue, Jan 28, 2025 at 3:11 PM Tim Allison wrote: > A candidate for the Tika 3.1.0 release is available at: >

[jira] [Commented] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2025-01-31 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922783#comment-17922783 ] Hudson commented on TIKA-4278: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922768#comment-17922768 ] Tim Allison commented on TIKA-4375: --- TIKA-4376   That link does not spark joy! :lol:

[jira] [Created] (TIKA-4376) tika-eval should tokenize on non-breaking/narrow/other space variants

2025-01-31 Thread Tim Allison (Jira)
Tim Allison created TIKA-4376: - Summary: tika-eval should tokenize on non-breaking/narrow/other space variants Key: TIKA-4376 URL: https://issues.apache.org/jira/browse/TIKA-4376 Project: Tika I

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922766#comment-17922766 ] Tilman Hausherr commented on TIKA-4375: --- Wow there are a lot of these: https://www.u

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922765#comment-17922765 ] Tilman Hausherr commented on TIKA-4375: --- IMHO yes if you're tokenizing from a "norma

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922764#comment-17922764 ] Tim Allison commented on TIKA-4375: --- We can add non-breaking spaces here: https://githu

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922763#comment-17922763 ] Tim Allison commented on TIKA-4375: --- Maybe a tika-eval issue? We should be tokenizing on

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922760#comment-17922760 ] Tim Allison commented on TIKA-4375: --- Is that something we need to fix before the 2.9.3 r

[jira] [Comment Edited] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922759#comment-17922759 ] Tilman Hausherr edited comment on TIKA-4375 at 1/31/25 4:13 PM:

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922759#comment-17922759 ] Tilman Hausherr commented on TIKA-4375: --- Although I wrote elsewhere not to bother wi

[jira] [Updated] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4375: -- Attachment: LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf > Regression tests for 2.9.3 release > ---

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922751#comment-17922751 ] Tim Allison commented on TIKA-4375: --- Y, agreed. > Regression tests for 2.9.3 release >

[jira] [Commented] (TIKA-4374) Add attachment "file name" to mime_diffs_A_to_B_details.xlsx in tika-eval

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922743#comment-17922743 ] Tim Allison commented on TIKA-4374: --- and maybe fix file name in content_diffs and others

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922750#comment-17922750 ] Tilman Hausherr commented on TIKA-4375: --- Re stl, I looked at one of these files and

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922749#comment-17922749 ] Tilman Hausherr commented on TIKA-4375: --- Or I because I did the merge but not th

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922748#comment-17922748 ] Tim Allison commented on TIKA-4375: --- Cherry-picked TIKA-4278 now. > Regression tests fo

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922745#comment-17922745 ] Tim Allison commented on TIKA-4375: --- 5) a number of files are now being idenfied as "col

[jira] [Comment Edited] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922738#comment-17922738 ] Tim Allison edited comment on TIKA-4375 at 1/31/25 3:22 PM: A

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922738#comment-17922738 ] Tim Allison commented on TIKA-4375: --- A few observations: 1) fewer exceptions. The small

[jira] [Updated] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4375: -- Attachment: tika-2.9.2-v-tika-2.9.3-reports.tgz > Regression tests for 2.9.3 release > -

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922731#comment-17922731 ] Tim Allison commented on TIKA-4375: --- Results attached. I haven't had a chance to review

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2025-01-31 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922717#comment-17922717 ] Hudson commented on TIKA-4239: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd

[jira] [Commented] (TIKA-4327) General updates for 4.0.0

2025-01-31 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922712#comment-17922712 ] Hudson commented on TIKA-4327: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #