Tim Allison created TIKA-4377:
-
Summary: Update helm for recent 3.1.0 release
Key: TIKA-4377
URL: https://issues.apache.org/jira/browse/TIKA-4377
Project: Tika
Issue Type: Task
Report
The Apache Tika project is pleased to announce the release of Apache
Tika 3.1.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.
Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various document
The vote has passed with 3 +1s and no -1s.
Nicholas Dipiazzo
Tilman Hausherr
Tim Allison
I'll push the artifacts and update the website shortly. Thank you, all!
Best,
Tim
On Tue, Jan 28, 2025 at 3:11 PM Tim Allison wrote:
> A candidate for the Tika 3.1.0 release is available at:
>
[
https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922783#comment-17922783
]
Hudson commented on TIKA-4278:
--
SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922768#comment-17922768
]
Tim Allison commented on TIKA-4375:
---
TIKA-4376
That link does not spark joy! :lol:
Tim Allison created TIKA-4376:
-
Summary: tika-eval should tokenize on non-breaking/narrow/other
space variants
Key: TIKA-4376
URL: https://issues.apache.org/jira/browse/TIKA-4376
Project: Tika
I
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922766#comment-17922766
]
Tilman Hausherr commented on TIKA-4375:
---
Wow there are a lot of these:
https://www.u
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922765#comment-17922765
]
Tilman Hausherr commented on TIKA-4375:
---
IMHO yes if you're tokenizing from a "norma
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922764#comment-17922764
]
Tim Allison commented on TIKA-4375:
---
We can add non-breaking spaces here:
https://githu
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922763#comment-17922763
]
Tim Allison commented on TIKA-4375:
---
Maybe a tika-eval issue? We should be tokenizing on
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922760#comment-17922760
]
Tim Allison commented on TIKA-4375:
---
Is that something we need to fix before the 2.9.3 r
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922759#comment-17922759
]
Tilman Hausherr edited comment on TIKA-4375 at 1/31/25 4:13 PM:
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922759#comment-17922759
]
Tilman Hausherr commented on TIKA-4375:
---
Although I wrote elsewhere not to bother wi
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4375:
--
Attachment: LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf
> Regression tests for 2.9.3 release
> ---
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922751#comment-17922751
]
Tim Allison commented on TIKA-4375:
---
Y, agreed.
> Regression tests for 2.9.3 release
>
[
https://issues.apache.org/jira/browse/TIKA-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922743#comment-17922743
]
Tim Allison commented on TIKA-4374:
---
and maybe fix file name in content_diffs and others
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922750#comment-17922750
]
Tilman Hausherr commented on TIKA-4375:
---
Re stl, I looked at one of these files and
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922749#comment-17922749
]
Tilman Hausherr commented on TIKA-4375:
---
Or I because I did the merge but not th
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922748#comment-17922748
]
Tim Allison commented on TIKA-4375:
---
Cherry-picked TIKA-4278 now.
> Regression tests fo
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922745#comment-17922745
]
Tim Allison commented on TIKA-4375:
---
5) a number of files are now being idenfied as "col
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922738#comment-17922738
]
Tim Allison edited comment on TIKA-4375 at 1/31/25 3:22 PM:
A
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922738#comment-17922738
]
Tim Allison commented on TIKA-4375:
---
A few observations:
1) fewer exceptions. The small
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4375:
--
Attachment: tika-2.9.2-v-tika-2.9.3-reports.tgz
> Regression tests for 2.9.3 release
> -
[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922731#comment-17922731
]
Tim Allison commented on TIKA-4375:
---
Results attached. I haven't had a chance to review
[
https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922717#comment-17922717
]
Hudson commented on TIKA-4239:
--
SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd
[
https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922712#comment-17922712
]
Hudson commented on TIKA-4327:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #
26 matches
Mail list logo