Re: [PR] tik [tika]

2025-01-28 Thread via GitHub
tballison commented on PR #2110: URL: https://github.com/apache/tika/pull/2110#issuecomment-2620181864 Stop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] tik [tika]

2025-01-28 Thread via GitHub
tballison closed pull request #2110: tik URL: https://github.com/apache/tika/pull/2110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@t

[PR] tik [tika]

2025-01-28 Thread via GitHub
Illidian4368 opened a new pull request, #2110: URL: https://github.com/apache/tika/pull/2110 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on th

[VOTE] Release Apache Tika 3.1.0 Candidate #1

2025-01-28 Thread Tim Allison
A candidate for the Tika 3.1.0 release is available at: https://dist.apache.org/repos/dist/dev/tika/3.1.0 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/tika-3.1.0-rc1 The SHA-512 checksum of the archive is f8e52b320a05ece867815d587284c695dae87b26ea7

[jira] [Commented] (TIKA-4337) Improvements to recent xps mods

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921842#comment-17921842 ] Hudson commented on TIKA-4337: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd

[jira] [Commented] (TIKA-4371) Exclude provided dependencies in xmlbeans that we don't need

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921841#comment-17921841 ] Hudson commented on TIKA-4371: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd

[jira] [Commented] (TIKA-4361) Rare RTF bug handling styles within an href in a malformed file

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921840#comment-17921840 ] Hudson commented on TIKA-4361: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd

[jira] [Commented] (TIKA-4361) Rare RTF bug handling styles within an href in a malformed file

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921830#comment-17921830 ] Hudson commented on TIKA-4361: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_3x-jd

[jira] [Commented] (TIKA-4337) Improvements to recent xps mods

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921831#comment-17921831 ] Hudson commented on TIKA-4337: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_3x-jd

[jira] [Commented] (TIKA-4361) Rare RTF bug handling styles within an href in a malformed file

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921819#comment-17921819 ] Hudson commented on TIKA-4361: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #

[jira] [Commented] (TIKA-4337) Improvements to recent xps mods

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921820#comment-17921820 ] Hudson commented on TIKA-4337: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #

[jira] [Resolved] (TIKA-4337) Improvements to recent xps mods

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4337. --- Resolution: Fixed > Improvements to recent xps mods > --- > >

[jira] [Resolved] (TIKA-4361) Rare RTF bug handling styles within an href in a malformed file

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4361. --- Resolution: Fixed > Rare RTF bug handling styles within an href in a malformed file >

[jira] [Comment Edited] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921789#comment-17921789 ] Tim Allison edited comment on TIKA-4373 at 1/28/25 4:20 PM: I

Re: [PR] TIKA-4337 -- check for empty string in array index 1 [tika]

2025-01-28 Thread via GitHub
tballison merged PR #2109: URL: https://github.com/apache/tika/pull/2109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[jira] [Commented] (TIKA-4370) SJIS Encoded Files Can't be Detected

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921798#comment-17921798 ] Tim Allison commented on TIKA-4370: --- Hi [~subbudvk], this is long standing problem. Figu

[jira] [Comment Edited] (TIKA-4370) SJIS Encoded Files Can't be Detected

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921798#comment-17921798 ] Tim Allison edited comment on TIKA-4370 at 1/28/25 4:41 PM: Hi

Re: [PR] TIKA-4361 -- follow on fix [tika]

2025-01-28 Thread via GitHub
tballison merged PR #2108: URL: https://github.com/apache/tika/pull/2108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921797#comment-17921797 ] Tim Allison commented on TIKA-4373: --- I reopened TIKA-4337 for a trivial xps improvement.

[PR] TIKA-4337 -- check for empty string in array index 1 [tika]

2025-01-28 Thread via GitHub
tballison opened a new pull request, #2109: URL: https://github.com/apache/tika/pull/2109 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [

[jira] [Reopened] (TIKA-4337) Improvements to recent xps mods

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-4337: --- Assignee: Tim Allison Found a trivial area for improvement based on regression runs on TIKA-4373 >

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921788#comment-17921788 ] Tim Allison commented on TIKA-4373: --- Y, I need to update tika-eval to include the attach

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921789#comment-17921789 ] Tim Allison commented on TIKA-4373: --- I reopened TIKA-4361. > Regression tests for 3.1.0

Re: [PR] TIKA-4361 -- follow on fix [tika]

2025-01-28 Thread via GitHub
tballison commented on PR #2108: URL: https://github.com/apache/tika/pull/2108#issuecomment-2619467677 The corpus file: {{commoncrawl3/5A/5AODDDRT5BKTJEDN7TENBGKKDO5I3ODD}} shows why we had to make this fix to the original TIKA-4361 fix. I regret that I can't share the file that triggered t

[PR] TIKA-4361 -- follow on fix [tika]

2025-01-28 Thread via GitHub
tballison opened a new pull request, #2108: URL: https://github.com/apache/tika/pull/2108 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [

[jira] [Reopened] (TIKA-4361) Rare RTF bug handling styles within an href in a malformed file

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-4361: --- Assignee: Tim Allison The regression tests on TIKA-4373 showed that the proposed fix here adds a new

[jira] [Commented] (TIKA-4370) SJIS Encoded Files Can't be Detected

2025-01-28 Thread Subbu (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921780#comment-17921780 ] Subbu commented on TIKA-4370: - [~tallison] , Please review when you have time. Do you see any

[jira] [Comment Edited] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921763#comment-17921763 ] Tilman Hausherr edited comment on TIKA-4373 at 1/28/25 3:01 PM:

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921763#comment-17921763 ] Tilman Hausherr commented on TIKA-4373: --- I found only one json file, which is 2PSMEF

[jira] [Updated] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4373: -- Attachment: filter_md5_suc_url.json > Regression tests for 3.1.0 release > -

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921761#comment-17921761 ] Tim Allison commented on TIKA-4373: --- Handful of text->json files I reviewed looks like a

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921759#comment-17921759 ] Tim Allison commented on TIKA-4373: --- W00t! And that's why I wanted to look at a few of t

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921756#comment-17921756 ] Tilman Hausherr commented on TIKA-4373: --- [^S53SZFZ2FBOZIVTX3HVP4D4XKHKPEMQQ.csv] is

[jira] [Updated] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4373: -- Attachment: S53SZFZ2FBOZIVTX3HVP4D4XKHKPEMQQ.csv > Regression tests for 3.1.0 release >

[jira] [Commented] (TIKA-4372) Import-Package osgi metadata are missing for commons-io

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921748#comment-17921748 ] Hudson commented on TIKA-4372: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_3x-jd

[jira] [Comment Edited] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921740#comment-17921740 ] Tim Allison edited comment on TIKA-4373 at 1/28/25 1:54 PM: Co

[jira] [Commented] (TIKA-4372) Import-Package osgi metadata are missing for commons-io

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921739#comment-17921739 ] Hudson commented on TIKA-4372: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk17

[jira] [Updated] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4373: -- Attachment: reports_tika-3.0-vs-3.1.tgz > Regression tests for 3.1.0 release > -

[jira] [Created] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tim Allison (Jira)
Tim Allison created TIKA-4373: - Summary: Regression tests for 3.1.0 release Key: TIKA-4373 URL: https://issues.apache.org/jira/browse/TIKA-4373 Project: Tika Issue Type: Task Reporter

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921704#comment-17921704 ] Hudson commented on TIKA-4239: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_2x-jd

[jira] [Resolved] (TIKA-4372) Import-Package osgi metadata are missing for commons-io

2025-01-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4372. --- Fix Version/s: 4.0.0 3.1.0 Resolution: Fixed > Import-Package osgi metadata

Re: [PR] TIKA-4372: fix org.apache.commons.io osgi metadata [tika]

2025-01-28 Thread via GitHub
tballison merged PR #2107: URL: https://github.com/apache/tika/pull/2107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[jira] [Commented] (TIKA-4326) General updates for 3.1.0

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921696#comment-17921696 ] Hudson commented on TIKA-4326: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch_3x-jd

[jira] [Commented] (TIKA-4327) General updates for 4.0.0

2025-01-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921686#comment-17921686 ] Hudson commented on TIKA-4327: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #

[PR] TIKA-4372: fix org.apache.commons.io osgi metadata [tika]

2025-01-28 Thread via GitHub
pleeplop opened a new pull request, #2107: URL: https://github.com/apache/tika/pull/2107 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [T

[jira] [Created] (TIKA-4372) Import-Package osgi metadata are missing for commons-io

2025-01-28 Thread Pleeplop (Jira)
Pleeplop created TIKA-4372: -- Summary: Import-Package osgi metadata are missing for commons-io Key: TIKA-4372 URL: https://issues.apache.org/jira/browse/TIKA-4372 Project: Tika Issue Type: Bug