[jira] [Updated] (TIKA-4375) Regression tests for 2.9.3 release

2025-02-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4375: -- Fix Version/s: 2.9.3 > Regression tests for 2.9.3 rele

[jira] [Created] (TIKA-4383) General updates for 2.9.4

2025-02-09 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4383: - Summary: General updates for 2.9.4 Key: TIKA-4383 URL: https://issues.apache.org/jira/browse/TIKA-4383 Project: Tika Issue Type: Task Components

[jira] [Resolved] (TIKA-4375) Regression tests for 2.9.3 release

2025-02-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4375. --- Resolution: Fixed > Regression tests for 2.9.3 rele

[jira] [Commented] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2025-02-07 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925154#comment-17925154 ] Tilman Hausherr commented on TIKA-1533: --- The links no longer work but it'

[jira] [Closed] (TIKA-4378) Tika app throws java.awt.HeadlessException

2025-02-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4378. - Resolution: Not A Bug Closing, you can still comment or reopen if you have additional arguments

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923416#comment-17923416 ] Tilman Hausherr commented on TIKA-4375: --- Re csv, for me that's a "wor

[jira] [Comment Edited] (TIKA-4375) Regression tests for 2.9.3 release

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923315#comment-17923315 ] Tilman Hausherr edited comment on TIKA-4375 at 2/3/25 2:5

[jira] [Comment Edited] (TIKA-4375) Regression tests for 2.9.3 release

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923312#comment-17923312 ] Tilman Hausherr edited comment on TIKA-4375 at 2/3/25 2:4

[jira] [Commented] (TIKA-4378) Tika app throws java.awt.HeadlessException

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923320#comment-17923320 ] Tilman Hausherr commented on TIKA-4378: --- Regardless of my previous comment,

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923315#comment-17923315 ] Tilman Hausherr commented on TIKA-4375: --- RYT4H6OCPKZPFG3YK5PGLETS6Q3SBUD

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923312#comment-17923312 ] Tilman Hausherr commented on TIKA-4375: --- Lotsof BMP-related except

Re: [VOTE] Release Apache Tika 2.9.3 Candidate #1

2025-02-03 Thread Tilman Hausherr
+1 builds on Windows 10, oracle jdk8 Tilman On 03.02.2025 13:43, Tim Allison wrote: A candidate for the Tika 2.9.3 release is available at: https://dist.apache.org/repos/dist/dev/tika/2.9.3 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/2.9.3-rc1

[jira] [Updated] (TIKA-4379) General updates for 3.1.1

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4379: -- Affects Version/s: 3.1.0 > General updates for 3.

[jira] [Updated] (TIKA-4379) General updates for 3.1.1

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4379: -- Component/s: build > General updates for 3.

[jira] [Updated] (TIKA-4326) General updates for 3.1.0

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4326: -- Component/s: build > General updates for 3.

[jira] [Resolved] (TIKA-4239) Update to 2.9.3

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4239. --- Fix Version/s: 2.9.3 Resolution: Fixed > Update to 2.

[jira] [Created] (TIKA-4379) General updates for 3.1.1

2025-02-03 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4379: - Summary: General updates for 3.1.1 Key: TIKA-4379 URL: https://issues.apache.org/jira/browse/TIKA-4379 Project: Tika Issue Type: Task Reporter

[jira] [Resolved] (TIKA-4326) General updates for 3.1.0

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4326. --- Fix Version/s: 3.1.0 Resolution: Fixed > General updates for 3.

[jira] [Commented] (TIKA-4378) Tika app throws java.awt.HeadlessException

2025-02-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923208#comment-17923208 ] Tilman Hausherr commented on TIKA-4378: --- This is weird, the main class in

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922766#comment-17922766 ] Tilman Hausherr commented on TIKA-4375: --- Wow there are a lot of these: h

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922765#comment-17922765 ] Tilman Hausherr commented on TIKA-4375: --- IMHO yes if you're tokenizi

[jira] [Comment Edited] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922759#comment-17922759 ] Tilman Hausherr edited comment on TIKA-4375 at 1/31/25 4:1

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922759#comment-17922759 ] Tilman Hausherr commented on TIKA-4375: --- Although I wrote elsewhere not to bo

[jira] [Updated] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4375: -- Attachment: LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf > Regression tests for 2.9.3 rele

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922750#comment-17922750 ] Tilman Hausherr commented on TIKA-4375: --- Re stl, I looked at one of these files

[jira] [Commented] (TIKA-4375) Regression tests for 2.9.3 release

2025-01-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922749#comment-17922749 ] Tilman Hausherr commented on TIKA-4375: --- Or I because I did the merge but

Re: [VOTE] Release Apache Tika 3.1.0 Candidate #1

2025-01-30 Thread Tilman Hausherr
+1 Successful build on Oracle jdk 11.0.21 on windows Tilman On 28.01.2025 21:11, Tim Allison wrote: A candidate for the Tika 3.1.0 release is available at: https://dist.apache.org/repos/dist/dev/tika/3.1.0 The release candidate is a zip archive of the sources in: https://github.com/apache/tik

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-30 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922306#comment-17922306 ] Tilman Hausherr commented on TIKA-4373: --- I found some huge differences with

[jira] [Comment Edited] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921763#comment-17921763 ] Tilman Hausherr edited comment on TIKA-4373 at 1/28/25 3:0

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921763#comment-17921763 ] Tilman Hausherr commented on TIKA-4373: --- I found only one json file, whic

[jira] [Updated] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4373: -- Attachment: filter_md5_suc_url.json > Regression tests for 3.1.0 rele

[jira] [Commented] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921756#comment-17921756 ] Tilman Hausherr commented on TIKA-4373: --- [^S53SZFZ2FBOZIVTX3HVP4D4XKHKPEMQQ

[jira] [Updated] (TIKA-4373) Regression tests for 3.1.0 release

2025-01-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4373: -- Attachment: S53SZFZ2FBOZIVTX3HVP4D4XKHKPEMQQ.csv > Regression tests for 3.1.0 rele

Re: Bodycontenthandler output issues

2025-01-24 Thread Tilman Hausherr
Hello, You're on the wrong mailing list, please post this on the users mailing list (don't forget to subscribe). Also upload the file to a sharehoster. And read this: https://cwiki.apache.org/confluence/display/tika/Troubleshooting%20Tika#TroubleshootingTika-PDFTextProblems Tilman On 24.01.202

Re: Release schedule for 2.x and 3.x?

2025-01-23 Thread Tilman Hausherr
Hi, No opinion re release schedule but a comment on the PDFBox update: tl;dr: ignore the PDF differences this time. The new version includes the /ActualText support: https://issues.apache.org/jira/browse/PDFBOX-5868 It is always enabled. In most cases the extraction is better. But sometimes c

[jira] [Resolved] (TIKA-2342) Broken words

2025-01-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-2342. --- Resolution: Fixed > Broken words > > > Ke

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2025-01-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916583#comment-17916583 ] Tilman Hausherr commented on TIKA-4239: --- Maybe that website is only for ap

[jira] [Closed] (TIKA-4369) Pages extracted twice

2025-01-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4369. - Resolution: Not A Problem Yes > Pages extracted tw

[jira] [Commented] (TIKA-4369) Pages extracted twice

2025-01-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914187#comment-17914187 ] Tilman Hausherr commented on TIKA-4369: --- Oh, I should have found this. It rem

[jira] [Created] (TIKA-4369) Pages extracted twice

2025-01-16 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4369: - Summary: Pages extracted twice Key: TIKA-4369 URL: https://issues.apache.org/jira/browse/TIKA-4369 Project: Tika Issue Type: Bug Components

[jira] [Commented] (TIKA-4366) Upgrade to POI 5.4.0

2025-01-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913653#comment-17913653 ] Tilman Hausherr commented on TIKA-4366: --- Although I'm not involved in

[jira] [Commented] (TIKA-4366) Upgrade to POI 5.4.0

2025-01-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913616#comment-17913616 ] Tilman Hausherr commented on TIKA-4366: --- What would need to be done? > Upg

[jira] [Comment Edited] (TIKA-4363) Duplicate text when OCR and extractMarkedContent (PDFParserConfig) enabled

2025-01-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913168#comment-17913168 ] Tilman Hausherr edited comment on TIKA-4363 at 1/15/25 11:1

[jira] [Comment Edited] (TIKA-4363) Duplicate text when OCR and extractMarkedContent (PDFParserConfig) enabled

2025-01-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913168#comment-17913168 ] Tilman Hausherr edited comment on TIKA-4363 at 1/15/25 11:0

[jira] [Comment Edited] (TIKA-4363) Duplicate text when OCR and extractMarkedContent (PDFParserConfig) enabled

2025-01-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913168#comment-17913168 ] Tilman Hausherr edited comment on TIKA-4363 at 1/15/25 5:3

[jira] [Commented] (TIKA-4363) Duplicate text when OCR and extractMarkedContent (PDFParserConfig) enabled

2025-01-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913168#comment-17913168 ] Tilman Hausherr commented on TIKA-4363: --- Maybe I misunderstood the ques

[jira] [Commented] (TIKA-3932) New repeatable test failures on Solr integration tests for Solr 6 on macosx aarch

2025-01-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913164#comment-17913164 ] Tilman Hausherr commented on TIKA-3932: --- [~tallison] does this still happen a

[jira] [Commented] (TIKA-4327) General updates for 4.0.0

2025-01-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913120#comment-17913120 ] Tilman Hausherr commented on TIKA-4327: --- ERROR [main] 01:52:17,448 tc.confluen

[jira] [Commented] (TIKA-4327) General updates for 4.0.0

2025-01-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913069#comment-17913069 ] Tilman Hausherr commented on TIKA-4327: --- I got this: Failed to verify that i

[jira] [Comment Edited] (TIKA-4303) Unable to extract Chinese content in onenote

2025-01-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912556#comment-17912556 ] Tilman Hausherr edited comment on TIKA-4303 at 1/13/25 4:3

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2025-01-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910301#comment-17910301 ] Tilman Hausherr commented on TIKA-4239: --- Good news: the logback folks did relea

[jira] [Comment Edited] (TIKA-2342) Broken words

2025-01-02 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992739#comment-15992739 ] Tilman Hausherr edited comment on TIKA-2342 at 1/2/25 8:0

[jira] [Resolved] (TIKA-3142) Update Jenkins for main branch, maybe turn on more modern jdks

2024-12-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3142. --- Resolution: Fixed Likely done long ago, and we have builds for several jdks > Update Jenk

[jira] [Resolved] (TIKA-3181) Upgrade to PDFBox 2.0.21

2024-12-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3181. --- Fix Version/s: 1.25 Resolution: Fixed > Upgrade to PDFBox 2.0

[jira] [Resolved] (TIKA-3119) General upgrades for 1.25

2024-12-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3119. --- Fix Version/s: 1.25 Resolution: Fixed > General upgrades for 1

[jira] [Resolved] (TIKA-3059) New NPE in ImageGraphicsEngine

2024-12-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3059. --- Fix Version/s: 1.24 2.0.0 Resolution: Fixed Fixed long ago >

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2024-12-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907656#comment-17907656 ] Tilman Hausherr commented on TIKA-4239: --- The next build will likely fail becaus

[jira] [Commented] (TIKA-2342) Broken words

2024-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906492#comment-17906492 ] Tilman Hausherr commented on TIKA-2342: --- I haven't made the changes to 2.

[jira] [Updated] (TIKA-2342) Broken words

2024-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2342: -- Fix Version/s: (was: 2.9.3) > Broken words > > >

[jira] [Updated] (TIKA-2342) Broken words

2024-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2342: -- Fix Version/s: 2.9.3 3.0.1 4.0.0 > Broken wo

[jira] [Commented] (TIKA-2342) Broken words

2024-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906335#comment-17906335 ] Tilman Hausherr commented on TIKA-2342: --- Reopened to add the new option >

[jira] [Updated] (TIKA-2342) Broken words

2024-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2342: -- Component/s: parser > Broken words > > > Ke

[jira] [Reopened] (TIKA-2342) Broken words

2024-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened TIKA-2342: --- Assignee: Tilman Hausherr > Broken words > > > Ke

[jira] [Commented] (TIKA-4327) General updates for 4.0.0

2024-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905747#comment-17905747 ] Tilman Hausherr commented on TIKA-4327: --- It works now that all has been reve

[jira] [Commented] (TIKA-4363) Duplicate text when OCR and extractMarkedContent (PDFParserConfig) enabled

2024-12-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905507#comment-17905507 ] Tilman Hausherr commented on TIKA-4363: --- getCurrentPageNo() > Duplicate te

[jira] [Commented] (TIKA-4327) General updates for 4.0.0

2024-12-08 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903921#comment-17903921 ] Tilman Hausherr commented on TIKA-4327: --- It didn't work. I've n

[jira] [Commented] (TIKA-4327) General updates for 4.0.0

2024-12-07 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903832#comment-17903832 ] Tilman Hausherr commented on TIKA-4327: --- I've added a mail notification

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2024-11-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901247#comment-17901247 ] Tilman Hausherr commented on TIKA-4239: --- Another possible cause: 2.0 uses

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2024-11-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901084#comment-17901084 ] Tilman Hausherr commented on TIKA-4239: --- I did both and it didn't help.

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2024-11-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900995#comment-17900995 ] Tilman Hausherr commented on TIKA-4239: --- Same here. It never fails on my win

[jira] [Commented] (TIKA-4348) Downgrade log4j2 2.24.1 for now

2024-11-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900900#comment-17900900 ] Tilman Hausherr commented on TIKA-4348: --- Can we update to 2.24.2 ? The two gi

[jira] [Comment Edited] (TIKA-4337) Improvements to recent xps mods

2024-10-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894660#comment-17894660 ] Tilman Hausherr edited comment on TIKA-4337 at 10/31/24 5:0

[jira] [Commented] (TIKA-4337) Improvements to recent xps mods

2024-10-31 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894660#comment-17894660 ] Tilman Hausherr commented on TIKA-4337: --- We download during the build files if

[jira] [Commented] (TIKA-4322) Create branch_3x and update main to 4.0.0-SNAPSHOT and Java 17

2024-10-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891795#comment-17891795 ] Tilman Hausherr commented on TIKA-4322: --- I can confirm that it works with j

[jira] [Commented] (TIKA-4322) Create branch_3x and update main to 4.0.0-SNAPSHOT and Java 17

2024-10-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891577#comment-17891577 ] Tilman Hausherr commented on TIKA-4322: --- I tried updating jetty and solr to cur

[jira] [Commented] (TIKA-4326) General updates for 3.0.1

2024-10-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891491#comment-17891491 ] Tilman Hausherr commented on TIKA-4326: --- I've modified the jdk22 build t

[jira] [Commented] (TIKA-4322) Create branch_3x and update main to 4.0.0-SNAPSHOT and Java 17

2024-10-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891483#comment-17891483 ] Tilman Hausherr commented on TIKA-4322: --- I've modified the jdk22 build t

[jira] [Created] (TIKA-4326) General updates for 3.0.1

2024-10-20 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4326: - Summary: General updates for 3.0.1 Key: TIKA-4326 URL: https://issues.apache.org/jira/browse/TIKA-4326 Project: Tika Issue Type: Task Reporter

[jira] [Resolved] (TIKA-4118) General updates for 2.9.1

2024-10-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4118. --- Fix Version/s: 2.9.1 Resolution: Fixed > General updates for 2.

[jira] [Resolved] (TIKA-4123) Update to 2.9.1

2024-10-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4123. --- Fix Version/s: 2.9.1 Resolution: Fixed > Update to 2.

[jira] [Created] (TIKA-4327) General updates for 4.0.0

2024-10-20 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4327: - Summary: General updates for 4.0.0 Key: TIKA-4327 URL: https://issues.apache.org/jira/browse/TIKA-4327 Project: Tika Issue Type: Task Reporter

[jira] [Closed] (TIKA-1907) Big Pdf parsing to text - Out of memory

2024-10-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-1907. - > Big Pdf parsing to text - Out of mem

[jira] [Resolved] (TIKA-1907) Big Pdf parsing to text - Out of memory

2024-10-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-1907. --- Fix Version/s: 3.0.0 (was: 3.0.1) Assignee: Tilman Hausherr

[jira] [Closed] (TIKA-4298) Failed to detect charset for zip entry with short non-Unicode file name

2024-10-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4298. - > Failed to detect charset for zip entry with short non-Unicode file n

[jira] [Updated] (TIKA-4298) Failed to detect charset for zip entry with short non-Unicode file name

2024-10-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4298: -- Affects Version/s: 2.9.2 > Failed to detect charset for zip entry with short non-Unicode f

[jira] [Resolved] (TIKA-4298) Failed to detect charset for zip entry with short non-Unicode file name

2024-10-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4298. --- Fix Version/s: 3.0.0 (was: 3.0.1) Assignee: Tilman Hausherr

[jira] [Comment Edited] (TIKA-4239) Update to 2.9.3

2024-10-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891154#comment-17891154 ] Tilman Hausherr edited comment on TIKA-4239 at 10/19/24 1:4

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2024-10-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891154#comment-17891154 ] Tilman Hausherr commented on TIKA-4239: --- This is because of jetty 9.4.56.v2024

[jira] [Resolved] (TIKA-4166) dependency updates for Tika 3.0

2024-10-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4166. --- Fix Version/s: 3.0.0 (was: 3.0.0-BETA) Resolution: Fixed

[jira] [Resolved] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4278. --- Resolution: Fixed > TextAndCSVParser doesn't detect semicolon separa

Re: [VOTE] Release Apache Tika 3.0.0 Candidate #1

2024-10-16 Thread Tilman Hausherr
+1 Tilman On 16.10.2024 13:24, Tim Allison wrote: A candidate for the Tika 3.0.0 release is available at: https://dist.apache.org/repos/dist/dev/tika/3.0.0 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/3.0.0-rc1/ The SHA-512 checksum of the arch

Re: 3.0.0 release?

2024-10-15 Thread Tilman Hausherr
On 08.10.2024 16:05, Tim Allison wrote: I realize that even dependency maintenance on three concurrent branches will be burdensome. Perhaps we fallback to "update dependencies before a release and before the regression tests" at least on the 2.x and 3.x branches? It wasn't burdensome for me, I'

[jira] [Assigned] (TIKA-4317) Abusive content on https://corpora.tika.apache.org/

2024-10-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned TIKA-4317: - Assignee: Tim Allison > Abusive content on https://corpora.tika.apache.

[jira] [Commented] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888662#comment-17888662 ] Tilman Hausherr commented on TIKA-4278: --- new test result with the latest cha

[jira] [Updated] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4278: -- Attachment: reports_csv_3.0.0_vs_3.0.0_new_withcolon.tar.xz > TextAndCSVParser doesn'

[jira] [Comment Edited] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888651#comment-17888651 ] Tilman Hausherr edited comment on TIKA-4278 at 10/11/24 1:1

[jira] [Comment Edited] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888651#comment-17888651 ] Tilman Hausherr edited comment on TIKA-4278 at 10/11/24 1:1

[jira] [Commented] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888651#comment-17888651 ] Tilman Hausherr commented on TIKA-4278: --- Here's the te

[jira] [Updated] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4278: -- Attachment: reports_csv_3.0.0_vs_3.0.0_nocolon.tar.xz > TextAndCSVParser doesn'

[jira] [Comment Edited] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-10-10 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888326#comment-17888326 ] Tilman Hausherr edited comment on TIKA-4278 at 10/10/24 3:4

  1   2   3   4   5   6   7   8   9   10   >