Re: [PR] [TIKA-4303] Handle OneNotePropertyEnum.CachedTitleString as RichEditTextUnicode [tika]

2025-01-24 Thread via GitHub
sunluman commented on PR #2098: URL: https://github.com/apache/tika/pull/2098#issuecomment-2613720426 @tballison @nddipiazza The unit test is ready. This test only performs a simple check for garbled characters in the title. -- This is an automated message from the Apache Git Service. To

[jira] [Commented] (TIKA-4303) Unable to extract Chinese content in onenote

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916881#comment-17916881 ] ASF GitHub Bot commented on TIKA-4303: -- sunluman commented on PR #2098: URL: https://

[jira] [Commented] (TIKA-4303) Unable to extract Chinese content in onenote

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916879#comment-17916879 ] ASF GitHub Bot commented on TIKA-4303: -- sunluman commented on PR #2098: URL: https://

Re: [PR] [TIKA-4303] Handle OneNotePropertyEnum.CachedTitleString as RichEditTextUnicode [tika]

2025-01-24 Thread via GitHub
sunluman commented on PR #2098: URL: https://github.com/apache/tika/pull/2098#issuecomment-2613712215 Sorry, I overlooked this unit test. I will add it right away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: Bodycontenthandler output issues

2025-01-24 Thread Tilman Hausherr
Hello, You're on the wrong mailing list, please post this on the users mailing list (don't forget to subscribe). Also upload the file to a sharehoster. And read this: https://cwiki.apache.org/confluence/display/tika/Troubleshooting%20Tika#TroubleshootingTika-PDFTextProblems Tilman On 24.01.202

[jira] [Commented] (TIKA-4303) Unable to extract Chinese content in onenote

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916735#comment-17916735 ] ASF GitHub Bot commented on TIKA-4303: -- nddipiazza commented on PR #2098: URL: https:

Re: [PR] [TIKA-4303] Handle OneNotePropertyEnum.CachedTitleString as RichEditTextUnicode [tika]

2025-01-24 Thread via GitHub
nddipiazza commented on PR #2098: URL: https://github.com/apache/tika/pull/2098#issuecomment-2612666198 @sunluman can you produce an example or unit test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916733#comment-17916733 ] Hudson commented on TIKA-4365: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #

[jira] [Created] (TIKA-4371) Exclude provided dependencies in xmlbeans that we don't need

2025-01-24 Thread Tim Allison (Jira)
Tim Allison created TIKA-4371: - Summary: Exclude provided dependencies in xmlbeans that we don't need Key: TIKA-4371 URL: https://issues.apache.org/jira/browse/TIKA-4371 Project: Tika Issue Type

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916721#comment-17916721 ] Tim Allison commented on TIKA-4365: --- Sorry. I just pushed my fixes now. Y. I got that.

Bodycontenthandler output issues

2025-01-24 Thread cs m
Dear Sir/Madam, The Bodycontenthandler return BLANK for scanned .pdf and .jpg using Java. Could you pl look into it and update me? Regards, SrinivasaMurthy Chintalapati

Re: Release of 2.9.3

2025-01-24 Thread TvT
This would be really helpful! Thank you so much. Am Fr., 24. Jan. 2025 um 12:52 Uhr schrieb Tim Allison : > Thank you. This is helpful. I haven't heard any guidance to the contrary. > We'll plan for a 3.x release process starting late today or next week and > then 2.x. > > On Fri, Jan 24, 2025 at

[jira] [Created] (TIKA-4370) SJIS Encoded Files Can't be Detected

2025-01-24 Thread Subbu (Jira)
Subbu created TIKA-4370: --- Summary: SJIS Encoded Files Can't be Detected Key: TIKA-4370 URL: https://issues.apache.org/jira/browse/TIKA-4370 Project: Tika Issue Type: Bug Reporter: Subbu W

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread Subbu (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916715#comment-17916715 ] Subbu commented on TIKA-4365: - [~tallison]  : I am sorry yes, the null check check in L#80 and

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916710#comment-17916710 ] Tim Allison commented on TIKA-4365: --- In the most recent commit, I fixed checkstyle and m

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916705#comment-17916705 ] ASF GitHub Bot commented on TIKA-4365: -- tballison merged PR #2085: URL: https://githu

Re: [PR] TIKA-4365 : feat : Support Detection for AAB Bundle file [tika]

2025-01-24 Thread via GitHub
tballison merged PR #2085: URL: https://github.com/apache/tika/pull/2085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916704#comment-17916704 ] ASF GitHub Bot commented on TIKA-4365: -- tballison commented on PR #2085: URL: https:/

Re: [PR] TIKA-4365 : feat : Support Detection for AAB Bundle file [tika]

2025-01-24 Thread via GitHub
tballison commented on PR #2085: URL: https://github.com/apache/tika/pull/2085#issuecomment-2612396962 I'll merge this as soon as the checks pass. On a follow-on PR, if you have a chance, would you be able to create/find an ASL 2.0-compatible test file for a unit test? -- This is an auto

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916698#comment-17916698 ] ASF GitHub Bot commented on TIKA-4365: -- tballison commented on PR #2085: URL: https:/

[jira] [Commented] (TIKA-4365) Support Android Bundle aab detection

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916700#comment-17916700 ] ASF GitHub Bot commented on TIKA-4365: -- subbudvk commented on PR #2085: URL: https://

Re: [PR] TIKA-4365 : feat : Support Detection for AAB Bundle file [tika]

2025-01-24 Thread via GitHub
subbudvk commented on PR #2085: URL: https://github.com/apache/tika/pull/2085#issuecomment-2612380253 @tballison you are right. The Jar detector decides content based on the presence of MANIFEST.MF. The android package bundle will also have MANIFEST.MF, so to guarantee the order and JarDet

[jira] [Commented] (TIKA-4303) Unable to extract Chinese content in onenote

2025-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916699#comment-17916699 ] ASF GitHub Bot commented on TIKA-4303: -- tballison commented on PR #2098: URL: https:/

Re: [PR] [TIKA-4303] Handle OneNotePropertyEnum.CachedTitleString as RichEditTextUnicode [tika]

2025-01-24 Thread via GitHub
tballison commented on PR #2098: URL: https://github.com/apache/tika/pull/2098#issuecomment-2612370122 @sunluman is this ready to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] TIKA-4365 : feat : Support Detection for AAB Bundle file [tika]

2025-01-24 Thread via GitHub
tballison commented on PR #2085: URL: https://github.com/apache/tika/pull/2085#issuecomment-2612368867 I'm sorry for my delay on this. Is there a reason to put the android logic in the jar detector instead of a separate detector? The good thing about the current PR is that it guarantees the

Re: Release schedule for 2.x and 3.x?

2025-01-24 Thread Tim Allison
This is very helpful. Thank you, Tilman! On Fri, Jan 24, 2025 at 2:25 AM Tilman Hausherr wrote: > Hi, > > No opinion re release schedule but a comment on the PDFBox update: > > tl;dr: ignore the PDF differences this time. > > The new version includes the /ActualText support: > https://issues.apa

Re: Release of 2.9.3

2025-01-24 Thread Tim Allison
Thank you. This is helpful. I haven't heard any guidance to the contrary. We'll plan for a 3.x release process starting late today or next week and then 2.x. On Fri, Jan 24, 2025 at 5:14 AM TvT wrote: > Since I can't post here ( > https://lists.apache.org/thread/99px6tjcrd1lsvcpv2jyxqqmq7xxgs5f)

Release of 2.9.3

2025-01-24 Thread TvT
Since I can't post here ( https://lists.apache.org/thread/99px6tjcrd1lsvcpv2jyxqqmq7xxgs5f) I was told to write via mailing list. It would be highly appreciated if it would be possible to create a 2.9.3 release since I am (eagerly) waiting for a fix I submitted (which is part of the version). I hav