[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish S N updated PDFBOX-5868: --- Attachment: poppler_out.txt > PDFBox not extracting text of non-latin languages(tamil, bengali) prop

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874136#comment-17874136 ] Manish S N commented on PDFBOX-5868: did some additional digging and found that popp

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Jira
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874166#comment-17874166 ] Andreas Lehmkühler commented on PDFBOX-5868: {quote}if poppler can do it, it

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874189#comment-17874189 ] Tilman Hausherr commented on PDFBOX-5868: - [~manish003] So you're appealing to o

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874188#comment-17874188 ] Manish S N commented on PDFBOX-5868: ok. if i am willing to invest some time and eff

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874189#comment-17874189 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/16/24 10:59 AM: ---

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874201#comment-17874201 ] Tilman Hausherr commented on PDFBOX-5868: - Ideally, the changes I just posted so

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874222#comment-17874222 ] Tilman Hausherr commented on PDFBOX-5868: - The code has one flaw, that it doesn'

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874189#comment-17874189 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/16/24 12:31 PM: ---

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish S N updated PDFBOX-5868: --- Attachment: Main.java > PDFBox not extracting text of non-latin languages(tamil, bengali) properly

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish S N updated PDFBOX-5868: --- Attachment: Tilman's_solution_out.txt > PDFBox not extracting text of non-latin languages(tamil, ben

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874261#comment-17874261 ] Manish S N commented on PDFBOX-5868: Thank you so much Mr. Tilman using _Main.java_

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874268#comment-17874268 ] Tilman Hausherr commented on PDFBOX-5868: - First I'll need to make the changes I

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5868: Attachment: screenshot-2.png > PDFBox not extracting text of non-latin languages(tamil, be

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr commented on PDFBOX-5868: - There is a problem that I didn't noti

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/16/24 7:06 PM:

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/17/24 4:12 AM:

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/17/24 6:30 AM:

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874436#comment-17874436 ] Manish S N commented on PDFBOX-5868: in most of these languages languages most of th

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874436#comment-17874436 ] Manish S N edited comment on PDFBOX-5868 at 8/17/24 6:48 AM: -

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874436#comment-17874436 ] Manish S N edited comment on PDFBOX-5868 at 8/17/24 6:48 AM: -

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874436#comment-17874436 ] Manish S N edited comment on PDFBOX-5868 at 8/17/24 6:49 AM: -

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Manish S N (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874436#comment-17874436 ] Manish S N edited comment on PDFBOX-5868 at 8/17/24 6:51 AM: -