[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045119#comment-14045119
]
Tilman Hausherr commented on TIKA-1300:
---
My impression was that the NSP had better re
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045119#comment-14045119
]
Tilman Hausherr edited comment on TIKA-1300 at 6/26/14 9:08 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045119#comment-14045119
]
Tilman Hausherr edited comment on TIKA-1300 at 6/27/14 6:18 AM:
-
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046288#comment-14046288
]
Tilman Hausherr commented on TIKA-1300:
---
I'm not doing much with text extraction, but
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046728#comment-14046728
]
Tilman Hausherr commented on TIKA-1300:
---
I had a look at most of the files. This resu
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046896#comment-14046896
]
Tilman Hausherr commented on TIKA-1300:
---
[~talli...@mitre.org] are there any "rules"
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047095#comment-14047095
]
Tilman Hausherr commented on TIKA-1300:
---
{quote}
Make sure to delete handful of infec
Tilman Hausherr created TIKA-1372:
-
Summary: PDCheckbox NPE
Key: TIKA-1372
URL: https://issues.apache.org/jira/browse/TIKA-1372
Project: Tika
Issue Type: Bug
Reporter: Tilman Haus
[
https://issues.apache.org/jira/browse/TIKA-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070847#comment-14070847
]
Tilman Hausherr commented on TIKA-1372:
---
IMHO the cause is TIKA not doing some null c
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145061#comment-14145061
]
Tilman Hausherr commented on TIKA-1419:
---
Thanks for making these tests. Would it be p
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145399#comment-14145399
]
Tilman Hausherr commented on TIKA-1419:
---
Maybe you could create a project for GSoC201
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1419:
--
Attachment: compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.xlsx
Here's an excel file, on the new co
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152855#comment-14152855
]
Tilman Hausherr commented on TIKA-1419:
---
Compare PDFBox's trunk against 1.8.x periodi
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165652#comment-14165652
]
Tilman Hausherr commented on TIKA-1427:
---
The first image ("Im1") is painted with "q 4
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1419:
--
Attachment: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx
Thank you [~talli...@apache.org], here's the result
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167194#comment-14167194
]
Tilman Hausherr commented on TIKA-1442:
---
Do you want the junk list in some format? Ju
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172978#comment-14172978
]
Tilman Hausherr commented on TIKA-1442:
---
files that have only junk as text with AR:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx
> Upgrade to PDFBox 1.8.8
> ---
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: (was: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx)
> Upgrade to PDFBox 1.8.8
> -
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173983#comment-14173983
]
Tilman Hausherr commented on TIKA-1442:
---
After some more research, I was able to deco
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx
> Upgrade to PDFBox 1.8.8
> ---
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180302#comment-14180302
]
Tilman Hausherr commented on TIKA-1442:
---
{quote}
and recommend other statistics that
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180440#comment-14180440
]
Tilman Hausherr commented on TIKA-1442:
---
Whats also missing this time is the token co
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180302#comment-14180302
]
Tilman Hausherr edited comment on TIKA-1442 at 10/22/14 8:06 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180446#comment-14180446
]
Tilman Hausherr commented on TIKA-1442:
---
Sorry, ignore my text re: 1st line only. It'
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180469#comment-14180469
]
Tilman Hausherr commented on TIKA-1442:
---
{quote}
Should I add token count?
{quote}
Y
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180636#comment-14180636
]
Tilman Hausherr commented on TIKA-1442:
---
Which are the top10words? I ask because 554/
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180687#comment-14180687
]
Tilman Hausherr commented on TIKA-1442:
---
Or does the top10words mean how many stop wo
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181779#comment-14181779
]
Tilman Hausherr commented on TIKA-1442:
---
Thanks!
I'm slowly starting, and here's the
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181779#comment-14181779
]
Tilman Hausherr edited comment on TIKA-1442 at 10/23/14 7:31 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181813#comment-14181813
]
Tilman Hausherr commented on TIKA-1442:
---
The directory structure isn't a problem for
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip
I'm done now; the result is two new issues, PDFBOX-2
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182047#comment-14182047
]
Tilman Hausherr commented on TIKA-1442:
---
A few files have less meta data than before:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173983#comment-14173983
]
Tilman Hausherr edited comment on TIKA-1442 at 10/24/14 11:02 AM:
---
[
https://issues.apache.org/jira/browse/TIKA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202456#comment-14202456
]
Tilman Hausherr commented on TIKA-1467:
---
The old and the new parser have different ap
[
https://issues.apache.org/jira/browse/TIKA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202456#comment-14202456
]
Tilman Hausherr edited comment on TIKA-1467 at 11/7/14 10:22 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225008#comment-14225008
]
Tilman Hausherr commented on TIKA-1442:
---
Thanks Tim!
892848.pdf and 892859.pdf shoul
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225008#comment-14225008
]
Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 8:38 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_6VPDFBox_1_8_8-b145.zip
> Upgrade to PDFBox 1.8.8
> --
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225283#comment-14225283
]
Tilman Hausherr commented on TIKA-1442:
---
[~talli...@apache.org] I'm really wondering
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225008#comment-14225008
]
Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 10:08 PM:
---
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225008#comment-14225008
]
Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 11:08 PM:
---
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225867#comment-14225867
]
Tilman Hausherr commented on TIKA-1442:
---
Ok, will do.
About the seq vs. nonSeq test:
Tilman Hausherr created TIKA-1489:
-
Summary: PDF Text extraction without permission
Key: TIKA-1489
URL: https://issues.apache.org/jira/browse/TIKA-1489
Project: Tika
Issue Type: Bug
Affec
[
https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226500#comment-14226500
]
Tilman Hausherr commented on TIKA-1489:
---
No, permissions are connected to encryption.
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx
Here's my evaluation of the test. I was
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx
> Upgrade to PDFBox 1.8.8
> ---
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: (was: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx)
> Upgrade to PDFBox 1.8.8
>
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228968#comment-14228968
]
Tilman Hausherr edited comment on TIKA-1442 at 11/30/14 10:49 PM:
---
[
https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230193#comment-14230193
]
Tilman Hausherr commented on TIKA-1489:
---
[~talli...@mitre.org] I can't tell you what
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230589#comment-14230589
]
Tilman Hausherr commented on TIKA-1442:
---
Weird thing in the 1.8.6 vs 1.8.8 test: acco
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230589#comment-14230589
]
Tilman Hausherr edited comment on TIKA-1442 at 12/1/14 10:44 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230589#comment-14230589
]
Tilman Hausherr edited comment on TIKA-1442 at 12/1/14 10:49 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_8-CLASSICVPDFBox_1_8_8-NONSEQ-b162.xlsx
Thanks... one problem in both exce
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_6VPDFBox_1_8_8-CLASSIC-b162.xlsx
I've now looked at the 1.8.6 vs 1.8.8 fil
[
https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316723#comment-14316723
]
Tilman Hausherr commented on TIKA-1548:
---
Sorry, no. We're not setting that one. It is
[
https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347377#comment-14347377
]
Tilman Hausherr commented on TIKA-1038:
---
[~talli...@mitre.org]are you watching this o
[
https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347377#comment-14347377
]
Tilman Hausherr edited comment on TIKA-1038 at 3/4/15 6:59 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362365#comment-14362365
]
Tilman Hausherr commented on TIKA-1575:
---
{code}
b) might be actual modest regressions
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362406#comment-14362406
]
Tilman Hausherr commented on TIKA-1575:
---
[~talli...@apache.org] please repeat the who
[
https://issues.apache.org/jira/browse/TIKA-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362552#comment-14362552
]
Tilman Hausherr commented on TIKA-1174:
---
Can't comment, I'm not that good with font i
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363061#comment-14363061
]
Tilman Hausherr commented on TIKA-1575:
---
Yes!
> Upgrade to PDFBox 1.8.9 when availab
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364710#comment-14364710
]
Tilman Hausherr commented on TIKA-1575:
---
Could you attach the TIKA output you get wit
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365524#comment-14365524
]
Tilman Hausherr commented on TIKA-1575:
---
I can't understand how you get the extracted
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365807#comment-14365807
]
Tilman Hausherr commented on TIKA-1575:
---
Can't tell, I don't know much about the stru
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365829#comment-14365829
]
Tilman Hausherr commented on TIKA-1575:
---
Thanks. Re: OCR, you should know that there
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368686#comment-14368686
]
Tilman Hausherr commented on TIKA-1575:
---
With the pure ExtractText, all is identical.
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368687#comment-14368687
]
Tilman Hausherr commented on TIKA-1575:
---
With the pure ExtractText, all is identical.
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1575:
--
Comment: was deleted
(was: With the pure ExtractText, all is identical. Could you attach the file
[
https://issues.apache.org/jira/browse/TIKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628890#comment-14628890
]
Tilman Hausherr commented on TIKA-1588:
---
The weird thing is that I can't find any dif
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632429#comment-14632429
]
Tilman Hausherr commented on TIKA-1678:
---
I think this is two bytes. I.e. a 0x0 and a
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632432#comment-14632432
]
Tilman Hausherr commented on TIKA-1678:
---
I get correct output for the non-XMP stuff w
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632429#comment-14632429
]
Tilman Hausherr edited comment on TIKA-1678 at 7/19/15 11:21 AM:
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632429#comment-14632429
]
Tilman Hausherr edited comment on TIKA-1678 at 7/19/15 11:22 AM:
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633687#comment-14633687
]
Tilman Hausherr commented on TIKA-1678:
---
sure:
{code}
public class Tika1678 extends B
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633722#comment-14633722
]
Tilman Hausherr commented on TIKA-1678:
---
Yes, such a string check would be useful. Or
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634045#comment-14634045
]
Tilman Hausherr commented on TIKA-1678:
---
Likely a bug. I tried calling getTitele afte
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634045#comment-14634045
]
Tilman Hausherr edited comment on TIKA-1678 at 7/20/15 8:41 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634065#comment-14634065
]
Tilman Hausherr commented on TIKA-1678:
---
Yes please do and attach the file. It's late
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637232#comment-14637232
]
Tilman Hausherr commented on TIKA-1678:
---
API has changed again. This code works:
{cod
[
https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845566#comment-17845566
]
Tilman Hausherr commented on TIKA-4254:
---
Why would we ever run the test twice in the
[
https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845590#comment-17845590
]
Tilman Hausherr edited comment on TIKA-4254 at 5/12/24 9:40 AM:
[
https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1907:
--
Fix Version/s: 3.0.0
> Big Pdf parsing to text - Out of memory
> ---
[
https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851598#comment-17851598
]
Tilman Hausherr commented on TIKA-4267:
---
The current version is 2.9.2, please retry
[
https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851598#comment-17851598
]
Tilman Hausherr edited comment on TIKA-4267 at 6/3/24 12:07 PM:
[
https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851598#comment-17851598
]
Tilman Hausherr edited comment on TIKA-4267 at 6/3/24 12:06 PM:
[
https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4267:
--
Affects Version/s: 1.28.4
> Not getting correct mimet type for few file extensions. example :csv
[
https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4267:
--
Summary: Not getting correct mime type for a few file extensions. example:
csv (was: Not gettin
[
https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed TIKA-4267.
-
Resolution: Invalid
Closing for now, please comment and/or reopen if needed.
> Not getting correc
[
https://issues.apache.org/jira/browse/TIKA-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4270:
--
Description:
We use tika to extract text from different sources, including images with text
tha
[
https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859718#comment-17859718
]
Tilman Hausherr commented on TIKA-4251:
---
I'm wondering if this means lots of changes
[
https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861075#comment-17861075
]
Tilman Hausherr commented on TIKA-4181:
---
Is this
{code:xml}
3.24.0
3.24.0
{c
[
https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861075#comment-17861075
]
Tilman Hausherr edited comment on TIKA-4181 at 7/1/24 7:02 AM:
-
[
https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861363#comment-17861363
]
Tilman Hausherr commented on TIKA-4181:
---
As a first step I've updated protobuf to cu
[
https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861555#comment-17861555
]
Tilman Hausherr commented on TIKA-4181:
---
PR 1849 has now succeeded.
> Tika Grpc Ser
Tilman Hausherr created TIKA-4274:
-
Summary: Improve ExtractReaderException
Key: TIKA-4274
URL: https://issues.apache.org/jira/browse/TIKA-4274
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863552#comment-17863552
]
Tilman Hausherr commented on TIKA-4274:
---
new output:
{noformat}
INFO [pool-3-thread
[
https://issues.apache.org/jira/browse/TIKA-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr resolved TIKA-4274.
---
Resolution: Fixed
> Improve ExtractReaderException
> --
>
>
[
https://issues.apache.org/jira/browse/TIKA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864670#comment-17864670
]
Tilman Hausherr commented on TIKA-4276:
---
Your file starts with "1 0 obj" instead of
[
https://issues.apache.org/jira/browse/TIKA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4276:
--
Description:
We use Tika to check file type and extension. However, with some damaged pdf
files
1 - 100 of 1104 matches
Mail list logo