[jira] [Commented] (TIKA-3494) Allow legacy combined doc extract in pipes module

2021-07-22 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385791#comment-17385791 ] Hudson commented on TIKA-3494: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #2

[jira] [Commented] (TIKA-3489) Robots.txt files frequently identified as message/rfc822

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385761#comment-17385761 ] Tim Allison commented on TIKA-3489: --- So, I'll leave this as is {{text/x-robots}} and bac

[jira] [Resolved] (TIKA-3494) Allow legacy combined doc extract in pipes module

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3494. --- Fix Version/s: 2.0.1 Assignee: Tim Allison Resolution: Fixed [~ndipiazza_gmail] please

[jira] [Updated] (TIKA-3494) Allow legacy combined doc extract in pipes module

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3494: -- Priority: Major (was: Minor) > Allow legacy combined doc extract in pipes module >

[jira] [Resolved] (TIKA-3490) Fix serialization in opensearch emitter for embedded documents

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3490. --- Fix Version/s: 2.0.1 Assignee: Tim Allison Resolution: Fixed > Fix serialization in op

[jira] [Comment Edited] (TIKA-3490) Fix serialization in opensearch emitter for embedded documents

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385707#comment-17385707 ] Tim Allison edited comment on TIKA-3490 at 7/22/21, 6:38 PM: -

[jira] [Commented] (TIKA-3490) Fix serialization in opensearch emitter for embedded documents

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385707#comment-17385707 ] Tim Allison commented on TIKA-3490: --- I found and fixed several other bugs while working

RE: JIRA...sorry

2021-07-22 Thread Uwe Schindler
You other reply went through after I added the fix to my spamfilter rules. No need to send a mail here! X-Spam-Status: No, score=-1.41 X-Spamd-Result: default: False [-1.41 / 15.00]; HAS_REPLYTO(0.00)[dev@tika.apache.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[

Re: JIRA...sorry

2021-07-22 Thread Tim Allison
Testing. Spam-or-not? On Thu, Jul 22, 2021 at 12:09 PM Uwe Schindler wrote: > > Hi, Tim, > > Your "sorry" mail and the one previously actually caused the rspamd spam > checker to trigger with a very bad score of (7 cuaused by a duplicate > reply-to header), which is actually a bug in the ezmlm

Re: TesseractOCRConfig.setTesseractPath moved to TesseractOCRParser

2021-07-22 Thread Tim Allison
Hi David, It has been a while (I think) since I made that change. The notion was to improve security and lock down the location of the tesseract executable to the initialization phase of the parser -- as you know, you can set it via the tika-config.xml. What I absolutely wanted to avoid was enab

TesseractOCRConfig.setTesseractPath moved to TesseractOCRParser

2021-07-22 Thread David Pilato
Hey team I'm wondering what was the reasoning behind the move of  setTesseractPath(String) method from TesseractOCRConfig to TesseractOCRParser. Why setting the Tesseract binary path is not considered as a configuration anymore? Sorry if this has been discussed previously. FWIW, the javadoc o

RE: JIRA...sorry

2021-07-22 Thread Uwe Schindler
Hi, Tim, Your "sorry" mail and the one previously actually caused the rspamd spam checker to trigger with a very bad score of (7 cuaused by a duplicate reply-to header), which is actually a bug in the ezmlm mailing list software. I opened a bug report: https://github.com/rspamd/rspamd/issues/38

[jira] [Commented] (TIKA-3490) Fix serialization in opensearch emitter for embedded documents

2021-07-22 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385632#comment-17385632 ] Hudson commented on TIKA-3490: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #2

[jira] [Commented] (TIKA-3489) Robots.txt files frequently identified as message/rfc822

2021-07-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385631#comment-17385631 ] Sebastian Nagel commented on TIKA-3489: --- [~nick]: agreed, sounds plausible. > Robot

[jira] [Updated] (TIKA-3494) Allow legacy combined doc extract in pipes module

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3494: -- Description: The pipes module is built around the RecursiveParserWrapper, and I'm happy with that being

[jira] [Commented] (TIKA-3489) Robots.txt files frequently identified as message/rfc822

2021-07-22 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385514#comment-17385514 ] Nick Burch commented on TIKA-3489: -- I'm not keen on us throwing away information we can e

[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385505#comment-17385505 ] David Pilato commented on TIKA-3493: {quote}It doesn't look like the RTF specifies a t

[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385472#comment-17385472 ] Tim Allison commented on TIKA-3493: --- It doesn't look like the RTF specifies a timezone:

[jira] [Commented] (TIKA-3489) Robots.txt files frequently identified as message/rfc822

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385470#comment-17385470 ] Tim Allison commented on TIKA-3489: --- Will change to {{text/plain}} today unless [~nick]

[jira] [Assigned] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-3493: - Assignee: Tim Allison > dcterms:created date depends on the current TimeZone in RTF documents > -

[jira] [Updated] (TIKA-3494) Allow legacy combined doc extract in pipes module

2021-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3494: -- Component/s: tika-pipes > Allow legacy combined doc extract in pipes module > --

[jira] [Created] (TIKA-3494) Allow legacy combined doc extract in pipes module

2021-07-22 Thread Tim Allison (Jira)
Tim Allison created TIKA-3494: - Summary: Allow legacy combined doc extract in pipes module Key: TIKA-3494 URL: https://issues.apache.org/jira/browse/TIKA-3494 Project: Tika Issue Type: New Featur

Re: Access to Tika Wiki

2021-07-22 Thread Tim Allison
Yes, please!!! Done! On Thu, Jul 22, 2021 at 5:49 AM David Pilato wrote: > Hey > > > As I'm moving my project to Tika 2.0.0, I would like to edit the Migrating > to Tika 2.0.0 page ( > https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0). > > My username on the wiki site is

JIRA...sorry

2021-07-22 Thread Tim Allison
All, I'm sorry for all of the JIRA spam when I migrated issues to 2.0.1. Onwards. Best, Tim

[jira] [Comment Edited] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385445#comment-17385445 ] David Pilato edited comment on TIKA-3493 at 7/22/21, 11:59 AM: -

[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385445#comment-17385445 ] David Pilato commented on TIKA-3493: I attached a patch which adds a unit test.  It i

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: Test_case_to_demo_the_change_with_Tika_1_x1.patch > dcterms:created date depends on the cu

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: (was: Test_case_to_demo_the_change_with_Tika_1_x.patch) > dcterms:created date depends

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: Test_case_to_demo_the_change_with_Tika_1_x.patch > dcterms:created date depends on the cur

[jira] [Commented] (TIKA-3348) Improve the workflow for extracting and returning images from PDFs and other containers using Tika Server..

2021-07-22 Thread Simon Lucy (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385442#comment-17385442 ] Simon Lucy commented on TIKA-3348: --- In relation to inline images sometimes being sliced

[jira] [Created] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
David Pilato created TIKA-3493: -- Summary: dcterms:created date depends on the current TimeZone in RTF documents Key: TIKA-3493 URL: https://issues.apache.org/jira/browse/TIKA-3493 Project: Tika

Access to Tika Wiki

2021-07-22 Thread David Pilato
Hey As I'm moving my project to Tika 2.0.0, I would like to edit the Migrating to Tika 2.0.0 page (https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0). My username on the wiki site is dadoonet. Could you give me a write access? David -- David Pilato, elastic.co Develop

[jira] [Created] (TIKA-3492) Upgrade version for TPS: rome to 1.16.0 in tika-bundle

2021-07-22 Thread Shubhangi Raut (Jira)
Shubhangi Raut created TIKA-3492: Summary: Upgrade version for TPS: rome to 1.16.0 in tika-bundle Key: TIKA-3492 URL: https://issues.apache.org/jira/browse/TIKA-3492 Project: Tika Issue Type:

[jira] [Created] (TIKA-3491) Upgrade version for TPS: commons-compress to 1.21 in tika-bundle

2021-07-22 Thread Shubhangi Raut (Jira)
Shubhangi Raut created TIKA-3491: Summary: Upgrade version for TPS: commons-compress to 1.21 in tika-bundle Key: TIKA-3491 URL: https://issues.apache.org/jira/browse/TIKA-3491 Project: Tika