[jira] [Commented] (TIKA-3242) Allow users to send arbitrary metadata to tika-server per document

2020-12-16 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250686#comment-17250686 ] Hudson commented on TIKA-3242: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #

[jira] [Commented] (TIKA-3180) Tika 2.0.0 -- Modularize tika-server

2020-12-16 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250687#comment-17250687 ] Hudson commented on TIKA-3180: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #

[jira] [Commented] (TIKA-3251) Add fetchers

2020-12-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250678#comment-17250678 ] Tim Allison commented on TIKA-3251: --- I started a new branch: TIKA-3251 to work on this:

[jira] [Created] (TIKA-3251) Add fetchers

2020-12-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-3251: - Summary: Add fetchers Key: TIKA-3251 URL: https://issues.apache.org/jira/browse/TIKA-3251 Project: Tika Issue Type: Sub-task Reporter: Tim Allison

[jira] [Resolved] (TIKA-3242) Allow users to send arbitrary metadata to tika-server per document

2020-12-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3242. --- Fix Version/s: 2.0.0 Resolution: Fixed Prefix headers with {{meta_}}, and this will be passed i

Tika - problem with Polish - dropbox links

2020-12-16 Thread Mariusz G
I'm sending once again with dropbox links instead enclosed files. Mariusz Grubba https://www.dropbox.com/s/d9zjxj4vx40pd75/tika%20problem.JPG?dl=0 https://www.dropbox.com/s/ayjy2s262uvgp2t/sczytanie_pdf.ipynb?dl=0 https://www.dropbox.com/s/toiqsawqw7jxv79/KGHM_2019.pdf?dl=0

Re: FW: [EXTERNAL] Tika - problem with Polish encoding

2020-12-16 Thread Tilman Hausherr
Please upload your file to a sharehoster, and please detail what you expected and what you got instead, maybe about one specific line that you think is botched. Compare it with the extraction of Adobe Reader. Tilman Am 16.12.2020 um 18:21 schrieb Chris Mattmann: Copying the Tika dev list wher

[jira] [Commented] (TIKA-3180) Tika 2.0.0 -- Modularize tika-server

2020-12-16 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250487#comment-17250487 ] Hudson commented on TIKA-3180: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #

FW: [EXTERNAL] Tika - problem with Polish encoding

2020-12-16 Thread Chris Mattmann
Copying the Tika dev list where I think you will find the help you are looking for 😊 From: Mariusz G Date: Wednesday, December 16, 2020 at 7:04 AM To: "Mattmann, Chris A (US 1740)" Subject: [EXTERNAL] Tika - problem with Polish encoding Hello Sir, I'm writing to you because I tri

[jira] [Created] (TIKA-3250) Not getting rtf_meta:thumbnail property for BMP type of files during RTF parsing

2020-12-16 Thread Hardik (Jira)
Hardik created TIKA-3250: - Summary: Not getting rtf_meta:thumbnail property for BMP type of files during RTF parsing Key: TIKA-3250 URL: https://issues.apache.org/jira/browse/TIKA-3250 Project: Tika

[jira] [Updated] (TIKA-3249) Excel type of files are generated hidden during RTF file parsing

2020-12-16 Thread Hardik (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hardik updated TIKA-3249: -- Description: We are using EmbeddedDocumentExtractor class to parse RTF type of file.  Consider a case when Exce

[jira] [Updated] (TIKA-3249) Excel type of files are generated hidden during RTF file parsing

2020-12-16 Thread Hardik (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hardik updated TIKA-3249: -- Summary: Excel type of files are generated hidden during RTF file parsing (was: Excel type of files are generat

[jira] [Commented] (TIKA-3180) Tika 2.0.0 -- Modularize tika-server

2020-12-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250444#comment-17250444 ] ASF GitHub Bot commented on TIKA-3180: -- tballison merged pull request #394: URL: http

[jira] [Created] (TIKA-3249) Excel type of files are generated with default hidden

2020-12-16 Thread Hardik (Jira)
Hardik created TIKA-3249: - Summary: Excel type of files are generated with default hidden Key: TIKA-3249 URL: https://issues.apache.org/jira/browse/TIKA-3249 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-3180) Tika 2.0.0 -- Modularize tika-server

2020-12-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250443#comment-17250443 ] ASF GitHub Bot commented on TIKA-3180: -- tballison opened a new pull request #394: URL

[GitHub] [tika] tballison merged pull request #394: TIKA-3180 modularize tika-server

2020-12-16 Thread GitBox
tballison merged pull request #394: URL: https://github.com/apache/tika/pull/394 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison opened a new pull request #394: TIKA-3180 modularize tika-server

2020-12-16 Thread GitBox
tballison opened a new pull request #394: URL: https://github.com/apache/tika/pull/394 Modularize tika-server This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [tika] tballison merged pull request #393: Fix license head

2020-12-16 Thread GitBox
tballison merged pull request #393: URL: https://github.com/apache/tika/pull/393 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Commented] (TIKA-3180) Tika 2.0.0 -- Modularize tika-server

2020-12-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250275#comment-17250275 ] Tim Allison commented on TIKA-3180: --- I got the go ahead from [~lewismc]. If all goes as