[jira] [Resolved] (TIKA-2586) PDFParser documentation has incorrect DPI default

2018-02-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2586. --- Resolution: Fixed Thank you! > PDFParser documentation has incorrect DPI default > ---

[jira] [Commented] (TIKA-2570) Tika 1.17 uses vulnerable Jackson version 2.9.2

2018-02-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372152#comment-16372152 ] Hudson commented on TIKA-2570: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1437 (See [h

[jira] [Created] (TIKA-2586) PDFParser documentation has incorrect DPI default

2018-02-21 Thread Ewan Mellor (JIRA)
Ewan Mellor created TIKA-2586: - Summary: PDFParser documentation has incorrect DPI default Key: TIKA-2586 URL: https://issues.apache.org/jira/browse/TIKA-2586 Project: Tika Issue Type: Improvemen

[jira] [Commented] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372078#comment-16372078 ] ASF GitHub Bot commented on TIKA-2584: -- ewanmellor opened a new pull request #224: Fix

[jira] [Commented] (TIKA-2563) Extract embedded objects in HTML and javascript

2018-02-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372077#comment-16372077 ] Hudson commented on TIKA-2563: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1436 (See [h

[jira] [Commented] (TIKA-2581) testOCROutputsHOCR fails with Tesseract 4.0

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372075#comment-16372075 ] ASF GitHub Bot commented on TIKA-2581: -- ewanmellor commented on issue #221: Fix for TI

[jira] [Commented] (TIKA-2583) Tika readme should mention builds.apache.org

2018-02-21 Thread Ewan Mellor (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372074#comment-16372074 ] Ewan Mellor commented on TIKA-2583: --- I wasn't trying to tell users where to find builds,

[jira] [Resolved] (TIKA-2570) Tika 1.17 uses vulnerable Jackson version 2.9.2

2018-02-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2570. --- Resolution: Fixed Fix Version/s: 2.0.0 1.18 > Tika 1.17 uses vulnerable Jacks

[jira] [Commented] (TIKA-2570) Tika 1.17 uses vulnerable Jackson version 2.9.2

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372064#comment-16372064 ] ASF GitHub Bot commented on TIKA-2570: -- tballison closed pull request #219: Fix for TI

[jira] [Commented] (TIKA-2585) TikaInputStream support for resetting via a factory of InputStreams

2018-02-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372063#comment-16372063 ] Nick Burch commented on TIKA-2585: -- I can't immediately see a common / well known class/in

[jira] [Updated] (TIKA-2585) TikaInputStream support for resetting via a factory of InputStreams

2018-02-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-2585: - Description: As raised in the 2.0 breaking changes thread, currently the only way that Tika has of handlin

[jira] [Created] (TIKA-2585) TikaInputStream support for resetting via a factory of InputStreams

2018-02-21 Thread Nick Burch (JIRA)
Nick Burch created TIKA-2585: Summary: TikaInputStream support for resetting via a factory of InputStreams Key: TIKA-2585 URL: https://issues.apache.org/jira/browse/TIKA-2585 Project: Tika Issue

[jira] [Resolved] (TIKA-2563) Extract embedded objects in HTML and javascript

2018-02-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2563. --- Resolution: Fixed Assignee: Tim Allison Fix Version/s: 2.0.0 1.18 >

[jira] [Commented] (TIKA-2583) Tika readme should mention builds.apache.org

2018-02-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372052#comment-16372052 ] Nick Burch commented on TIKA-2583: -- ASF policy is that "users" should only be directed to

[jira] [Commented] (TIKA-2581) testOCROutputsHOCR fails with Tesseract 4.0

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372048#comment-16372048 ] ASF GitHub Bot commented on TIKA-2581: -- Gagravarr commented on issue #221: Fix for TIK

[jira] [Created] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

2018-02-21 Thread Ewan Mellor (JIRA)
Ewan Mellor created TIKA-2584: - Summary: Tika should have a way to pass arbitrary Tesseract options Key: TIKA-2584 URL: https://issues.apache.org/jira/browse/TIKA-2584 Project: Tika Issue Type: I

[jira] [Commented] (TIKA-2583) Tika readme should mention builds.apache.org

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372031#comment-16372031 ] ASF GitHub Bot commented on TIKA-2583: -- ewanmellor opened a new pull request #223: Fix

[jira] [Commented] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372029#comment-16372029 ] ASF GitHub Bot commented on TIKA-2582: -- ewanmellor opened a new pull request #222: Fix

[jira] [Created] (TIKA-2583) Tika readme should mention builds.apache.org

2018-02-21 Thread Ewan Mellor (JIRA)
Ewan Mellor created TIKA-2583: - Summary: Tika readme should mention builds.apache.org Key: TIKA-2583 URL: https://issues.apache.org/jira/browse/TIKA-2583 Project: Tika Issue Type: Bug C

[jira] [Created] (TIKA-2580) SafeContentHandler documentation is incorrect about replacement character

2018-02-21 Thread Ewan Mellor (JIRA)
Ewan Mellor created TIKA-2580: - Summary: SafeContentHandler documentation is incorrect about replacement character Key: TIKA-2580 URL: https://issues.apache.org/jira/browse/TIKA-2580 Project: Tika

[jira] [Updated] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-02-21 Thread Ewan Mellor (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Mellor updated TIKA-2582: -- Description: Tesseract 4.0 includes a change to use form feed characters to separate pages by default in

[jira] [Created] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-02-21 Thread Ewan Mellor (JIRA)
Ewan Mellor created TIKA-2582: - Summary: Tesseract 4.0 includes a FF character by default, breaking parsers Key: TIKA-2582 URL: https://issues.apache.org/jira/browse/TIKA-2582 Project: Tika Issu

[jira] [Commented] (TIKA-2581) testOCROutputsHOCR fails with Tesseract 4.0

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371999#comment-16371999 ] ASF GitHub Bot commented on TIKA-2581: -- ewanmellor opened a new pull request #221: Fix

[jira] [Commented] (TIKA-2570) Tika 1.17 uses vulnerable Jackson version 2.9.2

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371946#comment-16371946 ] ASF GitHub Bot commented on TIKA-2570: -- ewanmellor opened a new pull request #219: Fix

[jira] [Updated] (TIKA-2581) testOCROutputsHOCR fails with Tesseract 4.0

2018-02-21 Thread Ewan Mellor (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Mellor updated TIKA-2581: -- Description: TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0. With 3.x, the output is

[jira] [Created] (TIKA-2581) testOCROutputsHOCR fails with Tesseract 4.0

2018-02-21 Thread Ewan Mellor (JIRA)
Ewan Mellor created TIKA-2581: - Summary: testOCROutputsHOCR fails with Tesseract 4.0 Key: TIKA-2581 URL: https://issues.apache.org/jira/browse/TIKA-2581 Project: Tika Issue Type: Bug Co

[jira] [Commented] (TIKA-2580) SafeContentHandler documentation is incorrect about replacement character

2018-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371955#comment-16371955 ] ASF GitHub Bot commented on TIKA-2580: -- ewanmellor opened a new pull request #220: Fix

FINAL REMINDER: CFP for Apache EU Roadshow Closes 25th February

2018-02-21 Thread Sharan F
Hello Apache Supporters and Enthusiasts This is your FINAL reminder that the Call for Papers (CFP) for the Apache EU Roadshow is closing soon. Our Apache EU Roadshow will focus on Cloud, IoT, Apache Tomcat, Apache Http and will run from 13-14 June 2018 in Berlin. Note that the CFP deadline has

[jira] [Assigned] (TIKA-2579) Update to PDFBox 2.0.9 when available

2018-02-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2579: - Assignee: Tim Allison > Update to PDFBox 2.0.9 when available > --

[jira] [Commented] (TIKA-2579) Update to PDFBox 2.0.9 when available

2018-02-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371527#comment-16371527 ] Tim Allison commented on TIKA-2579: --- +1 Thank you for opening this. I've been away from

[jira] [Updated] (TIKA-2579) Update to PDFBox 2.0.9 when available

2018-02-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2579: -- Summary: Update to PDFBox 2.0.9 when available (was: Update to PDFBox 2.0.9) > Update to PDFBox 2.0.9 w

[jira] [Created] (TIKA-2579) Update to PDFBox 2.0.9

2018-02-21 Thread David Pilato (JIRA)
David Pilato created TIKA-2579: -- Summary: Update to PDFBox 2.0.9 Key: TIKA-2579 URL: https://issues.apache.org/jira/browse/TIKA-2579 Project: Tika Issue Type: Improvement Components: p