[jira] [Commented] (TIKA-3260) Update rotation.py to work with python3 and a more modern matplotlib

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260092#comment-17260092 ] Tim Allison commented on TIKA-3260: --- Depending on what fellow devs think, I'd like to st

[jira] [Commented] (TIKA-3260) Update rotation.py to work with python3 and a more modern matplotlib

2021-01-06 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260051#comment-17260051 ] Peter Kronenberg commented on TIKA-3260: Still trying to figure out the best way t

[jira] [Commented] (TIKA-3265) Tika 2.0.0 -- improvements to image preprocessing in TesseractOCRParser

2021-01-06 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260032#comment-17260032 ] Hudson commented on TIKA-3265: -- UNSTABLE: Integrated in Jenkins build Tika ยป tika-main-jdk8 #

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260026#comment-17260026 ] Lewis John McGibbney commented on TIKA-3258: # Excellent # 10K sounds adventur

[jira] [Created] (TIKA-3266) Generalize OCRParser so that users can service load custom ocr parsers

2021-01-06 Thread Tim Allison (Jira)
Tim Allison created TIKA-3266: - Summary: Generalize OCRParser so that users can service load custom ocr parsers Key: TIKA-3266 URL: https://issues.apache.org/jira/browse/TIKA-3266 Project: Tika

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260002#comment-17260002 ] Tim Allison commented on TIKA-3258: --- Hi [~lewismc], 1. Currently every parser EXCEPT

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259975#comment-17259975 ] Lewis John McGibbney commented on TIKA-3258: [~tallison] # please describe th

[jira] [Commented] (TIKA-3260) Update rotation.py to work with python3 and a more modern matplotlib

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259877#comment-17259877 ] Tim Allison commented on TIKA-3260: --- Yes, with the exception that we've moved the scient

[jira] [Commented] (TIKA-3260) Update rotation.py to work with python3 and a more modern matplotlib

2021-01-06 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259876#comment-17259876 ] Peter Kronenberg commented on TIKA-3260: So is this 'parsers classic' jar the equi

[jira] [Created] (TIKA-3265) Tika 2.0.0 -- improvements to image preprocessing in TesseractOCRParser

2021-01-06 Thread Tim Allison (Jira)
Tim Allison created TIKA-3265: - Summary: Tika 2.0.0 -- improvements to image preprocessing in TesseractOCRParser Key: TIKA-3265 URL: https://issues.apache.org/jira/browse/TIKA-3265 Project: Tika

[jira] [Commented] (TIKA-3260) Update rotation.py to work with python3 and a more modern matplotlib

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259861#comment-17259861 ] Tim Allison commented on TIKA-3260: --- Unfortunately, we don't publish snapshots to a repo

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Annie Didier (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259841#comment-17259841 ] Annie Didier commented on TIKA-3258: Yes, I'm ok with the choice default, assuming tha

[jira] [Comment Edited] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259814#comment-17259814 ] Tim Allison edited comment on TIKA-3258 at 1/6/21, 3:44 PM: [~

[jira] [Created] (TIKA-3264) Improve the per page OCR heuristics for AUTO mode

2021-01-06 Thread Tim Allison (Jira)
Tim Allison created TIKA-3264: - Summary: Improve the per page OCR heuristics for AUTO mode Key: TIKA-3264 URL: https://issues.apache.org/jira/browse/TIKA-3264 Project: Tika Issue Type: Improvemen

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259814#comment-17259814 ] Tim Allison commented on TIKA-3258: --- [~adidier], y, that's exactly the type of refinemen

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259813#comment-17259813 ] David Pilato commented on TIKA-3258: I really like having {{auto}} as the default mode

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259810#comment-17259810 ] Tim Allison commented on TIKA-3258: --- [~ndipiazza_gmail], yup! Exactly! For folks in co

[jira] [Comment Edited] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259808#comment-17259808 ] Nicholas DiPiazza edited comment on TIKA-3258 at 1/6/21, 3:33 PM: --

[jira] [Comment Edited] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259808#comment-17259808 ] Nicholas DiPiazza edited comment on TIKA-3258 at 1/6/21, 3:32 PM: --

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259809#comment-17259809 ] David Eric Pugh commented on TIKA-3258: --- I'm thinking that this is a pointer towards

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259808#comment-17259808 ] Nicholas DiPiazza commented on TIKA-3258: - [~tallison] so if I do not have tessera

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Annie Didier (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259807#comment-17259807 ] Annie Didier commented on TIKA-3258: Other than resource consumption, I don't see majo

[jira] [Comment Edited] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259646#comment-17259646 ] Tim Allison edited comment on TIKA-3258 at 1/6/21, 12:47 PM: -

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259646#comment-17259646 ] Tim Allison commented on TIKA-3258: --- [~dadoonet] and [~ndipiazza] wdyt? > Run OCR on PD