[
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov closed TIKA-3149.
---
> Tikka 1.18 not working with tess4j 3.4.8 on linux
>
[
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov updated TIKA-3149:
Description:
I am using tikka 1.18 version to parse the docuemtn content. It is working
ind
[
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov resolved TIKA-3149.
-
Assignee: Konstantin Gribov
Resolution: Not A Bug
> Tikka 1.18 not working with tess
[
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331122#comment-17331122
]
Konstantin Gribov commented on TIKA-3149:
-
You have both slf4j-jdk14 (logger imple
[
https://issues.apache.org/jira/browse/TIKA-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov updated TIKA-3369:
Description:
Current main@08793d360a838db04a3d23b902c34d9e6e7362e4 fails with
{noformat}
[E
Konstantin Gribov created TIKA-3369:
---
Summary: Flaky Tesseract OCR confirmMultiPageTiffHandling test
Key: TIKA-3369
URL: https://issues.apache.org/jira/browse/TIKA-3369
Project: Tika
Issue
Hi, folks.
I hope for comments and kind of lazy consensus. If there would be no
objections I'll merge it to main and branch_1x.
I created tika-bom modules with bill-of-materials (in Apache Maven
terminology) / platform (for Gradle users). It will allow easy Tika module
versions alignment and to w
[
https://issues.apache.org/jira/browse/TIKA-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331107#comment-17331107
]
ASF GitHub Bot commented on TIKA-3368:
--
grossws opened a new pull request #432:
URL:
grossws opened a new pull request #432:
URL: https://github.com/apache/tika/pull/432
Fixes #TIKA-3368
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about thi
[
https://issues.apache.org/jira/browse/TIKA-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331102#comment-17331102
]
ASF GitHub Bot commented on TIKA-3367:
--
grossws opened a new pull request #431:
URL:
grossws opened a new pull request #431:
URL: https://github.com/apache/tika/pull/431
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, pleas
Konstantin Gribov created TIKA-3368:
---
Summary: Add Bill of Materials (BOM) artifact (Tika 1.x)
Key: TIKA-3368
URL: https://issues.apache.org/jira/browse/TIKA-3368
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov updated TIKA-3367:
Fix Version/s: (was: 1.27)
> Add Bill of Materials (BOM) artifact
>
Konstantin Gribov created TIKA-3367:
---
Summary: Add Bill of Materials (BOM) artifact
Key: TIKA-3367
URL: https://issues.apache.org/jira/browse/TIKA-3367
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved TIKA-3363.
Fix Version/s: (was: 1.27)
Resolution: Won't Fix
> Have tika-docker artif
[
https://issues.apache.org/jira/browse/TIKA-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney closed TIKA-3363.
--
> Have tika-docker artifacts start in spawn mode (configurable)
> --
Lewis John McGibbney created TIKA-3366:
--
Summary: Retrospective release of tika-docker 2.0.0-ALPHA
Key: TIKA-3366
URL: https://issues.apache.org/jira/browse/TIKA-3366
Project: Tika
Issue
Hi Folks,
If you are interested in participating in a mini meetup based around Apache
Tika container orchestration then please indicate your preferred
availability at the Doodle Poll below.
This community meetup focuses on Tika container orchestration (Docker,
Docker Compose, Helm, Kubernetes, etc.
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882
]
David Pilato edited comment on TIKA-3364 at 4/23/21, 4:05 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882
]
David Pilato edited comment on TIKA-3364 at 4/23/21, 4:04 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882
]
David Pilato edited comment on TIKA-3364 at 4/23/21, 4:03 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882
]
David Pilato edited comment on TIKA-3364 at 4/23/21, 4:03 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882
]
David Pilato commented on TIKA-3364:
Oh my god! I'm feeling stupid.
Anyway, I was not
Gordon Allen created TIKA-3365:
--
Summary: RTFParser to XMLContentHandler incorrectly interprets en
dash.
Key: TIKA-3365
URL: https://issues.apache.org/jira/browse/TIKA-3365
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330851#comment-17330851
]
Tim Allison commented on TIKA-3364:
---
try {{pdfParser.setExtractBookmarksText(false);}}
[You are receiving this because you're subscribed to one or more dev@
mailing lists for an Apache project, or the ApacheCon Announce list.]
Time is running out to submit your talk for ApacheCon 2021.
The Call for Presentations for ApacheCon @Home 2021, focused on Europe
and North America time zo
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330827#comment-17330827
]
Nick Burch commented on TIKA-3364:
--
I'm not sure if we already have outlines/bookmarks el
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824
]
David Pilato edited comment on TIKA-3364 at 4/23/21, 2:39 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824
]
David Pilato commented on TIKA-3364:
So I trie this:
{code:java}
PDFPars
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824
]
David Pilato edited comment on TIKA-3364 at 4/23/21, 2:38 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330823#comment-17330823
]
Hudson commented on TIKA-3324:
--
FAILURE: Integrated in Jenkins build Tika ยป tika-main-jdk8 #2
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330810#comment-17330810
]
Tim Allison commented on TIKA-3364:
---
We should probably add extra markup in the xhtml to
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330809#comment-17330809
]
Tim Allison commented on TIKA-3364:
---
You can see the text under the {{Outlines}} node.
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3364:
--
Attachment: Screenshot from 2021-04-23 10-15-22.png
> PDF Content is extracted twice
> -
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330805#comment-17330805
]
Tim Allison edited comment on TIKA-3364 at 4/23/21, 2:13 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330805#comment-17330805
]
Tim Allison commented on TIKA-3364:
---
{noformat}
Dummy PDF file
{noformat}
> PDF C
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3364:
--
Attachment: tika-bookmarks-config.xml
> PDF Content is extracted twice
> --
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330799#comment-17330799
]
Tim Allison commented on TIKA-3364:
---
The PDF contains bookmark text, which is what is tr
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330799#comment-17330799
]
Tim Allison edited comment on TIKA-3364 at 4/23/21, 2:08 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330799#comment-17330799
]
Tim Allison edited comment on TIKA-3364 at 4/23/21, 2:08 PM:
-
David Pilato created TIKA-3364:
--
Summary: PDF Content is extracted twice
Key: TIKA-3364
URL: https://issues.apache.org/jira/browse/TIKA-3364
Project: Tika
Issue Type: Bug
Components: p
41 matches
Mail list logo