[ https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636013#comment-16636013 ]
Luis Filipe Nassif commented on TIKA-2473: ------------------------------------------ Hi [~mcaruanagalizia], I think jbig2 is handled internally in pdfbox and no image conversion is done by Tika. If we want to do that, I think the conversion could be called from TesseractOCRParser. > PCX and DCX image support > ------------------------- > > Key: TIKA-2473 > URL: https://issues.apache.org/jira/browse/TIKA-2473 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.16 > Reporter: Matthew Caruana Galizia > Priority: Major > > It's straightforward in theory to implement support for PCX and DCX. There's > support for it in Commons Imaging as well as in ImageIO via TwelveMonkeys. > In practise, however, I'm not really sure how implement support. We obviously > want to OCR the images, but Tesseract has no support for the format. So where > do we do the conversion to a BufferedImage? I tried to look for what is done > to handle JBIG2 files but I can't find that anywhere. -- This message was sent by Atlassian JIRA (v7.6.3#76005)