Re: tika's beta dependency

2011-12-01 Thread Jukka Zitting
Hi, On Thu, Dec 1, 2011 at 2:38 PM, ankush chadha wrote: > Are there any plans to move to RC version of metadata-extractor > jar file? Currently it is pointing to a beta jar. You can start such a plan by filing an improvement request [1] for the dependency update. Note that for Tika to use a ne

[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-12-01 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160919#comment-13160919 ] Jukka Zitting commented on TIKA-623: bq. Is there some way to proceed here without requi

tika's beta dependency

2011-12-01 Thread ankush chadha
Hi Are there any plans to move to RC version of metadata-extractor jar file? Currently it is pointing to a beta jar. Thanks Ankush

[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-12-01 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160907#comment-13160907 ] Michael McCandless commented on TIKA-623: - Is there some way to proceed here without

[jira] [Commented] (TIKA-796) Tika breaks words of rotated text in PDF documents

2011-12-01 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160841#comment-13160841 ] Michael McCandless commented on TIKA-796: - This looks like a dup of TIKA-723? Note

[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-12-01 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160838#comment-13160838 ] Andrzej Bialecki commented on TIKA-623: Mark, visiting the github link to the proje

[jira] [Created] (TIKA-796) Tika breaks words of rotated text in PDF documents

2011-12-01 Thread Franz Canaval (Created) (JIRA)
Tika breaks words of rotated text in PDF documents -- Key: TIKA-796 URL: https://issues.apache.org/jira/browse/TIKA-796 Project: Tika Issue Type: Bug Components: parser Affects Ve

[jira] [Commented] (TIKA-793) Invalid ASCII character (65533) when retriving MP3 metadata

2011-12-01 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160810#comment-13160810 ] Nick Burch commented on TIKA-793: - I've managed to reproduce this on one of my Amazon MP3s,

Re: Tesseract OCR engine

2011-12-01 Thread Alex Ott
You can also look onto Cuneiform OCR... I think, that easiest way to integrate them into Tika - allow user to specify external script that will be called from Tika and that should return recognized text On Wed, Nov 30, 2011 at 10:48 PM, Albert Law (Logik) wrote: > Hi Chris, > > I agree with Oleg.