[ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966721#action_12966721 ]
Oleg Tikhonov edited comment on TIKA-245 at 12/6/10 1:52 AM: ------------------------------------------------------------- A couple of weeks ago I received the answer from SourceForge.net: "My apologies for not passing this message on sooner, however the project admin has responded that he is not willing to give up this project at this time. As such, we are not fulfilling this takeover request." The library as it is today contains critical bugs, because the fact that project is abandoned I cannot fix its bugs, thus I would exclude it as an option. Other option - 7-Zip-JBinding (http://sourceforge.net/projects/sevenzipjbind/develop/). I've implemented chm parser using this library, it works pretty well, the throughput of html extracting is about 5mb/sec. However, it's licensed under LGPL. I've asked Boris Brodski (the developer of that library) if he could re-license it for us. Here is a link to the discussion between him and Igor Pavlov (the author of 7Zip). http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3983892 What do you think? BR, Oleg > Support of CHM Format > --------------------- > > Key: TIKA-245 > URL: https://issues.apache.org/jira/browse/TIKA-245 > Project: Tika > Issue Type: New Feature > Components: parser > Environment: All > Reporter: Karl Heinz Marbaise > Priority: Minor > Attachments: TIKA-245.tikhonov.20103107.patch.txt > > > It might be a good idea to support the CHM File format of Windows. Some > information about > http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. > The CHM format contains HTML files which can be parsed by Tika. So the "only" > problem is to extract the data from the CHM file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.