[jira] Issue Comment Edited: (TIKA-245) Support of CHM Format

Oleg Tikhonov (JIRA) Sun, 05 Dec 2010 22:54:44 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966721#action_12966721
 ]


Oleg Tikhonov edited comment on TIKA-245 at 12/6/10 1:52 AM:
-------------------------------------------------------------

A couple of weeks ago I received the answer from SourceForge.net:
"My apologies for not passing this message on sooner, however the project  
admin has responded that he is not willing to give up this project at this 
time. As such, we are not fulfilling this takeover request."

The library as it is today contains critical bugs, because the fact that 
project is abandoned I cannot fix its bugs, thus I would exclude it as an 
option.

Other option - 7-Zip-JBinding 
(http://sourceforge.net/projects/sevenzipjbind/develop/). I've implemented chm 
parser using this library, it works pretty well, the throughput of html 
extracting is about 5mb/sec. However, it's licensed under LGPL. I've asked 
Boris Brodski (the developer of that library) if he could re-license it for us. 
Here is a link to the discussion between him and Igor Pavlov (the author of 
7Zip).
http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3983892

What do you think?

BR,
Oleg 

  
> Support of CHM Format
> ---------------------
>
>                 Key: TIKA-245
>                 URL: https://issues.apache.org/jira/browse/TIKA-245
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>         Environment: All
>            Reporter: Karl Heinz Marbaise
>            Priority: Minor
>         Attachments: TIKA-245.tikhonov.20103107.patch.txt
>
>
> It might be a good idea to support the CHM File format of Windows. Some 
> information about 
> http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. 
> The CHM format contains HTML files which can be parsed by Tika. So the "only" 
> problem is to extract the data from the CHM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (TIKA-245) Support of CHM Format

Reply via email to