Hi Oleg, On Jun 7, 2011, at 6:28 AM, Oleg Tikhonov wrote:
> Hi Chris, > > I've applied the patch to the > tika-parsers/src/main/java/org/apache/tika/parser/chm, also added 3 chm > files to the tika-parsers\src\test\resources\test-documents and the tests. Thanks sorry I think I confused you with my comments on the JIRA issue. Please uncommit the patch from Tika. By: >> the patch should be applied to the Tika source tree format (e.g., >> tika-parsers/src/main/java/org/apache/tika/parsers/chm) I didn't mean literally to "commit the patch to SVN" :-) I meant, if you looked at your patch inside of it, it didn't put the Java class files in the appropriate TIka source code area (e.g., tika-parsers/src/main/java/org/apache/tika/parsers/chm). Please revert r1132997, and then just modify your patch to make sure that your java classes and files fit into the appropriate Tika source code area. Then please attach a new patch real quick so I (or some other committer) can verify and then you're good to go. Cheers, Chris > > BR, > Oleg > > On Sun, Jun 5, 2011 at 1:32 AM, Chris A. Mattmann (JIRA) > <j...@apache.org>wrote: > >> >> [ >> https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044403#comment-13044403] >> >> Chris A. Mattmann commented on TIKA-245: >> ---------------------------------------- >> >> Hi Oleg, >> >> Looking over this patch, I have a few recommendations: >> >> # the patch should be applied to the Tika source tree format (e.g., >> tika-parsers/src/main/java/org/apache/tika/parsers/chm) >> # Many of the class-top-level comments can probably be removed and thrown >> up on the Tika Wiki >> # it would be nice to include at least a unit test or 2 to know this is >> working. It's a huge patch, and I don't have a lot of CHM files to test it >> out on (being a Mac guy :-) ) >> >> Cheers, >> Chris >> >> >> >>> Support of CHM Format >>> --------------------- >>> >>> Key: TIKA-245 >>> URL: https://issues.apache.org/jira/browse/TIKA-245 >>> Project: Tika >>> Issue Type: New Feature >>> Components: parser >>> Environment: All >>> Reporter: Karl Heinz Marbaise >>> Priority: Minor >>> Attachments: TIKA-245.tikhonov.04082011.patch.txt, >> TIKA-245.tikhonov.20103107.patch.txt, TIKA-245.tikhonov.20112603.txt, >> TIKA-245.tikhonov.20112703.txt >>> >>> >>> It might be a good idea to support the CHM File format of Windows. Some >> information about >> http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. >> The CHM format contains HTML files which can be parsed by Tika. So the >> "only" problem is to extract the data from the CHM file. >> >> -- >> This message is automatically generated by JIRA. >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++