> I know I missed the Friday deadline but... 
> 
> 
> 
> Has anyone any recommendations for parsing html. I use Lucene and the 
> example has its own HTML parser but I was wondering if anyone has used an 
> existing project or whether there is some built in functionality in an 
> Apache lib to convert 
> 

Swing has a pretty sweet event sax like parser that you can run headless.   You 
implement a callback interfaces and you have to fake it out a bit but it's very 
robust.  I used it once to create a utility to convert HTML documents to Struts 
jsp.  I could send you an example if interested.

http://java.sun.com/j2se/1.4.2/docs/api/javax/swing/text/html/HTMLEditorKit.ParserCallback.html

We are *working* on a dom like parser for the Shale clay plugin.  It's purpose 
is to use html templates to define a JSF rendered page.  Here's the test case:
http://svn.apache.org/viewcvs.cgi/struts/shale/trunk/clay-plugin/src/test/org/apache/shale/clay/parser/ParserTestCase.java?rev=267474&view=log

If you use the Clay parser and break it, please let us know :)

Gary

> 
> 
> 
> Your thoughts are appreciated. 
> 

Reply via email to