> I know I missed the Friday deadline but... > > > > Has anyone any recommendations for parsing html. I use Lucene and the > example has its own HTML parser but I was wondering if anyone has used an > existing project or whether there is some built in functionality in an > Apache lib to convert >
Swing has a pretty sweet event sax like parser that you can run headless. You implement a callback interfaces and you have to fake it out a bit but it's very robust. I used it once to create a utility to convert HTML documents to Struts jsp. I could send you an example if interested. http://java.sun.com/j2se/1.4.2/docs/api/javax/swing/text/html/HTMLEditorKit.ParserCallback.html We are *working* on a dom like parser for the Shale clay plugin. It's purpose is to use html templates to define a JSF rendered page. Here's the test case: http://svn.apache.org/viewcvs.cgi/struts/shale/trunk/clay-plugin/src/test/org/apache/shale/clay/parser/ParserTestCase.java?rev=267474&view=log If you use the Clay parser and break it, please let us know :) Gary > > > > Your thoughts are appreciated. >