I am a brand new newbie with respect to Lucene, and I am just figuring out how to include it into an application I am building. (personal blog)
In essence I have a set of articles that reside in a database. Each one will have an Integer ID identifying it, Textual Title, some key parameters (such as date published, and categories) and body which is composed of text using the "textile" format (see http://textism.com/tools/textile/). I have two sets of questions which I don't quite understand 1) The Analyser Since the body has some special syntax, I assume I have to extend the analyser to skip the special symbols etc. Has anyone done this already? Is there a standard place to look? If not, do I have to start again from scratch, or can I just "configure" an existing one? (In particular, I have a routine which will take a textile input string and produce an html output string - so could I use the HTMLParser in the demo - alternatively JavaCC - is that something I could use? - just came across it whilst writing this mail) I ultimately want to put a summary of the text on the front portion of my web site. In order to calculate where the split is, and therefore how many articles to place it would be useful as I am analysing it to get some statistics like where is the end of the first paragraph. Is there a "hook" that I can plug into to get that information out (I scanned the javadocs, but I can't find anything obvious). 2) Use of different field types. I am stuggling to understand what field types I need for my different fields. For instance, I will want to index all the body of the article, so that the words it contains show up in searches, and I will also want to output the snippet around where the text is on a search page. However I can easily retrieve the article from the database given its ID. Would I therefore make the ID of the article a keyword, and the body of it unstored? and would I build a special space separated string of the (undetermined number of) categories and make them normal. TIA -- Alan Chandler http://www.chandlerfamily.org.uk Open Source. It's the difference between trust and antitrust. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]