Seid Mohammed: > I need this exactly solution. > Can you please tell me how could I DO IT? > I am badly in nead of it
> > On Thu, May 14, 2009 at 5:58 AM, Ridzwan Aminuddin > > <r...@world-check.com>wrote: > > > >> Hi all, > >> > >> Is Lucene able to index phrases instead if individual terms? If it is, can > >> we also feed it a 'thesaurus or dictionary' of phrases that it should look > >> out for when indexing. Thanks in advance, > >> > >> Ridzwan Hi Seid. I constructed something like this in my master degree. What I bascially did was to write a custom analyzer. However I identified the top 100 most searched phrases in my collection, and then filtered for those. If a document contained a identified phrase, then the analyzer would construct a Token for those terms. Another approach is to decide that only two-word phrases should be search for. And, for instance, find verbs and use the "word right after". Example string: "The boat was very fast". Tokens: "The boat", "was", "very fast". If your analyzer contain a range of phrases to search for, then it should be straightforward to identify those phrases and index them as Tokens. Just remember to set the start and end offset of the Token to correct values. You can have a look at this thesis here: http://asbjorn.fellinghaug.com/filer/master/Master_thesis.pdf And the java source code here: http://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429 Hope this helps. -- Asbjørn A. Fellinghaug asbj...@fellinghaug.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org