I suspect what you are looking for is "Latent Semantics" - it can algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google for "Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply some of those approaches using the TermVectors in Lucene index. Ontologies such as WordNet are very generic, hence if you have a domain specific corpus, you would need to generate some kind of Latent Semantic Index to extract the relations therein.
On Tue, Jun 23, 2009 at 5:27 AM, Cool The Breezer <techcool.ku...@yahoo.com>wrote: > > Of the late I started using Lucene as main search library for all documents > in our intranet. It works extremely well. I am trying to use similarity > kinda functionality to find similarity between two sentences/documents and > trying to use Wordnet in our searching solution. I have used wordnet contrib > package and it really works well to expand queries with synonyms and get > results. But I can get handicap when searching for documents with query like > "Steve Jobs" and documents containing "apple" should be returned. In the > same way "pirated" and "willfull downloading copyrighted material". This > comes finding meaning of a word wrt its context. Has anybody done any kind > of such context based indexing that means while tokenization based on > context of each word/token and searching the same after expanding the query > using synonyms. I have come across some sf projects like > http://wn-similarity.sourceforge.net/ to semantically relating words > using wordnet but I am > still kinda confused on how to move ahead with such kind of context based > search. Appreciate your help. I understand that this might not be directly > related to Lucene but somehow this falls in the same domain search solution. > > - RB > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >