I suspect what you are looking for is "Latent Semantics" - it can
algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google for
"Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply
some of those approaches using the TermVectors in Lucene index.
Ontologies such as WordNet are very generic, hence if you have a domain
specific corpus, you would need to generate some kind of Latent Semantic
Index to extract the relations therein.





On Tue, Jun 23, 2009 at 5:27 AM, Cool The Breezer
<techcool.ku...@yahoo.com>wrote:

>
> Of the late I started using Lucene as main search library for all documents
> in our intranet. It works extremely well. I am trying to use similarity
> kinda functionality to find similarity between two sentences/documents and
> trying to use Wordnet in our searching solution. I have used wordnet contrib
> package and it really works well to expand queries with synonyms and get
> results. But I can get handicap when searching for documents with query like
> "Steve Jobs" and documents containing "apple" should be returned. In the
> same way "pirated" and "willfull downloading copyrighted material". This
> comes finding meaning of a word wrt its context. Has anybody done any kind
> of such context based indexing that means while tokenization based on
> context of each word/token and searching the same after expanding the query
> using synonyms. I have come across some sf projects like
> http://wn-similarity.sourceforge.net/  to semantically relating words
> using wordnet but I am
>  still kinda confused on how to move ahead with such kind of context based
> search. Appreciate your help. I understand that this might not be directly
> related to Lucene but somehow this falls in the same domain search solution.
>
> - RB
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Similarity

Reply via email to