Re: Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Kasun Perera
Hi Erick On Fri, Apr 20, 2012 at 5:14 PM, Erick Erickson wrote: > Maybe I'm missing something here, but why not just boost the > terms in the fields at query time? > Yes I can boost the fields in the query time. But I'm using the termFreqVector get term frequencies and then calculate the TFIDF v

RE: Highlighter and Shingles...

2012-04-20 Thread Steven A Rowe
Hi Dawn, Can you give an example of a "partial match"? Steve -Original Message- From: Dawn Zoë Raison [mailto:d...@digitorial.co.uk] Sent: Friday, April 20, 2012 7:59 AM To: java-user@lucene.apache.org Subject: Highlighter and Shingles... Hi, Are there any notes on making the highligh

Highlighter and Shingles...

2012-04-20 Thread Dawn Zoë Raison
Hi, Are there any notes on making the highlighter work consistently with a shingle generated index? I have a situation where complete matches highlight OK, but partial matches do not - leading to a number of blank previews... Our analyser look like: TokenStream result =

Re: Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Erick Erickson
Maybe I'm missing something here, but why not just boost the terms in the fields at query time? Best Erick On Fri, Apr 20, 2012 at 4:20 AM, Kasun Perera wrote: > I have documents that are marked up with Taxonomy and Ontology terms > separately. > When I calculate the document similarity, I want

Re: DisjunctionMaxQuery and scoring

2012-04-20 Thread Benson Margulies
Uwe and Robert, Thanks. David and I are two peas in one pod here at Basis. --benson On Fri, Apr 20, 2012 at 2:33 AM, Uwe Schindler wrote: > Hi, > > Ah sorry, I misunderstood, you wanted to score the duplicate match lower! To > achieve this, you have to change the coord function in your > simila

Re: Field value vs TokenStream

2012-04-20 Thread Carsten Schnober
Am 18.04.2012 20:06, schrieb Uwe Schindler: Hi, > You should inform yourself about the difference between "stored" and > "indexed" fields: The tokens in the ".tis" file are in fact the analyzed > tokens retrieved from the TokenStream. This is controlled by the Field > parameter Field.Index. The F

Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Kasun Perera
I have documents that are marked up with Taxonomy and Ontology terms separately. When I calculate the document similarity, I want to give higher weights to those Taxonomy terms and Ontology terms. When I index the document, I have defined the Document content, Taxonomy and Ontology terms as Field