Scoring, cosine measure

2005-04-20 Thread Barbara Krausz
Hi, currently I'm writing my Bachelorthesis about Lucene. I searched for theoretical information for example about the IR-model Lucene uses, but I couldn't find anything so I had to figure it out on my own. I think Lucene uses the vector space model with a variation of the cosine measure (cosine

Difference between minMergeDocs and mergeFactor

2005-05-08 Thread Barbara Krausz
Hi, can anybody tell me the difference between minMergeDocs and mergeFactor (perhaps an example?)? Thanks Barbara - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Finding docs which contain at least x of the queryterms

2005-05-25 Thread Barbara Krausz
Hi, Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all documents which contain at least e.g. 3 of the queryterms. How can I implement this? The first idea is to use BooleanQueries such as (t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and t4) or (t1 and

Determining the IDF while searching for documents

2005-06-13 Thread Barbara Krausz
Hi all, is it possible to determine the IDF (the documents in which a term appears) while searching for documents? I implemented an index based on trigrams, i.e. the indexterms are now Strings of 3 characters so that my search engine finds documents with OCR-Errors. When I'm searching for the

Re: Determining the IDF while searching for documents

2005-06-14 Thread Barbara Krausz
java-user@lucene.apache.org schrieb am 14.06.05 08:49:11: I'm not 100% sure I understand your question, but... : order to compute the TF I count the occurences of terms which are : similar to the term. But I've got problems to compute the IDF, because I : must know the number of documents in wh