Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
best if we > can > support other models also! > > Finally I think there is something to be said for Lucene's default > retrieval > model, which in my (non-english) findings across the board isn't terrible > at > all... then again I am working with languages

Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
t; Note: I have no bias against BM-25, but its definitely a myth to say there > is a single retrieval formula that is the 'best' across the board. > > > On Tue, Feb 16, 2010 at 1:53 PM, JOAQUIN PEREZ IGLESIAS < > joaquin.pe...@lsi.uned.es> wrote: > >> By the w

Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
t; >> >> >> --- On Tue, 2/16/10, Robert Muir wrote: >> >>> From: Robert Muir >>> Subject: Re: BM25 Scoring Patch >>> To: java-user@lucene.apache.org >>> Date: Tuesday, February 16, 2010, 11:36 AM >>> yes Ivan, if possible p

Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
t;> Date: Tuesday, February 16, 2010, 11:36 AM >> yes Ivan, if possible please report >> back any findings you can on the >> experiments you are doing! >> >> On Tue, Feb 16, 2010 at 11:22 AM, Joaquin Perez Iglesias >> < >> joaquin.pe...@lsi.uned.es&

Re: BM25 Scoring Patch

2010-02-16 Thread Joaquin Perez Iglesias
Hi Ivan, You shouldn't set the BM25Similarity for indexing or searching. Please try removing the lines: writer.setSimilarity(new BM25Similarity()); searcher.setSimilarity(sim); Please let us/me know if you improve your results with these changes. Robert Muir escribió: Hi Ivan, I've seen

Re: Query Expansion Module for Lucene based on BM25 ranking function

2008-10-23 Thread Joaquin Perez Iglesias
-BM25/ Best Regards. José Ramón Perez Aguera wrote: Hi Grant, Our query expansion approach is quite simple, we apply pseudo-relevance feedback techniques, where a number of top retrieved documents are used to extract the terms candidates to expand the original query. We have used

Re: Query Expansion Module for Lucene based on BM25 ranking function

2008-10-22 Thread José Ramón Perez Aguera
necessaries for query expansion. On the other hand, to implement BM25, we have used the implementation propoused by Joaquin perez, where avg. Length is computed in indexing time and it is used as a constant in query time. We know that this is not the best way to do that, but we don't

Anyone have an XMLAnalyzer?

2007-01-25 Thread Arturo Perez
Is there an analyzer that can work with XML? Any suggestions for such? -arturo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

What type of query best for OR with high score?

2007-01-25 Thread Arturo Perez
Hi all, Which type of query should I use for the following type of thing. I have multiple words/phrases. I want to run a search for them all OR'd together. But I want the documents with the most distinct matches to have the highest score. An example. I want to search for "TOM OR DICK OR HARRY

Re: Position of a word in a document?

2006-05-15 Thread Arturo Perez
Daniel Naber danielnaber.de> writes: > On Montag 15 Mai 2006 14:54, Franz Coriand wrote: > > is it possible not only to get the document which contains the words of > > a query, but also get the position in the text of the query word? > > Yes, by using the term vectors with positions that were ad

Re: Exact date search doesn't work with 1.9.1?

2006-04-09 Thread Perez
yzer is eating numbers? tia, arturo > > On Apr 7, 2006, at 10:45 PM, Perez wrote: > > > Hi all, > > > > I have a document with a date in it and I put it into a field like so: > > DateTools.dateToString(theDate, Resolution.DAY), > > Field.Index.UN_TOKEN

Exact date search doesn't work with 1.9.1?

2006-04-08 Thread Perez
Hi all, I have a document with a date in it and I put it into a field like so: DateTools.dateToString(theDate, Resolution.DAY), Field.Index.UN_TOKENIZED. What I find is that a range query works: [20060131 TO 20060601] and wildcard works e.g. 2006* but exact matches do not work e.g. 20060130 Any