Re: Getting irrelevant results using fuzzy query

2008-06-28 Thread László Monda
with FuzzyQuery. Using FuzzyLikeThisQuery made the relevance much better, so I'm really happy with the results. Thank you very much! > > Cheers > Mark > > > - Original Message > From: László Monda <[EMAIL PROTECTED]> > To: java-user@lucene.apache.or

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread Daniel Naber
On Montag, 23. Juni 2008, László Monda wrote: > According to the current Lucene documentation at > http://lucene.apache.org/java/2_3_2/api/index.html it seems to me that > the Query class doesn't have any explain() methods. It's in the IndexSearcher and it takes a query and a document number as i

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread mark harwood
ark - Original Message From: László Monda <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Monday, 23 June, 2008 1:11:50 PM Subject: Re: Getting irrelevant results using fuzzy query Thanks for your reply, Mark. This was my original code for constru

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
ot;Phul", "Saul", and "Paulo" before ANY "Paul" records due to IDF issues. > Using FuzzyLikeThisQuery puts all the "Paul" records ahead of the variants. > > > > - Original Message > From: László Monda <[EMAIL PROTECTED]>

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread mark harwood
the variants. - Original Message From: László Monda <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Monday, 23 June, 2008 12:10:05 PM Subject: Re: Getting irrelevant results using fuzzy query On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber w

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
Hi Daniel, On Wed, 2008-06-18 at 20:37 +0200, Daniel Naber wrote: > On Mittwoch, 18. Juni 2008, László Monda wrote: > > > Since fuzzy searching is based on the Levenshtein distance, the distance > > between "coldplay" and "coldplay" is 0 and the distance between > > "coldplay" and "downplay" is 3

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
Hi Mark, On Wed, 2008-06-18 at 21:09 +0100, markharw00d wrote: > This looks like it is related to an issue I first raised here: > http://markmail.org/message/37ywsemfudpos6uh > > At the time I identified 2 issues with FuzzyQuery - that the usual > "coord" and "idf" scoring factors shouldn't

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote: > On Mittwoch, 18. Juni 2008, László Monda wrote: > > > Additional info: Lucene seems to do the right thing when only few > > documents are present, but goes crazy when there is about 1.5 million > > documents in the index. > > Lucene works w

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread markharw00d
This looks like it is related to an issue I first raised here: http://markmail.org/message/37ywsemfudpos6uh At the time I identified 2 issues with FuzzyQuery - that the usual "coord" and "idf" scoring factors shouldn't be applied to fuzzy queries. The coord factor got fixed but idf remains a

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: > Additional info: Lucene seems to do the right thing when only few > documents are present, but goes crazy when there is about 1.5 million > documents in the index. Lucene works well with more documents (currently using it with 9 million). but the

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: > Since fuzzy searching is based on the Levenshtein distance, the distance > between "coldplay" and "coldplay" is 0 and the distance between > "coldplay" and "downplay" is 3 so how on earth is possible that when > searching for "coldplay", Lucene ret