On Mon, 2008-06-23 at 12:52 +0000, mark harwood wrote:
> >>Could you tell me what's wrong here, please?
> 
> There are potentially a number of factors at play here.
> 
> Your use of FuzzyLikeThis is fine - just tried the code on my single-term 
> "Paul" query and as I outlined before it is doing a much better job of 
> matching (Paul~= results Paul,Paul,Paul....Phul rather than FuzzyQuery's 
> Paul~= results Phul, Saul, Paulo , Paul, Paul.....)
> 
> Try the query on just the term artist:Coldplay and see the results. What 
> artists Does FuzzyLikeThis  return vs FuzzyQuery?
> 
> If you aren't getting Coldplay as the first result from FuzzyLikeThis double 
> check the content is indexed using the same analyzer that you pass to 
> FuzzyLikeThisQuery (your code below uses SimpleAnalyzer). If you indexed with 
> WhitespaceAnalyzer for example or as "UN_TOKENIZED the index and the query 
> differ so "Coldplay"!=coldplay.
> 
> I notice the song title in your original code is treated as a single term in 
> your query - is that how it is indexed? I can see that artist might possibly 
> make sense as a single term which gets fuzzy matched but song titles are 
> generally longer which means it may work better as a tokenized field.

You were right, tokenization was the issue.  Using TOKENIZED instead of
UN_TOKENIZED immediately provided relevant results, event when using it
with FuzzyQuery.

Using FuzzyLikeThisQuery made the relevance much better, so I'm really
happy with the results.

Thank you very much!

> 
> Cheers
> Mark
> 
> 
> ----- Original Message ----
> From: László Monda <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Cc: [EMAIL PROTECTED]
> Sent: Monday, 23 June, 2008 1:11:50 PM
> Subject: Re: Getting irrelevant results using fuzzy query
> 
> Thanks for your reply, Mark.
> 
> 
> 
> This was my original code for constructing my query using FuzzyQuery:
> 
> BooleanQuery query = new BooleanQuery();
> if (artist.length() > 0) {
>     FuzzyQuery artist_query = new FuzzyQuery(new Term("artist",
> artist));
>     query.add(artist_query, BooleanClause.Occur.MUST);
> }
> if (song.length() > 0) {
>     FuzzyQuery song_query = new FuzzyQuery(new Term("song", song));
>     query.add(song_query, BooleanClause.Occur.MUST);
> }
> 
> 
> 
> This is my first attempt to use FuzzyLikeThisQuery (with no success):
> 
> FuzzyLikeThisQuery query = new FuzzyLikeThisQuery(2, new
> SimpleAnalyzer());
> if (artist.length() > 0) {
>     query.addTerms(artist, "artist", 0.5f, 0);
> }
> if (song.length() > 0) {
>     query.addTerms(song, "song", 0.5f, 0);
> }
> 
> 
> 
> This is my second attempt to use FuzzyLikeThisQuery (with no success):
> 
> BooleanQuery query = new BooleanQuery();
> if (artist.length() > 0) {
>     FuzzyLikeThisQuery artist_query = new FuzzyLikeThisQuery(1, new
> SimpleAnalyzer());
>     artist_query.addTerms(artist, "artist", 0.5f, 0);
>     query.add(artist_query, BooleanClause.Occur.MUST);
> }
> if (song.length() > 0) {
>     FuzzyLikeThisQuery song_query = new FuzzyLikeThisQuery(1, new
> SimpleAnalyzer());
>     song_query.addTerms(song, "song", 0.5f, 0);
>     query.add(song_query, BooleanClause.Occur.MUST);
> }
> 
> 
> 
> I think it's my lack of undersanding of the usage of FuzzyLikeThisQuery
> that makes me getting irrelevant results.
> 
> Could you tell me what's wrong here, please?
> 
> Thank you.
> 
> On Mon, 2008-06-23 at 11:28 +0000, mark harwood wrote:
> > >>I do have serious problems with the relevance of the results with fuzzy 
> > >>queries.
> > 
> > Please take the time to read my response here:
> > 
> >      http://www.gossamer-threads.com/lists/lucene/java-user/62050#62050
> > 
> > I had a work colleague come up with exactly the same problem this week and 
> > the solution is the same.
> > 
> > Just tested my index with a standard Lucene FuzzyQuery for "Paul~" - this 
> > gives "Phul", "Saul", and "Paulo" before ANY "Paul" records due to IDF 
> > issues.
> > Using FuzzyLikeThisQuery puts all the "Paul" records ahead of the variants.
> > 
> > 
> > 
> > ----- Original Message ----
> > From: László Monda <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Cc: [EMAIL PROTECTED]
> > Sent: Monday, 23 June, 2008 12:10:05 PM
> > Subject: Re: Getting irrelevant results using fuzzy query
> > 
> > On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote:
> > > On Mittwoch, 18. Juni 2008, László Monda wrote:
> > > 
> > > > Additional info: Lucene seems to do the right thing when only few
> > > > documents are present, but goes crazy when there is about 1.5 million
> > > > documents in the index.
> > > 
> > > Lucene works well with more documents (currently using it with 9 
> > > million). 
> > > but the fuzzy query requires iteration over all terms which makes this 
> > > query slow. This can be avoid by setting the prefixLength parameter of 
> > > the 
> > > FuzzyQuery constructor to 1 or 2. Or maybe you should use an n-gram 
> > > index, 
> > > see the spellchecker in the contrib area.
> > 
> > Thanks for the suggestion, but I don't have any performance problems
> > yet, but I do have serious problems with the relevance of the results
> > with fuzzy queries.
> > 
-- 
Laci  <http://monda.hu>

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to