On Mon, 2008-06-23 at 12:52 +0000, mark harwood wrote: > >>Could you tell me what's wrong here, please? > > There are potentially a number of factors at play here. > > Your use of FuzzyLikeThis is fine - just tried the code on my single-term > "Paul" query and as I outlined before it is doing a much better job of > matching (Paul~= results Paul,Paul,Paul....Phul rather than FuzzyQuery's > Paul~= results Phul, Saul, Paulo , Paul, Paul.....) > > Try the query on just the term artist:Coldplay and see the results. What > artists Does FuzzyLikeThis return vs FuzzyQuery? > > If you aren't getting Coldplay as the first result from FuzzyLikeThis double > check the content is indexed using the same analyzer that you pass to > FuzzyLikeThisQuery (your code below uses SimpleAnalyzer). If you indexed with > WhitespaceAnalyzer for example or as "UN_TOKENIZED the index and the query > differ so "Coldplay"!=coldplay. > > I notice the song title in your original code is treated as a single term in > your query - is that how it is indexed? I can see that artist might possibly > make sense as a single term which gets fuzzy matched but song titles are > generally longer which means it may work better as a tokenized field.
You were right, tokenization was the issue. Using TOKENIZED instead of UN_TOKENIZED immediately provided relevant results, event when using it with FuzzyQuery. Using FuzzyLikeThisQuery made the relevance much better, so I'm really happy with the results. Thank you very much! > > Cheers > Mark > > > ----- Original Message ---- > From: László Monda <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Cc: [EMAIL PROTECTED] > Sent: Monday, 23 June, 2008 1:11:50 PM > Subject: Re: Getting irrelevant results using fuzzy query > > Thanks for your reply, Mark. > > > > This was my original code for constructing my query using FuzzyQuery: > > BooleanQuery query = new BooleanQuery(); > if (artist.length() > 0) { > FuzzyQuery artist_query = new FuzzyQuery(new Term("artist", > artist)); > query.add(artist_query, BooleanClause.Occur.MUST); > } > if (song.length() > 0) { > FuzzyQuery song_query = new FuzzyQuery(new Term("song", song)); > query.add(song_query, BooleanClause.Occur.MUST); > } > > > > This is my first attempt to use FuzzyLikeThisQuery (with no success): > > FuzzyLikeThisQuery query = new FuzzyLikeThisQuery(2, new > SimpleAnalyzer()); > if (artist.length() > 0) { > query.addTerms(artist, "artist", 0.5f, 0); > } > if (song.length() > 0) { > query.addTerms(song, "song", 0.5f, 0); > } > > > > This is my second attempt to use FuzzyLikeThisQuery (with no success): > > BooleanQuery query = new BooleanQuery(); > if (artist.length() > 0) { > FuzzyLikeThisQuery artist_query = new FuzzyLikeThisQuery(1, new > SimpleAnalyzer()); > artist_query.addTerms(artist, "artist", 0.5f, 0); > query.add(artist_query, BooleanClause.Occur.MUST); > } > if (song.length() > 0) { > FuzzyLikeThisQuery song_query = new FuzzyLikeThisQuery(1, new > SimpleAnalyzer()); > song_query.addTerms(song, "song", 0.5f, 0); > query.add(song_query, BooleanClause.Occur.MUST); > } > > > > I think it's my lack of undersanding of the usage of FuzzyLikeThisQuery > that makes me getting irrelevant results. > > Could you tell me what's wrong here, please? > > Thank you. > > On Mon, 2008-06-23 at 11:28 +0000, mark harwood wrote: > > >>I do have serious problems with the relevance of the results with fuzzy > > >>queries. > > > > Please take the time to read my response here: > > > > http://www.gossamer-threads.com/lists/lucene/java-user/62050#62050 > > > > I had a work colleague come up with exactly the same problem this week and > > the solution is the same. > > > > Just tested my index with a standard Lucene FuzzyQuery for "Paul~" - this > > gives "Phul", "Saul", and "Paulo" before ANY "Paul" records due to IDF > > issues. > > Using FuzzyLikeThisQuery puts all the "Paul" records ahead of the variants. > > > > > > > > ----- Original Message ---- > > From: László Monda <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Cc: [EMAIL PROTECTED] > > Sent: Monday, 23 June, 2008 12:10:05 PM > > Subject: Re: Getting irrelevant results using fuzzy query > > > > On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote: > > > On Mittwoch, 18. Juni 2008, László Monda wrote: > > > > > > > Additional info: Lucene seems to do the right thing when only few > > > > documents are present, but goes crazy when there is about 1.5 million > > > > documents in the index. > > > > > > Lucene works well with more documents (currently using it with 9 > > > million). > > > but the fuzzy query requires iteration over all terms which makes this > > > query slow. This can be avoid by setting the prefixLength parameter of > > > the > > > FuzzyQuery constructor to 1 or 2. Or maybe you should use an n-gram > > > index, > > > see the spellchecker in the contrib area. > > > > Thanks for the suggestion, but I don't have any performance problems > > yet, but I do have serious problems with the relevance of the results > > with fuzzy queries. > > -- Laci <http://monda.hu>
signature.asc
Description: This is a digitally signed message part