Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer since Version field introduced) and discovered a problem with field lenghth boosting for me.

I have a document that represents a recording artist (i.e Madonna, The Beatles ectera) it contains an artist and an alias field, the alias field contains other names that the artist is maybe known as, and so there can be multiple aliases for an artist.

PseudoCode:
(
doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
for (String alias : aliases.get(artistId)) {
     doc.addField(ArtistIndexField.ALIAS, alias);
}
)

Im finding that when I search by for the artist by the alias field if the value matches an alias in two different documents the document with the least number of aliases get the best score because the boost of the alias is split between the aliases on the other doc, if I ANALYSED_NO_NORMS then both documents return the same score.

The trouble is I don't want to disable norms because I want a match on a single field containing less terms to score better than one with more scores.

Full example:

http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1
return two results , the second result only has score of 8 because it more aliases than the first result, even the alias it matched on was an exact single term match.
http://musicbrainz.org/show/artist/aliases.html?artistid=174327

but if I remove norms then the following query (which is currently working)

http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1

would stop working, in that searching for 'The beatles' would no longer score rate artist 'The Beatles' better than 'The Beatles revival Band'

So isn't there any way to recognise that repeated calls to addField() is not creating a single field with many terms,but many fields with few terms.

thanks Paul




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to