So not much help here, (I wonder if its because I posted 3 questions in
one day) but Ive made some progress in my understaning.
I understand there is only one norm per field and I think Lucene does no
differentiating between adding the same field a number of times and
adding mutiple text to the same field. But Ive discovered
getPositionIncrementGap() to seperate my multiple adds of the same field
within a doc and I was wondering if they was a way I could use the
position gap to get DefaultSimailrity.lengthNorm() to be called with
only the number of tokens within one field passed to it rather than the
complete terms within the field as a whole.
Paul
Paul Taylor wrote:
Thanks Felipe, but you are missing the point Artist really doesnt
come into it, my problem is confined to the alias field, forget about
artist its just detailed to give the complete scenario
Paul
Felipe wrote:
You could change the boost of the field artist to be bigger than the
field alias.
field.setBoost(artistBoost);
2010/1/12 Paul Taylor <paul_t...@fastmail.fm
<mailto:paul_t...@fastmail.fm>>
Been doing some analysis with Luke (BTW doesnt work with
StandardAnalyzer since Version field introduced) and discovered a
problem with field lenghth boosting for me.
I have a document that represents a recording artist (i.e Madonna,
The Beatles ectera) it contains an artist and an alias field, the
alias field contains other names that the artist is maybe known
as, and so there can be multiple aliases for an artist.
PseudoCode:
(
doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
for (String alias : aliases.get(artistId)) {
doc.addField(ArtistIndexField.ALIAS, alias);
}
)
Im finding that when I search by for the artist by the alias field
if the value matches an alias in two different documents the
document with the least number of aliases get the best score
because the boost of the alias is split between the aliases on the
other doc, if I ANALYSED_NO_NORMS then both documents return the
same score.
The trouble is I don't want to disable norms because I want a
match on a single field containing less terms to score better than
one with more scores.
Full example:
http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1
<http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1>
return two results , the second result only has score of 8 because
it more aliases than the first result, even the alias it matched
on was an exact single term match.
http://musicbrainz.org/show/artist/aliases.html?artistid=174327
but if I remove norms then the following query (which is currently
working)
http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1
<http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1>
would stop working, in that searching for 'The beatles' would no
longer score rate artist 'The Beatles' better than 'The Beatles
revival Band'
So isn't there any way to recognise that repeated calls to
addField() is not creating a single field with many terms,but many
fields with few terms.
thanks Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
<mailto:java-user-unsubscr...@lucene.apache.org>
For additional commands, e-mail: java-user-h...@lucene.apache.org
<mailto:java-user-h...@lucene.apache.org>
--
Felipe Lobo
www.jusbrasil.com.br <http://www.jusbrasil.com.br>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org