Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Paul Taylor Wed, 13 Jan 2010 05:06:15 -0800

So not much help here, (I wonder if its because I posted 3 questions inone day) but Ive made some progress in my understaning.

I understand there is only one norm per field and I think Lucene does nodifferentiating between adding the same field a number of times andadding mutiple text to the same field. But Ive discoveredgetPositionIncrementGap() to seperate my multiple adds of the same fieldwithin a doc and I was wondering if they was a way I could use theposition gap to get DefaultSimailrity.lengthNorm() to be called withonly the number of tokens within one field passed to it rather than thecomplete terms within the field as a whole.


Paul

Paul Taylor wrote:

Thanks Felipe, but you are missing the point Artist really doesntcome into it, my problem is confined to the alias field, forget aboutartist its just detailed to give the complete scenario


Paul

Felipe wrote:

You could change the boost of the field artist to be bigger than thefield alias.

    field.setBoost(artistBoost);

2010/1/12 Paul Taylor <paul_t...@fastmail.fm<mailto:paul_t...@fastmail.fm>>


    Been doing some analysis with Luke (BTW doesnt work with
    StandardAnalyzer since Version field introduced) and discovered a
    problem with field lenghth boosting for me.

    I have a document that represents a recording artist (i.e Madonna,
    The Beatles ectera) it contains an artist and an alias field, the
    alias field contains other names that the artist is maybe known
    as, and so there can be multiple aliases for an artist.

    PseudoCode:
    (
    doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
    for (String alias : aliases.get(artistId)) {
        doc.addField(ArtistIndexField.ALIAS, alias);
    }
    )

    Im finding that when I search by for the artist by the alias field
    if the value matches an alias in two different documents the
    document with the least number of aliases get the best score
    because the boost of the alias is split between the aliases on the
    other doc, if I ANALYSED_NO_NORMS then both documents return the
    same score.

    The trouble is I don't want to disable norms because I want a
    match on a single field containing less terms to score better than
    one with more scores.

    Full example:

http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1<http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1>

    return two results , the second result only has score of 8 because
    it more aliases than the first result, even the alias it matched
    on was an exact single term match.
    http://musicbrainz.org/show/artist/aliases.html?artistid=174327

    but if I remove norms then the following query (which is currently
    working)

http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1<http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1>


    would stop working, in that  searching for 'The beatles' would no
    longer score rate artist 'The Beatles' better than 'The Beatles
    revival Band'

    So isn't there any way to recognise that repeated calls to
    addField() is not creating a single field with many terms,but many
    fields with few terms.

    thanks Paul

---------------------------------------------------------------------

    To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
    <mailto:java-user-unsubscr...@lucene.apache.org>
    For additional commands, e-mail: java-user-h...@lucene.apache.org
    <mailto:java-user-h...@lucene.apache.org>




--
Felipe Lobo
www.jusbrasil.com.br <http://www.jusbrasil.com.br>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Reply via email to