Hi, Im trying to override the Similarity lengthNorm() and tf() methods, but I only want to override for particular index fields, lengthNorm() is fine but tf() doesn't provide the fieldname as a parameter, so Im a bit stuck - is there anyway round this.

Here is my code, which doesnt compile because fieldname field doesnt exist in tf() method

package org.musicbrainz.search.analysis;

import org.apache.lucene.search.DefaultSimilarity;

public class MusicbrainzSimilarity extends DefaultSimilarity {


   @Override
   public float lengthNorm(String fieldName, int numTerms) {

//This will match both artist and label aliases and is applicable to both, didn't use the constant
       //ArtistIndexField.ALIAS because that would be confusing
       if (fieldName.equals("alias")) {
return 0.71f; //Same result as normal calc if field had two terms the most common scenario
       } else {
           return super.lengthNorm(fieldName, numTerms);
       }
   }

@Override
    public float tf(float freq) {

if(fieldName.equals("alias")) { /************** FIELDNAME DOESNT EXIST
        if(freq > 1.0f) {
            return 1.0f; //Same result as if matched term once
        }
      } else {
           return (float)Math.sqrt(freq);
      }
    }
}

FYI:
Each document represents a recording artist (i.e Madonna, U2)

An artist has one artistname , and may have many artist aliases, with the DefaultSimilarity implemenataion I hit two problems.

1. LengthNorm() sees all the aliases to one artist as one field, so an artist with many aliases but just matching one will return a much lower value for a match on an alias, then one which has few aliases. I wanted to remove this bias so I override to treat all alias fields as if they have two terms (Originally I just disabled norms for the alias field but the default value of 1.0f gave aliases an advantage over the artist field)

2. Tf() If seaching for an artist by artist or alias (i.e artist:bach OR alias:bach ) - and one artist has many aliases that match the search term this will return a large tf() values easily beating another artist that matches exactly on artist name but doesnt happen to have any aliases. So I want to remove this bias by just returning a tf() of 1.0f for a matching alias, so having multiple aliases isn't an advantage.

thanks  Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to