Re: Changing field lengthnorm to store length

Nalini Kartha Thu, 19 Jun 2014 15:04:22 -0700

Sorry, I meant the encodeNormValue and decodeNormValue methods on the
TFIDFSimilarity class -


public byte encodeNormValue(float f)
public float decodeNormValue(byte b)


On Thu, Jun 19, 2014 at 12:08 PM, Robert Muir <[email protected]> wrote:

> No they do not. The method is:
>
>   public abstract long computeNorm(FieldInvertState state);
>
>
>
> On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha <[email protected]>
> wrote:
> > Thanks for the info!
> >
> > We're more interested in changing the lengthnorm function vs using
> > additional stats for scoring so option 2 seems like the right way.
> >
> > It looks like the encode and decode methods deal with bytes right now -
> > would changing those APIs to deal with longs instead be a good idea? It
> > looks like the byte returned from encode is always being cast to long and
> > the byte passed into decode is always a long to begin with. If we make
> this
> > change, would it be useful to submit a patch for it?
> >
> > Thanks,
> > Nalini
> >
> >
> > On Thu, Jun 19, 2014 at 10:28 AM, Uwe Schindler <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> You may not need to change the length-norm at all: If you want to
> support
> >> *additional* statistics, add a docvalues field to your index where you
> can
> >> store that information in addition to the Lucene-Default statistics.
> Based
> >> on a function query you can then use it for scoring. In fact, you can
> then
> >> also use a different data type for the statistics value. The norms in
> >> Lucene are already internally handled as docvalues fields, too.
> >>
> >> On the other hand, if you want to modify the lengthNorm and you use a
> >> non-float value, you have to also modify the encodeNorm/decodeNorm
> methods
> >> of the similarity. The default uses a very lossy float->1byte
> >> transformation.
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: [email protected]
> >>
> >>
> >> > -----Original Message-----
> >> > From: Nalini Kartha [mailto:[email protected]]
> >> > Sent: Thursday, June 19, 2014 7:14 PM
> >> > To: [email protected]
> >> > Subject: Changing field lengthnorm to store length
> >> >
> >> > Hi,
> >> >
> >> > We're interested in having access to the number of terms in the fields
> >> for a
> >> > document vs the pre-calculated lengthnorm at scoring time - we want
> >> > experiment with different lengthnorm functions so it seems like
> storing
> >> the
> >> > raw length and then doing the norm calculation at query time would
> work.
> >> >
> >> > Is changing the lengthnorm method on Similarity class to return the
> raw
> >> > number of terms the right way to go to for this? We realize this will
> >> result in
> >> > taking up more than a byte to store the value but we're OK with this.
> >> Will this
> >> > break anything else under the hood?
> >> >
> >> > Thanks,
> >> > Nalini
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Changing field lengthnorm to store length

Reply via email to