Re: Changing field lengthnorm to store length

Robert Muir Thu, 19 Jun 2014 15:07:11 -0700

Don't extend that: extend Similarity.

Some of those implementations actually rely and optimize for the fact
that its a byte and build lookup tables and so on.


On Thu, Jun 19, 2014 at 6:03 PM, Nalini Kartha <[email protected]> wrote:
> Sorry, I meant the encodeNormValue and decodeNormValue methods on the
> TFIDFSimilarity class -
>
> public byte encodeNormValue(float f)
> public float decodeNormValue(byte b)
>
>
> On Thu, Jun 19, 2014 at 12:08 PM, Robert Muir <[email protected]> wrote:
>
>> No they do not. The method is:
>>
>>   public abstract long computeNorm(FieldInvertState state);
>>
>>
>>
>> On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha <[email protected]>
>> wrote:
>> > Thanks for the info!
>> >
>> > We're more interested in changing the lengthnorm function vs using
>> > additional stats for scoring so option 2 seems like the right way.
>> >
>> > It looks like the encode and decode methods deal with bytes right now -
>> > would changing those APIs to deal with longs instead be a good idea? It
>> > looks like the byte returned from encode is always being cast to long and
>> > the byte passed into decode is always a long to begin with. If we make
>> this
>> > change, would it be useful to submit a patch for it?
>> >
>> > Thanks,
>> > Nalini
>> >
>> >
>> > On Thu, Jun 19, 2014 at 10:28 AM, Uwe Schindler <[email protected]> wrote:
>> >
>> >> Hi,
>> >>
>> >> You may not need to change the length-norm at all: If you want to
>> support
>> >> *additional* statistics, add a docvalues field to your index where you
>> can
>> >> store that information in addition to the Lucene-Default statistics.
>> Based
>> >> on a function query you can then use it for scoring. In fact, you can
>> then
>> >> also use a different data type for the statistics value. The norms in
>> >> Lucene are already internally handled as docvalues fields, too.
>> >>
>> >> On the other hand, if you want to modify the lengthNorm and you use a
>> >> non-float value, you have to also modify the encodeNorm/decodeNorm
>> methods
>> >> of the similarity. The default uses a very lossy float->1byte
>> >> transformation.
>> >>
>> >> Uwe
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: [email protected]
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: Nalini Kartha [mailto:[email protected]]
>> >> > Sent: Thursday, June 19, 2014 7:14 PM
>> >> > To: [email protected]
>> >> > Subject: Changing field lengthnorm to store length
>> >> >
>> >> > Hi,
>> >> >
>> >> > We're interested in having access to the number of terms in the fields
>> >> for a
>> >> > document vs the pre-calculated lengthnorm at scoring time - we want
>> >> > experiment with different lengthnorm functions so it seems like
>> storing
>> >> the
>> >> > raw length and then doing the norm calculation at query time would
>> work.
>> >> >
>> >> > Is changing the lengthnorm method on Similarity class to return the
>> raw
>> >> > number of terms the right way to go to for this? We realize this will
>> >> result in
>> >> > taking up more than a byte to store the value but we're OK with this.
>> >> Will this
>> >> > break anything else under the hood?
>> >> >
>> >> > Thanks,
>> >> > Nalini
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Changing field lengthnorm to store length

Reply via email to