Don't extend that: extend Similarity. Some of those implementations actually rely and optimize for the fact that its a byte and build lookup tables and so on.
On Thu, Jun 19, 2014 at 6:03 PM, Nalini Kartha <nalinikar...@gmail.com> wrote: > Sorry, I meant the encodeNormValue and decodeNormValue methods on the > TFIDFSimilarity class - > > public byte encodeNormValue(float f) > public float decodeNormValue(byte b) > > > On Thu, Jun 19, 2014 at 12:08 PM, Robert Muir <rcm...@gmail.com> wrote: > >> No they do not. The method is: >> >> public abstract long computeNorm(FieldInvertState state); >> >> >> >> On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha <nalinikar...@gmail.com> >> wrote: >> > Thanks for the info! >> > >> > We're more interested in changing the lengthnorm function vs using >> > additional stats for scoring so option 2 seems like the right way. >> > >> > It looks like the encode and decode methods deal with bytes right now - >> > would changing those APIs to deal with longs instead be a good idea? It >> > looks like the byte returned from encode is always being cast to long and >> > the byte passed into decode is always a long to begin with. If we make >> this >> > change, would it be useful to submit a patch for it? >> > >> > Thanks, >> > Nalini >> > >> > >> > On Thu, Jun 19, 2014 at 10:28 AM, Uwe Schindler <u...@thetaphi.de> wrote: >> > >> >> Hi, >> >> >> >> You may not need to change the length-norm at all: If you want to >> support >> >> *additional* statistics, add a docvalues field to your index where you >> can >> >> store that information in addition to the Lucene-Default statistics. >> Based >> >> on a function query you can then use it for scoring. In fact, you can >> then >> >> also use a different data type for the statistics value. The norms in >> >> Lucene are already internally handled as docvalues fields, too. >> >> >> >> On the other hand, if you want to modify the lengthNorm and you use a >> >> non-float value, you have to also modify the encodeNorm/decodeNorm >> methods >> >> of the similarity. The default uses a very lossy float->1byte >> >> transformation. >> >> >> >> Uwe >> >> >> >> ----- >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >> >> >> > -----Original Message----- >> >> > From: Nalini Kartha [mailto:nalinikar...@gmail.com] >> >> > Sent: Thursday, June 19, 2014 7:14 PM >> >> > To: java-user@lucene.apache.org >> >> > Subject: Changing field lengthnorm to store length >> >> > >> >> > Hi, >> >> > >> >> > We're interested in having access to the number of terms in the fields >> >> for a >> >> > document vs the pre-calculated lengthnorm at scoring time - we want >> >> > experiment with different lengthnorm functions so it seems like >> storing >> >> the >> >> > raw length and then doing the norm calculation at query time would >> work. >> >> > >> >> > Is changing the lengthnorm method on Similarity class to return the >> raw >> >> > number of terms the right way to go to for this? We realize this will >> >> result in >> >> > taking up more than a byte to store the value but we're OK with this. >> >> Will this >> >> > break anything else under the hood? >> >> > >> >> > Thanks, >> >> > Nalini >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org