Re: CheckIndex complaining about -1 for norms value

2020-06-14 Thread Trejkaz
The answer here might be "terrifyingly old" actually. We've been using IndexUpgrader quite heavily. My best guess is Lucene 2.x. What I can verify at least, by manual inspection, is that the docs where the value is showing -1 are also docs where there is no value in the field, where I presume it w

Re: CheckIndex complaining about -1 for norms value

2020-06-11 Thread Adrien Grand
+1 On Thu, Jun 11, 2020 at 3:27 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Maybe we should fix CheckIndex to print norms as unsigned integers? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Jun 11, 2020 at 3:00 AM Adrien Grand wrote: > > > To my knowledge,

Re: CheckIndex complaining about -1 for norms value

2020-06-11 Thread Michael McCandless
Maybe we should fix CheckIndex to print norms as unsigned integers? Mike McCandless http://blog.mikemccandless.com On Thu, Jun 11, 2020 at 3:00 AM Adrien Grand wrote: > To my knowledge, -1 always represented the maximum supported length, both > before and after 7.0 (when we changed the norms

Re: CheckIndex complaining about -1 for norms value

2020-06-11 Thread Adrien Grand
To my knowledge, -1 always represented the maximum supported length, both before and after 7.0 (when we changed the norms encoding). One thing that changed when we introduced sparse norms is that documents with no value moved from having 0 as a norm to not having a norm at all, but I don't see how

Re: CheckIndex complaining about -1 for norms value

2020-06-10 Thread Trejkaz
Well, We're using the default Lucene similarity. But as far as I know, we've always disabled norms as well. So I'm surprised I'm even seeing norms mentioned in the context of our own index, which is why I wondered whether -1 might have been an older placeholder for "no value" which later became 0

Re: CheckIndex complaining about -1 for norms value

2020-06-10 Thread Adrien Grand
Hi Trejkaz, Negative norm values are legal. The problem here is that Lucene expects that documents that have no terms must either not have a norm value (typically because the document doesn't have a value for the field), or a norm value equal to 0 (typically because the token stream over the field