Norms is how Lucene records what the apriori boost is for each docXfield. This boost is the product of per-field boost, per-doc boost (both of which your app would set when it creates the doc), as well as the "length normalization" Lucene's default similarity applies (shorter docs have higher boost).
It's a quantized float, encoded as 8 bits. If your fields tend not to vary in length much, and you don't use boosting yourself, and you are worried about RAM, you should omit the norms. (Field has a method to omit norms). Not sure when nested docs will be available but I hope soon... even so, once it's committed, it won't be released until the next release (though you can use it on the trunk/3.x tip as well, once it's committed). Mike http://blog.mikemccandless.com On Sun, May 22, 2011 at 7:49 AM, zhoucheng2008 <zhoucheng2...@gmail.com> wrote: > Mike, thanks for reply. > > Can you please elaborate a little bit more on " If you don't need norms > (don't boost, lengths don't vary much or you > don't care to have field length impact scoring) you can omit norms"? > > When do you expect the handling of nested document will be applicable? > > Cheng > > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Sunday, May 22, 2011 6:58 PM > To: java-user@lucene.apache.org > Subject: Re: How to create document objects in our case > > 30 fields is fine, but if they are all indexed you should watch out > for memory usage. Ie, norms require 1 byte per doc per indexed field. > If you don't need norms (don't boost, lengths don't vary much or you > don't care to have field length impact scoring) you can omit norms. > > The relationship b/w Group and Subgroup is not something Lucene can do > today -- all docs are "independent" in a Lucene index. > > So.. you can either "denormalize", meaning duplicate all Group fields > onto each subgroup, or you can check out > https://issues.apache.org/jira/browse/LUCENE-2454 which I think adds > exactly what you need. It's not yet committed, but we are finally > starting to make progress towards that, eg > https://issues.apache.org/jira/browse/LUCENE-3112 > > Mike > > http://blog.mikemccandless.com > > On Fri, May 20, 2011 at 8:27 PM, Cheng Zhou <zhoucheng2...@gmail.com> wrote: >> Hi, >> >> I have a large number of XML files to be indexed by Lucene. All the files >> share similar structure as below: >> >> <Group id="abc" member="cde" blah blah ....> >> <Subgroup id="abc1" member ="fgh" blah blah ...> >> <Subgroup id="abc2" member ="fgh" blah blah ...> >> <Subgroup id="abc3" member ="fgh" blah blah ...> >> ...... >> </Group> >> >> Things to be noted are: >> >> The root element of Group has 30 or so attributes, and it usually has over >> 2000 Subgroup elements, which in turn also have more than 20 attributes. >> >> I want to create one Document object which holds the contents of the Group >> element, and one Document object which holds all the Subgroup elements. >> >> Here are my challenges however: >> >> 1. How many fields are advised for a Document to be indexed by Lucene? > Will >> over 30 fields (for the Group element) be too many? >> >> 2. How to create a Document object and fields for holding all the Subgroup >> elements? Is this a good way to think of? >> >> 3. How can I link the Document object of the Group element to the Document >> object of all the Subgroup elements? >> >> Please note that I intend to use such two Document objects to achieve the >> group while I don't know whether it is a good solution or not. I am open > to >> using more than two Documents to do the job, but I don't know how to > connect >> all the objects in Lucene. >> >> Many thanks! >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org