Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-17 Thread Antony Bowesman
I assume you already know this but just to make sure what I meant was clear - on tokenization but still indexing just means that the entire field's text becomes a single unchanged token. I believe this is exactly what SingleTokenTokenStream can buy you - a single token, for which you can pre set a

Re: Index of Lucene

2008-08-17 Thread blazingwolf7
Thanks for the info. But do you know where this is actually perform in Lucene? I mean the method involved, that will calculate the value before storing it into the index. I track it to one method known as lengthNorm() in DefaultSimilarity.java, but the value is different from what is stored in the

Re: Index of Lucene

2008-08-17 Thread Doron Cohen
Norms information comes mainly from lengths of documents - allowing the search time scoring to take into account the effect of document lengths (actually field length within a document). In practice, norms stored within the index may include other information, such as index time boosts - for a docu

Index of Lucene

2008-08-17 Thread blazingwolf7
Hi, I am currently using Lucene for indexing. After a index a file, I will use LUKE to open it and check the index. And there is 1 part that I am curious about. In Luke, under the Document tab, I randomly select a document and display it. At the bottom will be 4 columns, Field, ITSVopLBC, Norm an

Re: search for special condition.

2008-08-17 Thread 장용석
Hi. Yes, that method is in lucene. I'm sorry about I did misunderstand your words. I hope that you will find the way for you want. bye.:) 2008/8/16, Mr Shore <[EMAIL PROTECTED]>: > > thanks,Jang > but I didn't find the method isTokenChar > maybe it's in lucene,right? > but I'm using nutch this t

Re: Payloads and tokenizers

2008-08-17 Thread Doron Cohen
> > Implementing payloads via Tokens explicitly prevents the use of payloads > for untokenized fields, as they only support field.stringValue(). There > seems no way to override this. I assume you already know this but just to make sure what I meant was clear - on tokenization but still indexing