See below On 4/4/07, Sengly Heng <[EMAIL PROTECTED]> wrote:
Dear all, My problem is a little bit strange. Instead of parsing the content of the document to the indexer. I am adding one by one. Here is a piece of my code : Document doc = new Document(); doc.add(Field.Text("Features", "blue"); doc.add(Field.Text("Features","beautiful"); doc.add(Field.Text("Features", "black"); doc.add(Field.Text("Features","white"); doc.add(Field.Text("Features", "blue"); doc.add(Field.Text("Features","blue"); I'd like to know whether the internal representation of this is like when we add at once the whole content of the document as a long string?
There's no internal difference unless you implement an Analyzer that returns a value other than 1 from getPositionIncrementGap(). Why would you do this you ask? Occasionally, it is useful to index data in the same field but NOT allow proximity queries to span separate additions. Imagine indexing a book and, for some reason, you didn't want span queries to cross, say, chapters. You could do something like doc.add("page", <contents of page 1>); doc.add("page", <contents of page 2>); then in your Analyzer, have something in getPositionIncrementGap like if (page is beginning of chapter) { return 1000; } else { return 1; } Now, no Span query with a slop of less than 1,000 would match from the last page in one chapter to the first page of another. FWIW Erick I'd like
also to count the number of "blue" occurence from the document. How to do this?
See the other reply <G>. Thank you very much for your suggestion in advance.
Regards, Sengly