Re: Indexing multiple instances of the same field and counting their frequency afterward

Erick Erickson Wed, 04 Apr 2007 06:46:55 -0700

See below

On 4/4/07, Sengly Heng <[EMAIL PROTECTED]> wrote:


Dear all,

My problem is a little bit strange. Instead of parsing the content of the
document to the indexer. I am adding one by one. Here is a piece of my
code
:

Document doc = new Document();
doc.add(Field.Text("Features", "blue");
doc.add(Field.Text("Features","beautiful");
doc.add(Field.Text("Features", "black");
doc.add(Field.Text("Features","white");
doc.add(Field.Text("Features", "blue");
doc.add(Field.Text("Features","blue");

I'd like to know whether the internal representation of this is like when
we
add at once the whole content of the document as a long string?



There's no internal difference unless you implement an Analyzer that
returns a value other than 1 from getPositionIncrementGap(). Why would
you do this you ask? Occasionally, it is useful to index data in the
same field but NOT allow proximity queries to span separate
additions. Imagine indexing a book and, for some reason, you
didn't want span queries to cross, say, chapters. You could
do something like
doc.add("page", <contents of page 1>);
doc.add("page", <contents of page 2>);



then in your Analyzer, have something in getPositionIncrementGap
like
if (page is beginning of chapter)  {
   return 1000;
} else {
return 1;
}

Now, no Span query with a slop of less than 1,000 would
match from the last page in one chapter to the first page of another.

FWIW
Erick

I'd like

also to count the number of "blue" occurence from the document. How to do
this?



See the other reply <G>.

Thank you very much for your suggestion in advance.


Regards,

Sengly

Re: Indexing multiple instances of the same field and counting their frequency afterward

Reply via email to