just those positions around the interested
position. Once you are outside of your window, you can then short
circuit out of the TermVM (I think).
HTH,
Grant
On May 3, 2009, at 2:39 PM, Adrian Dimulescu wrote:
Hello,
I am post-processing a positional index -- with a field like the
Hello,
I am post-processing a positional index -- with a field like the following:
doc.add(new Field(Constants.FIELD_TEXT, txt, Store.NO, Index.ANALYZED,
TermVector.WITH_POSITIONS));
At post-processing, I want to retrieve the neighbours of a given term
within a given range. That is, if docum
Michael McCandless wrote:
Is this a one-time computation? If so, couldn't you wait a long time
for the machine to simply finish it?
The final "production" computation is one-time, still, I have to
recurrently come back and correct some errors, then retry...
With the simple approach (doing 100
Ian Lea wrote:
Adrian - have you looked any further into why your original two term
query was too slow? My experience is that simple queries are usually
extremely fast.
Let me first point out that it is not "too slow" in absolute terms, it
is only for my particular needs of attempting the num
Michael McCandless wrote:
I don't understand how this would address the "docFreq does
not reflect deletions".
Bad mail-quoting, sorry. I am not interested by document deletion, I
just index Wikipedia once, and want to get a co-occurrence-based
similarity distance between words called NGD (norm
Thank you.
I suppose the solution for this is to not create an index but to store
co-occurence frequencies at Analyzer level.
Adrian.
On Mon, Mar 16, 2009 at 11:37 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> Be careful: docFreq does not take deletions into account.
>
Hello,
I need the number of pages that contain two terms. Only the number of
hits, I don't care about retrieving the pages. Right now I am using the
following code in order to get it:
Term first, second;
TermQuery q1 = new TermQuery(first);
TermQuery q2 = new TermQuery(second);
BooleanQuer