Re: get term neighbours

2009-05-07 Thread Adrian Dimulescu
just those positions around the interested position. Once you are outside of your window, you can then short circuit out of the TermVM (I think). HTH, Grant On May 3, 2009, at 2:39 PM, Adrian Dimulescu wrote: Hello, I am post-processing a positional index -- with a field like the

get term neighbours

2009-05-03 Thread Adrian Dimulescu
Hello, I am post-processing a positional index -- with a field like the following: doc.add(new Field(Constants.FIELD_TEXT, txt, Store.NO, Index.ANALYZED, TermVector.WITH_POSITIONS)); At post-processing, I want to retrieve the neighbours of a given term within a given range. That is, if docum

Re: number of hits of pages containing two terms

2009-03-17 Thread Adrian Dimulescu
Michael McCandless wrote: Is this a one-time computation? If so, couldn't you wait a long time for the machine to simply finish it? The final "production" computation is one-time, still, I have to recurrently come back and correct some errors, then retry... With the simple approach (doing 100

Re: number of hits of pages containing two terms

2009-03-17 Thread Adrian Dimulescu
Ian Lea wrote: Adrian - have you looked any further into why your original two term query was too slow? My experience is that simple queries are usually extremely fast. Let me first point out that it is not "too slow" in absolute terms, it is only for my particular needs of attempting the num

Re: number of hits of pages containing two terms

2009-03-17 Thread Adrian Dimulescu
Michael McCandless wrote: I don't understand how this would address the "docFreq does not reflect deletions". Bad mail-quoting, sorry. I am not interested by document deletion, I just index Wikipedia once, and want to get a co-occurrence-based similarity distance between words called NGD (norm

Re: number of hits of pages containing two terms

2009-03-17 Thread Adrian Dimulescu
Thank you. I suppose the solution for this is to not create an index but to store co-occurence frequencies at Analyzer level. Adrian. On Mon, Mar 16, 2009 at 11:37 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Be careful: docFreq does not take deletions into account. >

number of hits of pages containing two terms

2009-03-16 Thread Adrian Dimulescu
Hello, I need the number of pages that contain two terms. Only the number of hits, I don't care about retrieving the pages. Right now I am using the following code in order to get it: Term first, second; TermQuery q1 = new TermQuery(first); TermQuery q2 = new TermQuery(second); BooleanQuer