Re: get term neighbours

Adrian Dimulescu Thu, 07 May 2009 09:12:04 -0700

Thank you for these precisions. As I had to do something fast, I codedthe thing as illustrated by the following pseudocode:



IndexReader index;

TermPositions iterator = this.index.termPositions(t); // for each docwhere this term appears


while (iterator.next()) {
           int docNr = iterator.doc();
           int freq = iterator.freq();

int[] apparitionPositions = new int[freq]; // these are thepositions in the crt doc of the crt term

           for (int i = 0; i < freq; i++) {
               apparitionPositions[i] = iterator.nextPosition();
           }
...

TermPositionVector tpv = (TermPositionVector)this.index.getTermFreqVector(docNr, "text");

...

// for all possible terms, see if it is close to one of theelements in apparitionPositions

          for (int i = 0; i < terms.length; i++) {
               int[] pos = tpv.getTermPositions(i);

... // for each element in pos, check close distance tothe crt term

}
}

My understanding is that this is a less object-oriented way of doing thesame thing as your proposition but please correct me if I'm wrong.

I finally managed to retrieve what I wanted with this code. The problemis that it is not really parallelizable. If several threads callgetTermFreqVector at the same time, they have to wait after each other.My multithreaded scenario involved a unique IndexReader on which allthreads ask for term vectors. I wonder if it is possible to avoid thisproblem (perhaps by having a pool of IndexReaders, is this a goodpractice, wouldn't there be memory problems?). I welcome any ideas onthis subject.


Thank you,
Adrian.

Grant Ingersoll wrote:

There isn't a very clean way to do this just yet, but it is doable.Index with positions (you might find offsets useful too) and then usethe TermVectorMapper and TermVector API call on the IndexReader (notthe termPositions). Then, you will need to implement aTermVectorMapper that takes in your position and then reads in theterm vector and gets just those positions around the interestedposition. Once you are outside of your window, you can then shortcircuit out of the TermVM (I think).
HTH,
Grant

On May 3, 2009, at 2:39 PM, Adrian Dimulescu wrote:
Hello,
I am post-processing a positional index -- with a field like thefollowing:
doc.add(new Field(Constants.FIELD_TEXT, txt, Store.NO,Index.ANALYZED, TermVector.WITH_POSITIONS));
At post-processing, I want to retrieve the neighbours of a given termwithin a given range. That is, if document x contains the sequence :
"Alabama experienced significant /recovery as the economy of thestate/ transitioned from agriculture to diversified interests inheavy manufacturing"
for range = 3 and term = "economy", I want to retrieve "recovery asthe *economy* of the state".
I see there is an API call :

IndexReader.termPositions(term)
which retrieves the actual positions of the given term. Is there aquick way to retrieve its neighbours too, instead of browsing allterms for all document and see if their position is close to theposition of the central term ?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: get term neighbours

Reply via email to