offsets of a term in a document

Ziqi Zhang Mon, 21 Sep 2015 08:10:00 -0700

Hi

Given a document in a lucene index, I would like to get a list of termsin that document and their offsets. I suppose starting withIndexReader.getTermVector can get me going with this. I have some codeas below (Lucene 5.3) of which I have some questions:


----------------------------------------
IndexReader reader = ....
Terms termVector = reader.getTermVector(docId, "content");
//now iterate through the terms
TermsEnum ti = termVector.iterator();
BytesRef luceneTerm = ti.next();
while(luceneTerm!=null){
        String tString =luceneTerm.utf8ToString();

//each term can have >1 occurrence, so I need to get eachoccurrence:

        PostingsEnum postingsEnum=ti.postings(???, PostingsEnum.OFFSETS);
        int totalOccurrence=postingsEnum.freq();

for(int i=0; i<totalOccurrence; i++) { //api says calling"nextPosition" more than "freq()" times is undefined, so...postingsEnum.nextPosition(); //move cursor to nextposition/occurrence

                int start=postingsEnum.startOffset(); //get the startoffset
                int end=postingsEnum.endOffset();    //get the endoffset
        }

        luceneTerm=ti.next();
}
------------------------------------------

The first question is if the code makes sense.

The second question if where I should put in place of "???". The APIsays "pass a prior PostingsEnum for possible reuse", but I don't get howto create an instance of it.


Many thanks!


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

offsets of a term in a document

Reply via email to