Unable to retrieve TermVectorOffsets using Lucene 6
Hello, I am trying to index a txt file and then retrieve it's terms offset positions.(if it occured more than once while indexing) I present most important parts of the code: 1)StandardAnalyzer used. 2)FieldType used while indexing. FieldType fieldType = new FieldType(); fieldType.setTokenized(true); fieldType.setStoreTermVectors(true); fieldType.setStoreTermVectorPositions(true); fieldType.setStoreTermVectorOffsets(true); fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); 3)doc.add(new Field("fieldname",reader,fieldType)) 4)After succesfully creating index, I am using indexReader to read terms. and iterate through all of them but I have no idea how to collect offsetVector. In earlier versions I would cast to needed vector from TermVector and get offset List for a concrete term value. Now I stuck on this part of code: Terms terms = indexReader.getTermVector(0,"text"); TermsEnum iterator = terms.iterator(); BytesRef byteRef = null; while((byteRef = iterator.next()) != null) { String term = byteRef.utf8ToString(); //Here I dont know how to get offset vector for given term } I would be grateful for any help!
Unable to retrieve OffsetTermVector for given term using Apache Lucene 6
Hello, I am trying to index a txt file and then retrieve it's terms offset positions. Unfortunately I can only get only one offset information per term, not all of it(if it occured more than once while indexing) Here are most important parts of the code: FieldType used while indexing. private FieldType getFieldType(){ FieldType fieldType = new FieldType(); fieldType.setTokenized(true); fieldType.setStoreTermVectors(true); fieldType.setStoreTermVectorPositions(true); fieldType.setStoreTermVectorOffsets(true); fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); return fieldType; } After succesfully creating index, I am using indexReader to read terms. and iterate through all of them but I have no idea how to collect theirs offsets. In earlier versions I would cast to needed vector from TermVector and get offset List for a concrete term value. Now I stuck on this part of code: Terms terms = indexReader.getTermVector(0,"text"); TermsEnum iterator = terms.iterator(); BytesRef byteRef = null; while((byteRef = iterator.next()) != null) { String term = byteRef.utf8ToString(); if (p.matcher(term).matches()) searchResult.put(1, term); System.out.println("[S]:" + term); } I would be grateful for any help!
Re: Unable to retrieve OffsetTermVector for given term using Apache Lucene 6
I made a mistake in last part of code. It should be: while((byteRef = iterator.next()) != null) { String term = byteRef.utf8ToString(); //Here I would like to retrieve all offset postions for given term variable } 2016-12-02 10:08 GMT+01:00 Szymon Sutek : > Hello, I am trying to index a txt file and then retrieve it's terms offset > positions. Unfortunately I can only get only one offset information per > term, not all of it(if it occured more than once while indexing) Here are > most important parts of the code: > > FieldType used while indexing. > > private FieldType getFieldType(){ > FieldType fieldType = new FieldType(); > > fieldType.setTokenized(true); > fieldType.setStoreTermVectors(true); > fieldType.setStoreTermVectorPositions(true); > fieldType.setStoreTermVectorOffsets(true); > > fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > > return fieldType; > } > > After succesfully creating index, I am using indexReader to read terms. > and iterate through all of them but I have no idea how to collect theirs > offsets. > > In earlier versions I would cast to needed vector from TermVector and get > offset List for a concrete term value. Now I stuck on this part of code: > > > Terms terms = indexReader.getTermVector(0,"text"); > TermsEnum iterator = terms.iterator(); > > BytesRef byteRef = null; > > while((byteRef = iterator.next()) != null) { > String term = byteRef.utf8ToString(); > if (p.matcher(term).matches()) > searchResult.put(1, term); > > System.out.println("[S]:" + term); > } > > I would be grateful for any help! > > >