from:"Szymon Sutek"

Unable to retrieve TermVectorOffsets using Lucene 6

2016-12-02 Thread Szymon Sutek

Hello, I am trying to index a txt file and then retrieve it's terms offset
positions.(if it occured more than once while indexing) I present most
important parts of the code:

1)StandardAnalyzer used.
2)FieldType used while indexing.

FieldType fieldType = new FieldType();

fieldType.setTokenized(true);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorOffsets(true);

fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);

3)doc.add(new Field("fieldname",reader,fieldType))


4)After succesfully creating index, I am using indexReader to read terms.
and iterate through all of them but I have no idea how to collect
offsetVector.
In earlier versions I would cast to needed vector from TermVector and get
offset List for a concrete term value. Now I stuck on this part of code:

Terms terms =  indexReader.getTermVector(0,"text");
TermsEnum iterator  = terms.iterator();

BytesRef byteRef = null;

while((byteRef = iterator.next()) != null) {
String term = byteRef.utf8ToString();
//Here I dont know how to get offset vector for given term
}

I would be grateful for any help!

Unable to retrieve OffsetTermVector for given term using Apache Lucene 6

2016-12-02 Thread Szymon Sutek

Hello, I am trying to index a txt file and then retrieve it's terms offset
positions. Unfortunately I can only get only one offset information per
term, not all of it(if it occured more than once while indexing) Here are
most important parts of the code:

FieldType used while indexing.

private FieldType getFieldType(){
FieldType fieldType = new FieldType();

fieldType.setTokenized(true);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorOffsets(true);

fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);

return fieldType;
}

After succesfully creating index, I am using indexReader to read terms.
and iterate through all of them but I have no idea how to collect
theirs offsets.

In earlier versions I would cast to needed vector from TermVector and
get offset List for a concrete term value. Now I stuck on this part of
code:


Terms terms =  indexReader.getTermVector(0,"text");
TermsEnum iterator  = terms.iterator();

BytesRef byteRef = null;

while((byteRef = iterator.next()) != null) {
String term = byteRef.utf8ToString();
if (p.matcher(term).matches())
searchResult.put(1, term);

System.out.println("[S]:" + term);
}

I would be grateful for any help!

Re: Unable to retrieve OffsetTermVector for given term using Apache Lucene 6

2016-12-02 Thread Szymon Sutek

I made a mistake in last part of code. It should be:

while((byteRef = iterator.next()) != null) {
String term = byteRef.utf8ToString();
//Here I would like to retrieve all offset postions for given term variable

}


2016-12-02 10:08 GMT+01:00 Szymon Sutek :

> Hello, I am trying to index a txt file and then retrieve it's terms offset
> positions. Unfortunately I can only get only one offset information per
> term, not all of it(if it occured more than once while indexing) Here are
> most important parts of the code:
>
> FieldType used while indexing.
>
> private FieldType getFieldType(){
> FieldType fieldType = new FieldType();
>
> fieldType.setTokenized(true);
> fieldType.setStoreTermVectors(true);
> fieldType.setStoreTermVectorPositions(true);
> fieldType.setStoreTermVectorOffsets(true);
> 
> fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>
> return fieldType;
> }
>
> After succesfully creating index, I am using indexReader to read terms.
> and iterate through all of them but I have no idea how to collect theirs 
> offsets.
>
> In earlier versions I would cast to needed vector from TermVector and get 
> offset List for a concrete term value. Now I stuck on this part of code:
>
>
> Terms terms =  indexReader.getTermVector(0,"text");
> TermsEnum iterator  = terms.iterator();
>
> BytesRef byteRef = null;
>
> while((byteRef = iterator.next()) != null) {
> String term = byteRef.utf8ToString();
> if (p.matcher(term).matches())
> searchResult.put(1, term);
>
> System.out.println("[S]:" + term);
> }
>
> I would be grateful for any help!
>
>
>

Unable to retrieve TermVectorOffsets using Lucene 6

Unable to retrieve OffsetTermVector for given term using Apache Lucene 6

Re: Unable to retrieve OffsetTermVector for given term using Apache Lucene 6

3 matches

Site Navigation

Mail list logo

Footer information