Mark,
This is exactly what I want and It worked perfectly. Thanks!
I'll post my highlighter to JIRA in a few days (hopegully).
It uses term offsets with positions (WITH_POSITIONS_OFFSETS)
to support PhraseQuery.
Thanks again,
Koji
Mark Miller wrote:
Okay, Koji, hopefully I'll be more luckily suggesting this this time.
Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am
not sure if its in an applyable state, but I hope that covers your issue.
On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi <k...@r.email.ne.jp> wrote:
Hello,
I'm writing a highlighter by using term offsets info (yes, I borrowed
the idea
of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info
when getting multi-valued field.
For example, if I indexed [" aaaa"," bbb "] (multi-valued), I got term info
bbb(7,10). This is expected result. But if I indexed [" aaa "," bbb "]
(note that using " aaa " instead of " aaaa"), I got term info bbb(6,9)
which
is unexpected. I would like to get same offset info for bbb because they
are same length of field values.
Please use the following program to see the problem I'm seeing. I'm
using trunk:
public static void main(String[] args) throws Exception {
// create an index
Directory dir = new RAMDirectory();
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter( dir, analyzer, true,
MaxFieldLength.LIMITED );
Document doc = new Document();
doc.add( new Field( "f", " aaa ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
//doc.add( new Field( "f", " aaaa", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
doc.add( new Field( "f", " bbb ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
writer.addDocument( doc );
writer.close();
// print the offsets
IndexReader reader = IndexReader.open( dir );
TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector(
0, "f" );
for( int i = 0; i < tpv.getTerms().length; i++ ){
System.out.print( "term = \"" + tpv.getTerms()[i] + "\"" );
TermVectorOffsetInfo[] tvois = tpv.getOffsets( i );
for( TermVectorOffsetInfo tvoi : tvois ){
System.out.println( "(" + tvoi.getStartOffset() + "," +
tvoi.getEndOffset() + ")" );
}
}
reader.close();
}
regards,
Koji
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org