Re: Highlighter that works with phrase and span queries

Mark Miller Wed, 27 Jun 2007 06:22:20 -0700

Depending on what these guys are doing, here is another possibility ifTermOffests and Ronnie's highlighter are not an option.

If you are highlighting whole documents (NullFragmenter) or are not veryconcerned about the fragments you get back, you can change the line inthe Highlighter at about 255:

tokenGroup.addToken(token,fragmentScorer.getTokenScore(token));


               TO:

               float score = fragmentScorer.getTokenScore(token);
               if(score > 0 ) {
                   tokenGroup.addToken(token, score);
               }

This is not a full solution yet, but more of a hack. Fragmenters won'tbe given the opportunity to start a new Fragment at every tokenposition...no problem if you are highlighting the whole document.

Essentially, instead of the the document being rebuilt from from thesource text using each individual token, it is rebuilt from thehighlighted tokens and the differences in offsets between them. No sofragment happy without some Fragmenter handling changes.

On a collection of 5,000 documents, 300-900 tokens (weighted toward300), this gave an improvement of 37-40%. I imagine the gains grow asthe document grows.

I am looking into making this a more general solution, but it's a greatquick hack for speed. It will also work with my SpanScorer thatcorrectly highlights Spans and PhraseQuerys.


- Mark

Otis Gospodnetic wrote:

Hi Mark,

I know one large user (meaning: high query/highlight rates) of the current 
Highlighter and this user wasn't too happy with its performance.  I don't know 
the details, other than it was inefficient.  So now I'm wondering if you've 
benchmarked your Highlighter against that/current Highlighter to see not only 
which one is more accurate, but also which one is faster, and by how much?

Thanks,
Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Mark Miller <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, June 20, 2007 12:39:27 AM
Subject: Highlighter that works with phrase and span queries
I have been working on extending the Highlighter with a new Scorer thatcorrectly scores phrase and span queries. The highlighter is workinggreat for me, but could really use some more banging on.
If you have a need or an interest in a more accurate Highlighter, pleasegive it a whirl and let me know how it went. Unlike most of the otheralternate Lucene Highlighters, this one builds off the original contribHighlighter so as to retain all of its goodness.
http://myhardshadow.com/qsolreleases/lucene-highlighter-2.2.jar

Example Usage

    IndexSearcher searcher = new IndexSearcher(ramDir);
    Query query = QueryParser.parse("Kenne*", FIELD_NAME, analyzer);
    query = query.rewrite(reader); //required to expand search terms
    Hits hits = searcher.search(query);

    for (int i = 0; i < hits.length(); i++)
    {
        String text = hits.doc(i).get(FIELD_NAME);
CachingTokenFilter tokenStream = newCachingTokenFilter(analyzer.tokenStream(
                        FIELD_NAME, new StringReader(text)));
Highlighter highlighter = new Highlighter(new SpanScorer(query,FIELD_NAME, tokenStream));
        tokenStream.reset();
// Get 3 best fragments and seperate with a "..."String result = highlighter.getBestFragments(tokenStream, text,3, "...");
        System.out.println(result);
    }
If you make a call to any of the getBestFragments() methods more thanonce, you must call reset() on the SpanScorer between each call.
Pass null as the FIELD_NAME to ignore fields.

If you want to Highlight the whole document, use a NullFragmenter.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Highlighter that works with phrase and span queries

Reply via email to