Re: highlighting with WildcardQuery

Doron Cohen Sat, 14 Oct 2006 00:35:37 -0700

The IndexReader is needed for finding all wildcard matches (by the index
lexicon). It seems you do not want to expand the wild card query by the
index lexicon, but rather with that of the highlighted text (which may not
be indexed at all). I think you have at least two ways to do that:


(1) create a (highlight) QueryScorer with:
   new QueryScorer(WeightedTerm weightedTerms[])
which means that you provide all the "lexicon" knowledge usually taken from
the index(reader), i.e. which words are valid for the wild card
'expression'.

(2) extend QueryScorer, implementing
   float getTokenScore(Token token)
such that tokens matching the wildcard expr get nonzero score.

- Doron

"James O'Rourke" <[EMAIL PROTECTED]> wrote on 13/10/2006 11:39:31:

> Is there anyway to do highlighting when using a WildcardQuery when
> there is no IndexReader available? I simply want to do it with a
> chunk of text, but it fails because the WildcardQuery needs to call
> rewrite - but doesn't know about the IndexReader.
>
> Code (using PyLucene-2.0.0 - can translate to java if like)
>
> def gethighlightedfragments(text, searchString,
>      fragmentLength = 50, numFragments = 3,  opening= '<span class=
> \"highlight\">', closing = '</span>'):
>      """ Returns a list of text fragments with returns included for
> 80 char max width """
>      """ Defaults to OR operator which is good for formatting """
>      analyzer = StandardAnalyzer()
>      #print text
>      strs = searchString.split()
>      bq = BooleanQuery()
>      for s in strs:
>          print s
>          q = WildcardQuery(Term('f', '*' + s +  '*'))
>          #print q.toString()
>          bq.add(q,  BooleanClause.Occur.SHOULD)
>      #print bq.toString()
>      scorer = QueryScorer(bq)
>      formatter = SimpleHTMLFormatter(opening, closing)
>      highlighter = Highlighter(formatter, scorer)
>      fragmenter = SimpleFragmenter(fragmentLength)
>      highlighter.setTextFragmenter(fragmenter)
>
>      tokenStream = analyzer.tokenStream('f', StringReader(text))
>      return  highlighter.getBestFragments(tokenStream, text,
> numFragments)
>
>
> Basically, I want to show partial word matches also.
>
> James
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: highlighting with WildcardQuery

Reply via email to