Dear list,

I try to use the Term Highlighter in my webapp but I have a problem. I want to highlight the terms in a text without extracting the most relevant sections.
The highlighting works but the last characters are trimmed !

Here is a portion of my code :


 Analyzer analyzer = new StandardAnalyzer();
 Query query = null;
 try {
  query = QueryParser.parse(queryStr, "scientificName", analyzer);
  query = query.rewrite(IndexReader.open("E:/specimenset-index"));
 } catch (ParseException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
 } catch (IOException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
 }

 QueryScorer scorer = new QueryScorer(query);
 SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(
   "<span class=\"highlight\">", "</span>");

 Highlighter highlighter = new Highlighter(formatter, scorer);

 TokenStream tokenStream = analyzer.tokenStream("scientificName",
   new StringReader(text));

 String highlightedText = null;

 try {
  highlightedText = highlighter.getBestFragment(
   tokenStream, text);
 } catch (IOException e1) {
  // TODO Auto-generated catch block
  e1.printStackTrace();
 }
 return highlightedText ;



A value for text variable is for instance :
<a href='taxoninfo.html?id=112'><span class='genus-species'>Capparimyia savastani</span> (Martelli)</a>

The corresponding value for highlightedText variable is :
<a href='taxoninfo.html?id=112'><span class='genus-species'><span class="highlight">Capparimyia</span> savastani</span> (Martelli

The ")</a>" are trimmed for some mysterious reason !! I try to play with Encoder and Fragmenter classes but without success !

Any help would be appreciate.

Best regards,

Johan

Johan Duflost
Belgian Biodiversity Information Facility (BeBIF)
Universite Libre de Bruxelles


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to