Hi Harini, I updated QueryTermsExtractor in Subversion last night to support your requirement.
The JUnit test is also updated with a field-specific example. Cheers, Mark --- Harini Raghavan <[EMAIL PROTECTED]> wrote: > Hi Chris, > > Can we pass a different query object for searching > and a different one > to the highlighter? I am not sure of that. > In any case, based on Mark's suggestion I modified > the > QueryTermsExtractor class and filtered the query > terms by the fieldName. > Attached is the modified file. > > Thanks, > Harini > > > > Chris Hostetter wrote: > > >I don't know what your application is, and I have > no experience with the > >Highlighter code, so forgive me if this is a silly > suggestion: > > > >It looks like you are building a query up > programaticaly, which > >contains some words to search on, and some other > stuff that's mainly > >being used to "filter" the results (i'll avoid my > usual rant about > >people underutilizing Filters). So why not pass > the Higherlighter just > >the portion of the Query that you acctaully want to > contribute to the > >highlighting? In this query... > > > >: >> +DocumentType:news > >: >> +(CompanyId:10 CompanyId:20 CompanyId:30 > CompanyId:40) > >: >> +FilingDate:[20041201 TO 20051201] > >: >> +(Content:"cost saving" Content:"cost savings" > >: >>Content:outsource > >: >>Content:outsources Content:downsize > >: >>Content:downsizes > >: >>Content:restructuring Content:restructure) > > > >...just give the highlighter... > > > > (Content:"cost saving" Content:"cost savings" > > Content:outsource > > Content:outsources Content:downsize > > Content:downsizes > > Content:restructuring Content:restructure) > > > > > >: Date: Thu, 01 Dec 2005 10:38:41 +0530 > >: From: Harini Raghavan > <[EMAIL PROTECTED]> > >: Reply-To: java-user@lucene.apache.org > >: To: java-user@lucene.apache.org > >: Subject: Re: how to control terms to be > highlighted? > >: > >: Hi Mark, > >: > >: It would be great if you can make this change and > send the > >: QueryTermsExtractor class. I am invoking the > QueryScorer(Query) > >: contructor. Should I use QueryScorer(Query query, > IndexReader reader, > >: String fieldName) instead for this to work? > >: > >: Thanks, > >: Harini > >: > >: mark harwood wrote: > >: > >: >>>>Is there anyway to restrict the highlighter > to > >: >>>> > >: >>>> > >: >>highlight only the values > >: >>mentioned for the field 'Content'? > >: >> > >: >> > >: > > >: >The problem lies in the QueryTermsExtractor > class > >: >which is typically used to provide the > Highlighter > >: >with the list of strings to identify in the > text. It > >: >currently has no filter for fieldname - you > could add > >: >this without too much effort. > >: > > >: >I could make this modification but it may change > the > >: >behaviour of existing applications - currently > the > >: >QueryTermsExtractor method that takes a > fieldname only > >: >uses that fieldname to derive IDF weightings, > the > >: >proposed change would also have the effect of > >: >filtering out any query terms that weren't for > this > >: >field. > >: >Would this change be a problem for anyone? > >: > > >: >Cheers, > >: >Mark > >: > > >: >--- Harini Raghavan > <[EMAIL PROTECTED]> > >: >wrote: > >: > > >: > > >: > > >: >>Hi, > >: >> > >: >>I have a requirement to highlight search > keywords in > >: >>the results and > >: >>display the matching fragment of the text with > the > >: >>results. I am using > >: >>the Hits highlighting mentioned in Lucene in > Action. > >: >> > >: >>Here is the search query(BooleanQuery) I am > passing > >: >>to the IndexSearcher > >: >>and QueryScorer: > >: >> +DocumentType:news > >: >> +(CompanyId:10 CompanyId:20 CompanyId:30 > >: >>CompanyId:40) > >: >> +FilingDate:[20041201 TO 20051201] > >: >> +(Content:"cost saving" Content:"cost savings" > >: >>Content:outsource > >: >>Content:outsources Content:downsize > >: >>Content:downsizes > >: >>Content:restructuring Content:restructure) > >: >> > >: >>My requirement is to highlight only the > keywords for > >: >>'Content' field, > >: >>but the highlighter api is also highlighting > words > >: >>like 'news', '10', > >: >>'40' etc. > >: >>Is there anyway to restrict the highlighter to > >: >>highlight only the values > >: >>mentioned for the field 'Content'? > >: >> > >: >>Thanks, > >: >>Harini > >: >> > >: >> > >: >> > >: >> > >: >> > >: >> > >: >> > >: >> > >: > >--------------------------------------------------------------------- > >: > > >: > > >: >>To unsubscribe, e-mail: > >: >>[EMAIL PROTECTED] > >: >>For additional commands, e-mail: > >: >>[EMAIL PROTECTED] > >: >> > >: >> > >: >> > >: >> > >: > > >: > > >: > > >: > > >: > >___________________________________________________________ > >: >Yahoo! Model Search 2005 - Find the next catwalk > superstars - > http://uk.news.yahoo.com/hot/model-search/ > >: > > >: > >--------------------------------------------------------------------- > >: >To unsubscribe, e-mail: > [EMAIL PROTECTED] > >: >For additional commands, e-mail: > [EMAIL PROTECTED] > >: > > >: > > >: > > >: > > >: > >: > >: > --------------------------------------------------------------------- > === message truncated ===> package org.apache.lucene.search.highlight; > /** > * Copyright 2002-2004 The Apache Software > Foundation > * > * Licensed under the Apache License, Version 2.0 > (the "License"); > * you may not use this file except in compliance > with the License. > * You may obtain a copy of the License at > * > * http://www.apache.org/licenses/LICENSE-2.0 > * > * Unless required by applicable law or agreed to in > writing, software > * distributed under the License is distributed on > an "AS IS" BASIS, > * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, > either express or implied. > * See the License for the specific language > governing permissions and > * limitations under the License. > */ > > import java.io.IOException; > import java.util.Collection; > import java.util.HashSet; > import java.util.Iterator; > > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.Term; > import org.apache.lucene.search.BooleanClause; > import org.apache.lucene.search.BooleanQuery; > import org.apache.lucene.search.PhraseQuery; > import org.apache.lucene.search.Query; > import org.apache.lucene.search.TermQuery; > import org.apache.lucene.search.spans.SpanNearQuery; > > /** > * Utility class used to extract the terms used in a > query, plus any weights. > * This class will not find terms for > MultiTermQuery, RangeQuery and PrefixQuery classes > * so the caller must pass a rewritten query (see > Query.rewrite) to obtain a list of > * expanded terms. > * > */ > public final class QueryTermExtractor > { > > /** > * Extracts all terms texts of a given Query into > an array of WeightedTerms > * > * @param query Query to extract term texts > from > * @return an array of the terms used in a query, > plus their weights. > */ > public static final WeightedTerm[] getTerms(Query > query) > { > return getTerms(query,false,""); > } > > /** > * Extracts all terms texts of a given Query into > an array of WeightedTerms > * > * @param query Query to extract term texts > from > * @param reader used to compute IDF which can be > used to a) score selected fragments better > * b) use graded highlights eg chaning intensity of > font color > * @param fieldName the field on which Inverse > Document Frequency (IDF) calculations are based > * @return an array of the terms used in a query, > plus their weights. > */ > public static final WeightedTerm[] > getIdfWeightedTerms(Query query, IndexReader reader, > String fieldName) > { > WeightedTerm[] > terms=getTerms(query,false,fieldName); > int totalNumDocs=reader.numDocs(); > for (int i = 0; i < terms.length; i++) > { > try > { > int docFreq=reader.docFreq(new > Term(fieldName,terms[i].term)); > //IDF algorithm taken from > DefaultSimilarity class > float > idf=(float)(Math.log((float)totalNumDocs/(double)(docFreq+1)) > + 1.0); > terms[i].weight*=idf; > } > catch (IOException e) > { > //ignore > } > } > return terms; > } > > /** > * Extracts all terms texts of a given Query into > an array of WeightedTerms > * > * @param query Query to extract term texts > from > * @param prohibited <code>true</code> to extract > "prohibited" terms, too > * @return an array of the terms used in a query, > plus their weights. > */ > public static final WeightedTerm[] getTerms(Query > query, boolean prohibited, String fieldName) > { > HashSet terms=new HashSet(); > getTerms(query,terms,prohibited,fieldName); > return (WeightedTerm[]) terms.toArray(new > WeightedTerm[0]); > } > > private static final void getTerms(Query query, > HashSet terms,boolean prohibited, String fieldName) > { > if (query instanceof BooleanQuery) > getTermsFromBooleanQuery((BooleanQuery) query, > terms, prohibited, fieldName); > else > if (query instanceof PhraseQuery) > getTermsFromPhraseQuery((PhraseQuery) query, > terms, fieldName); > else > if (query instanceof TermQuery) > getTermsFromTermQuery((TermQuery) > query, terms, > fieldName); > else > if(query instanceof SpanNearQuery) > > getTermsFromSpanNearQuery((SpanNearQuery) query, > terms, fieldName); > } > > private static final void > getTermsFromBooleanQuery(BooleanQuery query, HashSet > terms, boolean prohibited, String fieldName) > { > BooleanClause[] queryClauses = query.getClauses(); > int i; > > for (i = 0; i < queryClauses.length; i++) > { > if (prohibited || !queryClauses[i].prohibited) > getTerms(queryClauses[i].query, terms, > prohibited, fieldName); > } > } > > private static final void > getTermsFromPhraseQuery(PhraseQuery query, HashSet > terms, String fieldName) > { > Term[] queryTerms = query.getTerms(); > int i; > String field; > > for (i = 0; i < queryTerms.length; i++) > { > if(fieldName.equals("")) > terms.add(new > WeightedTerm(query.getBoost(),queryTerms[i].text())); > else { > field = queryTerms[i].field(); > if(field.equals(fieldName)) > terms.add(new > WeightedTerm(query.getBoost(),queryTerms[i].text())); > } > } > } > > private static final void > getTermsFromTermQuery(TermQuery query, HashSet > terms, String fieldName) > { > String field = query.getTerm().field(); > if(fieldName.equals("")) > terms.add(new > WeightedTerm(query.getBoost(),query.getTerm().text())); > else if(field.equals(fieldName)) { > terms.add(new > WeightedTerm(query.getBoost(),query.getTerm().text())); > } > } > > === message truncated ===> --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] ___________________________________________________________ WIN ONE OF THREE YAHOO! VESPAS - Enter now! - http://uk.cars.yahoo.com/features/competitions/vespa.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]