RE: Enabling indexing of hyphenated terms sans the hyphen

Steven A Rowe Mon, 19 Sep 2011 14:06:18 -0700

Hi sbs,

Solr's WordDelimiterFilterFactory does what you want.  You can see a 
description of its function here: 
<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory>.


WordDelimiterFilter, the filter class implementing the above factory's 
functionality, is package private in Solr 3.X, so unless you want to circumvent 
this access restriction (e.g. with introspection or a with façade class in the 
same package as the Solr filter class), you can't just depend on the v3.2 
solr-core jar, where it resides. In trunk (4.0, not yet released), 
WordDelimiterFilter has been moved to the analysis-common module and made 
public.

You can copy/paste WordDelimiterFilter.java into your project and use it 
without any additional dependencies beyond lucene-core.  Here's the source for 
the Lucene/Solr 3.2 version: 
<http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/solr/src/java/org/apache/solr/analysis/WordDelimiterFilter.java>.

Good luck,
Steve

> -----Original Message-----
> From: SBS [mailto:jturn...@uow.edu.au]
> Sent: Monday, September 19, 2011 4:27 PM
> To: java-user@lucene.apache.org
> Subject: Enabling indexing of hyphenated terms sans the hyphen
> 
> We use StandardTokenizer and this works well but we also need to include
> terms in our index which consist of hyphenated terms with the hyphen
> removed.  So, for example, if the text being indexed contains "self-
> induced"
> we need the terms "self", "induced" and "selfinduced" to be indexed.
> 
> How would I go about implementing this?  We use Lucene Java 3.2.
> 
> Thanks,
> 
> -sbs
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Enabling-indexing-of-hyphenated-terms-
> sans-the-hyphen-tp3350008p3350008.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Enabling indexing of hyphenated terms sans the hyphen

Reply via email to