Hi sbs, Solr's WordDelimiterFilterFactory does what you want. You can see a description of its function here: <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory>.
WordDelimiterFilter, the filter class implementing the above factory's functionality, is package private in Solr 3.X, so unless you want to circumvent this access restriction (e.g. with introspection or a with façade class in the same package as the Solr filter class), you can't just depend on the v3.2 solr-core jar, where it resides. In trunk (4.0, not yet released), WordDelimiterFilter has been moved to the analysis-common module and made public. You can copy/paste WordDelimiterFilter.java into your project and use it without any additional dependencies beyond lucene-core. Here's the source for the Lucene/Solr 3.2 version: <http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/solr/src/java/org/apache/solr/analysis/WordDelimiterFilter.java>. Good luck, Steve > -----Original Message----- > From: SBS [mailto:jturn...@uow.edu.au] > Sent: Monday, September 19, 2011 4:27 PM > To: java-user@lucene.apache.org > Subject: Enabling indexing of hyphenated terms sans the hyphen > > We use StandardTokenizer and this works well but we also need to include > terms in our index which consist of hyphenated terms with the hyphen > removed. So, for example, if the text being indexed contains "self- > induced" > we need the terms "self", "induced" and "selfinduced" to be indexed. > > How would I go about implementing this? We use Lucene Java 3.2. > > Thanks, > > -sbs > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Enabling-indexing-of-hyphenated-terms- > sans-the-hyphen-tp3350008p3350008.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org