Hi, Be sure to use the same Solr version as your Lucene version (if >= 3.1) and this is example code from test case:
WordDelimiterFilterFactory fact = new WordDelimiterFilterFactory(); // we dont need this if we dont load external exclusion files: // ResourceLoader loader = new SolrResourceLoader(null, null); Map<String,String> args = new HashMap<String,String>(); args.put("generateWordParts", "1"); args.put("generateNumberParts", "1"); args.put("catenateWords", "1"); args.put("catenateNumbers", "1"); args.put("catenateAll", "0"); args.put("splitOnCaseChange", "1"); fact.init(args); // fact.inform(loader); TokenStream ts = fact.create(new LowerCaseTokenizer(reader)); For all args params look here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimit erFilterFactory Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: stephen.warner.tho...@gmail.com > [mailto:stephen.warner.tho...@gmail.com] On Behalf Of Stephen Thomas > Sent: Tuesday, November 29, 2011 7:39 PM > To: java-user@lucene.apache.org > Subject: Re: Custom Filter for Splitting CamelCase? > > How do you use the WordDelimiterFilterFactory()? I tried the following code: > > > TokenStream out = new LowerCaseTokenizer(reader); > WordDelimiterFilterFactory wdf = new WordDelimiterFilterFactory(); out = > wdf.create(out); ... > > But I am getting a runtime error: > > Exception in thread "main" java.lang.AbstractMethodError: > org.apache.lucene.analysis.TokenStream.incrementToken()Z > at > org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:141) > at > org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFilter. j > ava:54) > ... > > I can't create a class of type WordDelimiterFilter directly, because it is > protected. > > Any ideas? > > Thanks, > Steve > > > > > On Tue, Nov 29, 2011 at 12:44 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > Hi, > > > > There is WordDelimiterFilter in Solr that was also ported to Lucene > > Analysis module in Lucene trunk (4.0). In 3.x yu can still add > > solr.jar to your classpath and WordDelimiterFilterFactory to produce > > one (WordDelimiterFilter itself is package-private). > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -----Original Message----- > >> From: stephen.warner.tho...@gmail.com > >> [mailto:stephen.warner.tho...@gmail.com] On Behalf Of Stephen Thomas > >> Sent: Tuesday, November 29, 2011 5:20 PM > >> To: java-user@lucene.apache.org > >> Subject: Custom Filter for Splitting CamelCase? > >> > >> List, > >> > >> I have written my own CustomAnalyzer, as follows: > >> > >> public TokenStream tokenStream(String fieldName, Reader reader) { > >> > >> // TODO: add calls to RemovePuncation, and > >> SplitIdentifiers here > >> > >> // First, convert to lower case > >> TokenStream out = new LowerCaseTokenizer(reader); > >> > >> if (this.doStopping){ > >> out = new StopFilter(true, out, customStopSet); > >> } > >> > >> if (this.doStemming){ > >> out = new PorterStemFilter(out); > >> } > >> > >> return out; > >> } > >> > >> > >> > >> What I need to do is write two custom filters that do the following: > >> > >> - RemovePuncation() removes all characters except [a-zA-Z], > >> preserving > > case. > >> E.g., > >> > >> "foo=bar*45;" ==> "foo bar 45" > >> "fooBar" ==> "fooBar" > >> "\"stho...@cs.queensu.ca\"" ==> "sthomas cs queensu ca" > >> > >> > >> - SplitIdentifers() breaks up words based on camelCase notation: > >> > >> "fooBar" ==> "foo Bar" > >> "ABCCompany" ==> "ABC Company" > >> > >> (I have the regex for this.) > >> > >> Note this step must be performed before LowerCaseTokenizer, because > >> we need case information to do the splitting. > >> > >> > >> How can I write custom filters, and how do I call them before > >> LowerCaseTokenizer()? > >> > >> > >> Thanks in advance, > >> Steve > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org