How do you use the WordDelimiterFilterFactory()? I tried the following code:
TokenStream out = new LowerCaseTokenizer(reader); WordDelimiterFilterFactory wdf = new WordDelimiterFilterFactory(); out = wdf.create(out); ... But I am getting a runtime error: Exception in thread "main" java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z at org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:141) at org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFilter.java:54) ... I can't create a class of type WordDelimiterFilter directly, because it is protected. Any ideas? Thanks, Steve On Tue, Nov 29, 2011 at 12:44 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > There is WordDelimiterFilter in Solr that was also ported to Lucene Analysis > module in Lucene trunk (4.0). In 3.x yu can still add solr.jar to your > classpath and WordDelimiterFilterFactory to produce one (WordDelimiterFilter > itself is package-private). > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: stephen.warner.tho...@gmail.com >> [mailto:stephen.warner.tho...@gmail.com] On Behalf Of Stephen Thomas >> Sent: Tuesday, November 29, 2011 5:20 PM >> To: java-user@lucene.apache.org >> Subject: Custom Filter for Splitting CamelCase? >> >> List, >> >> I have written my own CustomAnalyzer, as follows: >> >> public TokenStream tokenStream(String fieldName, Reader reader) { >> >> // TODO: add calls to RemovePuncation, and SplitIdentifiers >> here >> >> // First, convert to lower case >> TokenStream out = new LowerCaseTokenizer(reader); >> >> if (this.doStopping){ >> out = new StopFilter(true, out, customStopSet); >> } >> >> if (this.doStemming){ >> out = new PorterStemFilter(out); >> } >> >> return out; >> } >> >> >> >> What I need to do is write two custom filters that do the following: >> >> - RemovePuncation() removes all characters except [a-zA-Z], preserving > case. >> E.g., >> >> "foo=bar*45;" ==> "foo bar 45" >> "fooBar" ==> "fooBar" >> "\"stho...@cs.queensu.ca\"" ==> "sthomas cs queensu ca" >> >> >> - SplitIdentifers() breaks up words based on camelCase notation: >> >> "fooBar" ==> "foo Bar" >> "ABCCompany" ==> "ABC Company" >> >> (I have the regex for this.) >> >> Note this step must be performed before LowerCaseTokenizer, because we >> need case information to do the splitting. >> >> >> How can I write custom filters, and how do I call them before >> LowerCaseTokenizer()? >> >> >> Thanks in advance, >> Steve >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org