I think you're absolutely right Erick, Thanks for the insight - that's the direction I'll be heading.
Cheers, -D -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, July 13, 2012 8:53 AM To: java-user@lucene.apache.org Subject: Re: Pattern Analyzer Sure, you can do it that way. But first I'd look over the zillion tokenizers and filters that are available and string together the ones that best suit your need. For instance, WhitespaceTokenizer and PatternReplaceFilter might make your regex much easier since the PatternReplaceFilter gets just the whitespace-delimited tokens to operate on. You can hook arbitrary numbers of Filters into your chain, so you could add LowercaseFilter and.... But unless your case is pretty unusual, I'd claim just using the pre-built Tokenizers and Filters will probably work for you, or at least I'd check that out first. Best Erick On Thu, Jul 12, 2012 at 2:20 PM, Dave Seltzer <dselt...@tveyes.com> wrote: > Hello, > > I have a search project which uses the Lucene PatternAnalyzer for its > text/query analysis. > > At the moment it's configured like so: > analyzer = new PatternAnalyzer(Version.LUCENE_35, > Pattern.compile("\\s+"), true, null); > > My goal here was to split words based on spaces and make things case > insensitive. > > In thinking about this however I probably want to be a little bit more > sophisticated. I'd like to ignore punctuation which occurs at the end > or beginning of a word. > > Is this simply a matter of writing a regex which treats those cases > the same as a space? > > Would I use something like this: > analyzer = new PatternAnalyzer(Version.LUCENE_35, > Pattern.compile("\\s+|\\p{Punct}+\\w|\\w\\p{Punct}"), true, null); > > Thanks so much! > > Dave > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org