Hello,

I have a search project which uses the Lucene PatternAnalyzer for its
text/query analysis.

At the moment it's configured like so:
analyzer = new PatternAnalyzer(Version.LUCENE_35, Pattern.compile("\\s+"),
true, null);

My goal here was to split words based on spaces and make things case
insensitive.

In thinking about this however I probably want to be a little bit more
sophisticated. I'd like to ignore punctuation which occurs at the end or
beginning of a word.

Is this simply a matter of writing a regex which treats those cases the
same as a space?

Would I use something like this:
analyzer = new PatternAnalyzer(Version.LUCENE_35,
Pattern.compile("\\s+|\\p{Punct}+\\w|\\w\\p{Punct}"), true, null);

Thanks so much!

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to