Tucked away in the contrib section of Lucene (I'm using 2.0) there is....
org.apache.lucene.index.memory.PatternAnalyzer which takes a regular expression as and tokenizes with it. Would that help? Word of warning... the regex determines what is NOT a token, not what IS a token (as I remember), which threw me for a bit. Don't know if this is really useful, but it might work for you without as much work... Best [EMAIL PROTECTED]'mNowBeyondMyCompetence.WhyDoTheyStillEmployMeHere? On 8/29/06, Bill Taylor <[EMAIL PROTECTED]> wrote:
On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote: > > : Have a look at PerFieldAnalyzerWrapper: > > : > http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ > PerFieldAnalyzerWrapper.html > > ...which can be specified in the constructors for IndexWriter and > QueryParser. As I understand it, this allows me to specify a different analyzer for each field name. My problem is that the standard analyzer will not work for my content field and I need to define a new one. I need to make a modification to the StandardTokenizer so that a number does not need to have a digit in every other segment of a part number. For example, the StandardTokenizer breaks aa-bb-2 on the - between aa and bb because it demands that every other string between a - have a digit. I need to modify the .jj file for the Standard Tokenizer and get a new one, but I am confused by the javaCC documentation and do not know how to run it to get what I need. Thanks for the help. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]