It is my understanding that the StandardAnalyzer will remove underscores - so "some_word" be indexed as 'some' and 'word'.

I want to keep the underscores, so I was thinking of changing over to an Analyzer that uses the WhiteSpaceTokenizer, LowerCaseFilter, and StopFilter.

What other tokenizing magic will I lose by changing away from the StandardAnalyzer?

Thanks,

Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to