Re: Help with Custom Analyzer

2006-10-16 Thread Doron Cohen
Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 16/10/2006 14:32:13: > Hi Ryan, > > StandardAnalyzer should already be smart about keeping email > addresses as a single token: > > // email addresses > | (("."|"-"|"_") )* "@" > (("."|"-") )+ > > > (this is from StandardAnalyzer.jj) > > As for cha

Re: Help with Custom Analyzer

2006-10-16 Thread Ryan O'Hara
Sorry, I wasn't really concerned with email addresses - I was just using that as an example. How would I tell the StandardAnalyzer that I want a certain phrase to be tokenized as a token? Surround by quotes or ..? Also, how would you recommend manipulating the Reader object? You said s

Re: Help with Custom Analyzer

2006-10-16 Thread Bill Taylor
It is not THAT hard to write a custom analyzer, that is what I did. I found that there is a bug in the setup, however, in that there are two incompatible definitions of Token. The generated file Tokenizer.java refers to the wrong definition of Token so I ahve to patch it before it will compil

Re: Help with Custom Analyzer

2006-10-16 Thread Otis Gospodnetic
Hi Ryan, StandardAnalyzer should already be smart about keeping email addresses as a single token: // email addresses | (("."|"-"|"_") )* "@" (("."|"-") )+ > (this is from StandardAnalyzer.jj) As for changing the text you feed to Lucene, that's all up to you. Changing the String seems l