Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 16/10/2006 14:32:13:
> Hi Ryan,
>
> StandardAnalyzer should already be smart about keeping email
> addresses as a single token:
>
> // email addresses
> | (("."|"-"|"_") )* "@"
> (("."|"-") )+ >
>
> (this is from StandardAnalyzer.jj)
>
> As for cha
Sorry, I wasn't really concerned with email addresses - I was just
using that as an example. How would I tell the StandardAnalyzer that
I want a certain phrase to be tokenized as a token? Surround by
quotes or ..? Also, how would you recommend manipulating the Reader
object? You said s
It is not THAT hard to write a custom analyzer, that is what I did. I
found that there is a bug in the setup, however, in that there are two
incompatible definitions of Token. The generated file
Tokenizer.java refers to the wrong definition of Token so I ahve to
patch it before it will compil
Hi Ryan,
StandardAnalyzer should already be smart about keeping email addresses as a
single token:
// email addresses
| (("."|"-"|"_") )* "@" (("."|"-")
)+ >
(this is from StandardAnalyzer.jj)
As for changing the text you feed to Lucene, that's all up to you. Changing
the String seems l