Hi Dino, The Lucene KeywordTokenizer is about as simple as tokenizers get - it just outputs its entire input as a single token:
<http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/analysis/KeywordTokenizer.java?revision=687357&view=markup> Check out the source code for other Tokenizer descendants in the Lucene source for more hints. Warning: a few of them are generated by scanner generator tools (JavaCC and JFlex), so the code is a bit impenetrable in places. To set the position for a Token, call its setPositionIncrement() method. From the javadocs: Set the position increment. This determines the position of this token relative to the previous Token in a TokenStream, used in phrase searching. (Read the rest of the javadoc for that method. Go on, you know you want to.) Good luck, Steve On 08/20/2008 at 12:58 PM, Dino Korah wrote: > Hi guys, > > If I am to tokenize an email address like "John Smith" < > <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> into > > [ <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED] > [John] [Smith] [J.Smith] [london.gb.world.net] [gb.world.net] > [world.net] [world] [net] > > Is it possible to have a different Position increment for each of these > tokens? If it is, could you please help me with the same sample, with > numbers against each token. > > Also could you please point me to a skeleton code for a custom Tokenizer. > > Many Thanks > Dino --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]