On 02/11/2011 20:48, Paul Taylor wrote:
On 02/11/2011 17:15, Uwe Schindler wrote:
Hi Paul,
There is WordDelimiterFilter which does exactly what you want. In 3.x
its
unfortunately only shipped in Solr JAR file, but in 4.0 it's in the
analyzers-common module.
Okay so I found it and its looks ve
o++] = c;
}
}
termAtt.setLength(upto);
}
return true;
}
-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Wednesday, November 02, 2011 5:12 PM
To: java-user@lucene.apache.org
Subject: Creating additional tokens from input in a toke
On 02/11/2011 17:15, Uwe Schindler wrote:
Hi Paul,
There is WordDelimiterFilter which does exactly what you want. In 3.x its
unfortunately only shipped in Solr JAR file, but in 4.0 it's in the
analyzers-common module.
Uwe
Ah great, erm I being a bit dense but where is Lucene 4.0 , Ive looked
hi.de
> -Original Message-
> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> Sent: Wednesday, November 02, 2011 5:12 PM
> To: java-user@lucene.apache.org
> Subject: Creating additional tokens from input in a token filter
>
> I have a tokenizer filter that takes tokens and then
I have a tokenizer filter that takes tokens and then drops any non
alphanumeric characters
i.e 'this-stuff' becomes 'thisstuff'
but what I actually want it to do is split the one token into multiple
tokens using the non-alphanumeric characters as word boundaries
i.e 'this-stuff' becomes 'thi