Re: Creating additional tokens from input in a token filter

2011-11-03 Thread Paul Taylor
On 02/11/2011 20:48, Paul Taylor wrote: On 02/11/2011 17:15, Uwe Schindler wrote: Hi Paul, There is WordDelimiterFilter which does exactly what you want. In 3.x its unfortunately only shipped in Solr JAR file, but in 4.0 it's in the analyzers-common module. Okay so I found it and its looks ve

Re: Creating additional tokens from input in a token filter

2011-11-02 Thread Paul Taylor
o++] = c; } } termAtt.setLength(upto); } return true; } -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Wednesday, November 02, 2011 5:12 PM To: java-user@lucene.apache.org Subject: Creating additional tokens from input in a toke

Re: Creating additional tokens from input in a token filter

2011-11-02 Thread Paul Taylor
On 02/11/2011 17:15, Uwe Schindler wrote: Hi Paul, There is WordDelimiterFilter which does exactly what you want. In 3.x its unfortunately only shipped in Solr JAR file, but in 4.0 it's in the analyzers-common module. Uwe Ah great, erm I being a bit dense but where is Lucene 4.0 , Ive looked

RE: Creating additional tokens from input in a token filter

2011-11-02 Thread Uwe Schindler
hi.de > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Wednesday, November 02, 2011 5:12 PM > To: java-user@lucene.apache.org > Subject: Creating additional tokens from input in a token filter > > I have a tokenizer filter that takes tokens and then

Creating additional tokens from input in a token filter

2011-11-02 Thread Paul Taylor
I have a tokenizer filter that takes tokens and then drops any non alphanumeric characters i.e 'this-stuff' becomes 'thisstuff' but what I actually want it to do is split the one token into multiple tokens using the non-alphanumeric characters as word boundaries i.e 'this-stuff' becomes 'thi