Re: Best way to create own version of StandardTokenizer ?

Robert Muir Fri, 04 Sep 2009 09:03:16 -0700

On Fri, Sep 4, 2009 at 11:18 AM, Paul Taylor<paul_t...@fastmail.fm> wrote:
> I submitted this https://issues.apache.org/jira/browse/LUCENE-1787 patch to
> StandardTokenizerImpl, understandably it hasn't been incoroprated into
> Lucene (yet) but I need it for the project Im working on. So would you
> recommend keeping the same class name, and just putting in the classpath
> before the lucene.jar, or creating a new Tokenizer,Impl and Jflex file in my
> own projects package space.


i would recommend creating one in your own package space.

> Also, the StandardTokenizerImpl.jflex file states it should be compiled with
> Java 1.4 not a later JDK, is this just for backwards compatability ? Because
> the indexes will be built afresh with this project  would I actually get a
> better results if I used a later JVM, the project has to deal with indexing
> text  which can be in any language and I'm hoping using the latest JVM may
> solve some mapping problems with Japanese, Hebrew and Korean that I don't
> really understand.

i do not think you will really get better results, but it depends what
your issue is (can you elaborate?)
upgrading from 1.4 -> 1.6 will bump your unicode version from 3 to 4.
you can see a list of the changes here:
http://www.unicode.org/versions/Unicode4.0.0/


-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Best way to create own version of StandardTokenizer ?

Reply via email to