It's probably about 100,000 entries per "thing that it would care about
at once".

-----Original Message-----
From: Karl Wettin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 17, 2008 3:17 PM
To: java-user@lucene.apache.org
Subject: Re: Word split problems

Max Metral skrev:
 >
> Lululemon Athletica
> 
> I'd like any of these search terms to work for this:
> 
> Lulu lemon
> Lu Lu Lemon
> Lululemon
> 
> What strategy would be optimal for this kind of thing (of course
keeping

How large is your corpus? I suggest you look at NGramTokenizer.


    karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to