It's probably about 100,000 entries per "thing that it would care about at once".
-----Original Message----- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Thursday, April 17, 2008 3:17 PM To: java-user@lucene.apache.org Subject: Re: Word split problems Max Metral skrev: > > Lululemon Athletica > > I'd like any of these search terms to work for this: > > Lulu lemon > Lu Lu Lemon > Lululemon > > What strategy would be optimal for this kind of thing (of course keeping How large is your corpus? I suggest you look at NGramTokenizer. karl --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]