RE: Where to find non-English dictionaries, thesaurus, synonyms

2011-01-06 Thread Hong-Thai Nguyen
Hi, I'm not sure these non-English spellcheckers, analyzers and related resources are good idea in real usage. English grammar is quite simple and can be captured in Porter's rules, but others so different. For example, Porter's rules can not work well in French grammar, neither in Asian langua

Where to find non-English dictionaries, thesaurus, synonyms

2011-01-06 Thread Pulkit Singhal
Hello, What's a good source to get dictionaries (for spellcorrections) and/or thesaurus (for synonyms) that can be used with Lucene for non-English languages such as Fresh, Chinese, Korean etc? For example, the wordnet contrib module is based on the data set provided by the Princeton based wordne

Spell Checker for Non English languages

2011-01-06 Thread Pulkit Singhal
Hello, I was wondering if anyone on this mailing list have ever compiled a list of algorithms for various non English languages that work well with the lucene-spellchecker contrib module? For example, with English using an spellchecker index built using ngrams and then searched using LevensteinDi

Re: Use of PrefixQuery to create multi-word queries

2011-01-06 Thread L Duperval
Cameron, Cameron Leach gmail.com> writes: > I think what you want is for something like this: > > "the brown dog" -> > the brown dog > brown dog > dog > > If you write your custom analyzer accordingly, to trim terms from the > beginning and then use the NGramTokenFilter, you should get your rea