On Tue, Dec 28, 2010 at 6:03 AM, Katie T <ka...@coderstack.co.uk> wrote:
> On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand > <anand.shash...@gmail.com> wrote: > > Can anyone suggest a language detection library in python which works on > a > > phrase of say 2-5 words. > > Generally such libraries work by bi/trigram frequency analysis, which > means you're going to have a fairly high error rate with such small > phrases. If you're only dealing with a handful of languages it may > make more sense to combine an existing library with a simple > dictionary lookup model to improve accuracy. > > Katie > Infact I'm dealing with very few languages - German, French, Italian, Portugese and Russian. I read papers mentioning bi/tri gram frequency but was unable to find any library. 'guess-language' doesn't perform at all. The cld (Compact Language Detection) module of Google chrome performs well but it is not a standalone library ( I hope someone ports it ). Regarding dictionary lookup+n-gram approach I didn't quite understand what you wanted to say.
-- http://mail.python.org/mailman/listinfo/python-list