On Fri, Apr 17, 2009 at 3:19 PM, S.Selvam <s.selvams...@gmail.com> wrote:
> Hi all, > > I am trying for language detection in python.I just need to check whether > the input text is english or not. > > 1)I tried nltk's stopwords and compared with input text,but only with > little success. > > 2)Used oice.langdet for language detection,which uses bi-gram approach.It > is also inefficient. > > I need a best way to detect english text . > > I welcome your suggestions ... > -- > Yours, > S.Selvam > > -- > http://mail.python.org/mailman/listinfo/python-list > > I don't know anything about language detection, but my first attempt would be something like: Grab the first N words (space-separated) from whatever file you're trying to check Find out what percentage of them, if any, are in some dictionary file, say /usr/share/dict/american-english on Ubuntu linux. If there's a high percentage of words found, it's more than likely english. Or, perhaps checking for some commonly used words in english that only appear in english. I'm not aware of any examples off the top of my head, as I only know one language, but I'm sure there are some common english words that are mostly unique to the language.
-- http://mail.python.org/mailman/listinfo/python-list