Anyone knows of a good language detection library that can detect what
language a document (text) is ?
Language detection is easy. It's just a simple
text classification problem.
One way you can do this is using Lucene
itself. Create a so-called pseudo-document
for each language consi
Thank you, I got the natch plugin, and it is working great
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 03, 2007 4:17 PM
To: java-user@lucene.apache.org
Subject: Re: Language detection library
LingPipe - commercial unless your data/product
of a good language detection library that can
detect what
> language a document (text) is ?
I posted this some time back:
https://issues.apache.org/jira/browse/LUCENE-826
A bit of proof-of-concept:ish, but it does the job well if you ask
me. Uses Weka (GPL) and requires at least 150 char
://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
On 5/3/07, karl wettin <[EMAIL PROTECTED]> wrote:
3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK):
> Anyone knows of a good language detectio
3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK):
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
I posted this some time back:
https://issues.apache.org/jira/browse/LUCENE-826
A bit of proof-of-concept:ish, but it does the
Jason Pump wrote:
http://software.wise-guys.nl/libtextcat/
... which is what Nutch implements in its language-identifier plugin.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___
- Original Message
From: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, May 3, 2007 4:06:04 PM
Subject: Language detection library
Anyone knows of a good language detection library that can detect what
language a do
t; <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, May 3, 2007 4:06:04 PM
Subject: Language detection library
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]