Anyone knows of a good language detection library that can detect what
language a document (text) is ?
Language detection is easy. It's just a simple
text classification problem.
One way you can do this is using Lucene
itself. Create a so-called pseudo-document
for each language consisting
Thank you, I got the natch plugin, and it is working great
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 03, 2007 4:17 PM
To: java-user@lucene.apache.org
Subject: Re: Language detection library
LingPipe - commercial unless your data/product
4 maj 2007 kl. 02.20 skrev Chris Lu:
I suppose if a document is indexed as English or French,
when users searching the document,
we need to parse the query as English or French also?
If you do some language specific token analysis such as stemming, yes.
Detecting the language on such small t
I suppose if a document is indexed as English or French,
when users searching the document,
we need to parse the query as English or French also?
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.db
3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK):
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
I posted this some time back:
https://issues.apache.org/jira/browse/LUCENE-826
A bit of proof-of-concept:ish, but it does the job
Jason Pump wrote:
http://software.wise-guys.nl/libtextcat/
... which is what Nutch implements in its language-identifier plugin.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___
http://software.wise-guys.nl/libtextcat/
Otis Gospodnetic wrote:
LingPipe - commercial unless your data/product/service is free.
Nutch language id plugin.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Origin
LingPipe - commercial unless your data/product/service is free.
Nutch language id plugin.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTEC