Re: Language detection library

2007-05-07 Thread Bob Carpenter
Anyone knows of a good language detection library that can detect what language a document (text) is ? Language detection is easy. It's just a simple text classification problem. One way you can do this is using Lucene itself. Create a so-called pseudo-document for each language consisting

RE: Language detection library

2007-05-04 Thread Mordo, Aviran (EXP N-NANNATEK)
Thank you, I got the natch plugin, and it is working great -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 4:17 PM To: java-user@lucene.apache.org Subject: Re: Language detection library LingPipe - commercial unless your data/product

Re: Language detection library

2007-05-03 Thread karl wettin
4 maj 2007 kl. 02.20 skrev Chris Lu: I suppose if a document is indexed as English or French, when users searching the document, we need to parse the query as English or French also? If you do some language specific token analysis such as stemming, yes. Detecting the language on such small t

Re: Language detection library

2007-05-03 Thread Chris Lu
I suppose if a document is indexed as English or French, when users searching the document, we need to parse the query as English or French also? -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.db

Re: Language detection library

2007-05-03 Thread karl wettin
3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK): Anyone knows of a good language detection library that can detect what language a document (text) is ? I posted this some time back: https://issues.apache.org/jira/browse/LUCENE-826 A bit of proof-of-concept:ish, but it does the job

Re: Language detection library

2007-05-03 Thread Andrzej Bialecki
Jason Pump wrote: http://software.wise-guys.nl/libtextcat/ ... which is what Nutch implements in its language-identifier plugin. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___

Re: Language detection library

2007-05-03 Thread Jason Pump
http://software.wise-guys.nl/libtextcat/ Otis Gospodnetic wrote: LingPipe - commercial unless your data/product/service is free. Nutch language id plugin. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Origin

Re: Language detection library

2007-05-03 Thread Otis Gospodnetic
LingPipe - commercial unless your data/product/service is free. Nutch language id plugin. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTEC