Oops... sorry, I just realized this was on the Lucene-user list. My response
was for Solr-ONLY!
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Thursday, June 27, 2013 1:11 PM
To: java-user@lucene.apache.org
Subject: Re: Language detection
You can use the
an explicit check before
sendting the document to Solr. Tika also has language detection, so you
could call Tika from an external process before sending the document to
Solr.
-- Jack Krupansky
-Original Message-
From: Hang Mang
Sent: Thursday, June 27, 2013 11:45 AM
To: java-user@
Hello,
is there some kind of a filter or component that I could use to filter
non-english text? I have a preprocessing step that I only want to index
English documents.
Best,
Gucko
This character lies in the CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A block.
Added extensions detection, I assume (not really knowing) that all of these
characters are not phonetic as well.
import java.lang.Character.UnicodeBlock;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
i
On Sun, Mar 10, 2013 at 8:19 PM, Gili Nachum wrote:
> Answering myself for next generations' sake.
> Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS does the job.
How about 㒨?
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lu
Answering myself for next generations' sake.
Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS does the job.
Example:
import junit.framework.Assert;
import org.junit.Test;
public class DetectCJK {
@Test
public void test1() {
Assert.assertEquals(Character.UnicodeBlock.BASIC_LATIN,
Ch
Original Message-
> From: Bradford Stephens [mailto:bradfordsteph...@gmail.com]
> Sent: Thursday, August 06, 2009 12:46 PM
> To: solr-u...@lucene.apache.org; java-user@lucene.apache.org
> Subject: Language Detection for Analysis?
>
> Hey there,
>
> We're trying to
Google Translate just released (last week) its language API with translation
and LANGUAGE DETECTION.
:)
It's very simple to use, and you can query it with some text to define witch
language is it.
Here is a simple example using groovy, but all you need is the url to
query:
There are several free Language Detection libraries out there, as well
as a few commercial ones. I think Karl Wettin has even written one as
a plugin for Lucene. Nutch also has one, AIUI. I would just Google
"language detection".
Also see http://www.lucidimagination.com
, NER, IR
- Original Message
> From: Bradford Stephens
> To: solr-u...@lucene.apache.org; java-user@lucene.apache.org
> Sent: Thursday, August 6, 2009 3:46:21 PM
> Subject: Language Detection for Analysis?
>
> Hey there,
>
> We're trying to add foreign
>
> >> > We're trying to add foreign language support into our new search
> >> > engine -- languages like Arabic, Farsi, and Urdu (that don't work with
> >> > standard analyzers). But our data source doesn't tell us
to our new search
>> > engine -- languages like Arabic, Farsi, and Urdu (that don't work with
>> > standard analyzers). But our data source doesn't tell us which
>> > languages we're actually collecting -- we just get blocks of text. Has
>> > an
gine -- languages like Arabic, Farsi, and Urdu (that don't work with
> > standard analyzers). But our data source doesn't tell us which
> > languages we're actually collecting -- we just get blocks of text. Has
> > anyone here worked on language detection so we can f
trying to add foreign language support into our new search
> engine -- languages like Arabic, Farsi, and Urdu (that don't work with
> standard analyzers). But our data source doesn't tell us which
> languages we're actually collecting -- we just get blocks of text. Has
> anyon
ext. Has
anyone here worked on language detection so we can figure out what
analyzers to use? Are there commercial solutions?
Much appreciated!
--
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
pache.org/jira/browse/LUCENE-1039that
> I've successfully used for language detection of user queries.
>
> karl
>
> 27 mar 2009 kl. 18.35 skrev Boris Aleksandrovsky:
>
>
> Lisheng,
>>
>> You might want to look at the Nutch LanguageID plugin
>> (h
You can also look at https://issues.apache.org/jira/browse/LUCENE-1039
that I've successfully used for language detection of user queries.
karl
27 mar 2009 kl. 18.35 skrev Boris Aleksandrovsky:
Lisheng,
You might want to look at the Nutch LanguageID plugin
(http://wiki.apach
to:jochen.sc...@gmail.com]on Behalf Of
> Jochen Frey
> Sent: Friday, March 27, 2009 10:04 AM
> To: java-user@lucene.apache.org
> Subject: Re: Free software for language detection
>
>
> Lisheng,
>
> Here's a package you could take a look at. I have used it in the past and i
Thanks very much!
-Original Message-
From: jochen.sc...@gmail.com [mailto:jochen.sc...@gmail.com]on Behalf Of
Jochen Frey
Sent: Friday, March 27, 2009 10:04 AM
To: java-user@lucene.apache.org
Subject: Re: Free software for language detection
Lisheng,
Here's a package you could t
sheng.zh...@broadvision.com> wrote:
> Hi,
>
> Are you aware of any free software for language detection (given certain
> text, see if it is French, or Japanese)? I saw Bob Carpenter's previous
> mail which explained the principle nicely, but could not locate free tools?
>
>
Hi,
Are you aware of any free software for language detection (given certain
text, see if it is French, or Japanese)? I saw Bob Carpenter's previous
mail which explained the principle nicely, but could not locate free tools?
Thanks very much for helps, Li
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
Language detection is easy. It's just a simple
text classification problem.
One way you can do this is using Lucene
itself. Create a so-called pseudo-document
for each language consi
Thank you, I got the natch plugin, and it is working great
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 03, 2007 4:17 PM
To: java-user@lucene.apache.org
Subject: Re: Language detection library
LingPipe - commercial unless your data/product
of a good language detection library that can
detect what
> language a document (text) is ?
I posted this some time back:
https://issues.apache.org/jira/browse/LUCENE-826
A bit of proof-of-concept:ish, but it does the job well if you ask
me. Uses Weka (GPL) and requires at least 150 char
://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
On 5/3/07, karl wettin <[EMAIL PROTECTED]> wrote:
3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK):
> Anyone knows of a good language detectio
3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK):
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
I posted this some time back:
https://issues.apache.org/jira/browse/LUCENE-826
A bit of proof-of-concept:ish, but it does the
Jason Pump wrote:
http://software.wise-guys.nl/libtextcat/
... which is what Nutch implements in its language-identifier plugin.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___
- Original Message
From: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, May 3, 2007 4:06:04 PM
Subject: Language detection library
Anyone knows of a good language detection library that can detect what
language a do
t; <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, May 3, 2007 4:06:04 PM
Subject: Language detection library
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Antony Bowesman wrote:
Hello,
I'm new to Lucene and wanted some advice on analyzers, stemmers and
language analysis. I've got LIA, so have read it's chapters.
I am writing a framework that needs to be able to index documents from a
range of languages where just the character set of the docu
31 matches
Mail list logo