: I don't know if the developpers of lucene would agree, but from what
: I've been browsing on the ML archives, those multiple language issues
: seems to arrise quite often in the mailing list, and maybe some articles
: like "best practices", "do's and don'ts" or "Lucene Architecture in
: multiple
On Sep 6, 2005, at 7:15 AM, Hacking Bear wrote:
On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote:
As far as your usage is concerned, it seems to be the right approach,
and I think the StandardAnalyzer does the job pretty right when it
has
to deal with whatever language you want.
Olivier Jaquemet wrote:
> Gusenbauer Stefan wrote:
>
>> I think nutch uses ngramj for language classification but i don't know
>> what type of saving language information they use. In our application
>> for example i save the language in an extra field in the document
>> because lucene is supporti
not do so at the same time?
-Original Message-
From: Olivier Jaquemet [mailto:[EMAIL PROTECTED]
Sent: 06 September 2005 13:21
To: java-user@lucene.apache.org
Subject: Re: Multiple Language Indexing and Searching
Gusenbauer Stefan wrote:
>I think nutch uses ngramj for language class
Gusenbauer Stefan wrote:
I think nutch uses ngramj for language classification but i don't know
what type of saving language information they use. In our application
for example i save the language in an extra field in the document
because lucene is supporting multiple fields with the same names
James Adams wrote:
>Does anyone know what approach does Nutch uses?
>
>
>-Original Message-
>From: Hacking Bear [mailto:[EMAIL PROTECTED]
>Sent: 06 September 2005 12:15
>To: java-user@lucene.apache.org
>Subject: Re: Multiple Language Indexing and Searching
>
&
Does anyone know what approach does Nutch uses?
-Original Message-
From: Hacking Bear [mailto:[EMAIL PROTECTED]
Sent: 06 September 2005 12:15
To: java-user@lucene.apache.org
Subject: Re: Multiple Language Indexing and Searching
On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]>
On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote:
>
> As far as your usage is concerned, it seems to be the right approach,
> and I think the StandardAnalyzer does the job pretty right when it has
> to deal with whatever language you want.
I should look into exactly what it does. Does this
As far as your usage is concerned, it seems to be the right approach,
and I think the StandardAnalyzer does the job pretty right when it has
to deal with whatever language you want.
Though, note that it won't deal with all languages' stop words but the
English ones, unless specified at index tim
Hi,
I have the similar problem to deal with. In fact, a lot of times, the
documents do not have any lanugage information or it may contain text in
multiple languages. Further, the user would not like to always supply this
information. Also the user may very well be interested in documents in
m
10 matches
Mail list logo