Re: Multiple Language Indexing and Searching

2005-09-06 Thread Chris Hostetter
: I don't know if the developpers of lucene would agree, but from what : I've been browsing on the ML archives, those multiple language issues : seems to arrise quite often in the mailing list, and maybe some articles : like "best practices", "do's and don'ts" or "Lucene Architecture in : multiple

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Erik Hatcher
On Sep 6, 2005, at 7:15 AM, Hacking Bear wrote: On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote: As far as your usage is concerned, it seems to be the right approach, and I think the StandardAnalyzer does the job pretty right when it has to deal with whatever language you want.

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Gusenbauer Stefan
Olivier Jaquemet wrote: > Gusenbauer Stefan wrote: > >> I think nutch uses ngramj for language classification but i don't know >> what type of saving language information they use. In our application >> for example i save the language in an extra field in the document >> because lucene is supporti

RE: Multiple Language Indexing and Searching

2005-09-06 Thread James Adams
not do so at the same time? -Original Message- From: Olivier Jaquemet [mailto:[EMAIL PROTECTED] Sent: 06 September 2005 13:21 To: java-user@lucene.apache.org Subject: Re: Multiple Language Indexing and Searching Gusenbauer Stefan wrote: >I think nutch uses ngramj for language class

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Olivier Jaquemet
Gusenbauer Stefan wrote: I think nutch uses ngramj for language classification but i don't know what type of saving language information they use. In our application for example i save the language in an extra field in the document because lucene is supporting multiple fields with the same names

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Gusenbauer Stefan
James Adams wrote: >Does anyone know what approach does Nutch uses? > > >-Original Message- >From: Hacking Bear [mailto:[EMAIL PROTECTED] >Sent: 06 September 2005 12:15 >To: java-user@lucene.apache.org >Subject: Re: Multiple Language Indexing and Searching > &

RE: Multiple Language Indexing and Searching

2005-09-06 Thread James Adams
Does anyone know what approach does Nutch uses? -Original Message- From: Hacking Bear [mailto:[EMAIL PROTECTED] Sent: 06 September 2005 12:15 To: java-user@lucene.apache.org Subject: Re: Multiple Language Indexing and Searching On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]>

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Hacking Bear
On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote: > > As far as your usage is concerned, it seems to be the right approach, > and I think the StandardAnalyzer does the job pretty right when it has > to deal with whatever language you want. I should look into exactly what it does. Does this

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Olivier Jaquemet
As far as your usage is concerned, it seems to be the right approach, and I think the StandardAnalyzer does the job pretty right when it has to deal with whatever language you want. Though, note that it won't deal with all languages' stop words but the English ones, unless specified at index tim

Multi-lang analyzer? Re: Multiple Language Indexing and Searching

2005-09-05 Thread Hacking Bear
Hi, I have the similar problem to deal with. In fact, a lot of times, the documents do not have any lanugage information or it may contain text in multiple languages. Further, the user would not like to always supply this information. Also the user may very well be interested in documents in m