RE: Lucene and multi-lingual Unicode - advice needed

OBender Hotmail Mon, 15 Jun 2009 14:40:00 -0700

I've looked over SolR quickly, it is a bit too heavy for my project.
So what is required (at a minimum) to build an analyzer, sandbox has a few of 
them varying in complexity.


-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Monday, June 15, 2009 4:51 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene and multi-lingual Unicode - advice needed

Well just reply back if SolR is inappropriate for your needs.

In that case, you will need to build a custom analyzer (its not too
bad), so that you can use compass.

On Mon, Jun 15, 2009 at 4:19 PM, OBender Hotmail<osya_ben...@hotmail.com> wrote:
> Hi,
>
> My goal is to find a framework that encapsulates as much low level 
> indexing/search technology as possible and have it integrate nicely with 
> Spring.
> It looked like Compass was/is a good encapsulation of the functionality. I'll 
> take a look at SolR though, thanks for the pointer.
>
> -----Original Message-----
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Monday, June 15, 2009 1:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene and multi-lingual Unicode - advice needed
>
> Hi,
>
> (Since this is an issue you brought up on the Compass forums)
>
> I wonder what stage you are in the development process?
> Have you considered SolR, or does compass provide some other
> functionality that you need?
>
> The reason I say this, is because the easiest solution might be to use
> a nightly SolR for your application.
>
> I'm not personally biased one way or the other for any particular
> framework, but recently there has been some improvements added to SolR
> so that the default type 'text' is pretty good for multilingual
> processing.
>
> In fact I hope in the future it will be improved in lucene so that
> your decision is really based upon other application needs...
>
> On Mon, Jun 15, 2009 at 1:10 PM, OBender Hotmail<osya_ben...@hotmail.com> 
> wrote:
>> Hi All!
>>
>>
>>
>> I'm new to Lucene so forgive me if this question was asked before.
>>
>> I have a database with records in the same table in many different languages
>> (up to 70) it includes all W-European, Arabic, Eastern, CJK, Cyrillic, etc.
>> you name it.
>> I've looked at what people say about Lucene and it looks like for the most
>> part standard analyzers should do fine with most Unicode languages but there
>> are quite a few exceptions.
>> Here is some recently updated Lucene Jira thread:
>> https://issues.apache.org/jira/browse/LUCENE-1488
>>
>> My question is, what would be the safest bet for me in terms of
>> analyzers/tokenizers?
>> Do I really have to write my own ones for the bunch of languages that are
>> not supported?
>> Did anyone already solve the problem similar to mine? I'm sure someone
>> already did :)
>>
>> And yes, I looked at the Lucene sandbox analyzers. It just adds more
>> confusion. For example why there analyzers for DE and FR? Wouldn't the
>> standard analyzer (which is Unicode complaint as I understood) deal with EU
>> languages just fine?
>>
>> Thanks in advance for advices :)
>>
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Lucene and multi-lingual Unicode - advice needed

Reply via email to