But for this, you need a skillfully designed:
- set of fields
- multiplexing analyzer
- query expansion
In one of my projects, we do not split language by fields and it's a pain... 
I'm having recurring issues in one sense or the other.
- the "die" example that Oti s mentioned is a good one: stop-word in German, 
essential verb in English
- I had recently issues with the contribution of the word Fourier (for the name 
of series): in English it stays fourier, in French in becomes fouri. So: if the 
resource is contributed in French, the indexed value is fouri, English seekers 
won't find it; if the resource is contributed in English, French seekers won't 
find it.
So my last lesson: always have a whitespace-lowercase unstemmed field also at 
hand and prefer it over the others in your query expansion.

A wiki page should probably be made.

paul


Le 19 janv. 2011 à 07:53, Vinaya Kumar Thimmappa a écrit :
> I think we should be using lucene with snowball jar's which means one index 
> for all languages (ofcourse size of index is always a matter of concerns).
> 
> Hope this helps.
> -vinaya
> 
> On Tuesday 18 January 2011 11:23 PM, Clemens Wyss wrote:
>> What is the "best practice" to support multiple languages, i.e. 
>> Lucene-Documents that have multiple language content/fields?
>> Should
>> a) each language be indexed in a seperate index/directory or should
>> b) the Documents (in a single directory) hold the diverse localized fields?
>> 
>> We most often will be searching "language dependent" which (at least 
>> performance wise) mandates one-directory-per-language...
>> 
>> Any (lucene specific) white papers on this topic?
>> 
>> Thx in advance
>> Clemens
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to