Re: languages & files

Otis Gospodnetic Thu, 19 Jan 2006 14:43:33 -0800

Hello,
For the first problem (indexing different types of documents), you can use the 
mini-framework for doing just that.  Just get the source code that comes with 
Lucene in Action, and play - http://www.lucenebook.com/


For the Analyzers, look what Snowball provides (do a search at lucenebook.com 
or google.com).  I imagine only Dutch is supported, and I imagine you may be 
able to find a Bulgarian Analyzer somewhere, but the other two languages may be 
harder ...

Otis

----- Original Message ----
From: [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thu 19 Jan 2006 11:01:07 PM EST
Subject: languages &  files 

Hi,

I begin working with lucene and need few explanations to do what i want,
thanks for your helpful answers.

I have to add lucene into a java application and I have two targets:

- To enable search throw different types of files, like MS Word, PDF or
Excel files. 
I read that each type of document must be indexed with the appropriate
indexer. So how can I do it easily? I found an API called Lius which
seem to index different types of documents directly, is anyone know this
product? Other ones?

- Secondly, the search system must work with different like Dutch,
Turkish, Bulgarian and tomorrow Thaï or Chinese. 
The lucene documentation talks about using different Analyzer for none
ISO languages. Lucene's sandbox is quite empty, and I do not understand
which kind of treatment much be done to read correctly data. I think I
still have problem searching into indexed documents with accents. So how
should I work on the languages particularity?

Thanks

A


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: languages & files

Reply via email to