Hello, For the first problem (indexing different types of documents), you can use the mini-framework for doing just that. Just get the source code that comes with Lucene in Action, and play - http://www.lucenebook.com/
For the Analyzers, look what Snowball provides (do a search at lucenebook.com or google.com). I imagine only Dutch is supported, and I imagine you may be able to find a Bulgarian Analyzer somewhere, but the other two languages may be harder ... Otis ----- Original Message ---- From: [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thu 19 Jan 2006 11:01:07 PM EST Subject: languages & files Hi, I begin working with lucene and need few explanations to do what i want, thanks for your helpful answers. I have to add lucene into a java application and I have two targets: - To enable search throw different types of files, like MS Word, PDF or Excel files. I read that each type of document must be indexed with the appropriate indexer. So how can I do it easily? I found an API called Lius which seem to index different types of documents directly, is anyone know this product? Other ones? - Secondly, the search system must work with different like Dutch, Turkish, Bulgarian and tomorrow Thaï or Chinese. The lucene documentation talks about using different Analyzer for none ISO languages. Lucene's sandbox is quite empty, and I do not understand which kind of treatment much be done to read correctly data. I think I still have problem searching into indexed documents with accents. So how should I work on the languages particularity? Thanks A --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]