Dear Rasha,

Sorry for the delay, I've indexed Arabic and English seamlessly on Lucene, the only thing you have to watch out for is stemming, as for indexing PDFs, I have not used that part of the API, but from experience, this comes down to using or in some cases forcing the correct encoding, debug this by bringing down your development to the lowest denominator, for example if you're doing this from a webservice, try it first from the prompt, so you have to contend only with the OS encoding (UTF-8 is highly recommended) and not the browser / server encodings.

A more detailed example of the problem you're facing would help me understand the problem more.

Nader

Rasha wrote:

Dear Nader,

I Have a big problem during indexing pdfs containing Persian Word

lucenePDFIndexer cannot index it , and indexed words of pdf are unuseable


is there a way to perform it to index good?


regards,
rasha malek







--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to