Liaqat,
Out of curiosity - what are you using to analyze and index Urdu? AraMorph or
something else?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Liaqat Ali <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, Decembe
The example from Grant's earlier reply uses UTF-8:
http://wiki.apache.org/lucene-java/IndexingOtherLanguages
I tried out the Urdu in your email, first converted it to UTF-8, then Lucene
seemed to index/search ok, SAX worked as well for parsing it.
-Original Message-
From: Liaqat Ali [ma
You are on the right path, just extract your content using SAX and
then you can add Fields to Lucene for each document. As long as the
values are strings, it should be the same as any indexing task. The
key of course will be using an Analyzer that understands how to
tokenize/stem Urdu.