Re: Indexing XML document

2007-12-11 Thread Otis Gospodnetic
Liaqat, Out of curiosity - what are you using to analyze and index Urdu? AraMorph or something else? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Liaqat Ali <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, Decembe

RE: Indexing XML document

2007-12-05 Thread Seneviratne_Yasoja
The example from Grant's earlier reply uses UTF-8: http://wiki.apache.org/lucene-java/IndexingOtherLanguages I tried out the Urdu in your email, first converted it to UTF-8, then Lucene seemed to index/search ok, SAX worked as well for parsing it. -Original Message- From: Liaqat Ali [ma

Re: Indexing XML document

2007-12-04 Thread Grant Ingersoll
You are on the right path, just extract your content using SAX and then you can add Fields to Lucene for each document. As long as the values are strings, it should be the same as any indexing task. The key of course will be using an Analyzer that understands how to tokenize/stem Urdu.