Re: Indexing XML document

2007-12-11 Thread Otis Gospodnetic
uesday, December 4, 2007 1:04:45 PM Subject: Indexing XML document Hi all, I want to index an XML file,containing 200 Urdu language (Varient of Arabic and Persian) documents. This corpus is in CES format,consisting of information about author and many more, I just want to extract textual data o

RE: Indexing XML document

2007-12-05 Thread Seneviratne_Yasoja
t Ali [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 04, 2007 10:05 AM To: java-user@lucene.apache.org Subject: Indexing XML document Hi all, I want to index an XML file,containing 200 Urdu language (Varient of Arabic and Persian) documents. This corpus is in CES format,consisting of inform

Re: Indexing XML document

2007-12-04 Thread Grant Ingersoll
You are on the right path, just extract your content using SAX and then you can add Fields to Lucene for each document. As long as the values are strings, it should be the same as any indexing task. The key of course will be using an Analyzer that understands how to tokenize/stem Urdu.

Indexing XML document

2007-12-04 Thread Liaqat Ali
Hi all, I want to index an XML file,containing 200 Urdu language (Varient of Arabic and Persian) documents. This corpus is in CES format,consisting of information about author and many more, I just want to extract textual data of each document and relative Doc number and title in each documen