uesday, December 4, 2007 1:04:45 PM
Subject: Indexing XML document
Hi all,
I want to index an XML file,containing 200 Urdu language (Varient of
Arabic and Persian) documents. This corpus is in CES format,consisting
of information about author and many more, I just want to extract
textual data o
t Ali [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 04, 2007 10:05 AM
To: java-user@lucene.apache.org
Subject: Indexing XML document
Hi all,
I want to index an XML file,containing 200 Urdu language (Varient of
Arabic and Persian) documents. This corpus is in CES format,consisting
of inform
You are on the right path, just extract your content using SAX and
then you can add Fields to Lucene for each document. As long as the
values are strings, it should be the same as any indexing task. The
key of course will be using an Analyzer that understands how to
tokenize/stem Urdu.
Hi all,
I want to index an XML file,containing 200 Urdu language (Varient of
Arabic and Persian) documents. This corpus is in CES format,consisting
of information about author and many more, I just want to extract
textual data of each document and relative Doc number and title in each
documen