On Oct 24, 2007, at 3:07 AM, Liaqat Ali wrote:

Hi All,

I m developing a search engine for Urdu language. I want to use lucene for that purpose. Now the situation is that

---I have a corpus of 2000 Urdu(Variant of Persian and Arabic) documents in XML form, how i will make index of them using Lucene.

You will have to use some sort of XML Parser (SAX or a pull parser) to extract the content you want and create Lucene Documents. Have a look at the tutorial on the Lucene home page for examples

---Well there will be need some stemming techniques while indexing, because there is no stemmer available for Urdu language.

You will have to write your own, more than likely. There are some Arabic analyzers out there, perhaps you could use them as a starting point.


---I have developed a GUI using HTML and have a Java Servlets for searching, so how i will integrate Lucene with my own servlets.

This really is up to you, but essentially you need to setup an IndexSearcher and create queries to do searches. Again, have a look at the tutorial as a way of getting started.



--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http:// www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to