Re: Indexing Non-English text

Grant Ingersoll Tue, 04 Dec 2007 03:53:04 -0800

FileReader is dependent on your local locale. http://wiki.apache.org/lucene-java/IndexingOtherLanguageshas some useful tips. Essentially, you need to make sure youcontrol the encodings at all input points of your application. Lucenewill do the appropriate thing internally.


On Dec 4, 2007, at 5:53 AM, Liaqat Ali wrote:

Hi,
I m facing a problem while indexing a small .txt file with Lucene.The file which i want to index with lucene is in Urdu language(varient of Arabic and Persian). But the Index i get is in Unicodeform, not in the real form (original Urdu text). This program worksgood for a file in English language. This is the code i use forindexing..
      FileReader file = new FileReader ("urdoc.txt");
      BufferedReader buff = new BufferedReader(file);
      String line = buff.readLine();
      boolean eof = false;
      buff.close();
      String indexDir = "D:\\index";
             Analyzer analyzer = new StandardAnalyzer();
          boolean createFlag = true;
      IndexWriter writer =
                  new IndexWriter(indexDir, analyzer, createFlag);
          Document document  = new Document();
      document.add(new Field("fieldname",line, Field.Store.YES,
      Field.Index.TOKENIZED));
          writer.addDocument(document);
          writer.close();
Kindly guide me, what I should do, would i have to change this codeor whatever else do you suggest?
Liaqat

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing Non-English text

Reply via email to