Luke is your friend.  Use it to see what you have in your index.

On Jan 24, 2007, at 5:29 AM, Fournaux Nicolas wrote:

Good morning all (or good afternoon)



I used Lucene many times before, to search text in French Or English. All
worked fine :-)



But now I have a new challenge, I need to use Lucene with Khmer (Khmer is
the Cambodia’s language, it looks like Thai or Indian)



But it doesn’t work, my code is well executed but it found no results, I
give you my code below



I thought UTF-8 is 100% handled by Java and that we have “nothing to do”



My code is working fine when I use English words.



Thanks in advance for your help :-)



Here is my source code :



/ ********************************************************************** *****
*************************************/



public class TestKhmer

{



            public static void main(String[] args) throws Exception

            {

Analyzer analyzer = new StandardAnalyzer();

Directory directory = FSDirectory.getDirectory("C:\\Folder\ \indexLucene",
true);

IndexWriter iwriter = new IndexWriter(directory, analyzer, true);



iwriter.setMaxFieldLength(25000);



                        Document doc = new Document();

String text = getContents("C:\\Folder\ \file.txt"); // this file was saved as UTF-8 format by UltraEdit , when I open it I see
my Khmer charactere





                        Field field = new Field("text", text,
Field.Store.YES, Field.Index.TOKENIZED);

Field field2 = new Field("filename", "file.txt" ,
Field.Store.YES, Field.Index.TOKENIZED);

                        doc.add(field);

                        doc.add(field2);

                        iwriter.addDocument(doc);



                        iwriter.close();



                        // Now search the index:

                        IndexSearcher isearcher = new
IndexSearcher(directory);

                        String stringToSearch =
getContents("C:\\Folder\\dataToSearch.txt"); // my search string is located in a text file, this file was saved as UTF-8 format by UltraEdit, when I
open it I see my Khmer charactere

String stringQuery = "text:" + stringToSearch ;

QueryParser queryParser = new QueryParser ("text" ,
analyzer);

                        Query query = queryParser.parse(stringQuery);



                        Hits hits = isearcher.search(query);



                        // Iterate through the results:

                        for (int i = 0; i < hits.length(); i++)

                         {

                                    Document hitDoc = hits.doc(i);

                                    System.out.println("Result : " +
hitDoc.get("filename"));

                        }



                        isearcher.close();

                        directory.close();

            }



private static String getContents(String path) throws Exception

            {

String line = null;

                        StringBuffer sb = new StringBuffer();

                        BufferedReader br;



                        try

                        {

                                    br = new BufferedReader( new
InputStreamReader( new FileInputStream(path), "UTF-8")); // as I told you,
my file is in UTF-8 format



while((line = br.readLine()) != null)

{

    sb.append(line + "\n");

}



                        } catch (Exception e)

{



e.printStackTrace();

}



                                    return sb.toString();

}



}



/ ********************************************************************** *****
*************************************/



Best regards



Nicolas


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.410 / Virus Database: 268.17.8/649 - Release Date: 1/23/2007


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to