Re: Lucene with Khmer ? (Language in cambodia)

Grant Ingersoll Wed, 24 Jan 2007 05:04:48 -0800

Luke is your friend.  Use it to see what you have in your index.


On Jan 24, 2007, at 5:29 AM, Fournaux Nicolas wrote:

Good morning all (or good afternoon)

I used Lucene many times before, to search text in French OrEnglish. All

worked fine :-)

But now I have a new challenge, I need to use Lucene with Khmer(Khmer is

the Cambodia’s language, it looks like Thai or Indian)

But it doesn’t work, my code is well executed but it found noresults, I

give you my code below

I thought UTF-8 is 100% handled by Java and that we have “nothingto do”




My code is working fine when I use English words.



Thanks in advance for your help :-)



Here is my source code :

/***************************************************************************

*************************************/



public class TestKhmer

{



            public static void main(String[] args) throws Exception

            {

Analyzer analyzer = new StandardAnalyzer();

Directory directory = FSDirectory.getDirectory("C:\\Folder\\indexLucene",

true);

IndexWriter iwriter = new IndexWriter(directory, analyzer, true);



iwriter.setMaxFieldLength(25000);



                        Document doc = new Document();

String text = getContents("C:\\Folder\\file.txt");// this file was saved as UTF-8 format by UltraEdit , when I openit I see

my Khmer charactere





                        Field field = new Field("text", text,
Field.Store.YES, Field.Index.TOKENIZED);

Field field2 = new Field("filename","file.txt" ,

Field.Store.YES, Field.Index.TOKENIZED);

                        doc.add(field);

                        doc.add(field2);

                        iwriter.addDocument(doc);



                        iwriter.close();



                        // Now search the index:

                        IndexSearcher isearcher = new
IndexSearcher(directory);

                        String stringToSearch =

getContents("C:\\Folder\\dataToSearch.txt"); // my search string islocatedin a text file, this file was saved as UTF-8 format by UltraEdit,when I

open it I see my Khmer charactere

String stringQuery = "text:" +stringToSearch ;

QueryParser queryParser = new QueryParser("text" ,

analyzer);

                        Query query = queryParser.parse(stringQuery);



                        Hits hits = isearcher.search(query);



                        // Iterate through the results:

                        for (int i = 0; i < hits.length(); i++)

                         {

                                    Document hitDoc = hits.doc(i);

                                    System.out.println("Result : " +
hitDoc.get("filename"));

                        }



                        isearcher.close();

                        directory.close();

            }



private static String getContents(String path) throws Exception

            {

String line = null;

                        StringBuffer sb = new StringBuffer();

                        BufferedReader br;



                        try

                        {

                                    br = new BufferedReader( new

InputStreamReader( new FileInputStream(path), "UTF-8")); // as Itold you,

my file is in UTF-8 format



while((line = br.readLine()) != null)

{

    sb.append(line + "\n");

}



                        } catch (Exception e)

{



e.printStackTrace();

}



                                    return sb.toString();

}



}

/***************************************************************************

*************************************/



Best regards



Nicolas


--
No virus found in this outgoing message.
Checked by AVG Free Edition.

Version: 7.1.410 / Virus Database: 268.17.8/649 - Release Date:1/23/2007


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene with Khmer ? (Language in cambodia)

Reply via email to