Luke is your friend. Use it to see what you have in your index.
On Jan 24, 2007, at 5:29 AM, Fournaux Nicolas wrote:
Good morning all (or good afternoon)
I used Lucene many times before, to search text in French Or
English. All
worked fine :-)
But now I have a new challenge, I need to use Lucene with Khmer
(Khmer is
the Cambodia’s language, it looks like Thai or Indian)
But it doesn’t work, my code is well executed but it found no
results, I
give you my code below
I thought UTF-8 is 100% handled by Java and that we have “nothing
to do”
My code is working fine when I use English words.
Thanks in advance for your help :-)
Here is my source code :
/
**********************************************************************
*****
*************************************/
public class TestKhmer
{
public static void main(String[] args) throws Exception
{
Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.getDirectory("C:\\Folder\
\indexLucene",
true);
IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
iwriter.setMaxFieldLength(25000);
Document doc = new Document();
String text = getContents("C:\\Folder\
\file.txt");
// this file was saved as UTF-8 format by UltraEdit , when I open
it I see
my Khmer charactere
Field field = new Field("text", text,
Field.Store.YES, Field.Index.TOKENIZED);
Field field2 = new Field("filename",
"file.txt" ,
Field.Store.YES, Field.Index.TOKENIZED);
doc.add(field);
doc.add(field2);
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
IndexSearcher isearcher = new
IndexSearcher(directory);
String stringToSearch =
getContents("C:\\Folder\\dataToSearch.txt"); // my search string is
located
in a text file, this file was saved as UTF-8 format by UltraEdit,
when I
open it I see my Khmer charactere
String stringQuery = "text:" +
stringToSearch ;
QueryParser queryParser = new QueryParser
("text" ,
analyzer);
Query query = queryParser.parse(stringQuery);
Hits hits = isearcher.search(query);
// Iterate through the results:
for (int i = 0; i < hits.length(); i++)
{
Document hitDoc = hits.doc(i);
System.out.println("Result : " +
hitDoc.get("filename"));
}
isearcher.close();
directory.close();
}
private static String getContents(String path) throws Exception
{
String line = null;
StringBuffer sb = new StringBuffer();
BufferedReader br;
try
{
br = new BufferedReader( new
InputStreamReader( new FileInputStream(path), "UTF-8")); // as I
told you,
my file is in UTF-8 format
while((line = br.readLine()) != null)
{
sb.append(line + "\n");
}
} catch (Exception e)
{
e.printStackTrace();
}
return sb.toString();
}
}
/
**********************************************************************
*****
*************************************/
Best regards
Nicolas
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.410 / Virus Database: 268.17.8/649 - Release Date:
1/23/2007
------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://www.paperoftheweek.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]