Thanks for your answer. No those words are not part of the stop word file (I'm using the one that comes with the Japanese analyzer in lucene-kuromoji-3.6.1.jar.
My Japanese contact told me that the first sentence means "I am Japanese" and the second one is a unit of length. Jerome From: Swapnil Patil <ping.swap...@gmail.com> To: java-user@lucene.apache.org, Date: 01/18/2013 02:33 PM Subject: Re: Japanese analyzer Hi, I just translated these words, using google translate look like Japanese I [ Can you check if these words are in your stopword file. if these words exits in your stop word file than you will not get them in token stream. -Swapnil On Fri, Jan 18, 2013 at 6:58 PM, Jerome Lanneluc <jerome_lanne...@fr.ibm.com > wrote: > [私 日本人 Sauf indication contraire ci-dessus:/ Unless stated otherwise above: Compagnie IBM France Siège Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex RCS Nanterre 552 118 465 Forme Sociale : S.A.S. Capital Social : 653.242.306,20 � SIREN/SIRET : 552 118 465 03644 - Code NAF 6202A