hello, your example (hindi), is probably suffering from a number of search issues:
i dont recommend standardanalyzer as for this example, it will break words around dependent the vowels and nukta dot, etc. whitespaceanalyzer might be a good start. also, is it possible to apply unicode normalization to your text before indexing it? normalization will standardize things in indian languages. in your example, the pha + nukda dot you queries on is the normalized form, but i wonder if in your text its encoded as fa (u095E) if you apply normalization mode NFC it will standardize to pha + nukda dot. On Thu, May 21, 2009 at 9:26 AM, KK <dioxide.softw...@gmail.com> wrote: > Hi All, > I've indexed some docs[non-english] in unicoded utf=8 format. For both > indexing as well as searching/querying I'm using simpleanalyzer. For > english > texts when I tried with single words its working then I thought of trying > for non-english texts. So I wrote those words[multiple words] in babelmap[a > unicode converter] and got the unicode for the text string and tried that > as > query but it din't work. Earlier I've used the same method to query solr > index which use lucene at the backend. I tried say this query, > \u0938\u0941\u0939\u093E\u0928\u093E\u0020\u0938\u092B\u093C\u0930 > which is unicoded for some non-english text, but this give me zero search > result in lucene. I want to know whats going wrong. As I know at the end of > the day lucene writes my non-english texts in unicodes, so if I'm reading > say the index it'll have this kind of characters on the disk, right? So > when > I query using the same thing it should work. This used to work perfectly > well with Solr where I was indexing all docs in unicode utf-8 encoding and > the query was also unicoded as show above. Can someone point me what is > going wrong here? > May be I've to have a look over the analyzer solr was using in the default > setting[i used the default setting only, and pretty sure it was using lot > many analyzers/filter factory]. Thanks for all your time and appreciation. > > Thanks, > KK. > -- Robert Muir rcm...@gmail.com