Hi, We are having an issue while indexing Chinese Documents in Lucene.
Some background first: Since CJK languages doesn't have space between words, we first have to determine the words from sentences. e.g. a sentence containing characters ABC, it may be segmented into AB, C or A, BC. the problem is sometimes there can be ambiguities in how the sentence should be segmented. It is possible that both AB, C and A, BC are valid segmentations. In this cases we would like to index both segmentation into the index: AB offset (0,1) position 0 C offset (2,2) position 1 A offset (0,0) position 0 BC offset (1,2) position 1 Now the problem is, when someone search using a PhraseQuery (AC) it will find this line ABC because it match A (position 0) and C (position 1). Are there any ways to search for exact match using the offset information instead of the position information ? Best Regards, Cedric --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]