Hello, I'm currently working out some problems when searching for Tibetan
Characters. More specifically: /u0f10-/u0f19. We are using the
StandardAnalyzer (3.4) and I've narrowed the problem down to
StandardTokenizerImpl throwing away these characters i.e. in
getNextToken(), falls through case1:
Thanks Robert. That makes sense. Do you have a link handy where I can
find this information? i.e. word boundary/punctuation for any unicode
character set?
On Fri, Mar 30, 2012 at 12:57 PM, Robert Muir wrote:
> On Fri, Mar 30, 2012 at 12:46 PM, Denis Brodeur
> wrote:
> > Hello, I