Re: StandardAnalyzer Problem with Apostrophes

2006-11-14 Thread Karel Tejnora
The problem is in StandardTokenizer so Analyzer with method: public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new LowerCaseTokenizer(reader); result = new StopFilter(result, stopSet); return result; } if you need everything standard analyzer does Fr

Re: StandardAnalyzer Problem with Apostrophes

2006-11-14 Thread Sarah Hunter
That was my first thought as well, but it looks like APOSTROPHE is already the one that I want. As you can see, from StandardAnalyzer.jj --- TOKEN : { // token patterns // basic word: a sequence of digits & letters ||)+ > // internal ap

Re: StandardAnalyzer Problem with Apostrophes

2006-11-14 Thread Karel Tejnora
Apostrophe is recognized as a part of word - Standard analyzer is mostly English oriented. The way is to swap apostrophes - "normal" with unusual. StandardAnalyzer.java line 40-44 APOSTROPHE: token = jj_consume_token(APOSTROPHE); -

StandardAnalyzer Problem with Apostrophes

2006-11-13 Thread Sarah Hunter
Hi there, Any ideas you have about the following would be greatly appreciated. I'd like apostropes to break up a word into two for indexing - ie, the french l'observatoire would be indexed as two separate tokens, l observatoire. My understanding from reading documentation and list archives is tha