I use Luke to check the result and find only c exists as a term, no cplusplus found in the index
On Sun, Dec 13, 2009 at 10:34 AM, Weiwei Wang <ww.wang...@gmail.com> wrote: > Thanks, Koji, I followed your advice and change my analyzer as shown below: > NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); > RECOVERY_MAP.add("c++","cplusplus$"); > CharFilter filter = new LowercaseCharFilter(reader); > filter = new MappingCharFilter(RECOVERY_MAP,filter); > StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_30, > filter); > tokenStream.setMaxTokenLength(maxTokenLength); > TokenStream result = new StandardFilter(tokenStream); > result = new LowerCaseFilter(result); > result = new StopFilter(enableStopPositionIncrements, result, stopSet); > result = new SnowballFilter(result, STEMMER); > > I use the same analyzer in the search side. As you know, this analyzer can > token c++ as cplusplus, for this reason, it seems I can search c++ with > the same analyzer because it is also tokenized as cplusplus. > > I tested it on as string c++c++, however, when i search c++ on the built > index, nothing is returned. > > I do not know what's wrong with my code. Waiting for your replay > > > > > > On Fri, Dec 11, 2009 at 9:43 PM, Weiwei Wang <ww.wang...@gmail.com> wrote: > >> Thanks, Koji >> >> >> On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi <k...@r.email.ne.jp>wrote: >> >>> MappingCharFilter can be used to convert c++ to cplusplus. >>> >>> Koji >>> >>> -- >>> http://www.rondhuit.com/en/ >>> >>> >>> >>> Anshum wrote: >>> >>>> How about getting the original token stream and then converting c++ to >>>> cplusplus or anyother such transform. Or perhaps you might look at >>>> using/extending(in the non java sense) some other tokenized! >>>> >>>> -- >>>> Anshum Gupta >>>> Naukri Labs! >>>> http://ai-cafe.blogspot.com >>>> >>>> The facts expressed here belong to everybody, the opinions to me. The >>>> distinction is yours to draw............ >>>> >>>> >>>> On Fri, Dec 11, 2009 at 11:00 AM, Weiwei Wang <ww.wang...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>>> Hi, all, >>>>> I designed a ftp search engine based on Lucene. I did a few >>>>> modifications to the StandardTokenizer. >>>>> My problem is: >>>>> C++ is tokenized as c from StandardTokenizer and I want to recover it >>>>> from >>>>> the TokenStream from StandardTokenizer >>>>> >>>>> What should I do? >>>>> >>>>> -- >>>>> Weiwei Wang >>>>> Alex Wang >>>>> 王巍巍 >>>>> Room 403, Mengmin Wei Building >>>>> Computer Science Department >>>>> Gulou Campus of Nanjing University >>>>> Nanjing, P.R.China, 210093 >>>>> >>>>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> >> -- >> Weiwei Wang >> Alex Wang >> 王巍巍 >> Room 403, Mengmin Wei Building >> Computer Science Department >> Gulou Campus of Nanjing University >> Nanjing, P.R.China, 210093 >> >> Homepage: http://cs.nju.edu.cn/rl/weiweiwang >> > > > > -- > Weiwei Wang > Alex Wang > 王巍巍 > Room 403, Mengmin Wei Building > Computer Science Department > Gulou Campus of Nanjing University > Nanjing, P.R.China, 210093 > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang