Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
>> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> > -Original Message- >> > From: Weiwei Wang [mailto:ww.wang...@gmail.com] >> > Sent: Sunday, Dec

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
ng [mailto:ww.wang...@gmail.com] > > Sent: Sunday, December 13, 2009 12:51 PM > > To: java-user@lucene.apache.org > > Subject: Re: Recover special terms from StandardTokenizer > > > > LowercaseCharFilter is necessary, as in the MappingCharFilter we need to > > provid

RE: Recover special terms from StandardTokenizer

2009-12-13 Thread Uwe Schindler
il: u...@thetaphi.de > -Original Message- > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > Sent: Sunday, December 13, 2009 12:51 PM > To: java-user@lucene.apache.org > Subject: Re: Recover special terms from StandardTokenizer > > LowercaseCharFilter is necessary, as in the

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
gt; -Original Message- > > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > > Sent: Sunday, December 13, 2009 12:23 PM > > To: java-user@lucene.apache.org > > Subject: Re: Recover special terms from StandardTokenizer > > > > thanks, Uwe. >

RE: Recover special terms from StandardTokenizer

2009-12-13 Thread Uwe Schindler
age- > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > Sent: Sunday, December 13, 2009 12:23 PM > To: java-user@lucene.apache.org > Subject: Re: Recover special terms from StandardTokenizer > > thanks, Uwe. > Maybe i was not very clear. My situation is like this: > Analyzer: >

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
> Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > > Sent: Sunday, December 13, 2009 11:43 AM > > To: java-user

RE: Recover special terms from StandardTokenizer

2009-12-13 Thread Uwe Schindler
.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > Sent: Sunday, December 13, 2009 11:43 AM > To: java-user@lucene.apache.org > Subject: Re: Recover special terms from St

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
Problem solved. Now another problem comes. As I want to use Highlighter in my system, the token offset is incorrect after the MappingCharFilter is used. Koji, do you known how to fix the offset problem? On Sun, Dec 13, 2009 at 11:12 AM, Weiwei Wang wrote: > I use Luke to check the result and

Re: Recover special terms from StandardTokenizer

2009-12-12 Thread Weiwei Wang
I use Luke to check the result and find only c exists as a term, no cplusplus found in the index On Sun, Dec 13, 2009 at 10:34 AM, Weiwei Wang wrote: > Thanks, Koji, I followed your advice and change my analyzer as shown below: > NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); > RECOVERY

Re: Recover special terms from StandardTokenizer

2009-12-12 Thread Weiwei Wang
Thanks, Koji, I followed your advice and change my analyzer as shown below: NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); RECOVERY_MAP.add("c++","cplusplus$"); CharFilter filter = new LowercaseCharFilter(reader); filter = new MappingCharFilter(RECOVERY_MAP,filter); StandardTokenizer token

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Weiwei Wang
Thanks, Koji On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi wrote: > MappingCharFilter can be used to convert c++ to cplusplus. > > Koji > > -- > http://www.rondhuit.com/en/ > > > > Anshum wrote: > >> How about getting the original token stream and then converting c++ to >> cplusplus or anyothe

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Koji Sekiguchi
MappingCharFilter can be used to convert c++ to cplusplus. Koji -- http://www.rondhuit.com/en/ Anshum wrote: How about getting the original token stream and then converting c++ to cplusplus or anyother such transform. Or perhaps you might look at using/extending(in the non java sense) some ot

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Anshum
How about getting the original token stream and then converting c++ to cplusplus or anyother such transform. Or perhaps you might look at using/extending(in the non java sense) some other tokenized! -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everyb

Recover special terms from StandardTokenizer

2009-12-10 Thread Weiwei Wang
Hi, all, I designed a ftp search engine based on Lucene. I did a few modifications to the StandardTokenizer. My problem is: C++ is tokenized as c from StandardTokenizer and I want to recover it from the TokenStream from StandardTokenizer What should I do? -- Weiwei Wang Alex Wang 王巍巍 Room