Which token filter can combine 2 terms into 1?

2012-12-20 Thread Xi Shen
Hi, I am looking for a token filter that can combine 2 terms into 1? E.g. the input has been tokenized by white space: t1 t2 t2a t3 I want a filter that output: t1 t2t2a t3 I know it is a very special case, and I am thinking about develop a filter of my own. But I cannot figure out which API

Re: Which token filter can combine 2 terms into 1?

2012-12-21 Thread Xi Shen
; > On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen wrote: > > > Hi, > > > > I am looking for a token filter that can combine 2 terms into 1? E.g. > > > > the input has been tokenized by white space: > > > > t1 t2 t2a t3 > > > > I want a filter tha

Re: Which token filter can combine 2 terms into 1?

2012-12-21 Thread Xi Shen
< alan.woodw...@romseysoftware.co.uk> wrote: > Have a look at ShingleFilter: > http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html > > On 21 Dec 2012, at 08:42, Xi Shen wrote: > > > I have to use the white space and word del

Re: how to implement a TokenFilter?

2012-12-22 Thread Xi Shen
thanks, i read this ready. it is useful, but it is too 'small'... e.g. for this.charTermAttr = addAttribute(CharTermAttribute.class); i want to know what are the other attributes i need in order to implement my function. where i can find a references to these attributes? i tried on lucene & solr

Re: how to implement a TokenFilter?

2012-12-23 Thread Xi Shen
thanks a lot :) On Mon, Dec 24, 2012 at 10:22 AM, feng lu wrote: > hi Shen > > May be you can see some source code in org.apache.lucene.analysis package, > such LowerCaseFilter.java,StopFilter.java and so on. > > and some common attribute includes: > > offsetAtt = addAttribute(OffsetAttribute.c

Re: how to implement a TokenFilter?

2012-12-24 Thread Xi Shen
; > Also, after the full Analyzer stack is called, the caller saves the output > (I guess to codecs?). You can look at which Attributes it saves. > > > On 12/23/2012 06:30 PM, Xi Shen wrote: > >> thanks a lot :) >> >> >> On Mon, Dec 24, 2012 at 10:22 AM,