Hi,
I am looking for a token filter that can combine 2 terms into 1? E.g.
the input has been tokenized by white space:
t1 t2 t2a t3
I want a filter that output:
t1 t2t2a t3
I know it is a very special case, and I am thinking about develop a filter
of my own. But I cannot figure out which API
;
> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen wrote:
>
> > Hi,
> >
> > I am looking for a token filter that can combine 2 terms into 1? E.g.
> >
> > the input has been tokenized by white space:
> >
> > t1 t2 t2a t3
> >
> > I want a filter tha
<
alan.woodw...@romseysoftware.co.uk> wrote:
> Have a look at ShingleFilter:
> http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html
>
> On 21 Dec 2012, at 08:42, Xi Shen wrote:
>
> > I have to use the white space and word del
thanks, i read this ready. it is useful, but it is too 'small'...
e.g. for this.charTermAttr = addAttribute(CharTermAttribute.class);
i want to know what are the other attributes i need in order to implement
my function. where i can find a references to these attributes? i tried on
lucene & solr
thanks a lot :)
On Mon, Dec 24, 2012 at 10:22 AM, feng lu wrote:
> hi Shen
>
> May be you can see some source code in org.apache.lucene.analysis package,
> such LowerCaseFilter.java,StopFilter.java and so on.
>
> and some common attribute includes:
>
> offsetAtt = addAttribute(OffsetAttribute.c
;
> Also, after the full Analyzer stack is called, the caller saves the output
> (I guess to codecs?). You can look at which Attributes it saves.
>
>
> On 12/23/2012 06:30 PM, Xi Shen wrote:
>
>> thanks a lot :)
>>
>>
>> On Mon, Dec 24, 2012 at 10:22 AM,