Hello,

For some time I have been trying to apply ShingleFilter. I have a string:
"The users get program in the User RPC API in Apache Rave"

and I would like to get:

[the users get]  [users get program]  [get program in] [program in
the] [in the user] [the user rpc] [user rpc api] [rpc api in] [api in
apache] [in apache rave][apache rave 0.11]

however I'm getting :

[the users get] [users] [users get program] [get] [get program in]
[program] [program in the] [in the user] [the user rpc] [user] [user
rpc api] [rpc] [rpc api in] [api] [api in apache] [in apache rave]
[apache] [apache rave 0.11] [rave]

part of my code:

protected TokenStreamComponents createComponents(String fieldName,
Reader reader){


        StandardTokenizer source = new
StandardTokenizer(Version.LUCENE_43, reader);

        TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source);

        tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);

        tokenStream = new ShingleFilter(tokenStream,3,3);

        tokenStream = new
StopFilter(Version.LUCENE_43,tokenStream,StopAnalyzer.ENGLISH_STOP_WORDS_SET);


        return new TokenStreamComponents(source, tokenStream)

could please, somebody explain me why I'm getting single shinglers
when I set min size 3.
Thanks,
--
gosia

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to