Need to set outputUnigrams = false with something like:

      StandardTokenizer source = new StandardTokenizer(Version.LUCENE_43, 
reader);
      TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source);
      tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);

      TokenFilter sf = new ShingleFilter(tokenStream, 3,3);
      ((ShingleFilter)sf).setOutputUnigrams(false);

      sf = new 
StopFilter(Version.LUCENE_43,sf,StopAnalyzer.ENGLISH_STOP_WORDS_SET);
      
      return new Analyzer.TokenStreamComponents(source, sf);


Not sure the stopFilter will do you any good if you're extracting only trigrams.
-----Original Message-----
From: murba...@rams.colostate.edu [mailto:murba...@rams.colostate.edu] On 
Behalf Of Malgorzata Urbanska
Sent: Thursday, July 18, 2013 6:02 PM
To: java-user@lucene.apache.org
Subject: ShingleFilter

Hello,

For some time I have been trying to apply ShingleFilter. I have a string:
"The users get program in the User RPC API in Apache Rave"

and I would like to get:

[the users get]  [users get program]  [get program in] [program in
the] [in the user] [the user rpc] [user rpc api] [rpc api in] [api in
apache] [in apache rave][apache rave 0.11]

however I'm getting :

[the users get] [users] [users get program] [get] [get program in]
[program] [program in the] [in the user] [the user rpc] [user] [user
rpc api] [rpc] [rpc api in] [api] [api in apache] [in apache rave]
[apache] [apache rave 0.11] [rave]

part of my code:

protected TokenStreamComponents createComponents(String fieldName,
Reader reader){


        StandardTokenizer source = new
StandardTokenizer(Version.LUCENE_43, reader);

        TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source);

        tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);

        tokenStream = new ShingleFilter(tokenStream,3,3);

        tokenStream = new
StopFilter(Version.LUCENE_43,tokenStream,StopAnalyzer.ENGLISH_STOP_WORDS_SET);


        return new TokenStreamComponents(source, tokenStream)

could please, somebody explain me why I'm getting single shinglers
when I set min size 3.
Thanks,
--
gosia

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to