Re: Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene

2012-09-04 Thread Robert Muir
On Tue, Sep 4, 2012 at 12:37 PM, Martin O'Shea wrote: > > Does anyone know if this can be used in conjunction with other analyzers to > return the frequencies of the bigrams or trigrams found, e.g.: > > > > "please divide this please divide sentence into shingles" > > > > Would return 2 for "p

Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene

2012-09-04 Thread Martin O'Shea
If a Lucene ShingleFilter can be used to tokenize a string into shingles, or ngrams, of different sizes, e.g.: "please divide this sentence into shingles" Becomes: shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles" Does anyone know