hi,

I'm trying to migrate to Lucene 4.

in Lucene 3.5 I extended org.apache.lucene.analysis.FilteringTokenFilter and overrode accept() to remove undesired shingles. in Lucene 4 org.apache.lucene.analysis.FilteringTokenFilter does not exist?

I'm trying to achieve two things:

1) remove shingles that have an empty item.

2) remove shingles when the phrase contains a comma, for example:

    for the phrase:    "delicious red apples, green pears, and oranges"

I want the following shingles (with a shingle size of 2):

"delicious red", "red apples", "green pears", "and oranges"
(no "apples green" because there's a comma)
(no "pears and" because there's a comma)

any ideas?

TIA

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to