hi,
I've created an Analyzer that performs a few filtering tasks, including
creating Shingles and term Replacements among other things.
I use that Analyzer with IndexWriter and it works as expected. but when I use that same
Analyzer with QueryParser (org.apache.lucene.queryparser.classic.QueryParser) it behaves
differently. specifically it does not create shingles. see below the output for a
simple phrase of 4 terms: "word1 word2 word3 word4"
from IndexWriter (shingle terms created as expected -- total of 7 terms):
term: word1 term=word1,bytes=[77 6f 72 64
31],startOffset=0,endOffset=5,punct=0,positionIncrement=1,position=1,type=word,keyword=false
term: word2 term=word2,bytes=[77 6f 72 64
32],startOffset=6,endOffset=11,punct=0,positionIncrement=1,position=2,type=word,keyword=false
term: word1 word2 term=word1 word2,bytes=[77 6f 72 64 31 20 77 6f 72 64
32],startOffset=6,endOffset=11,punct=0,positionIncrement=0,position=2,type=SHINGLE,keyword=false
term: word3 term=word3,bytes=[77 6f 72 64
33],startOffset=12,endOffset=17,punct=0,positionIncrement=1,position=3,type=word,keyword=false
term: word2 word3 term=word2 word3,bytes=[77 6f 72 64 32 20 77 6f 72 64
33],startOffset=12,endOffset=17,punct=0,positionIncrement=0,position=3,type=SHINGLE,keyword=false
term: word4 term=word4,bytes=[77 6f 72 64
34],startOffset=18,endOffset=23,punct=0,positionIncrement=1,position=4,type=word,keyword=false
term: word3 word4 term=word3 word4,bytes=[77 6f 72 64 33 20 77 6f 72 64
34],startOffset=18,endOffset=23,punct=0,positionIncrement=0,position=4,type=SHINGLE,keyword=false
from QueryParser (shingle terms not created -- only 4 terms):
term: word1 term=word1,bytes=[77 6f 72 64
31],startOffset=0,endOffset=5,punct=0,positionIncrement=1,position=1,type=word,keyword=false
term: word2 term=word2,bytes=[77 6f 72 64
32],startOffset=0,endOffset=5,punct=0,positionIncrement=1,position=1,type=word,keyword=false
term: word3 term=word3,bytes=[77 6f 72 64
33],startOffset=0,endOffset=5,punct=0,positionIncrement=1,position=1,type=word,keyword=false
term: word4 term=word4,bytes=[77 6f 72 64
34],startOffset=0,endOffset=5,punct=0,positionIncrement=1,position=1,type=word,keyword=false
can anyone tell me what I'm doing wrong?
thank you,
Igal
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org