Hi,
I need to use stop-word bigrams, liike the Nutch analyzer, as described
in LIA 4.8 (Nutch Analysis). What I don't understand is, why does it
keep the original stop word intact? I can see great advantage to being
able to search for a combination of stop word + real word, but I don't
see the point of keeping the stop word as a token on it's own. Searches
with just that word would be as pointless as ever.
Is the idea to allow searching on all stop words, even on their own, and
the bigrams are just an optimization that will improve things 90% of the
time? Or is it just a side effect of the bigram analyzer that it
produces a token from the stop word, and therefore it could just be
filtered out by a stop word filter afterwards, leaving only the bigram
and the original (non-stop) word?
I'm sure either way would work fr me - just wondering what is normally
done, and if I'm missing something important here...
Thanks!
-John
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]