Hi,

I need to use stop-word bigrams, liike the Nutch analyzer, as described in LIA 4.8 (Nutch Analysis). What I don't understand is, why does it keep the original stop word intact? I can see great advantage to being able to search for a combination of stop word + real word, but I don't see the point of keeping the stop word as a token on it's own. Searches with just that word would be as pointless as ever.

Is the idea to allow searching on all stop words, even on their own, and the bigrams are just an optimization that will improve things 90% of the time? Or is it just a side effect of the bigram analyzer that it produces a token from the stop word, and therefore it could just be filtered out by a stop word filter afterwards, leaving only the bigram and the original (non-stop) word?

I'm sure either way would work fr me - just wondering what is normally done, and if I'm missing something important here...

Thanks!
-John

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to