Re: bigram analysis

2008-03-03 Thread John Byrne
Yes, this makes sense to me. I think I'll just keep all words, including stop words, and if performance ever becomes an issue, I'll look at bigrams again. But I think there's a good chance that I'll never see significant impact either way. Thanks guys! Grant Ingersoll wrote: Yep, still good r

Re: bigram analysis

2008-03-03 Thread Grant Ingersoll
Yep, still good reasons like I said, but becoming less important as the hardware, etc. gets faster and cheaper, IMO, especially in the context of more advanced search capabilities. On Mar 3, 2008, at 10:49 AM, Mathieu Lecarme wrote: Not sure, you might want to ask on Nutch. From a strict

Re: bigram analysis

2008-03-03 Thread Mathieu Lecarme
Not sure, you might want to ask on Nutch. From a strict language standpoint, the notion of a stopword in my mind is a bit dubious. If the word really has no meaning, then why does the language have it to begin with? In a search context, it has been treated as of minimal use in the early da

Re: bigram analysis

2008-03-03 Thread Grant Ingersoll
On Mar 3, 2008, at 5:40 AM, John Byrne wrote: Hi, I need to use stop-word bigrams, liike the Nutch analyzer, as described in LIA 4.8 (Nutch Analysis). What I don't understand is, why does it keep the original stop word intact? I can see great advantage to being able to search for a combi

bigram analysis

2008-03-03 Thread John Byrne
Hi, I need to use stop-word bigrams, liike the Nutch analyzer, as described in LIA 4.8 (Nutch Analysis). What I don't understand is, why does it keep the original stop word intact? I can see great advantage to being able to search for a combination of stop word + real word, but I don't see th