Aleksandr Parfenov <a.parfe...@postgrespro.ru> writes:
> As I wrote few weeks ago, there is a issue with stopwords processing in
> proposed syntax for full-text configurations. I want to separate word
> normalization and stopwords detection to two separate dictionaries. The
> problem is how to configure stopword detection dictionary.

> The cause of the problem is counting stopwords, but not using any
> lexemes for them. However, do we have to count stopwords during words
> counting or can we ignore them like unknown words? The problem I see is
> backward compatibility, since we have to regenerate all queries and
> vectors. But is it real problem or we can change its behavior in this
> way?

I think there should be a pretty high bar for forcing people to regenerate
all that data when they haven't made any change of their own choice.

Also, I'm not very clear on exactly what you're proposing here, but it
sounds like it'd have the effect of changing whether stopwords count in
phrase distances ('a <N> b').  I think that's right out --- whether or not
you feel the current distance behavior is ideal, asking people to *both*
rebuild all their derived data *and* change their applications will cause
a revolt.  It's not sufficiently obviously broken that we can change it.

                        regards, tom lane

Reply via email to