On Mon, Sep 24, 2018 at 05:36:39PM -0400, Tom Lane wrote: > I reviewed and pushed this.
Great! Thank you. > As a cross-check on the patch, I cloned the Snowball github repo > and built the derived files in it. I noticed that they'd incorporated > several new stemmers since 2007 --- not only your Nepali one, but > half a dozen more besides. Since the point here is (IMO) mostly to > follow their lead on what's interesting, I went ahead and added those > as well. Agree. It is good decision. It may attract more users. > Although I added nepali.stop from the other patch, I've not done > anything about updating our other stopword lists. Presumably those > are a bit obsolete by now as well. I wonder if we can prevail on > the Snowball people to make those available in some less painful way > than scraping them off assorted web pages. Ideally they'd stick them > into their git repo ... They have repository snowball-website [1]. It is snowballstem.org web-site source repository. It also stores stopwords for various languages (for example for english [2]). I checked couple languages. It seems their russian and danish stopword lists look like PostgreSQL's stopword lists. But their english stopword list is different. There is lack of stopword lists for the following languages: - arabic - irish - lithuanian - nepali - I can create a pull request to add it to snowball-website - tamil There is also another project, called Stopwords ISO [3]. But I'm not sure about them. It stores stopword lists from various sources. And also there are stopwords for languages mentioned above, except for nepali and tamil. I think I could make a script, which generates stopwords from snowball-website repository. It can be run periodically. Is it suitable? Also it would be good to move missing stopwords from Stopwords ISO to snowball-website... 1 - https://github.com/snowballstem/snowball-website/tree/master/algorithms 2 - https://github.com/snowballstem/snowball-website/blob/master/algorithms/english/stop.txt 3 - https://github.com/stopwords-iso -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company