Hi,

I maintain a project (diofanti.org <http://diofanti.org/>) that tracks public 
spending in Greece.
It’s a PG instance hosting 55M+ json documents with searching functionality on 
top of them.
It relies heavily on to_tsvector(‘greek’, ..), as users search for company 
names, invoice descriptions etc. 

The results are fairly good, but as I was trying to experiment with adding some 
more domain-specific stopwords, I realised there’s no greek.stop under 
$(pg_config —sharedir)/tsearch_data
And indeed looks like stop words are maintained with to_tsvector(‘greek’, ..). 

select to_tsvector('greek', 'ΚΑΛΗΜΕΡΑ ΚΑΙ ΣΕ ΕΣΑΣ'); --> 'εσ':4 'κα':2 
'καλημερ':1 'σε':3 
select to_tsvector('english', 'AND GOOD MORNING TO YOU TOO'); --> 'good':2 
'morn’:3

I found an older discussion on pgsql-hackers [0] but not sure where this 
stopped / if started ? 

Am I missing something? 
Is there another thread/patch I can peek up myself ?

[0] 
https://www.postgresql.org/message-id/flat/e1c79330-48a5-abef-c309-8d4499e3180b%402ndquadrant.com#7431fdb9ae24b694155aef3f040b7b60
 


Reply via email to