Jimmy Huang <jimmy_hu...@live.com> writes: > I tried pg_trgm and my own customized token parser > https://github.com/huangjimmy/pg_cjk_parser
pg_trgm is going to be fairly useless for indexing text that's mostly multibyte characters, since its unit of indexable data is just 3 bytes (not characters). I don't know of any comparable issue in the core tsvector logic, though. The numbers you're quoting do sound quite awful, but I share Cory's suspicion that it's something about your setup rather than an inherent Postgres issue. regards, tom lane