phraseto_tsquery design

Sagiv Some Thu, 21 Jun 2018 08:07:13 -0700

 I recently migrated over from SQL Server to Postgres (through Amazon
Aurora) for the cost.


I have been working with the full-text search functions for quite some
time, but have encountered a few major roadblocks in my quest to use
postgres full-text search functionality to its fullest. I am using it as a
blob text-search tool, where the entire document (could be up to 100 pages)
is indexed and searched on:

1. Limited table and document sizes. In SQL server, I can create a 1.5 TB
table no problem (filingid, data blobs). However, in postgres, this is not
possible - I am limited to approximately 100-200GB per table and 1MB per
tsvector. This is an annoyance, but can get around it.

2. However, it seems impossible to bypass the performance problem of phrase
searching. I conduct quite a bit of phrase searching, and although
postgres' "phraseto_tsquery" performs great on phrases with uncommon words,
it slows to a screeching halt on phrases with common words such as "law
firm" or, for example, "bank of america". This is a huge problem, because
"plainto_tsquery" performs just fine on these but as I understand it,
phrase searching is built to do a scan after finding each word using
"plainto"?

There are already positions and the "plainto" function is quite fast; is
there a way to modify the "phraseto" query to perform a useful and fast
search that looks for the distance between found words appropriately?

Seems fairly trivial to modify the function this way, but I didn't see it
on the development roadmap. It's a pretty critical piece and I have seen
that other people have had issues with it too, and a test between this
function and some other similar functions show that the performance of the
phraseto function is quite poor.

Thanks,
Sagiv

sagivs...@gmail.com

phraseto_tsquery design

Reply via email to