tsvector not giving expected results on one host
Under PostgreSQL 12, I have a table using tsvector to search the column pkgmessage. It looks like this (not all columns are shown). Table "public.ports" Column | Type | Collation | Nullable | Default pkgmessage | text | | | pkgmessage_textsearchable | tsvector | | | generated always as (to_tsvector('english'::regconfig, pkgmessage)) stored On several servers, it works fine, like this: freshports.devgit=# SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); port_id | element_pathname -+-- 100421 | /ports/branches/2022Q1/dns/dnsmasq 100428 | /ports/branches/2022Q1/emulators/xsystem35 14686 | /ports/head/sysutils/lmon ... etc On the problem server, production, we get nothing. Nada. freshports.org=> SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); port_id | element_pathname -+-- (0 rows) freshports.org=> However, ilike on the same database does find the matches: freshports.org=> SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage ilike '%example%'; port_id | element_pathname -+-- 34126 | /ports/head/security/pond 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 60310 | /ports/branches/2020Q4/www/gitlab-ce 38345 | /ports/head/www/gitlab ... etc Both database are SQL_ASCII with the same Collate and Ctype settings. Before composing this email, I ran "reindex table concurrently ports;" on the problem database. It did not change the results. prod, has problem List of databases Name | Owner | Encoding | Collate |Ctype| Access privileges +--+---+-+-+--- freshports.org | postgres | SQL_ASCII | C | C | dev, no issues: [pg02 dan ~] % psql -l List of databases Name |Owner | Encoding | Collate | Ctype | Access privileges -+--+---+-+---+--- freshports.devgit | postgres | SQL_ASCII | C | C | Any ideas as to what to search please? Oh, one difference. All the working-as-expected databases are self-hosted on FreeBSD. The problem database is on AWS RDS. -- Dan Langille d...@langille.org
Re: tsvector not giving expected results on one host
On Sat, Dec 17, 2022, at 1:53 PM, Dan Langille wrote: > Under PostgreSQL 12, I have a table using tsvector to search the column > pkgmessage. > > It looks like this (not all columns are shown). > > > Table "public.ports" >Column | Type | Collation | > Nullable | Default > > pkgmessage | text | | > | > pkgmessage_textsearchable | tsvector | | > | generated always as (to_tsvector('english'::regconfig, > pkgmessage)) stored pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored I see the above should have been included as well. > > On several servers, it works fine, like this: > > freshports.devgit=# SELECT id as port_id, element_pathname(element_id) > FROM ports > WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); > port_id | element_pathname > > -+-- > 100421 | /ports/branches/2022Q1/dns/dnsmasq > 100428 | /ports/branches/2022Q1/emulators/xsystem35 >14686 | /ports/head/sysutils/lmon > ... etc > > On the problem server, production, we get nothing. Nada. > > freshports.org=> SELECT id as port_id, element_pathname(element_id) > FROM ports > WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); > port_id | element_pathname > -+-- > (0 rows) > > freshports.org=> > > However, ilike on the same database does find the matches: > > freshports.org=> SELECT id as port_id, element_pathname(element_id) > FROM ports > WHERE pkgmessage ilike '%example%'; > port_id | element_pathname > > -+-- >34126 | /ports/head/security/pond >74559 | /ports/branches/2015Q3/emulators/linux_base-c6 >60310 | /ports/branches/2020Q4/www/gitlab-ce >38345 | /ports/head/www/gitlab > ... etc Let's look at that first entry on the problem database: freshports.org=> select pkgmessage_textsearchable from ports where id = 34126; pkgmessage_textsearchable - '/tmp':32 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16 'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmpfs':9,18,24,30,31 'use':2 'yes':20 (1 row) freshports.org=> select pkgmessage_textsearchable2 from ports where id = 34126; pkgmessage_textsearchable2 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16 'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,24,30,31 'use':2 'yes':20 (1 row) freshports.org=> >From a database which runs this query with expected results: freshports.devgit=# select pkgmessage_textsearchable2 from ports where id = 34126;
Re: tsvector not giving expected results on one host
On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: > "Dan Langille" writes: >> pkgmessage_textsearchable2 | tsvector | | >> | generated always as (to_tsvector('english'::regconfig, >> translate(pkgmessage, '/'::text, ' '::text))) stored > > That is not likely to play well with this: > >> freshports.org=> show default_text_search_config ; >> default_text_search_config >> >> pg_catalog.simple > > because "english" and "simple" will stem words differently. > > regression=# select websearch_to_tsquery('english', 'example'); > websearch_to_tsquery > -- > 'exampl' > (1 row) > > regression=# select websearch_to_tsquery('simple', 'example'); > websearch_to_tsquery > -- > 'example' > (1 row) > > If what is in your tsvector is 'exampl', then only the first of > these will match. So IMO the question is not "why is it failing > on prod?", it's "how the heck did it work on the other machine?". > You won't get nice results if websearch_to_tsquery is using a > different TS configuration than to_tsvector did. I think this shows why we are getting the results we see. Credit to ch on IRC for asking this question. The problem host: freshports.org=> select websearch_to_tsquery('example'); websearch_to_tsquery -- 'example' (1 row) The hosts on which this search works freshports.devgit=# select websearch_to_tsquery('example'); websearch_to_tsquery -- 'exampl' (1 row) On that same host: [pg02 dan ~] % sudo grep -i default_text_search_config /var/db/postgres/data12/postgresql.conf default_text_search_config = 'pg_catalog.english' -- Dan Langille d...@langille.org
Re: tsvector not giving expected results on one host
On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote: > On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: >> "Dan Langille" writes: >>> pkgmessage_textsearchable2 | tsvector | | >>> | generated always as (to_tsvector('english'::regconfig, >>> translate(pkgmessage, '/'::text, ' '::text))) stored >> >> That is not likely to play well with this: >> >>> freshports.org=> show default_text_search_config ; >>> default_text_search_config >>> >>> pg_catalog.simple >> >> because "english" and "simple" will stem words differently. >> >> regression=# select websearch_to_tsquery('english', 'example'); >> websearch_to_tsquery >> -- >> 'exampl' >> (1 row) >> >> regression=# select websearch_to_tsquery('simple', 'example'); >> websearch_to_tsquery >> -- >> 'example' >> (1 row) >> >> If what is in your tsvector is 'exampl', then only the first of >> these will match. So IMO the question is not "why is it failing >> on prod?", it's "how the heck did it work on the other machine?". >> You won't get nice results if websearch_to_tsquery is using a >> different TS configuration than to_tsvector did. > > I think this shows why we are getting the results we see. Credit to ch > on IRC for asking this question. > > The problem host: > > freshports.org=> select websearch_to_tsquery('example'); > websearch_to_tsquery > -- > 'example' > (1 row) Ahh, this explains the differences and as to why it works where it shouldn't? freshports.org=> select setting, source from pg_settings where name = 'default_text_search_config'; setting | source ---+- pg_catalog.simple | default (1 row) > > The hosts on which this search works > > freshports.devgit=# select websearch_to_tsquery('example'); > websearch_to_tsquery > -- > 'exampl' > (1 row) freshports.devgit=# select setting, source from pg_settings where name = 'default_text_search_config'; setting | source + pg_catalog.english | configuration file (1 row) At least now I know what I can play with to get all hosts in sync. -- Dan Langille d...@langille.org
Re: tsvector not giving expected results on one host
On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote: > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote: >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: >>> "Dan Langille" writes: >>>> pkgmessage_textsearchable2 | tsvector | | >>>>| generated always as (to_tsvector('english'::regconfig, >>>> translate(pkgmessage, '/'::text, ' '::text))) stored >>> >>> That is not likely to play well with this: >>> >>>> freshports.org=> show default_text_search_config ; >>>> default_text_search_config >>>> >>>> pg_catalog.simple >>> >>> because "english" and "simple" will stem words differently. >>> >>> regression=# select websearch_to_tsquery('english', 'example'); >>> websearch_to_tsquery >>> -- >>> 'exampl' >>> (1 row) >>> >>> regression=# select websearch_to_tsquery('simple', 'example'); >>> websearch_to_tsquery >>> -- >>> 'example' >>> (1 row) >>> >>> If what is in your tsvector is 'exampl', then only the first of >>> these will match. So IMO the question is not "why is it failing >>> on prod?", it's "how the heck did it work on the other machine?". >>> You won't get nice results if websearch_to_tsquery is using a >>> different TS configuration than to_tsvector did. >> >> I think this shows why we are getting the results we see. Credit to ch >> on IRC for asking this question. >> >> The problem host: >> >> freshports.org=> select websearch_to_tsquery('example'); >> websearch_to_tsquery >> -- >> 'example' >> (1 row) > > Ahh, this explains the differences and as to why it works where it shouldn't? > > freshports.org=> select setting, source from pg_settings where name = > 'default_text_search_config'; > setting | source > ---+- > pg_catalog.simple | default > (1 row) > > >> >> The hosts on which this search works >> >> freshports.devgit=# select websearch_to_tsquery('example'); >> websearch_to_tsquery >> -- >> 'exampl' >> (1 row) > > > freshports.devgit=# select setting, source from pg_settings where name > = 'default_text_search_config'; > setting | source > + > pg_catalog.english | configuration file > (1 row) > > > At least now I know what I can play with to get all hosts in sync. Here we go, on the problem database, create a new field, based on simple, not english. ALTER TABLE public.ports ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored; Index it: CREATE INDEX ports_pkgmessage_textsearchable3_idx ON public.ports USING gin (pkgmessage_textsearchable3) TABLESPACE pg_default; CREATE INDEX query it: freshports.org=> SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example'); port_id | element_pathname -+-- 34126 | /ports/head/security/pond 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 60310 | /ports/branches/2020Q4/www/gitlab-ce 38345 | /ports/head/www/gitlab 46842 | /ports/branches/2018Q1/mail/postfix-sasl 51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod Success. Thank you Mr Lane. -- Dan Langille d...@langille.org
Re: tsvector not giving expected results on one host
Oleg Bartunov wrote on 12/21/22 12:31 PM: I On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov wrote: Dan, it's always good to specify configuration name in a query to avoid recheck, since websearch_to_tsquery(regconfig, text) is immutable, while websearch_to_tsquery(text) is stable. I am glad to hear this because one of my decisions was: * change the query to include configuration name * change the default configuration name I think I'm going to start using websearch_to_tsquery(regconfig, text) immutable function calculates once in planning time, but stable function calculates during running time, so the difference may be very big depending on how many tuples found. Looking at my data, the largest tsvector collection has 453 elements. About 40 have more than 170. Total number of rows is about 2400. Another issue discussed on IRC: why store my tsvector values? Why not just index them instead? At present: I have this column: pkgmessage_textsearchable | tsvector | | | generated always as (to_tsvector('english'::regconfig, pkgmessage)) stored with this index: ports_pkgmessage_textsearchable_idx" gin (pkgmessage_textsearchable) Instead, I could replace that column and index with this index: "testing" gin (to_tsvector('english'::regconfig, pkgmessage)) Simple testing showed it was comparable if not slightly faster. The plan now: implement the index on to_tsvector, not a column, and start specifying the configuration. That's in the near future. I've written up this journey at https://news.freshports.org/2022/12/18/when-tsvector-was-working-as-expected-on-most-hosts-but-not-one/ thank you See the difference: [local]:5433 oleg@oleg=# explain (analyze,costs off) select title from apod where websearch_to_tsquery('simple','galaxies') @@ fts; QUERY PLAN --- Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1) Recheck Cond: ('''galaxies'''::tsquery @@ fts) -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (fts @@ '''galaxies'''::tsquery) Planning Time: 0.134 ms Execution Time: 0.022 ms (6 rows) Time: 0.369 ms [local]:5433 oleg@oleg=# explain (analyze,costs off) select title from apod where websearch_to_tsquery('galaxies') @@ fts; QUERY PLAN - Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1) Filter: (websearch_to_tsquery('galaxies'::text) @@ fts) Heap Blocks: exact=276 -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059 rows=493 loops=1) Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text)) Planning Time: 0.125 ms Execution Time: 1.518 ms (7 rows) -- Dan Langille - d...@langille.org https://langille.org/
Re: Trigger when user logs in
Pushover.net comes to mind. The email sending should not delay login. If you want notifications, look at mqtt. https://dan.langille.org/2018/04/15/using-mtqq-to-create-a-notification-network-mosquitto-mqttwarn-hare-and-hared/ Lightweight, non-pausing solution. HTH -- Dan Langille d...@langille.org On Sat, Apr 13, 2019, at 9:16 PM, Schneider, Jeremy wrote: > > > On Apr 13, 2019, at 17:28, Schneider, Jeremy wrote: > > > >> On Apr 11, 2019, at 19:52, Tom Lane wrote: > >> > >> Ron writes: > >>> I bet requests like this will start to make it onto the beaten path. > >> > >> Meh. I'm not that excited about inventing our own versions of wheels > >> that already exist, especially when there's nothing very Postgres-specific > >> about the requirements. Notice that the example I pointed you at is for > >> sshd not Postgres. IMO the fact that you can use the same tool to solve > >> both cases is a good thing. > > > > This might work for sending an email, but not very useful if I want to do > > something in the database. > > > > For example, one very common use of logon triggers in other databases is to > > look at various connection parameters (like username or source IP) and > > enable sql logging or debugging for only certain cases (not always doing > > the same thing for a particular user). Another common use case is to do > > something like running plpgsql or manipulating data in db tables - but > > again looking at some combination of things at a database level to make a > > decision about what to do; for example the application itself might enable > > or disable certain behaviors by setting values in a configuration table. > > Probably worth mentioning that I’m all for solving this in the application - > just that I’ve experienced many cases in the past where it wasn’t feasible or > even possible to get the sorts of changes I’d need into applications using > the databases that I was responsible for. > > > I’m still trying to work out the best approach for solving these sorts of > > use cases in current versions of PostgreSQL... I’m curious how others are > > solving this? >