tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
Under PostgreSQL 12, I have a table using tsvector to search the column 
pkgmessage.

It looks like this (not all columns are shown).


Table "public.ports"
   Column   |   Type   | Collation | Nullable | 
  Default   

 pkgmessage | text |   |  | 
 pkgmessage_textsearchable  | tsvector |   |  | 
generated always as (to_tsvector('english'::regconfig, pkgmessage)) stored

On several servers, it works fine, like this:

freshports.devgit=# SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
 port_id |   element_pathname   
-+--
  100421 | /ports/branches/2022Q1/dns/dnsmasq
  100428 | /ports/branches/2022Q1/emulators/xsystem35
   14686 | /ports/head/sysutils/lmon
... etc

On the problem server, production, we get nothing.  Nada.

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
 port_id | element_pathname 
-+--
(0 rows)

freshports.org=> 

However, ilike on the same database does find the matches:

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage ilike '%example%';
 port_id |   element_pathname   
-+--
   34126 | /ports/head/security/pond
   74559 | /ports/branches/2015Q3/emulators/linux_base-c6
   60310 | /ports/branches/2020Q4/www/gitlab-ce
   38345 | /ports/head/www/gitlab
... etc

Both database are SQL_ASCII with the same Collate and Ctype settings.

Before composing this email, I ran "reindex table concurrently ports;" on the 
problem database. It did not change the results.

prod, has problem

 List of databases
  Name  |  Owner   | Encoding  |   Collate   |Ctype|   Access 
privileges   
+--+---+-+-+---
 freshports.org | postgres | SQL_ASCII | C   | C   | 


dev, no issues:

[pg02 dan ~] % psql -l
List of databases
Name |Owner | Encoding  | Collate | Ctype |   
Access privileges   
-+--+---+-+---+---
 freshports.devgit   | postgres | SQL_ASCII | C   | C | 

Any ideas as to what to search please?

Oh, one difference. All the working-as-expected databases are self-hosted on 
FreeBSD. The problem database is on AWS RDS.
-- 
  Dan Langille
  d...@langille.org




Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 1:53 PM, Dan Langille wrote:
> Under PostgreSQL 12, I have a table using tsvector to search the column 
> pkgmessage.
>
> It looks like this (not all columns are shown).
>
> 
> Table "public.ports"
>Column   |   Type   | Collation | 
> Nullable |   Default
>
>  pkgmessage | text |   |
>   | 
>  pkgmessage_textsearchable  | tsvector |   |
>   | generated always as (to_tsvector('english'::regconfig, 
> pkgmessage)) stored

 pkgmessage_textsearchable2 | tsvector |   |  | 
generated always as (to_tsvector('english'::regconfig, translate(pkgmessage, 
'/'::text, ' '::text))) stored

I see the above should have been included as well.

>
> On several servers, it works fine, like this:
>
> freshports.devgit=# SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
>  port_id |   element_pathname 
>   
> -+--
>   100421 | /ports/branches/2022Q1/dns/dnsmasq
>   100428 | /ports/branches/2022Q1/emulators/xsystem35
>14686 | /ports/head/sysutils/lmon
> ... etc
>
> On the problem server, production, we get nothing.  Nada.
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
>  port_id | element_pathname 
> -+--
> (0 rows)
>
> freshports.org=> 
>
> However, ilike on the same database does find the matches:
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage ilike '%example%';
>  port_id |   element_pathname 
>   
> -+--
>34126 | /ports/head/security/pond
>74559 | /ports/branches/2015Q3/emulators/linux_base-c6
>60310 | /ports/branches/2020Q4/www/gitlab-ce
>38345 | /ports/head/www/gitlab
> ... etc

Let's look at that first entry on the problem database:

freshports.org=> select pkgmessage_textsearchable from ports where id = 34126;

  pkgmessage_textsearchable 

 
-
 '/tmp':32 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 
'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16 'mount':22,28 
'place':11 'pond':3,35,41 'requir':8 'run':34 'tmpfs':9,18,24,30,31 'use':2 
'yes':20
(1 row)

freshports.org=> select pkgmessage_textsearchable2 from ports where id = 34126;

 pkgmessage_textsearchable2 



 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 
'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16 'mount':22,28 
'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,24,30,31 
'use':2 'yes':20
(1 row)

freshports.org=> 

>From a database which runs this query with expected results:

freshports.devgit=# select pkgmessage_textsearchable2 from ports where id = 
34126;
 

Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> "Dan Langille"  writes:
>>  pkgmessage_textsearchable2 | tsvector |   | 
>>  | generated always as (to_tsvector('english'::regconfig, 
>> translate(pkgmessage, '/'::text, ' '::text))) stored
>
> That is not likely to play well with this:
>
>> freshports.org=> show  default_text_search_config ;
>>  default_text_search_config 
>> 
>>  pg_catalog.simple
>
> because "english" and "simple" will stem words differently.
>
> regression=# select websearch_to_tsquery('english', 'example');
>  websearch_to_tsquery 
> --
>  'exampl'
> (1 row)
>
> regression=# select websearch_to_tsquery('simple', 'example');
>  websearch_to_tsquery 
> --
>  'example'
> (1 row)
>
> If what is in your tsvector is 'exampl', then only the first of
> these will match.  So IMO the question is not "why is it failing
> on prod?", it's "how the heck did it work on the other machine?".
> You won't get nice results if websearch_to_tsquery is using a
> different TS configuration than to_tsvector did.

I think this shows why we are getting the results we see.  Credit to ch on IRC 
for asking this question.

The problem host:

freshports.org=> select websearch_to_tsquery('example');
 websearch_to_tsquery 
--
 'example'
(1 row)


The hosts on which this search works

freshports.devgit=# select websearch_to_tsquery('example');
 websearch_to_tsquery 
--
 'exampl'
(1 row)

On that same host:

[pg02 dan ~] % sudo grep -i default_text_search_config 
/var/db/postgres/data12/postgresql.conf
default_text_search_config = 'pg_catalog.english'


-- 
  Dan Langille
  d...@langille.org




Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
>> "Dan Langille"  writes:
>>>  pkgmessage_textsearchable2 | tsvector |   |
>>>   | generated always as (to_tsvector('english'::regconfig, 
>>> translate(pkgmessage, '/'::text, ' '::text))) stored
>>
>> That is not likely to play well with this:
>>
>>> freshports.org=> show  default_text_search_config ;
>>>  default_text_search_config 
>>> 
>>>  pg_catalog.simple
>>
>> because "english" and "simple" will stem words differently.
>>
>> regression=# select websearch_to_tsquery('english', 'example');
>>  websearch_to_tsquery 
>> --
>>  'exampl'
>> (1 row)
>>
>> regression=# select websearch_to_tsquery('simple', 'example');
>>  websearch_to_tsquery 
>> --
>>  'example'
>> (1 row)
>>
>> If what is in your tsvector is 'exampl', then only the first of
>> these will match.  So IMO the question is not "why is it failing
>> on prod?", it's "how the heck did it work on the other machine?".
>> You won't get nice results if websearch_to_tsquery is using a
>> different TS configuration than to_tsvector did.
>
> I think this shows why we are getting the results we see.  Credit to ch 
> on IRC for asking this question.
>
> The problem host:
>
> freshports.org=> select websearch_to_tsquery('example');
>  websearch_to_tsquery 
> --
>  'example'
> (1 row)

Ahh, this explains the differences and as to why it works where it shouldn't?

freshports.org=> select setting, source from pg_settings where name = 
'default_text_search_config';
  setting  | source  
---+-
 pg_catalog.simple | default
(1 row)


>
> The hosts on which this search works
>
> freshports.devgit=# select websearch_to_tsquery('example');
>  websearch_to_tsquery 
> --
>  'exampl'
> (1 row)


freshports.devgit=# select setting, source from pg_settings where name = 
'default_text_search_config';
  setting   |   source   
+
 pg_catalog.english | configuration file
(1 row)


At least now I know what I can play with to get all hosts in sync.

-- 
  Dan Langille
  d...@langille.org




Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
>> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
>>> "Dan Langille"  writes:
>>>>  pkgmessage_textsearchable2 | tsvector |   |   
>>>>| generated always as (to_tsvector('english'::regconfig, 
>>>> translate(pkgmessage, '/'::text, ' '::text))) stored
>>>
>>> That is not likely to play well with this:
>>>
>>>> freshports.org=> show  default_text_search_config ;
>>>>  default_text_search_config 
>>>> 
>>>>  pg_catalog.simple
>>>
>>> because "english" and "simple" will stem words differently.
>>>
>>> regression=# select websearch_to_tsquery('english', 'example');
>>>  websearch_to_tsquery 
>>> --
>>>  'exampl'
>>> (1 row)
>>>
>>> regression=# select websearch_to_tsquery('simple', 'example');
>>>  websearch_to_tsquery 
>>> --
>>>  'example'
>>> (1 row)
>>>
>>> If what is in your tsvector is 'exampl', then only the first of
>>> these will match.  So IMO the question is not "why is it failing
>>> on prod?", it's "how the heck did it work on the other machine?".
>>> You won't get nice results if websearch_to_tsquery is using a
>>> different TS configuration than to_tsvector did.
>>
>> I think this shows why we are getting the results we see.  Credit to ch 
>> on IRC for asking this question.
>>
>> The problem host:
>>
>> freshports.org=> select websearch_to_tsquery('example');
>>  websearch_to_tsquery 
>> --
>>  'example'
>> (1 row)
>
> Ahh, this explains the differences and as to why it works where it shouldn't?
>
> freshports.org=> select setting, source from pg_settings where name = 
> 'default_text_search_config';
>   setting  | source  
> ---+-
>  pg_catalog.simple | default
> (1 row)
>
>
>>
>> The hosts on which this search works
>>
>> freshports.devgit=# select websearch_to_tsquery('example');
>>  websearch_to_tsquery 
>> --
>>  'exampl'
>> (1 row)
>
>
> freshports.devgit=# select setting, source from pg_settings where name 
> = 'default_text_search_config';
>   setting   |   source   
> +
>  pg_catalog.english | configuration file
> (1 row)
>
>
> At least now I know what I can play with to get all hosts in sync.

Here we go, on the problem database, create a new field, based on simple, not 
english.

ALTER TABLE public.ports
ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as 
(to_tsvector('simple'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) 
stored;

Index it:

CREATE INDEX ports_pkgmessage_textsearchable3_idx
    ON public.ports USING gin
(pkgmessage_textsearchable3)
TABLESPACE pg_default;
CREATE INDEX

query it:

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable3  @@ websearch_to_tsquery('example');
 port_id |   element_pathname   
-+--
   34126 | /ports/head/security/pond
   74559 | /ports/branches/2015Q3/emulators/linux_base-c6
   60310 | /ports/branches/2020Q4/www/gitlab-ce
   38345 | /ports/head/www/gitlab
   46842 | /ports/branches/2018Q1/mail/postfix-sasl
   51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod

Success. Thank you Mr Lane.

-- 
  Dan Langille
  d...@langille.org




Re: tsvector not giving expected results on one host

2022-12-21 Thread Dan Langille

Oleg Bartunov wrote on 12/21/22 12:31 PM:

I

On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov  wrote:

Dan,

it's always good to specify configuration name in a query to avoid
recheck,  since
websearch_to_tsquery(regconfig, text) is immutable, while
websearch_to_tsquery(text) is stable.


I am glad to hear this because one of my decisions was:

* change the query to include configuration name
* change the default configuration name

I think I'm going to start using websearch_to_tsquery(regconfig, text)

immutable function calculates once in planning time, but stable
function calculates during running time,
so the difference may be very big depending on how many tuples found.


Looking at my data, the largest tsvector collection has 453 elements. 
About 40 have more

than 170.  Total number of rows is about 2400.

Another issue discussed on IRC: why store my tsvector values? Why not 
just index them instead?


At present: I have this column:

 pkgmessage_textsearchable  | tsvector   |   |  | generated always as 
(to_tsvector('english'::regconfig, pkgmessage)) stored


with this index: ports_pkgmessage_textsearchable_idx" gin 
(pkgmessage_textsearchable)


Instead, I could replace that column and index with this index:

"testing" gin (to_tsvector('english'::regconfig, pkgmessage))

Simple testing showed it was comparable if not slightly faster.

The plan now: implement the index on to_tsvector, not a column, and 
start specifying the configuration. That's in the near future.


I've written up this journey at 
https://news.freshports.org/2022/12/18/when-tsvector-was-working-as-expected-on-most-hosts-but-not-one/


thank you

See the difference:

[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('simple','galaxies') @@ fts;
   QUERY PLAN
---
  Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)
Recheck Cond: ('''galaxies'''::tsquery @@ fts)
->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
rows=0 loops=1)
  Index Cond: (fts @@ '''galaxies'''::tsquery)
  Planning Time: 0.134 ms
  Execution Time: 0.022 ms
(6 rows)

Time: 0.369 ms
[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('galaxies') @@ fts;
QUERY PLAN
-
  Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)
Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)
Heap Blocks: exact=276
->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
rows=493 loops=1)
  Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text))
  Planning Time: 0.125 ms
  Execution Time: 1.518 ms
(7 rows)

--
Dan Langille - d...@langille.org
https://langille.org/


Re: Trigger when user logs in

2019-04-13 Thread Dan Langille
Pushover.net comes to mind. 

The email sending should not delay login. 

If you want notifications, look at mqtt. 

https://dan.langille.org/2018/04/15/using-mtqq-to-create-a-notification-network-mosquitto-mqttwarn-hare-and-hared/

Lightweight, non-pausing solution. 

HTH

-- 
 Dan Langille
 d...@langille.org



On Sat, Apr 13, 2019, at 9:16 PM, Schneider, Jeremy wrote:
> 
> > On Apr 13, 2019, at 17:28, Schneider, Jeremy  wrote:
> > 
> >> On Apr 11, 2019, at 19:52, Tom Lane  wrote:
> >> 
> >> Ron  writes:
> >>> I bet requests like this will start to make it onto the beaten path.
> >> 
> >> Meh. I'm not that excited about inventing our own versions of wheels
> >> that already exist, especially when there's nothing very Postgres-specific
> >> about the requirements. Notice that the example I pointed you at is for
> >> sshd not Postgres. IMO the fact that you can use the same tool to solve
> >> both cases is a good thing.
> > 
> > This might work for sending an email, but not very useful if I want to do 
> > something in the database.
> > 
> > For example, one very common use of logon triggers in other databases is to 
> > look at various connection parameters (like username or source IP) and 
> > enable sql logging or debugging for only certain cases (not always doing 
> > the same thing for a particular user). Another common use case is to do 
> > something like running plpgsql or manipulating data in db tables - but 
> > again looking at some combination of things at a database level to make a 
> > decision about what to do; for example the application itself might enable 
> > or disable certain behaviors by setting values in a configuration table.
> 
> Probably worth mentioning that I’m all for solving this in the application - 
> just that I’ve experienced many cases in the past where it wasn’t feasible or 
> even possible to get the sorts of changes I’d need into applications using 
> the databases that I was responsible for.
> 
> > I’m still trying to work out the best approach for solving these sorts of 
> > use cases in current versions of PostgreSQL... I’m curious how others are 
> > solving this?
>