tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
Under PostgreSQL 12, I have a table using tsvector to search the column 
pkgmessage.

It looks like this (not all columns are shown).


Table "public.ports"
   Column   |   Type   | Collation | Nullable | 
  Default   

 pkgmessage | text |   |  | 
 pkgmessage_textsearchable  | tsvector |   |  | 
generated always as (to_tsvector('english'::regconfig, pkgmessage)) stored

On several servers, it works fine, like this:

freshports.devgit=# SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
 port_id |   element_pathname   
-+--
  100421 | /ports/branches/2022Q1/dns/dnsmasq
  100428 | /ports/branches/2022Q1/emulators/xsystem35
   14686 | /ports/head/sysutils/lmon
... etc

On the problem server, production, we get nothing.  Nada.

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
 port_id | element_pathname 
-+--
(0 rows)

freshports.org=> 

However, ilike on the same database does find the matches:

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage ilike '%example%';
 port_id |   element_pathname   
-+--
   34126 | /ports/head/security/pond
   74559 | /ports/branches/2015Q3/emulators/linux_base-c6
   60310 | /ports/branches/2020Q4/www/gitlab-ce
   38345 | /ports/head/www/gitlab
... etc

Both database are SQL_ASCII with the same Collate and Ctype settings.

Before composing this email, I ran "reindex table concurrently ports;" on the 
problem database. It did not change the results.

prod, has problem

 List of databases
  Name  |  Owner   | Encoding  |   Collate   |Ctype|   Access 
privileges   
+--+---+-+-+---
 freshports.org | postgres | SQL_ASCII | C   | C   | 


dev, no issues:

[pg02 dan ~] % psql -l
List of databases
Name |Owner | Encoding  | Collate | Ctype |   
Access privileges   
-+--+---+-+---+---
 freshports.devgit   | postgres | SQL_ASCII | C   | C | 

Any ideas as to what to search please?

Oh, one difference. All the working-as-expected databases are self-hosted on 
FreeBSD. The problem database is on AWS RDS.
-- 
  Dan Langille
  d...@langille.org




Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 1:53 PM, Dan Langille wrote:
> Under PostgreSQL 12, I have a table using tsvector to search the column 
> pkgmessage.
>
> It looks like this (not all columns are shown).
>
> 
> Table "public.ports"
>Column   |   Type   | Collation | 
> Nullable |   Default
>
>  pkgmessage | text |   |
>   | 
>  pkgmessage_textsearchable  | tsvector |   |
>   | generated always as (to_tsvector('english'::regconfig, 
> pkgmessage)) stored

 pkgmessage_textsearchable2 | tsvector |   |  | 
generated always as (to_tsvector('english'::regconfig, translate(pkgmessage, 
'/'::text, ' '::text))) stored

I see the above should have been included as well.

>
> On several servers, it works fine, like this:
>
> freshports.devgit=# SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
>  port_id |   element_pathname 
>   
> -+--
>   100421 | /ports/branches/2022Q1/dns/dnsmasq
>   100428 | /ports/branches/2022Q1/emulators/xsystem35
>14686 | /ports/head/sysutils/lmon
> ... etc
>
> On the problem server, production, we get nothing.  Nada.
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
>  port_id | element_pathname 
> -+--
> (0 rows)
>
> freshports.org=> 
>
> However, ilike on the same database does find the matches:
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage ilike '%example%';
>  port_id |   element_pathname 
>   
> -+--
>34126 | /ports/head/security/pond
>74559 | /ports/branches/2015Q3/emulators/linux_base-c6
>60310 | /ports/branches/2020Q4/www/gitlab-ce
>38345 | /ports/head/www/gitlab
> ... etc

Let's look at that first entry on the problem database:

freshports.org=> select pkgmessage_textsearchable from ports where id = 34126;

  pkgmessage_textsearchable 

 
-
 '/tmp':32 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 
'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16 'mount':22,28 
'place':11 'pond':3,35,41 'requir':8 'run':34 'tmpfs':9,18,24,30,31 'use':2 
'yes':20
(1 row)

freshports.org=> select pkgmessage_textsearchable2 from ports where id = 34126;

 pkgmessage_textsearchable2 



 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 
'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16 'mount':22,28 
'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,24,30,31 
'use':2 'yes':20
(1 row)

freshports.org=> 

>From a database which runs this query with expected results:

freshports.devgit=# select pkgmessage_textsearchable2 from ports where id = 
34126;

 pkgmessage_textsearchable2 



 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 
'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16 'mount':22,28 
'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,

Re: tsvector not giving expected results on one host

2022-12-17 Thread Tom Lane
"Dan Langille"  writes:
>  pkgmessage_textsearchable2 | tsvector |   |  
> | generated always as (to_tsvector('english'::regconfig, 
> translate(pkgmessage, '/'::text, ' '::text))) stored

That is not likely to play well with this:

> freshports.org=> show  default_text_search_config ;
>  default_text_search_config 
> 
>  pg_catalog.simple

because "english" and "simple" will stem words differently.

regression=# select websearch_to_tsquery('english', 'example');
 websearch_to_tsquery 
--
 'exampl'
(1 row)

regression=# select websearch_to_tsquery('simple', 'example');
 websearch_to_tsquery 
--
 'example'
(1 row)

If what is in your tsvector is 'exampl', then only the first of
these will match.  So IMO the question is not "why is it failing
on prod?", it's "how the heck did it work on the other machine?".
You won't get nice results if websearch_to_tsquery is using a
different TS configuration than to_tsvector did.

regards, tom lane




Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> "Dan Langille"  writes:
>>  pkgmessage_textsearchable2 | tsvector |   | 
>>  | generated always as (to_tsvector('english'::regconfig, 
>> translate(pkgmessage, '/'::text, ' '::text))) stored
>
> That is not likely to play well with this:
>
>> freshports.org=> show  default_text_search_config ;
>>  default_text_search_config 
>> 
>>  pg_catalog.simple
>
> because "english" and "simple" will stem words differently.
>
> regression=# select websearch_to_tsquery('english', 'example');
>  websearch_to_tsquery 
> --
>  'exampl'
> (1 row)
>
> regression=# select websearch_to_tsquery('simple', 'example');
>  websearch_to_tsquery 
> --
>  'example'
> (1 row)
>
> If what is in your tsvector is 'exampl', then only the first of
> these will match.  So IMO the question is not "why is it failing
> on prod?", it's "how the heck did it work on the other machine?".
> You won't get nice results if websearch_to_tsquery is using a
> different TS configuration than to_tsvector did.

I think this shows why we are getting the results we see.  Credit to ch on IRC 
for asking this question.

The problem host:

freshports.org=> select websearch_to_tsquery('example');
 websearch_to_tsquery 
--
 'example'
(1 row)


The hosts on which this search works

freshports.devgit=# select websearch_to_tsquery('example');
 websearch_to_tsquery 
--
 'exampl'
(1 row)

On that same host:

[pg02 dan ~] % sudo grep -i default_text_search_config 
/var/db/postgres/data12/postgresql.conf
default_text_search_config = 'pg_catalog.english'


-- 
  Dan Langille
  d...@langille.org




Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
>> "Dan Langille"  writes:
>>>  pkgmessage_textsearchable2 | tsvector |   |
>>>   | generated always as (to_tsvector('english'::regconfig, 
>>> translate(pkgmessage, '/'::text, ' '::text))) stored
>>
>> That is not likely to play well with this:
>>
>>> freshports.org=> show  default_text_search_config ;
>>>  default_text_search_config 
>>> 
>>>  pg_catalog.simple
>>
>> because "english" and "simple" will stem words differently.
>>
>> regression=# select websearch_to_tsquery('english', 'example');
>>  websearch_to_tsquery 
>> --
>>  'exampl'
>> (1 row)
>>
>> regression=# select websearch_to_tsquery('simple', 'example');
>>  websearch_to_tsquery 
>> --
>>  'example'
>> (1 row)
>>
>> If what is in your tsvector is 'exampl', then only the first of
>> these will match.  So IMO the question is not "why is it failing
>> on prod?", it's "how the heck did it work on the other machine?".
>> You won't get nice results if websearch_to_tsquery is using a
>> different TS configuration than to_tsvector did.
>
> I think this shows why we are getting the results we see.  Credit to ch 
> on IRC for asking this question.
>
> The problem host:
>
> freshports.org=> select websearch_to_tsquery('example');
>  websearch_to_tsquery 
> --
>  'example'
> (1 row)

Ahh, this explains the differences and as to why it works where it shouldn't?

freshports.org=> select setting, source from pg_settings where name = 
'default_text_search_config';
  setting  | source  
---+-
 pg_catalog.simple | default
(1 row)


>
> The hosts on which this search works
>
> freshports.devgit=# select websearch_to_tsquery('example');
>  websearch_to_tsquery 
> --
>  'exampl'
> (1 row)


freshports.devgit=# select setting, source from pg_settings where name = 
'default_text_search_config';
  setting   |   source   
+
 pg_catalog.english | configuration file
(1 row)


At least now I know what I can play with to get all hosts in sync.

-- 
  Dan Langille
  d...@langille.org




Re: tsvector not giving expected results on one host

2022-12-17 Thread Dan Langille
On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
>> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
>>> "Dan Langille"  writes:
  pkgmessage_textsearchable2 | tsvector |   |   
| generated always as (to_tsvector('english'::regconfig, 
 translate(pkgmessage, '/'::text, ' '::text))) stored
>>>
>>> That is not likely to play well with this:
>>>
 freshports.org=> show  default_text_search_config ;
  default_text_search_config 
 
  pg_catalog.simple
>>>
>>> because "english" and "simple" will stem words differently.
>>>
>>> regression=# select websearch_to_tsquery('english', 'example');
>>>  websearch_to_tsquery 
>>> --
>>>  'exampl'
>>> (1 row)
>>>
>>> regression=# select websearch_to_tsquery('simple', 'example');
>>>  websearch_to_tsquery 
>>> --
>>>  'example'
>>> (1 row)
>>>
>>> If what is in your tsvector is 'exampl', then only the first of
>>> these will match.  So IMO the question is not "why is it failing
>>> on prod?", it's "how the heck did it work on the other machine?".
>>> You won't get nice results if websearch_to_tsquery is using a
>>> different TS configuration than to_tsvector did.
>>
>> I think this shows why we are getting the results we see.  Credit to ch 
>> on IRC for asking this question.
>>
>> The problem host:
>>
>> freshports.org=> select websearch_to_tsquery('example');
>>  websearch_to_tsquery 
>> --
>>  'example'
>> (1 row)
>
> Ahh, this explains the differences and as to why it works where it shouldn't?
>
> freshports.org=> select setting, source from pg_settings where name = 
> 'default_text_search_config';
>   setting  | source  
> ---+-
>  pg_catalog.simple | default
> (1 row)
>
>
>>
>> The hosts on which this search works
>>
>> freshports.devgit=# select websearch_to_tsquery('example');
>>  websearch_to_tsquery 
>> --
>>  'exampl'
>> (1 row)
>
>
> freshports.devgit=# select setting, source from pg_settings where name 
> = 'default_text_search_config';
>   setting   |   source   
> +
>  pg_catalog.english | configuration file
> (1 row)
>
>
> At least now I know what I can play with to get all hosts in sync.

Here we go, on the problem database, create a new field, based on simple, not 
english.

ALTER TABLE public.ports
ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as 
(to_tsvector('simple'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) 
stored;

Index it:

CREATE INDEX ports_pkgmessage_textsearchable3_idx
ON public.ports USING gin
(pkgmessage_textsearchable3)
TABLESPACE pg_default;
CREATE INDEX

query it:

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable3  @@ websearch_to_tsquery('example');
 port_id |   element_pathname   
-+--
   34126 | /ports/head/security/pond
   74559 | /ports/branches/2015Q3/emulators/linux_base-c6
   60310 | /ports/branches/2020Q4/www/gitlab-ce
   38345 | /ports/head/www/gitlab
   46842 | /ports/branches/2018Q1/mail/postfix-sasl
   51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod

Success. Thank you Mr Lane.

-- 
  Dan Langille
  d...@langille.org




Postgres Date Type Value

2022-12-17 Thread Michael Arnold
Want to retrieve a date type from a postgres table using libpq
PQexecParams() in binary mode (please humor me).
https://www.postgresql.org/docs/14/datatype-datetime.html says that a date
is 4 bytes (4713 BC to 5874897 AD). src/include/utils/date.h defines:

  typedef int32 DateADT;

Something like this:

  int32_t date_val = be32toh(*((uint32_t *) PQgetvalue(res, 0, 17)));

Gives date_val=1,466,004,328 for 2022-10-25.

Since 4713BC is the start of the Julian calendar thought it might be a
simple day count but the value is too large (2,459,878 days from 1 Jan
4713BC to 2022-10-25).  So clearly not a day count.  My simple maths gives
me 585.966... ticks per day.

How do I interpret the 4 bytes of postgresql 'date' value?


Re: Postgres Date Type Value

2022-12-17 Thread Tom Lane
Michael Arnold  writes:
> Something like this:

>   int32_t date_val = be32toh(*((uint32_t *) PQgetvalue(res, 0, 17)));

> Gives date_val=1,466,004,328 for 2022-10-25.

The origin is 2000-01-01 (I'm pretty sure that's documented somewhere),
so the correct integer value for that date is 8333 by my math.

Looking at the bit-pattern for 1,466,004,328: 0x57617368, it seems
totally unrelated, more like ASCII text ("Wash") than anything else.
You sure you're reading the right column of the result?

regards, tom lane




Re: Postgres Date Type Value

2022-12-17 Thread David G. Johnston
On Sat, Dec 17, 2022 at 4:55 PM Michael Arnold  wrote:

> How do I interpret the 4 bytes of postgresql 'date' value?
>
See date2j and j2date in datetime.c

timestamp.h has the relevant constant Tom refers to (POSTGRES_EPOCH_JDATE)


David J.


Re: Postgres Date Type Value

2022-12-17 Thread Michael Arnold
My bad - was referencing the wrong column with PQgetvalue().  With that
corrected, get 8332 for 2022-10-25.  So date is an integer day count
from POSTGRES_EPOCH_JDATE
(2000-01-01).

Thanks for the help!
---

On Sun, Dec 18, 2022 at 8:24 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sat, Dec 17, 2022 at 4:55 PM Michael Arnold  wrote:
>
>> How do I interpret the 4 bytes of postgresql 'date' value?
>>
> See date2j and j2date in datetime.c
>
> timestamp.h has the relevant constant Tom refers to (POSTGRES_EPOCH_JDATE)
>
>
> David J.
>
>


integer square root function proposed

2022-12-17 Thread Martin L. Buchanan
Dear PostgreSQL colleagues:

I have just joined this, my first PG mailing list.

Reading the documentation I found no built-in function for integer square
root, requiring a sequence of:

floor(sqrt(foo))::integer

to go from an integer to the integer square root as an integer.

If PG leaders smile on this suggestion, could we have *isqrt*(foo) where
foo has a numeric type and the result type is the same numeric type, with
the result type being the integer square root for foo >= 0 and producing an
error if negative? (And with infinities and NaNs processed as described in
8.1 Numeric Types).

I did not see an explicit enhancement request list in the PG mailing lists.
Recommendations for a more specific list to post this to are welcome.


Sincerely,

Martin L. Buchanan
software developer
Laramie, WY, USA


Re: integer square root function proposed

2022-12-17 Thread Rob Sargent

On 12/17/22 19:39, Martin L. Buchanan wrote:

Dear PostgreSQL colleagues:

I have just joined this, my first PG mailing list.

Reading the documentation I found no built-in function for integer 
square root, requiring a sequence of:


floor(sqrt(foo))::integer

to go from an integer to the integer square root as an integer.

If PG leaders smile on this suggestion, could we have *isqrt*(foo) 
where foo has a numeric type and the result type is the same numeric 
type, with the result type being the integer square root for foo >= 0 
and producing an error if negative? (And with infinities and NaNs 
processed as described in 8.1 Numeric Types).


I did not see an explicit enhancement request list in the PG mailing 
lists. Recommendations for a more specific list to post this to are 
welcome.





I suspect the majority are comfortable getting a non-integer result for 
the square root of a number.  What environment have you which needs 
(sqrt(a) * sqrt(a)) < a-epsilon?


Re: integer square root function proposed

2022-12-17 Thread Martin L. Buchanan
Dear Rob and all readers:

Generating prime numbers is one example where you use integer square root
in the inner loop, going from integer to integer.

Calculating an integer square root from an integer input may have a more
efficient algorithm than doing so in floating-point, with the caveat that
an underlying processor architecture may provide floating-point square root
instructions but not integer square root instructions. In that particular
case an implementation could use the floating-point instructions internally.

Some but not all programming languages provide isqrt directly, math.isqrt
in Python or isqrt in Common Lisp for example.

It would be a useful and convenient function and would not, I believe,
impair the other features of PostgreSQL in any way.

That said, as a PG novice (2+ years now), I completely defer to the greater
wisdom of those much more involved in PostgreSQL. So something for you all
to think about.

Best wishes, Happy Channukah, and Merry Christmas,

Martin L. Buchanan
software developer and writer since 1976
Laramie, WY

On Sat, Dec 17, 2022 at 8:20 PM Rob Sargent  wrote:

> On 12/17/22 19:39, Martin L. Buchanan wrote:
>
> Dear PostgreSQL colleagues:
>
> I have just joined this, my first PG mailing list.
>
> Reading the documentation I found no built-in function for integer square
> root, requiring a sequence of:
>
> floor(sqrt(foo))::integer
>
> to go from an integer to the integer square root as an integer.
>
> If PG leaders smile on this suggestion, could we have *isqrt*(foo) where
> foo has a numeric type and the result type is the same numeric type, with
> the result type being the integer square root for foo >= 0 and producing an
> error if negative? (And with infinities and NaNs processed as described in
> 8.1 Numeric Types).
>
> I did not see an explicit enhancement request list in the PG mailing
> lists. Recommendations for a more specific list to post this to are welcome.
>
>
>
> I suspect the majority are comfortable getting a non-integer result for
> the square root of a number.  What environment have you which needs
> (sqrt(a) * sqrt(a)) < a-epsilon?
>
>


Re: integer square root function proposed

2022-12-17 Thread Rob Sargent

On 12/17/22 20:40, Martin L. Buchanan wrote:

Dear Rob and all readers:

Generating prime numbers is one example where you use integer square 
root in the inner loop, going from integer to integer.


Calculating an integer square root from an integer input may have a 
more efficient algorithm than doing so in floating-point, with the 
caveat that an underlying processor architecture may provide 
floating-point square root instructions but not integer square root 
instructions. In that particular case an implementation could use the 
floating-point instructions internally.


Some but not all programming languages provide isqrt 
directly, math.isqrt in Python or isqrt in Common Lisp for example.


It would be a useful and convenient function and would not, I believe, 
impair the other features of PostgreSQL in any way.


That said, as a PG novice (2+ years now), I completely defer to the 
greater wisdom of those much more involved in PostgreSQL. So something 
for you all to think about.




Seasons greetings to you and welcome to the list.  Here we generally 
bottom post.


I have zero say in whether or not isqrt might get added to postgres but 
in the meantime you might craft your own function (sql or plpgsql) which 
does the necessary casting.  Or are you specifically after a faster 
function?  And would a plpythonu function using their isqrt be fast 
enough.  (Note the "u" in plpythonu means "untrusted")





Re: integer square root function proposed

2022-12-17 Thread Tom Lane
"Martin L. Buchanan"  writes:
> Reading the documentation I found no built-in function for integer square
> root, requiring a sequence of:
> floor(sqrt(foo))::integer
> to go from an integer to the integer square root as an integer.

I'm a little down on this idea, mainly because I doubt there is any
principled argument as to whether the rounding ought to be down, up,
or to nearest.  Specific applications probably want a specific one
of those, but how likely is it that a built-in choice would satisfy
many users?

I'm not buying the performance angle too much either.  Even granting
that your platform has an integer square root instruction that's faster
than its floating-point one, and that we can get at that from C, the
difference would likely be swamped by the overhead of the SQL function
call.

> (And with infinities and NaNs processed as described in
> 8.1 Numeric Types).

Our integers do not have infinities, nor NaNs.

Anyway, I don't mean to slam the door on this idea completely,
but I think you need to make a much better-supported argument
for it.

regards, tom lane