Re: The performance issues caused by upgrading PostgreSQL to version 16.3.

2025-02-17 Thread Laurenz Albe
On Mon, 2025-02-17 at 07:55 +, 馬 騰飛 wrote:
> I am reaching out to seek your technical assistance regarding a performance
> issue we encountered after upgrading our PostgreSQL version from 12.19 to 
> 16.3.
> We have noticed a significant performance problem with a specific SQL query on
> one of our application screens. 
> Interestingly, when we isolate the problematic SQL statement and replace its
> parameters with actual values, it executes in just a few seconds in pgAdmin. 
> However, when we run the same SQL query through our application using Npgsql,
> it takes over ten minutes to complete.

I recommend using the auto_explain module to capture the plan of the statement
when the application executes it.

You best write the fast and the slow plan to the list, in plain text.

Yours,
Laurenz Albe

-- 

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den 
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat 
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, 
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder 
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich 
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are 
confidential and may be privileged or otherwise protected from disclosure 
and solely for the use of the person(s) or entity to whom it is intended. 
If you have received this message in error and are not the intended 
recipient, please notify the sender immediately and delete this message and 
any attachment from your system. If you are not the intended recipient, be 
advised that any use of this message is prohibited and may be unlawful, and 
you must not copy this message or attachment or disclose the contents to 
any other person.




Clarification on Role Access Rights to Table Indexes

2025-02-17 Thread Ayush Vatsa
Hi PostgreSQL Community,
I am currently exploring the behavior of pg_prewarm and encountered an
issue related to role
access rights that I was hoping you could help clarify.

Here is the scenario I observed:

postgres=# CREATE ROLE alpha;
CREATE ROLE
postgres=# GRANT SELECT ON pg_class TO alpha;
GRANT
postgres=# SET ROLE alpha;
SET
postgres=> SELECT pg_prewarm('pg_class');
 pg_prewarm

 14
(1 row)

postgres=> SELECT pg_prewarm('pg_class_oid_index');
ERROR:  permission denied for index pg_class_oid_index
postgres=> RESET ROLE;
RESET

postgres=# GRANT SELECT ON pg_class_oid_index TO alpha;
ERROR:  "pg_class_oid_index" is an index

Based on this, I have few questions:
1. Can a role have access rights to a table without having access to its
index?
2. If yes, how can we explicitly grant access to the index?
3. If no, and the role inherently gets access to the index when granted
access to the table, why
does the pg_prewarm call fail [1] in the above scenario?

[1]
https://github.com/postgres/postgres/blob/master/contrib/pg_prewarm/pg_prewarm.c#L108-L110


Regards,
Ayush Vatsa
SDE AWS


Clarification on Role Access Rights to Table Indexes

2025-02-17 Thread David G. Johnston
On Monday, February 17, 2025, Ayush Vatsa  wrote:

> postgres=# CREATE ROLE alpha;
>
> CREATE ROLE
> postgres=# GRANT SELECT ON pg_class TO alpha;
>
This is pointless, everyone (i.e. the PUBLIC pseudo-role) can already read
pg_class.

1. Can a role have access rights to a table without having access to its
> index?
>
Roles don’t directly interact with indexes in PostgreSQL so this doesn’t
even make sense.  But if you need a yes/no answer, then yes.

>
> 3. If no, and the role inherently gets access to the index when granted
> access to the table, why
> does the pg_prewarm call fail [1] in the above scenario?
>
> [1] https://github.com/postgres/postgres/blob/master/contrib
> /pg_prewarm/pg_prewarm.c#L108-L110
>
It fails because AFAICS there is no way for it to work on an index, only
tables.

David J.


Re: Calling set-returning functions in a non-set-expecting context

2025-02-17 Thread Tom Lane
Jan Behrens  writes:
> I wonder if it is guaranteed that when calling a set-returning function
> in a non-set-expecting context, the used row is guaranteed to be the
> first row returned.

In general, I'd expect either we'd use the first row or throw an
error.  We're not 100% consistent about which rule applies, but
I can't think of a reason for anything to do something else.

> I.e. if I have the following function definition
> CREATE FUNCTION foo() RETURNS INTEGER RETURN generate_series(1, 10);
> is it then guaranteed, that foo() always returns 1? And if so, is that
> documented somewhere? I didn't find it.

I'd say that using the first row (and not throwing an error) is
guaranteed for the specific case of SQL-language functions by the same
text you quote:

> "SQL functions execute an arbitrary list of SQL statements, returning
> the result of the last query in the list. In the simple (non-set) case,
> the first row of the last query's result will be returned. (Bear in
> mind that 'the first row' of a multirow result is not well-defined
> unless you use ORDER BY.) If the last query happens to return no rows
> at all, the null value will be returned."

The reason for the parenthetical weasel-wording is that a query might
require joins, aggregation, etc, and some of our implementations of
those things don't preserve row order.  If we're talking about a
simple invocation of another set-returning function, it's just
going to take whatever that function returns first.

regards, tom lane




Re: The performance issues caused by upgrading PostgreSQL to version 16.3.

2025-02-17 Thread Ron Johnson
On Mon, Feb 17, 2025 at 2:55 AM 馬 騰飛  wrote:

> Dear PostgreSQL Community,
>
> I hope this message finds you well. I am reaching out to seek your
> technical assistance regarding a performance issue we encountered after
> upgrading our PostgreSQL version from 12.19 to 16.3.
> We have noticed a significant performance problem with a specific SQL
> query on one of our application screens.
> Interestingly, when we isolate the problematic SQL statement and replace
> its parameters with actual values, it executes in just a few seconds in
> pgAdmin.
> However, when we run the same SQL query through our application using
> Npgsql, it takes over ten minutes to complete.

We are using NpgsqlCommand.ExecuteReader to execute the SQL query, and the
> parameters are set using NpgsqlCommand.Parameters.Add.
> The main table involved in this query contains approximately 800,000
> records.
> We believe that the SQL statement itself does not have performance issues,
> but there may be problems related to how the SQL is executed in the
> application or how the parameters are set.
> However, we are unable to pinpoint the exact cause of the performance
> degradation.
>

Your situation sounds like something we encountered a few years ago in
PG12.  The solution was to add:
set plan_cache_mode = force_custom_plan

This is only for when the first five or six executions of a prepared
statement run fast, and performance drops after that.
Test the query using PREPARE (
https://www.postgresql.org/docs/16/sql-prepare.html) and ten different
parameter sets, with and without "set plan_cache_mode = force_custom_plan".

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: The performance issues caused by upgrading PostgreSQL to version 16.3.

2025-02-17 Thread Doron Tsur
post installing, did you run analyze verbose?

-Doron


On Mon, Feb 17, 2025 at 4:23 PM Greg Sabino Mullane 
wrote:

> On Mon, Feb 17, 2025 at 2:55 AM 馬 騰飛  wrote:
>
>> Interestingly, when we isolate the problematic SQL statement and replace
>> its parameters with actual values, it executes in just a few seconds in
>> pgAdmin.
>> However, when we run the same SQL query through our application using
>> Npgsql, it takes over ten minutes to complete.
>>
>
> Another reason could be a poor type casting by your driver/middleware. Can
> you share the query that is problematic? (also, since you mentioned a
> version upgrade, also make sure you run ANALYZE; post-upgrade.)
>
> Cheers,
> Greg
>
> --
> Crunchy Data - https://www.crunchydata.com
> Enterprise Postgres Software Products & Tech Support
>
>

-- 
---
Sent with Gmail for Sidekick Browser 


Re: The performance issues caused by upgrading PostgreSQL to version 16.3.

2025-02-17 Thread Adrian Klaver

On 2/16/25 23:55, 馬 騰飛 wrote:

Dear PostgreSQL Community,

I hope this message finds you well. I am reaching out to seek your technical 
assistance regarding a performance issue we encountered after upgrading our 
PostgreSQL version from 12.19 to 16.3.
We have noticed a significant performance problem with a specific SQL query on 
one of our application screens.
Interestingly, when we isolate the problematic SQL statement and replace its 
parameters with actual values, it executes in just a few seconds in pgAdmin.
However, when we run the same SQL query through our application using Npgsql, 
it takes over ten minutes to complete.
We are using NpgsqlCommand.ExecuteReader to execute the SQL query, and the 
parameters are set using NpgsqlCommand.Parameters.Add.
The main table involved in this query contains approximately 800,000 records.
We believe that the SQL statement itself does not have performance issues, but 
there may be problems related to how the SQL is executed in the application or 
how the parameters are set.
However, we are unable to pinpoint the exact cause of the performance 
degradation.
Could you please provide us with some possible reasons or suggestions for 
improvement? Your insights would be greatly appreciated.


This is going to need more information:

1) Was the upgrade done on the same machine or across machines?
If across machines how are they different?

2) The table definition.

3) The query text.

4) The Npgsql code.



Thank you for your assistance, and I look forward to your response.

Best regards,
Ma




--
Adrian Klaver
adrian.kla...@aklaver.com





Calling set-returning functions in a non-set-expecting context

2025-02-17 Thread Jan Behrens
Hi,

I wonder if it is guaranteed that when calling a set-returning function
in a non-set-expecting context, the used row is guaranteed to be the
first row returned.

I.e. if I have the following function definition

CREATE FUNCTION foo() RETURNS INTEGER RETURN generate_series(1, 10);

is it then guaranteed, that foo() always returns 1? And if so, is that
documented somewhere? I didn't find it.

I know that generate_series creates an ordered result, so that's not my
concern, but I'm not sure whether the first row will be picked. There
is something written here:

https://www.postgresql.org/docs/17/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET

"SQL functions execute an arbitrary list of SQL statements, returning
the result of the last query in the list. In the simple (non-set) case,
the first row of the last query's result will be returned. (Bear in
mind that 'the first row' of a multirow result is not well-defined
unless you use ORDER BY.) If the last query happens to return no rows
at all, the null value will be returned."

But this part explicitly mentions queries. Using the "RETURN"
statement, I don't give a query but an expression. So does the "first
row gets used" rule also apply in my context, and why so?

For example, the following command results in an error:

SELECT (SELECT generate_series(1, 10));
ERROR:  more than one row returned by a subquery used as an expression

Kind regards and thanks for your advice,
Jan Behrens




Re: The performance issues caused by upgrading PostgreSQL to version 16.3.

2025-02-17 Thread Greg Sabino Mullane
On Mon, Feb 17, 2025 at 2:55 AM 馬 騰飛  wrote:

> Interestingly, when we isolate the problematic SQL statement and replace
> its parameters with actual values, it executes in just a few seconds in
> pgAdmin.
> However, when we run the same SQL query through our application using
> Npgsql, it takes over ten minutes to complete.
>

Another reason could be a poor type casting by your driver/middleware. Can
you share the query that is problematic? (also, since you mentioned a
version upgrade, also make sure you run ANALYZE; post-upgrade.)

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: Clarification on Role Access Rights to Table Indexes

2025-02-17 Thread Ayush Vatsa
> This is pointless, everyone (i.e. the PUBLIC pseudo-role) can already
read pg_class.
True, Just checked that.

> It fails because AFAICS there is no way for it to work on an index, only
tables.
pg_prewarm extension works on index if we have right (SELECT) privileges
postgres=# CREATE TABLE x(id INT);
CREATE TABLE
postgres=# CREATE INDEX idx ON x(id);
CREATE INDEX

postgres=# INSERT INTO x SELECT * FROM generate_series(1,1);
INSERT 0 1
postgres=# SELECT pg_prewarm('x');
 pg_prewarm

 45
(1 row)

postgres=# SELECT pg_prewarm('idx');
 pg_prewarm

 30
(1 row)

> It seems like ownership of the table would be more appropriate, or maybe
> access to one of the built-in roles like pg_maintain.
True, adding Robert Haas (author) to this thread for his opinion.

Regards,
Ayush Vatsa
SDE AWS


Re: Clarification on Role Access Rights to Table Indexes

2025-02-17 Thread David G. Johnston
On Monday, February 17, 2025, Tom Lane  wrote:

> Ayush Vatsa  writes:
> > postgres=> SELECT pg_prewarm('pg_class_oid_index');
> > ERROR:  permission denied for index pg_class_oid_index
>
> You'd really have to take that up with the author of pg_prewarm.


This is our contrib module so this seems like the expected place to ask
such a question.  It’s neither a bug nor a topic for -hackers.  FTR, Robert
Haas is the author from 2013.  Not sure he monitors -general though.

David J.


Re: Clarification on Role Access Rights to Table Indexes

2025-02-17 Thread Tom Lane
Ayush Vatsa  writes:
> postgres=> SELECT pg_prewarm('pg_class_oid_index');
> ERROR:  permission denied for index pg_class_oid_index

You'd really have to take that up with the author of pg_prewarm.
It's not apparent to me why checking SQL access permissions is
the right mechanism for limiting use of pg_prewarm.  It seems
like ownership of the table would be more appropriate, or maybe
access to one of the built-in roles like pg_maintain.

> 1. Can a role have access rights to a table without having access to its
> index?

Indexes do not have access rights of their own, which is why
access rights are a poor gating mechanism for something that
needs to be applicable to indexes.  Ownership could work,
because we make indexes inherit their table's ownership.

regards, tom lane




Re: pg_rewind - enable wal_log_hints or data-checksums

2025-02-17 Thread Bowen Shi
Hi Michael,

I first use initdb, and set wal_log_hints=off, data_checksums=off, and
full_page_writes=on. Starting pg and running for a while.

Then switch over happened, I used the following commands:
1. Old master postgresql.conf set wal_log_hints=on, then start and stop pg.
2. using  pg_rewind --target-pgdata=OldMaster --source-server=NewMaster (no
error, old master's wal_log_hints has been set to on)

I wonder if this could lead to data corruption.



On Tue, Feb 18, 2025 at 2:27 PM Michael Paquier 
wrote:

> On Mon, Jun 5, 2017 at 9:37 AM, Dylan Luong 
> wrote:
> > pg_rewind requires that the target server either has the wal_log_hints
> > option enabled in postgresql.conf or data checksums enabled when the
> cluster
> > was initialized with initdb.
>
> Yes, this is to make sure that you don't finish with a corrupted
> target server if a hint bit is set on a page after a checkpoint. Any
> of those options make sure that a full-page write is generated in this
> case.
>
> > What is the difference between the two options?
>
> Data checksums calculate 2 bytes of checksum data and write it to each
> page that is evicted from shared buffers. Each page read from disk has
> its checksum checked. In some workloads, like a heavy read load where
> a lot of page evictions happen, this can induce a couple of percents
> of performance lost. In my own experience, that's 1~2%.
>
> > What are the advantages and disadvantages between the two?
> > Which one is the the preferred option?
>
> If you care more about performance or if you use a file system that
> has its own block-level checksum, wal_log_hints would be preferred.
> Data checksums offer more guarantees in terms of integrity though when
> looking for corrupted data. Things get found more quickly.
> --
> Michael
>
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>


Loading the latest N rows into the cache seems way too fast.

2025-02-17 Thread Ron Johnson
PG 9.6.24 and PG 14.15, if it matters.
(Yes, 9.6 is really EOL.  I don't control that.)

(I could use pg_prewarm, but the table is much bigger than RAM, and
last_block value only has the newest record if data has never been
deleted.  The oldest records regularly get deleted, and then the table is
vacuumed; thus, new records can be anywhere in the table.)

Thus, roll my own cache-loading statement.

The bigint "id" column in "mytbl" is populated from a sequence, and so is
monotonically increasing: the newest records will have the biggest id
values.
The table also has a bytea column that averages about 100KB.

Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
to take much longer than 1/2 second.  Am I still in the dark ages of
computer speed, or is this statement not doing what I hope it's doing?

$ time psql -h foo bar -Xc "DO \$\$ BEGIN PERFORM * FROM mytbl ORDER BY id
DESC LIMIT 20 ; END \$\$;"
DO

real0m0.457s
user0m0.005s
sys 0m0.004s

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: Loading the latest N rows into the cache seems way too fast.

2025-02-17 Thread Christoph Moench-Tegeder
## Ron Johnson (ronljohnso...@gmail.com):

> Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
> to take much longer than 1/2 second.  Am I still in the dark ages of
> computer speed, or is this statement not doing what I hope it's doing?
> 
> $ time psql -h foo bar -Xc "DO \$\$ BEGIN PERFORM * FROM mytbl ORDER BY id
> DESC LIMIT 20 ; END \$\$;"

You can check what that statement does - e.g. in pg_stat_statements,
or (on an idle database, so the effects aren't lost in the noise) in
pg_stat_database or pg_statio_user_tables.
Between what the storage components of the last decade (e.g. those
SATA SSDs which are already being replaced in the market by NVME)
can deliver (>400MB/s, often marketed as ">500 MB/s" but on SATA that's
optimistic) and the fact that there are most likely some blocks
in the database' buffer and/or the OS buffer, the observed throughput
is not neccessarily unrealistic. With modern "server" hardware, getting
throughput in the "gigabytes per second" range is considered normal and
expected.

Regards,
Christoph

-- 
Spare Space




Re: Loading the latest N rows into the cache seems way too fast.

2025-02-17 Thread Tom Lane
Ron Johnson  writes:
> The bigint "id" column in "mytbl" is populated from a sequence, and so is
> monotonically increasing: the newest records will have the biggest id
> values.
> The table also has a bytea column that averages about 100KB.

> Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
> to take much longer than 1/2 second.  Am I still in the dark ages of
> computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

regards, tom lane




Re: Loading the latest N rows into the cache seems way too fast.

2025-02-17 Thread Ron Johnson
On Mon, Feb 17, 2025 at 4:36 PM Tom Lane  wrote:

> Ron Johnson  writes:
> > The bigint "id" column in "mytbl" is populated from a sequence, and so is
> > monotonically increasing: the newest records will have the biggest id
> > values.
> > The table also has a bytea column that averages about 100KB.
>
> > Loading 200K rows is more than 200MB.  I expected this "prewarm"
> statement
> > to take much longer than 1/2 second.  Am I still in the dark ages of
> > computer speed, or is this statement not doing what I hope it's doing?
>
> It's not pulling in the TOAST storage where the bytea column lives.
> (pg_prewarm wouldn't have either, without special pushups.)
>

Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
bytea column's name?

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: Wasteful nested loop join when there is `limit` in the query

2025-02-17 Thread WU Yan
Thank you for your help, Tom.

You are right. I added an index on employee.name (by making it unique), and
then postgres can visit employee table in a pre-sorted manner, and can exit
early without joining more rows.


Just sharing the tweak I did to the example, if anyone else is interested
in a quick test. I also populated 1 million rows so the example is no
longer a toy demo.

```sql
drop table if exists department;
drop table if exists employee;

create table department(
id int primary key,
name text);
create table employee(
id int primary key,
name text unique,
department_id int);

INSERT INTO department (id, name)
SELECT i+1, 'department' || i+1
FROM generate_series(0, 9) AS i;

INSERT INTO employee (id, name, department_id)
SELECT i+1, 'name' || i+1, i % 10 +1
FROM generate_series(0, 99) AS i;

analyze department;
analyze employee;

explain analyze
select *
from employee left outer join department
on employee.department_id = department.id
order by employee.name limit 10;
```

And here is the plan:
```
 QUERY
PLAN

 Limit  (cost=0.57..1.36 rows=10 width=34) (actual time=0.017..0.030
rows=10 loops=1)
   ->  Nested Loop Left Join  (cost=0.57..78630.06 rows=100 width=34)
(actual time=0.016..0.028 rows=10 loops=1)
 ->  Index Scan using employee_name_key on employee
 (cost=0.42..54855.68 rows=100 width=18) (actual time=0.008..0.015
rows=10 loops=1)
 ->  Memoize  (cost=0.15..0.16 rows=1 width=16) (actual
time=0.001..0.001 rows=1 loops=10)
   Cache Key: employee.department_id
   Cache Mode: logical
   Hits: 6  Misses: 4  Evictions: 0  Overflows: 0  Memory
Usage: 1kB
   ->  Index Scan using department_pkey on department
 (cost=0.14..0.15 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=4)
 Index Cond: (id = employee.department_id)
 Planning Time: 0.189 ms
 Execution Time: 0.045 ms
(11 rows)
```

Personally I still wish someday postgres can push down `limit` node
together with `sort` node when certain conditions are met, so that there's
no need to add an index :D

Thank you again for your help!

On Mon, 17 Feb 2025 at 18:01, Tom Lane  wrote:

> WU Yan <4wu...@gmail.com> writes:
> > Hello everyone, I am still learning postgres planner and performance
> > optimization, so please kindly point out if I missed something obvious.
>
> An index on employee.name would likely help here.  Even if we had
> an optimization for pushing LIMIT down through a join (which you
> are right, we don't) it could not push the LIMIT through a sort step.
> So you need presorted output from the scan of "employee".  I think
> this example would behave better with that.  You may also need to
> test with non-toy amounts of data to get the plan you think is
> better: an example with only half a dozen rows is going to be
> swamped by startup costs.
>
> regards, tom lane
>


Re: Clarification on Role Access Rights to Table Indexes

2025-02-17 Thread Laurenz Albe
On Mon, 2025-02-17 at 23:31 +0530, Ayush Vatsa wrote:
> postgres=> SELECT pg_prewarm('pg_class_oid_index');
> ERROR:  permission denied for index pg_class_oid_index
> postgres=> RESET ROLE;
> RESET
> 
> postgres=# GRANT SELECT ON pg_class_oid_index TO alpha;
> ERROR:  "pg_class_oid_index" is an index
> Based on this, I have few questions:
> 1. Can a role have access rights to a table without having access to its 
> index?
> 2. If yes, how can we explicitly grant access to the index?
> 3. If no, and the role inherently gets access to the index when granted 
> access to the table, why
> does the pg_prewarm call fail [1] in the above scenario?

I have seen a complaint about this bug before:
https://dba.stackexchange.com/a/344603/176905

Yours,
Laurenz Albe

-- 

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den 
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat 
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, 
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder 
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich 
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are 
confidential and may be privileged or otherwise protected from disclosure 
and solely for the use of the person(s) or entity to whom it is intended. 
If you have received this message in error and are not the intended 
recipient, please notify the sender immediately and delete this message and 
any attachment from your system. If you are not the intended recipient, be 
advised that any use of this message is prohibited and may be unlawful, and 
you must not copy this message or attachment or disclose the contents to 
any other person.




Re: Loading the latest N rows into the cache seems way too fast.

2025-02-17 Thread Ron Johnson
On Mon, Feb 17, 2025 at 4:51 PM Tom Lane  wrote:

> Ron Johnson  writes:
> > On Mon, Feb 17, 2025 at 4:36 PM Tom Lane  wrote:
> >> It's not pulling in the TOAST storage where the bytea column lives.
> >> (pg_prewarm wouldn't have either, without special pushups.)
>
> > Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> > bytea column's name?
>
> You'd have to do something that actually used the column's value,
> perhaps "md5(byteacol)" or such.  (The obvious candidate would be
> length(), but I think that is optimized to not fetch or decompress
> the whole value.)
>

That's definitely taking a LOT longer...

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: Loading the latest N rows into the cache seems way too fast.

2025-02-17 Thread Tom Lane
Ron Johnson  writes:
> On Mon, Feb 17, 2025 at 4:36 PM Tom Lane  wrote:
>> It's not pulling in the TOAST storage where the bytea column lives.
>> (pg_prewarm wouldn't have either, without special pushups.)

> Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> bytea column's name?

You'd have to do something that actually used the column's value,
perhaps "md5(byteacol)" or such.  (The obvious candidate would be
length(), but I think that is optimized to not fetch or decompress
the whole value.)

regards, tom lane




Re: Loading the latest N rows into the cache seems way too fast.

2025-02-17 Thread David G. Johnston
On Mon, Feb 17, 2025 at 2:41 PM Ron Johnson  wrote:

> On Mon, Feb 17, 2025 at 4:36 PM Tom Lane  wrote:
>
>> Ron Johnson  writes:
>> > The bigint "id" column in "mytbl" is populated from a sequence, and so
>> is
>> > monotonically increasing: the newest records will have the biggest id
>> > values.
>> > The table also has a bytea column that averages about 100KB.
>>
>> > Loading 200K rows is more than 200MB.  I expected this "prewarm"
>> statement
>> > to take much longer than 1/2 second.  Am I still in the dark ages of
>> > computer speed, or is this statement not doing what I hope it's doing?
>>
>> It's not pulling in the TOAST storage where the bytea column lives.
>> (pg_prewarm wouldn't have either, without special pushups.)
>>
>
> Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> bytea column's name?
>
>
It's more about the system optimizing away data retrieval because you've
indicated you don't care about the contents due to using PERFORM.  All it
needs is a pointer to represent the future data, not the data itself.  And
PERFORM will never resolve that pointer by itself - so as Tom said your
query would need to force pointer resolution by computing on the data.

David J.


Re: Clarification on Role Access Rights to Table Indexes

2025-02-17 Thread Tom Lane
"David G. Johnston"  writes:
> On Monday, February 17, 2025, Tom Lane  wrote:
>> You'd really have to take that up with the author of pg_prewarm.

> This is our contrib module so this seems like the expected place to ask
> such a question.  It’s neither a bug nor a topic for -hackers.  FTR, Robert
> Haas is the author from 2013.  Not sure he monitors -general though.

Ah, you are right, I was thinking it was a third-party extension.

If we're talking about changing the behavior of a contrib module,
I think -hackers would be the appropriate location for that.
And it does seem like this deserves a fresh look.  As it stands,
a superuser can prewarm an index (because she bypasses all
privilege checks including this one), but nobody else can.
Seems weird.

regards, tom lane