pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Achilleas Mantzios - cloud

Dear All

I have a weird problem, I am trying to improve performance on this query :

SELECT text('f...@bar.com') from mail_vessel_addressbook where 
text('f...@bar.com') ~* address_regex limit 1;


The first system (linux) is a linux hosted in a cloud, kernel 
3.16.0-4-amd64, 32GB mem, SSD, 4 x Intel(R) Xeon(R) CPU E7-4860 v2 @ 
2.60GHz ,


The second (freebsd) system, used as test, is my local FreeBSD 
13.1-RELEASE workstation, 32GB mem, ZFS/magnetic disks ,16 x AMD Ryzen 7 
5800X 3800.16-MHz .


Overall my workstation is faster, but my issue is not plain speed. The 
problem is as follows :


*FreeBSD*

postgres@[local]/dynacom=# explain (analyze,buffers) SELECT 
text('f...@bar.com') from mail_vessel_addressbook where 
text('f...@bar.com') ~* address_regex limit 1;


 QUERY PLAN
-- 

Limit  (cost=0.42..5.11 rows=1 width=32) (actual time=96.705..96.706 
rows=1 loops=1)

  Buffers: shared hit=71
  ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32) 
(actual time=96.704..96.705 rows=1 loops=1)

Filter: ('f...@bar.com'::text ~* address_regex)
Rows Removed by Filter: 14738
Heap Fetches: 0
Buffers: shared hit=71
Planning time: 0.082 ms
Execution time: 96.725 ms
(9 rows)

Time: 97.038 ms
postgres@[local]/dynacom=#

*Linux*

dynacom=# explain (analyze,buffers) SELECT text('f...@bar.com') from 
mail_vessel_addressbook where text('f...@bar.com') ~* address_regex limit 1;


   QUERY PLAN
-- 

Limit  (cost=0.42..5.12 rows=1 width=32) (actual time=1768.725..1768.727 
rows=1 loops=1)

  Buffers: shared hit=530
  ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32) 
(actual time=1768.724..1768.725 rows=1 loops=1)

Filter: ('f...@bar.com'::text ~* address_regex)
Rows Removed by Filter: 97781
Heap Fetches: 0
Buffers: shared hit=530
Planning time: 1.269 ms
Execution time: 1768.998 ms
(9 rows)


The file in FreeBSD came by pg_dump from the linux system, I am puzzled 
why this huge difference in Buffers: shared hit. All table/index sizes 
are identical on both systems, I did vacuum full on the linux one, and 
also did vacuum freeze on both. I analyzed both, reindexed both (several 
times). Still the FreeBSD seems to access about 7 times less number of 
blocks from shared_buffers than linux : 71 vs 530 . There is no bloat , 
I tested with newly fresh table in both systems as well.


Thank you for any help.


Re: Inconsistent query performance based on relation hit frequency

2024-06-27 Thread Achilleas Mantzios - cloud


On 6/27/24 03:50, Laura Hausmann wrote:
Heya, I hope the title is somewhat descriptive. I'm working on a 
decentralized social media platform and have encountered the following 
performance issue/quirk, and would like to ask for input, since I'm 
not sure I missed anything.


I'm running PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 13.2.1 20230801, 64-bit, running on an Arch Linux box with 128GB 
of RAM & an 8c16t Ryzen 3700X CPU. Disk is a NVME RAID0.


Postgres configuration: https://paste.depesz.com/s/iTv

I'm using autovacuum defaults & am running a manual VACUUM ANALYZE on 
the entire database nightly.


The relevant database parts consist of a table with posts (note), a 
table with users (user), and a table with follow relationships 
(following). The query in question takes the most recent n (e.g. 50) 
posts, filtered by the users follow relations.


The note table on my main production instance grows by about 200k 
entries per week.


Schema & tuple counts: https://paste.depesz.com/s/cfI

Here's the shortest query I can reproduce the issue with: 
https://paste.depesz.com/s/RoC
Specifically, it works well for users that follow a relatively large 
amount of users (https://explain.depesz.com/s/tJnB), and is very slow 
for users that follow a low amount of users / users that post 
infrequently (https://explain.depesz.com/s/Mtyr).


From what I can tell, this is because this query causes postgres to 
scan the note table from the bottom (most recent posts first), 
discarding anything by users that are not followed.


Curiously, rewriting the query like this 
(https://paste.depesz.com/s/8rN) causes the opposite problem, this 
query is fast for users with a low following count 
(https://explain.depesz.com/s/yHAz#query), and slow for users with a 
high following count (https://explain.depesz.com/s/1v6L, 
https://explain.depesz.com/s/yg3N).


These numbers are even further apart (to the point of 10-30s query 
timeouts) in the most extreme outlier cases I've observed, and on 
lower-end hardware.


I've sidestepped the issue by running either of these queries based on 
a heuristic that checks whether there are more than 250 matching posts 
in the past 7 days, recomputed once per day for every user, but it 
feels more like a hack than a proper solution.


I'm able to make the planner make a sensible decision in both cases by 
setting enable_sort = off, but that tanks performance for the rest of 
my application, is even more of a hack, and doesn't seem to work in 
all cases.


I've been able to reproduce this issue with mock data 
(https://paste.depesz.com/s/CnY), though it's not generating quite the 
same query plans and is behaving a bit differently.


Before deep dive into everybody's favorite topic you may simplify your 
query :


select o.* from objects o where o."userId" = :userid UNION select o.* 
from objects o where o."userId" IN


(SELECT r."followeeId" FROM relationships r WHERE r."followerId"= :userid)

postgres@[local]/laura=# explain (analyze, buffers) select o.* from 
objects o where o."userId" = 1 UNION select o.* from objects o where 
o."userId" IN (SELECT r."followeeId" FROM relati

onships r WHERE r."followerId"=1) ORDER BY id DESC ;

QUERY PLAN

-
---
Sort  (cost=8622.04..8767.98 rows=58376 width=40) (actual 
time=1.041..1.053 rows=314 loops=1)

  Sort Key: o.id DESC
  Sort Method: quicksort  Memory: 39kB
  Buffers: shared hit=1265
  ->  HashAggregate  (cost=3416.92..4000.68 rows=58376 width=40) 
(actual time=0.900..1.006 rows=314 loops=1)

Group Key: o.id, o."userId", o.data
Batches: 1  Memory Usage: 1585kB
Buffers: shared hit=1265
->  Append  (cost=0.42..2979.10 rows=58376 width=40) (actual 
time=0.024..0.816 rows=314 loops=1)

  Buffers: shared hit=1265
  ->  Index Scan using "objects_userId_idx" on objects o 
 (cost=0.42..3.10 rows=17 width=21) (actual time=0.003..0.003 rows=0 
loops=1)

Index Cond: ("userId" = 1)
Buffers: shared hit=3
  ->  Nested Loop  (cost=0.70..2684.12 rows=58359 width=21) 
(actual time=0.020..0.794 rows=314 loops=1)

Buffers: shared hit=1262
->  Index Only Scan using 
"relationships_followerId_followeeId_idx" on relationships r 
 (cost=0.28..7.99 rows=315 width=4) (actual time=0.011..0.030 rows=315 
loops=

1)
  Index Cond: ("followerId" = 1)
  Heap Fetches: 0
  Buffers: shared hit=3
->  Index Scan using "objects_userId_idx" on 
objects o_1  (cost=0.42..6.65 rows=185 width=21) (actual 
time=0.002..0.002 rows=1 loops=315)

   

Re: tds_fdw : Severe performance degradation from postgresql 10.23 to 16.4

2024-11-12 Thread Achilleas Mantzios - cloud



On 11/12/24 21:37, Tom Lane wrote:

Achilleas Mantzios  writes:

Our sysadm created the system debian Debian GNU/Linux 12 (bookworm)with
postgres as user 1000. Now at some point we realized that whenever we
run a pgsql cluster with another user (I found that after spending two
good days testing), the above query runs in about 1 second. With user
postgres 1000 in 30 seconds. As you saw the perf output are completely
different.

Don't recall details offhand, but in some situations where the calling
SQL user doesn't have permissions to read particular columns, the
planner will not consult statistics for those columns.  That can lead
to a different, less optimal plan being used.  Maybe something like
that is happening here?


Thank you Tom,

In all runs of this query the user is postgres (superuser). Besides, 
this is just a simple query run on a FOREIGN TABLE living in a MS SQL 
Server.




regards, tom lane





Re: tds_fdw : Severe performance degradation from postgresql 10.23 to 16.4

2024-11-13 Thread Achilleas Mantzios - cloud

Dear All

false alert, I run strace and it was obvious the slow one was producing 
huge debug output, while the fast one did not. It was not even a tds_fdw 
issue. It was freetds. Turned out we have forgotten enabled debugging 
inside the freetds configuration. You will ask me we did I get this 
effect of debugging only when run with user "postgres" ? Because 
/tmp/freetds.log belonged to postgres!


I changed owner to nobody:nogroup and dont even need to restart postgres. !!

I am sorry for all the noise.

Thank you so much for your prompts !!

On 11/12/24 23:46, Tomas Vondra wrote:


On 11/12/24 20:37, Tom Lane wrote:

Achilleas Mantzios  writes:

Our sysadm created the system debian Debian GNU/Linux 12 (bookworm)with
postgres as user 1000. Now at some point we realized that whenever we
run a pgsql cluster with another user (I found that after spending two
good days testing), the above query runs in about 1 second. With user
postgres 1000 in 30 seconds. As you saw the perf output are completely
different.

Don't recall details offhand, but in some situations where the calling
SQL user doesn't have permissions to read particular columns, the
planner will not consult statistics for those columns.  That can lead
to a different, less optimal plan being used.  Maybe something like
that is happening here?


I don't know, the query is pretty trivial, and the estimates seemed
exactly the same in both cases. And it shouldn't affect how the query
gets planned on the MSSQL side.

But this seems really strange:

   Planning Time: 14029.724 ms
   ...
   Execution Time: 15102.803 ms

It's not about the execution, it's about the planning. I have no idea
why should the planning take this long, except maybe for waiting for a
lock, or something like that. But that's not really consistent with the
profile ... it's weird.

I'm not familiar with tds_fdw, but I see there are a bunch of table
options [1] that might affect this, namely:

* use_remote_estimate
* local_tuple_estimate
* row_estimate_method (defaults to 'execute')

Are you sure these are set to the same value on both machines?

Wild random guesses:

1) Could you try running the query with jit=off?

2) Did you run ANALYZE on the foreign table? Could matter when not using
remote estimates (use_remote_estimate=false).

3) Could it be some sort of memory pressure/swapping? But that would
look different in the profile, AFAIK.


regards


[1] https://github.com/tds-fdw/tds_fdw/blob/master/ForeignTableCreation.md






Re: Slow performance of collate "en_US.utf8"

2025-02-28 Thread Achilleas Mantzios - cloud

Hi Alexey

On 2/27/25 15:54, Alexey Borschev wrote:


Hi everyone!

I see poor performance of text sorting of collate "en_US.utf8" in PG 17.4.

Test query:

explain (analyze, costs, buffers, verbose)

SELECT

('БвЁжЫйяЙSёYz&$' || gen.id) collate "en_US.utf8"

FROM generate_series(1, 1) AS gen(id)

order by 1 desc;

I've got execution time like:

Execution Time: 73.068 ms

Same poor result with ru_RU.UTF8.

With other collations time is much better:

explain (analyze, costs, buffers, verbose)

select

('БвЁжЫйяЙSёYz&$' || gen.id) collate "C"

from generate_series(1, 1) AS gen(id)

order by 1 desc;

Execution Time: 4.792 ms

explain (analyze, costs, buffers, verbose)

SELECT

('БвЁжЫйяЙSёYz&$' || gen.id) collate "C.utf8"

FROM generate_series(1, 1) AS gen(id)

order by 1 desc;

Execution Time: 7.473 ms

explain (analyze, costs, buffers, verbose)

select

('БвЁжЫйяЙSёYz&$' || gen.id) collate "und-x-icu"

from generate_series(1, 1) AS gen(id)

order by 1 desc;

Execution Time: 13.282 ms

Yes, collate C is fastest, ICU collations is ~ 2 slower then C, but 
"en_US.utf8" is ~ 10x slower!


I suspect it is some performance issue over there.

Can someone of PG hackers reproduce this please?

1) This PG17 instance was installed with default options, and initdb 
got en_US.utf8 as system default collation and created PG cluster with it.


It seems like most PG databases are created this way with en_US.utf8 
by default.


It seems you initialized the cluster with libc as the locale provider. 
Have you tried with icu ?


2) Typical text\varchar columns are created with DB default en_US.utf8 
and performs poor.


explain (analyze, costs, buffers, verbose)

select

('БвЁжЫйяЙSёYz&$' || gen.id)

from generate_series(1, 1) AS gen(id)

order by 1 desc;

Execution Time: 73.600 ms

3) The index search operations are also slower with en_US.utf8, but 
the difference is not as high.


Please see attached file with test table and indexes, with tests on 
index performance.


System Details:

postgres@borschev-pg-copydb1:~$ uname -a Linux borschev-pg-copydb1 
6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19) 
x86_64 GNU/Linux


postgres@borschev-pg-copydb1:~$ cat /etc/issue Debian GNU/Linux 
trixie/sid \n \l


select version();

PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2) on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit


SELECT * FROM pg_config();

|name |setting |

|-|--|

|BINDIR |/usr/lib/postgresql/17/bin |

|DOCDIR |/usr/share/doc/postgresql-doc-17 |

|HTMLDIR |/usr/share/doc/postgresql-doc-17 |

|INCLUDEDIR |/usr/include/postgresql |

|PKGINCLUDEDIR |/usr/include/postgresql |

|INCLUDEDIR-SERVER|/usr/include/postgresql/17/server |

|LIBDIR |/usr/lib/x86_64-linux-gnu |

|PKGLIBDIR |/usr/lib/postgresql/17/lib |

|LOCALEDIR |/usr/share/locale |

|MANDIR |/usr/share/postgresql/17/man |

|SHAREDIR |/usr/share/postgresql/17 |

|SYSCONFDIR |/etc/postgresql-common |

|PGXS |/usr/lib/postgresql/17/lib/pgxs/src/makefiles/pgxs.mk |

|CONFIGURE | '--build=x86_64-linux-gnu' '--prefix=/usr' 
'--includedir=${prefix}/include' '--mandir=${prefix}/share/man' 
'--infodir=${prefix}/share/info' '--sysconfdir=/etc' 
'--localstatedir=/var' '--disable-option-checking' 
'--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' 
'--run