Vacuum full connection exhaustion

2024-08-08 Thread Costa Alexoglou
Hey folks,

I noticed something weird, and not sure if this is the expected behaviour
or not in PostgreSQL.

So I am running Benchbase (a benchmark framework) with 50 terminals (50
concurrent connections).
There are 2-3 additional connections, one for a postgres-exporter container
for example.

So far so good, and with a `max_connections` at 100 there is no problem.
What happens is that if I execute manually `VACUUM FULL` the connections
are exhausted.

Also tried this with 150 `max_connections` to see if it just “doubles” the
current connections, but as it turned out, it still exhausted all the
connections until it reached `max_connections`.

This was cross-checked, as the postgres-exporter could not connect, and I
manually was not allowed to connect with `psql`.

Is this expected or is this a bug?


postgres-exporter logs:
```
sql: error: connection to server on socket
"/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  sorry, too many clients
already
```


Re: Vacuum full connection exhaustion

2024-08-09 Thread Costa Alexoglou
On Fri, Aug 9, 2024 at 1:02 AM David Rowley  wrote:

> On Fri, 9 Aug 2024 at 02:12, Christophe Pettus  wrote:
> > VACUUM FULL takes an exclusive lock on the table that it is operating
> on.  It's possible that a connection becomes blocked on that exclusive lock
> waiting for the VACUUM FULL to finish, the application sees the connection
> stopped and fires up another one (this is common in container-based
> applications), that one blocks... until all of the connections are full of
> queries waiting on that VACUUM FULL.
>
> I also imagine this is the cause. One way to test would be to do:
> BEGIN; LOCK TABLE ; and see if the connections pile up
> in a similar way to when the VACUUM FULL command is used.
>
> David
>

Thanks folks. David really straight-forward way to test. I validated this,
when I lock the two tables involved in the benchmark the connections are
constantly growing until they reach the `max_connections`


insufficient privilege with pg_read_all_stats granted

2024-08-20 Thread Costa Alexoglou
Hey folks,

I run PostgreSQL v15.8 (docker official image), and there is an issue when
reading pg_stat_staments table with a result of query most of the times
having `` value.

I have created the user that I use to fetch the data with the following way:
```
CREATE USER abcd WITH NOSUPERUSER NOCREATEROLE NOINHERIT LOGIN;

GRANT pg_read_all_stats, pg_stat_scan_tables, pg_read_all_settings to abcd;

GRANT pg_monitor to abcd;
```

I explicitly gave `pg_read_all_stats` and also called `pg_monitor` just to
be on the safe side, but stil I get the insufficient privilege error.

```
SELECT
   r.rolname AS member,
   m.rolname AS role
FROM
   pg_auth_members am
JOIN
   pg_roles r ON r.oid = am.member
JOIN
   pg_roles m ON m.oid = am.roleid
WHERE
   m.rolname = 'pg_read_all_stats'
   AND r.rolname = 'abcd';

 member |   role
+---
 abcd   | pg_read_all_stats
(1 row)
```

I also tried with PostgreSQL v14.13, and this was not the case, it was
working fine as expected.
Then I tried v16.4 and v17beta3, and I faced the 
issue, so I guess something changed v15 onwards?


Re: insufficient privilege with pg_read_all_stats granted

2024-08-21 Thread Costa Alexoglou
> I think the problem is in the NOINHERIT attribute for the abcd role.

Indeed that is the issue, thanks for helping find this out

>


Re: Used memory calculation in containers - docker stats and file cache

2024-11-04 Thread Costa Alexoglou
> I don't know if Docker does anything strange here.

I am not sure if this is docker specific or cgroup comes into play.
The measurement is implemented in docker CLI, but I would make the
assumption
that the eviction is done within the cgroup scope.

> A large file (or many smaller files) which is cached is deleted

The increase pattern is "incremental" until the huge eviction, and this is
my question.
Couldn't also the eviction happen incrementally rather than 15GB of file
cache evicted on an instant?

> So I'd double check the logs if there are
any errors.

Not any error in the logs, unfortunately (or fortunately).


Seems like this issue , or
the parent one  that everyone is
linking to this.


Empty query_id in pg_stat_activity

2024-12-06 Thread Costa Alexoglou
Hey folks, I am running Benchbase and pgbench at the same time just for
debugging purposes, and I notice that sometimes query_id is missing
from pg_stat_activity. Any clue why this is happening?

```
benchbase=# SELECT query_id, now() - query_start as duration, query FROM
pg_stat_activity WHERE state = 'active' AND backend_type = 'client backend'
ORDER BY duration DESC LIMIT 5; query_id | duration | query
-+--+
| 00:00:00.223544 | SELECT + | | s_suppkey, + | | s_name, + | | s_address,
+ | | s_phone, + | | total_revenue + | | FROM + | | supplier, + | |
revenue0 + | | WHERE + | | s_suppkey = supplier_no + | | AND total_revenue
= ( + | | SELECT + | | MAX(total_revenue) + | | FROM + | | revenue0 + | | )
+ | | ORDER BY + | | s_suppkey + | | 3080582906387216276 | 00:00:00.32
| UPDATE pgbench_branches SET bbalance = bbalance + -4897 WHERE bid = 8;
3080582906387216276 | -00:00:00.000321 | UPDATE pgbench_branches SET
bbalance = bbalance + -377 WHERE bid = 6; 2064869707185898531 |
-00:00:00.000501 | END; 2064869707185898531 | -00:00:00.000502 | END; (5
rows)
```

Cheers,
Costa


Re: Performance regression when adding LIMIT 1 to a query

2025-04-07 Thread Costa Alexoglou
On Mon, Apr 7, 2025 at 4:06 PM Artur Zakirov  wrote:

> On Mon, 7 Apr 2025 at 14:45, Costa Alexoglou  wrote:
> > ...
> > with a plan:
> > ```
> > Gather Merge  (cost=115584.47..118515.35 rows=25120 width=824) (actual
> time=46.004..74.267 rows=29653 loops=1)
> >   Workers Planned: 2
> >   Workers Launched: 2
> >   ->  Sort  (cost=114584.45..114615.85 rows=12560 width=824) (actual
> time=41.200..47.322 rows=9884 loops=3)
> > Sort Key: id
> > Sort Method: external merge  Disk: 16360kB
> > Worker 0:  Sort Method: external merge  Disk: 15552kB
> > Worker 1:  Sort Method: external merge  Disk: 14536kB
> > ->  Parallel Bitmap Heap Scan on databases_metrics
> (cost=990.77..109175.83 rows=12560 width=824) (actual time=3.326..14.295
> rows=9884 loops=3)
> >   Recheck Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> >   Filter: ((metrics -> 'perf_average_query_runtime'::text)
> IS NOT NULL)
> >   Rows Removed by Filter: 68
> >   Heap Blocks: exact=4272
> >   ->  Bitmap Index Scan on
> idx_databases_metrics_instance_date_custom_created_debugging
> (cost=0.00..983.24 rows=30294 width=0) (actual time=3.786.786 rows=29856
> loops=1)"},
> > Index Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> > ...
> > With a plan:
> > ```
> > Limit  (cost=0.43..229.66 rows=1 width=824) (actual
> time=7538.004..7538.005 rows=1 loops=1)
> >   ->  Index Scan using databases_metrics_pkey on databases_metrics
> (cost=0.43..6909156.38 rows=30142 width=824) (actual
> time=7538.002..7538.003 rows=1 loops=1)
> > Filter: ((created_at >= '2023-03-15 10:00:00+00'::timestamp with
> time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time
> zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND
> (db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
> > Rows Removed by Filter: 10244795
> > Planning Time: 0.128 ms
> > Execution Time: 7538.032 ms
> > ```
>
> On your second query Postgres uses the index "databases_metrics_pkey".
> I assume that it is built using the "id" column. It could be very fast
> with the statement "ORDER BY ... LIMIT", but due to the additional
> filter Postgres firstly has to remove 10mln rows, which doesn't
> satisfy the filter, only to reach one single row.
>
> On the first query Postgres has to read and sort only 29k rows using
> the index "idx_databases_metrics_instance_date_custom_created_debugging",
> which is better suited for the used filter if it includes the columns
> used in the filter.
>
> I'm not sure why Postgres chooses the index "databases_metrics_pkey".
> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
> on the table?
>
> --
> Kind regards,
> Artur
>


> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
on the table

Yes, I just tried this, nothing changed.

I found this post:
https://bohanzhang.me/assets/blogs/order_by_limit/order_by_limit.html

And when applied the suggestion to add `+0` to the "order by" it worked
faster:
```
SELECT "databases_metrics"."metrics"
FROM
  "databases_metrics"
WHERE
  (
"databases_metrics"."created_at" >= '2023-03-15
10:00:00+00:00'::timestamptz
AND "databases_metrics"."db_instance_id" =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
AND "databases_metrics"."created_at" <= '2025-04-03
10:00:00+00:00'::timestamptz
AND (
  metrics -> 'perf_average_query_runtime' IS NOT NULL
)
  )
ORDER BY
  "databases_metrics"."id"+0 ASC LIMIT 1;
```

the plan:
```
Limit  (cost=97195.71..97195.71 rows=1 width=821) (actual
time=38.084..38.086 rows=1 loops=1)
  ->  Sort  (cost=97195.71..97261.91 rows=26482 width=821) (actual
time=38.083..38.084 rows=1 loops=1)
Sort Key: ((id + 0))
Sort Method: top-N heapsort  Memory: 27kB
->  Bitmap Heap Scan on databases_metrics  (cost=867.87..97063.30
rows=264

Performance regression when adding LIMIT 1 to a query

2025-04-07 Thread Costa Alexoglou
Hey folks,

I faced an interesting regression and would love to have some help
understanding why this happened.

The postgres version if it's of any use is PostgreSQL 16.3 .

The following query:
```
EXPLAIN ANALYZE
SELECT "databases_metrics"."metrics"
FROM
  "databases_metrics"
WHERE
  (
"databases_metrics"."created_at" >= '2023-03-15
10:00:00+00:00'::timestamptz
AND "databases_metrics"."db_instance_id" =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
AND "databases_metrics"."created_at" <= '2025-04-03
10:00:00+00:00'::timestamptz
AND (
  metrics -> 'perf_average_query_runtime' IS NOT NULL
)
  )
ORDER BY
  "databases_metrics"."id" ASC;
```

with a plan:
```
Gather Merge  (cost=115584.47..118515.35 rows=25120 width=824) (actual
time=46.004..74.267 rows=29653 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Sort  (cost=114584.45..114615.85 rows=12560 width=824) (actual
time=41.200..47.322 rows=9884 loops=3)
Sort Key: id
Sort Method: external merge  Disk: 16360kB
Worker 0:  Sort Method: external merge  Disk: 15552kB
Worker 1:  Sort Method: external merge  Disk: 14536kB
->  Parallel Bitmap Heap Scan on databases_metrics
 (cost=990.77..109175.83 rows=12560 width=824) (actual time=3.326..14.295
rows=9884 loops=3)
  Recheck Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))"},
  Filter: ((metrics -> 'perf_average_query_runtime'::text) IS
NOT NULL)
  Rows Removed by Filter: 68
  Heap Blocks: exact=4272
  ->  Bitmap Index Scan on
idx_databases_metrics_instance_date_custom_created_debugging
 (cost=0.00..983.24 rows=30294 width=0) (actual time=3.786.786 rows=29856
loops=1)"},
Index Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))"},
Planning Time: 0.126 ms
Execution Time: 79.334 ms
```

Results in an execution time of `79.334ms` which is ok.


Then if I only add a `LIMIT 1` at the end of the query, I get dramatically
slower execution:

```
EXPLAIN ANALYZE
SELECT "databases_metrics"."metrics"
FROM
  "databases_metrics"
WHERE
  (
"databases_metrics"."created_at" >= '2023-03-15
10:00:00+00:00'::timestamptz
AND "databases_metrics"."db_instance_id" =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
AND "databases_metrics"."created_at" <= '2025-04-03
10:00:00+00:00'::timestamptz
AND (
  metrics -> 'perf_average_query_runtime' IS NOT NULL
)
  )
ORDER BY
  "databases_metrics"."id" ASC LIMIT 1;
```

With a plan:
```
Limit  (cost=0.43..229.66 rows=1 width=824) (actual time=7538.004..7538.005
rows=1 loops=1)
  ->  Index Scan using databases_metrics_pkey on databases_metrics
 (cost=0.43..6909156.38 rows=30142 width=824) (actual
time=7538.002..7538.003 rows=1 loops=1)
Filter: ((created_at >= '2023-03-15 10:00:00+00'::timestamp with
time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time
zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND
(db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
Rows Removed by Filter: 10244795
Planning Time: 0.128 ms
Execution Time: 7538.032 ms
```

Any clue why this may be happening?