md5 issues Postgres14 on OL7

2021-12-20 Thread Michael Mühlbeyer
Hi all,

we're currently facing a strange behavior with Postgres14.1 on Oracle Linux 7.9
using md5.

a basic statement leads to an out-of-memory error:

postgres=# select md5('just a test');
ERROR:  out of memory

Anyone else facing the same issue?
It may be related to hardening though disabling SELinux didn't solve the issue.

Thanks in advance,
Michael




Re: Best Strategy for Large Number of Images

2021-12-20 Thread Estevan Rech
How is this folder structure like 10,000 folders? and the backup of it, how
long does it take?


Re: Best Strategy for Large Number of Images

2021-12-20 Thread Andreas Joseph Krogh

På mandag 20. desember 2021 kl. 11:29:57, skrev Estevan Rech <
softr...@gmail.com >: 

How is this folder structure like 10,000 folders? and the backup of it, how 
long does it take? 

I recommend using SeaweedFS as blob-store, and store metadata (folder, size, 
filenames etc.) in DB. It has excellent HA and backup mechanisms. 



-- 
Andreas Joseph Krogh 
CTO / Partner - Visena AS 
Mobile: +47 909 56 963 
andr...@visena.com  
www.visena.com  
  


Re: Best Strategy for Large Number of Images

2021-12-20 Thread Imre Samu
> ... I have about 2 million images ...
> folder structure

The "Who's On First" gazetteer with ~ 26M geojson records - using 3-number
chunks subfolder structure.

"Given a Who's On First ID its (relative) URI can be derived by splitting
the ID in to 3-number chunks representing nested subdirectories, followed
by filename consisting of the ID followed by .geojson.For example the
ID for Montréal is 101736545 which becomes:   101/736/545/101736545.geojson"
https://whosonfirst.org/docs/uris/

it is working .. but this is also not optimal

"As of this writing it remains clear that this approach (lots of tiny files
parented by lots of nested directories) can be problematic. We may be
forced to choose another approach, like fewer subdirectories but nothing
has been decided and anything we do will be backwards compatible." ( from
https://whosonfirst.org/data/principles/ )

Now  the structure have been migrated to per-country repositories  (
https://whosonfirst.org/blog/2019/05/09/changes/ )
so the US structure is:
https://github.com/whosonfirst-data/whosonfirst-data-admin-us/tree/master/data
 or
https://github.com/whosonfirst-data/whosonfirst-data-admin-us/blob/master/data/907/132/693/907132693.geojson

maybe you can adopt some ideas.
imho:  with 3-number chunks representing nested subdirectories - you can
choose more file systems / hosting solutions ..

regards,
 Imre

Estevan Rech  ezt írta (időpont: 2021. dec. 20., H,
11:30):

> How is this folder structure like 10,000 folders? and the backup of it,
> how long does it take?
>


How to reduce query planning time (10s)

2021-12-20 Thread iulian dragos
Hi,

I was analyzing the query performance in a certain code path and noticed
that practically all of the query time is spent planning (11s planning,
200ms execution time). Here is the output of EXPLAIN ANALYZE. I tried using
a prepared statement with three parameters (run_id, and the two text
comparisons) and the performance is similar, even when run multiple times
with the same parameters. Could someone give me some hints on how to speed
this query up?

 explain analyze SELECT x13.run_id, x19.text, x13.id, x17.status

 FROM module_result x13

 INNER JOIN result_name x14 ON x13.name_id = x14.id

 INNER JOIN test_result x17 ON x13.id = x17.module_result_id

 INNER JOIN result_name x19 ON x17.name_id = x19.id

 WHERE x19.text IN ('devtools/devbox/devbox_test_scalastyle') AND x13.run_id
IN (3769263) AND x14.text = 'Lint-Pr'

+--+

| QUERY PLAN
|

|--|

| Nested Loop  (cost=993.29..160107.01 rows=1 width=557) (actual
time=0.147..0.148 rows=0 loops=1)
  |

|   Join Filter: (x13.id = x17.module_result_id)
|

|   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
time=0.147..0.147 rows=0 loops=1)
|

| ->  Index Scan using result_name_text_key on result_name x14
 (cost=1.69..9.71
rows=1 width=8) (actual time=0.146..0.147 rows=0 loops=1) |

|   Index Cond: (text = 'Lint-Pr'::text)
|

| ->  Index Scan using module_result_run_id_name_id_idx on
module_result x13  (cost=0.57..8.59 rows=1 width=20) (never executed)
|

|   Index Cond: ((run_id = 3769263) AND (name_id = x14.id))
|

|   ->  Nested Loop  (cost=991.02..160087.73 rows=77 width=553) (never
executed)
|

| ->  Index Scan using result_name_text_key on result_name x19
 (cost=1.69..9.71
rows=1 width=549) (never executed)|

|   Index Cond: (text =
'devtools/devbox/devbox_test_scalastyle'::text)
|

| ->  Bitmap Heap Scan on test_result x17  (cost=989.33..159674.48
rows=40354 width=20) (never executed)   |

|   Recheck Cond: (name_id = x19.id)
|

|   ->  Bitmap Index Scan on test_result_name_id_idx
(cost=0.00..979.24
rows=40354 width=0) (never executed)  |

| Index Cond: (name_id = x19.id)
|

| Planning Time: 11257.713 ms
|

| Execution Time: 0.204 ms
|

+--+

EXPLAIN

Time: 11.505s (11 seconds), executed in: 11.496s (11 seconds)



explain analyze execute test_history_prep2(3769263, 'Lint-Pr',
'devtools/devbox/devbox_test_scalastyle')

+--+

| QUERY PLAN
|

|--|

| Nested Loop  (cost=993.30..160114.89 rows=1 width=557) (actual
time=0.173..0.173 rows=0 loops=1)
  |

|   Join Filter: (x13.id = x17.module_result_id)
|

|   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
time=0.172..0.173 rows=0 loops=1)
|

| ->  Index Scan using result_name_text_key on result_name x14
 (cost=1.69..9.71
rows=1 width=8) (actual time=0.172..0.172 rows=0 loops=1) |

|   Index Cond: (text = 'Lint-Pr'::text)
|

| ->  Index Scan using module_result_run_id_name_id_idx on
module_result x13  (cost=0.57..8.59 rows=1 width=20) (never executed)
|

|   Index Cond: ((run_id = 3769263) AND (name_id = x14.id))
|

|   ->  Nested Loop  (cost=991.03..160095.61 rows=77 width=553) (never
executed)
|

| ->  Index Scan using result_n

Re: How to reduce query planning time (10s)

2021-12-20 Thread Pavel Stehule
Hi

po 20. 12. 2021 v 13:31 odesílatel iulian dragos <
iulian.dra...@databricks.com> napsal:

> Hi,
>
> I was analyzing the query performance in a certain code path and noticed
> that practically all of the query time is spent planning (11s planning,
> 200ms execution time). Here is the output of EXPLAIN ANALYZE. I tried using
> a prepared statement with three parameters (run_id, and the two text
> comparisons) and the performance is similar, even when run multiple times
> with the same parameters. Could someone give me some hints on how to speed
> this query up?
>
>  explain analyze SELECT x13.run_id, x19.text, x13.id, x17.status
>
>  FROM module_result x13
>
>  INNER JOIN result_name x14 ON x13.name_id = x14.id
>
>  INNER JOIN test_result x17 ON x13.id = x17.module_result_id
>
>  INNER JOIN result_name x19 ON x17.name_id = x19.id
>
>  WHERE x19.text IN ('devtools/devbox/devbox_test_scalastyle') AND x13.run_id
> IN (3769263) AND x14.text = 'Lint-Pr'
>
>
> +--+
>
> | QUERY PLAN
>
> |
>
>
> |--|
>
> | Nested Loop  (cost=993.29..160107.01 rows=1 width=557) (actual
> time=0.147..0.148 rows=0 loops=1)
> |
>
> |   Join Filter: (x13.id = x17.module_result_id)
>
> |
>
> |   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
> time=0.147..0.147 rows=0 loops=1)
>   |
>
> | ->  Index Scan using result_name_text_key on result_name x14  
> (cost=1.69..9.71
> rows=1 width=8) (actual time=0.146..0.147 rows=0 loops=1) |
>
> |   Index Cond: (text = 'Lint-Pr'::text)
>
> |
>
> | ->  Index Scan using module_result_run_id_name_id_idx on
> module_result x13  (cost=0.57..8.59 rows=1 width=20) (never executed)
>   |
>
> |   Index Cond: ((run_id = 3769263) AND (name_id = x14.id))
>
> |
>
> |   ->  Nested Loop  (cost=991.02..160087.73 rows=77 width=553) (never
> executed)
>   |
>
> | ->  Index Scan using result_name_text_key on result_name x19  
> (cost=1.69..9.71
> rows=1 width=549) (never executed)|
>
> |   Index Cond: (text =
> 'devtools/devbox/devbox_test_scalastyle'::text)
>   |
>
> | ->  Bitmap Heap Scan on test_result x17  (cost=989.33..159674.48
> rows=40354 width=20) (never executed)   |
>
> |   Recheck Cond: (name_id = x19.id)
>
> |
>
> |   ->  Bitmap Index Scan on test_result_name_id_idx  
> (cost=0.00..979.24
> rows=40354 width=0) (never executed)  |
>
> | Index Cond: (name_id = x19.id)
>
> |
>
> | Planning Time: 11257.713 ms
>
> |
>
> | Execution Time: 0.204 ms
>
> |
>
>
> +--+
>
> EXPLAIN
>
> Time: 11.505s (11 seconds), executed in: 11.496s (11 seconds)
>
>
>
> explain analyze execute test_history_prep2(3769263, 'Lint-Pr',
> 'devtools/devbox/devbox_test_scalastyle')
>
>
> +--+
>
> | QUERY PLAN
>
> |
>
>
> |--|
>
> | Nested Loop  (cost=993.30..160114.89 rows=1 width=557) (actual
> time=0.173..0.173 rows=0 loops=1)
> |
>
> |   Join Filter: (x13.id = x17.module_result_id)
>
> |
>
> |   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
> time=0.172..0.173 rows=0 loops=1)
>   |
>
> | ->  Index Scan using result_name_text_key on result_name x14  
> (cost=1.69..9.71
> rows=1 width=8) (actual time=0.172..0.172 rows=0 loops=1) |
>
> |   Index Cond: (text = 'Lint-Pr'::text)
>
> |
>
> | ->  Index Scan using module_result_run_id_name_id_idx on
> module_result x13  (cost=0.57..8.59 rows=1 width=20) (never executed)
>   |
>
> |   Index Cond: ((run_id = 3769263) AND (name_id = x14.id))
>
> |
>
> |   ->  Nested Loop  (cost=991.03..160095.61 rows=77 width=553) (never
> executed)
>   |
>
> | ->  Index Scan using result_name_text_key on result_name x19  
> (cost=1.69..9.71
> rows=1 width=549) (never executed)|
>
> |   Index Cond: (text =
> 'devtools/devbox/devbox_test_scalastyle'::text)
>   |
>
> | ->  Bitmap Heap Scan on test_result x17  (cost=989.34..159682.34
> rows=40356 width=20) (never executed)   |
>
> |   Recheck Cond: (name_id = x19.id)
>
> |
>
> |   ->  Bitm

Reg. static SQL code analysis tool for PostgreSQL

2021-12-20 Thread holistic.dev
Hi everyone!

DB-related workflow can rapidly eat up time as the complexity of
projects grows. Tracking details and structures gets harder and the
risk of missing an issue that could affect architecture or data
structure dependencies increases. The result is undermined
performance, hours wasted on debugging or catastrophic failures like
data leaks and security flows.

With that in mind, we developed Holistic (holistic.dev) – an online
static SQL code analysis tool for PostgreSQL. Holistic is automatic
(less than 10 minutes to set up), safe (no direct DB connection or
access to user data by design) and comprehensive (proprietary corpus
of recommendation rules).
More than 200 DB architects and administrators, data engineers,
analysts and product managers currently use it to detect database
performance issues, reduce infrastructure costs, eliminate errors,
prevent data leakage and security issues.

See the demo (holistic.dev/playground) and sign up to start working
with Holistic now – it’s in FREE (no hidden costs, upsales or quotas)
public beta. Onboarding is easy:

1. Sign up (app.holistic.dev) and get the holistic.dev API key in the
client area. See more: app.holistic.dev/settings.
2. Create new project in the client area.
3. Import your database schema (DDL, without entries or contents) in
SQL format and upload it to your project. See more:
docs.holistic.dev/#extract-ddl-from-database
4. Activate internal DB tools (docs.holistic.dev/#pg_stat_statements)
for queries monitoring (pg_stat_statements or slow query log) on the
server with the database you want to monitor (cloud/on-premise).
5. Create a cron job (docs.holistic.dev/#pg_stat_statements) that will
regularly upload queries to our holistic.dev API.
6. You'll get a list of query issues as soon as they're processed in
the client area (app.holistic.dev).

Done!

We would really appreciate your comments, questions or feedback at
i...@holistic.dev.
Thanks,
Holistic Team




Update concurrency

2021-12-20 Thread Sylvain Déve
Hello,

On a server, I have an API which is called by one or multiple clients. The 
server can receive, almost at the same time, multiple update instructions for 
the same PostgreSQL attribute. These requests do not arrive at the exact same 
time, and I would just like to execute them one after another. Because of the 
(Flask) API, I cannot control that a previous call is over, so the new update 
can be called before the previous one is completed. I use psycopg2 to interface 
with PostgreSQL, and I would except psycopg2/PostgreSQL to be able to queue the 
updates and proceed them quietly one after another (maybe not even in order), 
but it does not seem to work that way.

Below is a simplified version of the function called on the server when an 
update is required. The attribute to be updated is a JSONB object, if that 
matters. Hence there is 'path' to determine which part of the JSONB object to 
update. I copy the Python code, not only the PostgreSQL code, because it might 
be part of the problem. So the API call looks like this:

def pg_update(data, path):
conn = psycopg2.connect(...) # always the same database.
cur = conn.cursor()

# JSONB update for a single selected row and column.
# 'column' and 'select_row' are determined with the parameters 'data' and 
'path'.
command = (
f"""UPDATE MY_TABLE SET """
f"""{column} = jsonb_merge({column}, %s) """
f"""WHERE {select_row};"""
)

cur.execute(command, [Json(data)])
conn.commit()

When called twice in a row, this call leads to the error (at "cur.execute"):

psycopg2.errors.InternalError_: tuple concurrently updated

Since the server keeps running all the time, I can also define conn outside the 
API call "pg_update" so that all calls are processed with the same psycopg2 
connection. But then, when two updates are requested for the same attribute, I 
get instead:

psycopg2.ProgrammingError: execute cannot be used while an asynchronous 
query is underway

I forced async_ = False in psycopg2.connect, just in case. No change.

I also tried to lock like that:
command = (
f"""BEGIN; """
f"""SELECT * FROM MY_TABLE WHERE {select_row} FOR UPDATE; """
f"""UPDATE MY_TABLE SET """
f"""{column} = jsonb_merge({column}, %s) """
f"""WHERE {select_row};"""
f"""END;"""
)

To summarize, how can I allow for multiple updates of the same attributes, one 
after another, even when these updates are requested almost at the same time 
from independent, uncontrollable client requests? Maybe there is a need for a 
better locking mechanism? A queuing mechanism?

Thanks for your help!




Re: Update concurrency

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Sylvain Déve 
wrote:

>
> To summarize, how can I allow for multiple updates of the same attributes,
> one after another, even when these updates are requested almost at the same
> time from independent, uncontrollable client requests?
>

Most people use an application connect pool.  You replace “connect” with a
call to get a connection from the pool.  When you are done you release the
connect back to the pool.  Concurrency is then done by having a pool size
greater than one.  If you try doing more concurrent work than the pool size
allows the pool call should block and/lr return a nothing upon which you
loop and try again.

David J.


Re: md5 issues Postgres14 on OL7

2021-12-20 Thread Christoph Moench-Tegeder
Hi!

## Michael Mühlbeyer (michael.muehlbe...@trivadis.com):

> postgres=# select md5('just a test');
> ERROR:  out of memory

Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does
produce this behaviour.

Regards,
Christoph

-- 
Spare Space




Re: Update concurrency

2021-12-20 Thread Tom Lane
=?utf-8?Q?Sylvain_D=C3=A9ve?=  writes:
> # JSONB update for a single selected row and column.
> # 'column' and 'select_row' are determined with the parameters 'data' and 
> 'path'.
> command = (
> f"""UPDATE MY_TABLE SET """
> f"""{column} = jsonb_merge({column}, %s) """
> f"""WHERE {select_row};"""
> )

> When called twice in a row, this call leads to the error (at "cur.execute"):
> psycopg2.errors.InternalError_: tuple concurrently updated

That error should be pretty hard to reach from user code, and certainly
two successive UPDATEs ought not have anything to do with it.  I think
there is something you're not telling us about the context.  Does this
table have any triggers, weird constraints, or the like?  What is the
actual WHERE clause (ie, I wonder if you have simplified away some
relevant query detail)?  What PG version is this exactly?

regards, tom lane




Re: Update concurrency

2021-12-20 Thread Tom Lane
[ please keep the list cc'd ]

=?utf-8?Q?Sylvain_D=C3=A9ve?=  writes:
> Indeed I removed the important part here... I was including a function 
> definition ("create or replace function ...") in the call too. This was 
> temporary and dirty. After moving the definition of the function to the 
> initialization of the database, it solved everything... Defining the same 
> function multiple times, and I presume more or less at the same time, led to 
> problems. The table update is carried out finally without any problem...

Hah, now I can reproduce it:

regression=# create or replace function foo(int) returns int as 'select 1' 
language sql;
CREATE FUNCTION
regression=# begin;
BEGIN
regression=*# create or replace function foo(int) returns int as 'select 1' 
language sql;
CREATE FUNCTION

... in another session:

regression=# create or replace function foo(int) returns int as 'select 1' 
language sql;
<>

... in first session:

regression=*# commit;
COMMIT

and now the second session fails with

ERROR:  tuple concurrently updated

because both transactions are trying to update the same pre-existing
row of pg_proc.  (If the function didn't exist to start with, then
you get "duplicate key value violates unique constraint" instead.)

That's basically because internal catalog manipulations don't go
to the same lengths as user queries do to handle concurrent-update
scenarios nicely.  I'm not sure what would be involved in making
that better, but I am sure it'd be a lot of work :-(

regards, tom lane




Re: How to reduce query planning time (10s)

2021-12-20 Thread iulian dragos
Thanks for the suggestion!

Reading through the docs it looks like this might involve some downtime
(locking writes to the tables whos indexes are being rebuilt) and this is a
running system. I may need some time to setup a database copy and try to
reproduce the issue (dev deployment doesn't exhibit the performance
problem, so it must be related to table contents).

Is there any way to validate the bloated index hypothesis before I invest
too much in this direction?

iulian


On Mon, Dec 20, 2021 at 1:42 PM Pavel Stehule 
wrote:

> Hi
>
> po 20. 12. 2021 v 13:31 odesílatel iulian dragos <
> iulian.dra...@databricks.com> napsal:
>
>> Hi,
>>
>> I was analyzing the query performance in a certain code path and noticed
>> that practically all of the query time is spent planning (11s planning,
>> 200ms execution time). Here is the output of EXPLAIN ANALYZE. I tried using
>> a prepared statement with three parameters (run_id, and the two text
>> comparisons) and the performance is similar, even when run multiple times
>> with the same parameters. Could someone give me some hints on how to speed
>> this query up?
>>
>>  explain analyze SELECT x13.run_id, x19.text, x13.id, x17.status
>>
>>  FROM module_result x13
>>
>>  INNER JOIN result_name x14 ON x13.name_id = x14.id
>>
>>  INNER JOIN test_result x17 ON x13.id = x17.module_result_id
>>
>>  INNER JOIN result_name x19 ON x17.name_id = x19.id
>>
>>  WHERE x19.text IN ('devtools/devbox/devbox_test_scalastyle') AND x13.run_id
>> IN (3769263) AND x14.text = 'Lint-Pr'
>>
>>
>> +--+
>>
>> | QUERY PLAN
>>
>> |
>>
>>
>> |--|
>>
>> | Nested Loop  (cost=993.29..160107.01 rows=1 width=557) (actual
>> time=0.147..0.148 rows=0 loops=1)
>>   |
>>
>> |   Join Filter: (x13.id = x17.module_result_id)
>>
>> |
>>
>> |   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
>> time=0.147..0.147 rows=0 loops=1)
>>   |
>>
>> | ->  Index Scan using result_name_text_key on result_name x14  
>> (cost=1.69..9.71
>> rows=1 width=8) (actual time=0.146..0.147 rows=0 loops=1) |
>>
>> |   Index Cond: (text = 'Lint-Pr'::text)
>>
>> |
>>
>> | ->  Index Scan using module_result_run_id_name_id_idx on
>> module_result x13  (cost=0.57..8.59 rows=1 width=20) (never executed)
>> |
>>
>> |   Index Cond: ((run_id = 3769263) AND (name_id = x14.id))
>>
>> |
>>
>> |   ->  Nested Loop  (cost=991.02..160087.73 rows=77 width=553) (never
>> executed)
>> |
>>
>> | ->  Index Scan using result_name_text_key on result_name x19  
>> (cost=1.69..9.71
>> rows=1 width=549) (never executed)|
>>
>> |   Index Cond: (text =
>> 'devtools/devbox/devbox_test_scalastyle'::text)
>>   |
>>
>> | ->  Bitmap Heap Scan on test_result x17  (cost=989.33..159674.48
>> rows=40354 width=20) (never executed)   |
>>
>> |   Recheck Cond: (name_id = x19.id)
>>
>> |
>>
>> |   ->  Bitmap Index Scan on test_result_name_id_idx  
>> (cost=0.00..979.24
>> rows=40354 width=0) (never executed)  |
>>
>> | Index Cond: (name_id = x19.id)
>>
>> |
>>
>> | Planning Time: 11257.713 ms
>>
>> |
>>
>> | Execution Time: 0.204 ms
>>
>> |
>>
>>
>> +--+
>>
>> EXPLAIN
>>
>> Time: 11.505s (11 seconds), executed in: 11.496s (11 seconds)
>>
>>
>>
>> explain analyze execute test_history_prep2(3769263, 'Lint-Pr',
>> 'devtools/devbox/devbox_test_scalastyle')
>>
>>
>> +--+
>>
>> | QUERY PLAN
>>
>> |
>>
>>
>> |--|
>>
>> | Nested Loop  (cost=993.30..160114.89 rows=1 width=557) (actual
>> time=0.173..0.173 rows=0 loops=1)
>>   |
>>
>> |   Join Filter: (x13.id = x17.module_result_id)
>>
>> |
>>
>> |   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
>> time=0.172..0.173 rows=0 loops=1)
>>   |
>>
>> | ->  Index Scan using result_name_text_key on result_name x14  
>> (cost=1.69..9.71
>> rows=1 width=8) (actual time=0.172..0.172 rows=0 loops=1) |
>>
>> |   Index Cond: (text = 'Lint-Pr'::text)
>>
>> |
>>
>> | ->  Index Scan using module_result_run_id_name_id_idx on
>> module_result x13  (cost=0.57..8.59 rows=1 width=20) (never executed)
>> |
>>
>> | 

Re: How to reduce query planning time (10s)

2021-12-20 Thread Pavel Stehule
po 20. 12. 2021 v 17:50 odesílatel iulian dragos <
iulian.dra...@databricks.com> napsal:

> Thanks for the suggestion!
>
> Reading through the docs it looks like this might involve some downtime
> (locking writes to the tables whos indexes are being rebuilt) and this is a
> running system. I may need some time to setup a database copy and try to
> reproduce the issue (dev deployment doesn't exhibit the performance
> problem, so it must be related to table contents).
>
> Is there any way to validate the bloated index hypothesis before I invest
> too much in this direction?
>

https://wiki.postgresql.org/wiki/Show_database_bloat

Pavel

>
> iulian
>
>
> On Mon, Dec 20, 2021 at 1:42 PM Pavel Stehule 
> wrote:
>
>> Hi
>>
>> po 20. 12. 2021 v 13:31 odesílatel iulian dragos <
>> iulian.dra...@databricks.com> napsal:
>>
>>> Hi,
>>>
>>> I was analyzing the query performance in a certain code path and noticed
>>> that practically all of the query time is spent planning (11s planning,
>>> 200ms execution time). Here is the output of EXPLAIN ANALYZE. I tried using
>>> a prepared statement with three parameters (run_id, and the two text
>>> comparisons) and the performance is similar, even when run multiple times
>>> with the same parameters. Could someone give me some hints on how to speed
>>> this query up?
>>>
>>>  explain analyze SELECT x13.run_id, x19.text, x13.id, x17.status
>>>
>>>  FROM module_result x13
>>>
>>>  INNER JOIN result_name x14 ON x13.name_id = x14.id
>>>
>>>  INNER JOIN test_result x17 ON x13.id = x17.module_result_id
>>>
>>>  INNER JOIN result_name x19 ON x17.name_id = x19.id
>>>
>>>  WHERE x19.text IN ('devtools/devbox/devbox_test_scalastyle') AND x13.run_id
>>> IN (3769263) AND x14.text = 'Lint-Pr'
>>>
>>>
>>> +--+
>>>
>>> | QUERY PLAN
>>>
>>> |
>>>
>>>
>>> |--|
>>>
>>> | Nested Loop  (cost=993.29..160107.01 rows=1 width=557) (actual
>>> time=0.147..0.148 rows=0 loops=1)
>>>   |
>>>
>>> |   Join Filter: (x13.id = x17.module_result_id)
>>>
>>> |
>>>
>>> |   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
>>> time=0.147..0.147 rows=0 loops=1)
>>> |
>>>
>>> | ->  Index Scan using result_name_text_key on result_name x14  
>>> (cost=1.69..9.71
>>> rows=1 width=8) (actual time=0.146..0.147 rows=0 loops=1) |
>>>
>>> |   Index Cond: (text = 'Lint-Pr'::text)
>>>
>>> |
>>>
>>> | ->  Index Scan using module_result_run_id_name_id_idx on
>>> module_result x13  (cost=0.57..8.59 rows=1 width=20) (never executed)
>>> |
>>>
>>> |   Index Cond: ((run_id = 3769263) AND (name_id = x14.id))
>>>
>>> |
>>>
>>> |   ->  Nested Loop  (cost=991.02..160087.73 rows=77 width=553) (never
>>> executed)
>>> |
>>>
>>> | ->  Index Scan using result_name_text_key on result_name x19  
>>> (cost=1.69..9.71
>>> rows=1 width=549) (never executed)|
>>>
>>> |   Index Cond: (text =
>>> 'devtools/devbox/devbox_test_scalastyle'::text)
>>> |
>>>
>>> | ->  Bitmap Heap Scan on test_result x17  (cost=989.33..159674.48
>>> rows=40354 width=20) (never executed)
>>> |
>>>
>>> |   Recheck Cond: (name_id = x19.id)
>>>
>>> |
>>>
>>> |   ->  Bitmap Index Scan on test_result_name_id_idx  
>>> (cost=0.00..979.24
>>> rows=40354 width=0) (never executed)  |
>>>
>>> | Index Cond: (name_id = x19.id)
>>>
>>> |
>>>
>>> | Planning Time: 11257.713 ms
>>>
>>> |
>>>
>>> | Execution Time: 0.204 ms
>>>
>>> |
>>>
>>>
>>> +--+
>>>
>>> EXPLAIN
>>>
>>> Time: 11.505s (11 seconds), executed in: 11.496s (11 seconds)
>>>
>>>
>>>
>>> explain analyze execute test_history_prep2(3769263, 'Lint-Pr',
>>> 'devtools/devbox/devbox_test_scalastyle')
>>>
>>>
>>> +--+
>>>
>>> | QUERY PLAN
>>>
>>> |
>>>
>>>
>>> |--|
>>>
>>> | Nested Loop  (cost=993.30..160114.89 rows=1 width=557) (actual
>>> time=0.173..0.173 rows=0 loops=1)
>>>   |
>>>
>>> |   Join Filter: (x13.id = x17.module_result_id)
>>>
>>> |
>>>
>>> |   ->  Nested Loop  (cost=2.27..18.32 rows=1 width=12) (actual
>>> time=0.172..0.173 rows=0 loops=1)
>>> |
>>>
>>> | ->  Index Scan using result_name_text_key on result_name x14  
>>> (cost=1.69..9.71
>>> ro

Re: How to reduce query planning time (10s)

2021-12-20 Thread Tom Lane
iulian dragos  writes:
> Is there any way to validate the bloated index hypothesis before I invest
> too much in this direction?

A plain old VACUUM ought to take care of most of the problem.

regards, tom lane




Re: Packages, inner subprograms, and parameterizable anonymous blocks for PL/pgSQL

2021-12-20 Thread Bryn Llewellyn
b...@yugabyte.com wrote:
> 
>> p...@bowt.ie  wrote:
>> 
>>> b...@yugabyte.com  wrote:
>>> 
>>> Modular design recommends exposing functionality through a purpose oriented 
>>> interface and hiding all implementation details from the API’s user. A 
>>> package achieves this with declarative syntax via the spec/body separation. 
>>> The body encapsulates as many (top-level) subprograms as you want. Each of 
>>> these is visible to all of its peers. But none is visible outside of the 
>>> package unless the spec declares that it should be. This is a simple opt-in 
>>> scheme.
>> 
>> I still don’t get it. It sounds like you’re mostly talking about 
>> encapsulation, or Information hiding, for stored procedures. I can certainly 
>> see how plpgsql doesn’t do those things very well, but it still seems like 
>> there might be a lot of nuance that isn’t getting across. The list of 
>> specific features that seem to be missing are not unreasonable, 
>> individually, and yet it feels like I cannot see some bigger picture that's 
>> apparent to you.
>> 
>> Maybe you should explain your position by way of a motivating example, 
>> involving a real world use case. Something that makes the issues concrete. 
>> Are these items compelling because of how they allow an organization to 
>> deploy a program in a production environment, complete with version control? 
>> Does it have something to do with decoupling the mutable business data 
>> stored in tables from the programs contained/run in the same database?
> 
> I can certainly make up an example. I’ll do this over the weekend. However, I 
> fear that it will be time wasted because at least some of the addressees here 
> who’ve expressed strong opposition to the notion of PL/pgSQL packages must 
> understand very well what they’re objecting to. For example, 
> pavel.steh...@gmail.com  with his “schema 
> variables, LET command” work.
> 
> Anyway… I’ll give it my best shot. I’ll try to address your specific 
> questions in my follow-up reply. Hang on for a couple of days, please.

I made a start on this. But I want to think carefully about the example use 
case(s). So I won’t promise a delivery date. Like I said, I don’t expect to 
change anybody’s mind. But I do hope that I might get some useful suggestions 
on how, using PG Version 14, I can best meet the requirements that I’ll aim to 
explain.

I’m still hoping that I might get some pointers to whitepapers or blog posts 
that expand on those bullets that I quoted from the PG doc: «Instead of 
packages, use schemas to organize your functions into groups.» and «Since there 
are no packages, there are no package-level variables either. This is somewhat 
annoying. You can keep per-session state in temporary tables instead.»



Re: Packages, inner subprograms, and parameterizable anonymous blocks for PL/pgSQL

2021-12-20 Thread Pavel Stehule
Hi


> I’m still hoping that I might get some pointers to whitepapers or blog
> posts that expand on those bullets that I quoted from the PG doc: «Instead
> of packages, use schemas to organize your functions into groups.» and
> «Since there are no packages, there are no package-level variables either.
> This is somewhat annoying. You can keep per-session state in temporary
> tables instead.»
>

I fixed a patch https://commitfest.postgresql.org/36/1608/ so you can check
it.

Using temporary tables instead of session variables is not too practical.
There are more alternative ways - a) one extension, b) using global
variables from Perl, c) using global configuration variables.

The @c is most common today

http://okbob.blogspot.com/2021/06/calculating-derivation-from.html


Regards

Pavel


Freeing transient memory in aggregate functions

2021-12-20 Thread Matt Magoffin
I have a question about trying to keep memory from growing too much in a C 
aggregate function with pass-by-reference types. I am trying to keep track of a 
last-seen value in my aggregate state, so I have code roughly doing this:

Datum current;
MemoryContext aggContext;
AggCheckCallContext(fcinfo, &aggContext);
old = MemoryContextSwitchTo(aggContext);

if (!PG_ARGISNULL(0)) {
  current = PG_GETARG_DATUM(0);
  state->last = datumCopy(¤t, typbyval, typlen);
}
MemoryContextSwitchTo(old);

I’m essentially doing a datumCopy() on every non-null input value. I was 
wondering if there is a way to free the previously copied datum, since I don’t 
really need it anymore? Something like

if (!PG_ARGISNULL(0)) {
  current = PG_GETARG_DATUM(0);
  if (state->last != NULL) {
pfree(state->last);
  }
  state->last = datumCopy(¤t, typbyval, typlen);
}

I wasn’t sure it was allowed to call pfree() like this. My actual function is 
dealing with array input values, and for large sets of inputs I didn’t want to 
grow memory use as large as the entire data set being aggregated.

Kind regards,
Matt

Re: PGBouncer logs explanation required

2021-12-20 Thread Peter Eisentraut

On 19.12.21 12:50, Shubham Mittal wrote:
2021-11-25 14:45:45.244 IST [18307] LOG C-0x6ae270: 
sit/postgres@abcdlogin attempt: db=sit user=postgres tls=no
2021-11-25 14:45:45.299 IST [18307] LOG S-0x6b4f48: sit/postgres@abcd 
new connection to server (from abcd)
2021-11-25 14:46:17.843 IST [18307] LOG S-0x6b4d10: sit/postgres@abcd 
*closing because: query timeout (age=185s)*
2021-11-25 14:46:17.843 IST [18307] LOG C-0x6ae038: sit/postgres@abcd 
*closing because: query timeout (age=185s)*
2021-11-25 14:46:17.843 IST [18307] WARNING C-0x6ae038: 
sit/postgres@*abcd pooler error: query timeout*
2021-11-25 14:46:25.763 IST [18307] LOG stats: 0 xacts/s, 0 queries/s, 
in 30 B/s, out 141 B/s, xact 3660 us, query 4362 us, wait 152 us


Does this log mean that connection is closed and returned to the pool 
and can be reused again?? Or something else?? Please help.


A closed connection cannot be reused, since it's closed.  But after the 
connection is closed, there is a free slot in the pool to possibly open 
a new connection if needed.





Re: Freeing transient memory in aggregate functions

2021-12-20 Thread Tom Lane
Matt Magoffin  writes:
> I’m essentially doing a datumCopy() on every non-null input value.

If you're doing that in the aggContext, you definitely need to free
the prior one(s).

> I was wondering if there is a way to free the previously copied datum, since 
> I don’t really need it anymore? Something like

>   if (state->last != NULL) {
> pfree(state->last);
>   }

Not quite like that.  Look into nodeAgg.c, which solves a similar problem
for the transvalues themselves with code like

/* forget the old value, if any */
if (!oldIsNull && !pertrans->inputtypeByVal)
pfree(DatumGetPointer(oldVal));

regards, tom lane




How best to turn select result into options like 'a|b|c''

2021-12-20 Thread Shaozhong SHI
Is there a way to turn select result into something like 'a|b|c' .
Regards,
David


Re: How best to turn select result into options like 'a|b|c''

2021-12-20 Thread Ray O'Donnell

On 20 December 2021 22:10:19 Shaozhong SHI  wrote:

Is there a way to turn select result into something like 'a|b|c' .
Regards,
David


On my phone so hard to reply properly, but if a, b and c are in separate 
rows then you need the array_agg() and array_to_string() functions.


Ray.



Re: How best to turn select result into options like 'a|b|c''

2021-12-20 Thread David G. Johnston
On Mon, Dec 20, 2021 at 3:24 PM Ray O'Donnell  wrote:

> On 20 December 2021 22:10:19 Shaozhong SHI  wrote:
>
>> Is there a way to turn select result into something like 'a|b|c' .
>> Regards,
>> David
>>
>
> On my phone so hard to reply properly, but if a, b and c are in separate
> rows then you need the array_agg() and array_to_string() functions.
>
>
Or just: string_agg()

David J.


Re: Max connections reached without max connections reached

2021-12-20 Thread James Sewell
>
>
> I'm guessing this won't be back-patched? Is it possible to somehow read
> this information from a C function?
>
>

Ok it is possible, I've got a C extension up and running which hooks
ExecutorStart, then once for each TX ( I monitor the nesting depth like in
pg_stat_statements, and only attach at the top level) attaches a sub
transaction callback, tracking start subtransaction events and incrementing
a counter / keeping track of the worst offenders in a hashmap.

This seems to work very well - but I've got a question. How many sub
transactions would you expect the following anon block and function to
create respectively? The first seems to report 9, and the second 10 - is
this expected? It feels like it might be - I just want to make sure.

do $$
begin
   for counter in 1..10 loop
   begin
   INSERT INTO a VALUES (counter);
exception
when no_data_found then  raise exception 'gosh';
  end;
   end loop;
end;
$$


CREATE OR REPLACE FUNCTION public.create_subtransactions()
 RETURNS void
 LANGUAGE plpgsql
AS $function$
begin
   for counter in 1..10 loop
 begin
   INSERT INTO a VALUES (counter);
 exception
   when no_data_found then  raise exception 'gosh';
 end;
  end loop;
end;
$function$

- James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-12-20 Thread James Sewell
> Ok it is possible, I've got a C extension up and running which hooks
> ExecutorStart, then once for each TX ( I monitor the nesting depth like in
> pg_stat_statements, and only attach at the top level) attaches a sub
> transaction callback, tracking start subtransaction events and incrementing
> a counter / keeping track of the worst offenders in a hashmap.
>

Sorry that should be ExecutorRun  - although I'm a little confused about
which is the correct choice.

-James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Sorting difference in version 10 vs 13

2021-12-20 Thread Akheel Ahmed

Hi


I have performed logical replication from a 10.18 to 13.4.

in 13.4, order by asc gives underscores higher than other characters. 
all the LC parameters are identical across both instances.


Am I missing anything else?


Help appreciated.

Thanks





Re: Sorting difference in version 10 vs 13

2021-12-20 Thread Tom Lane
Akheel Ahmed  writes:
> in 13.4, order by asc gives underscores higher than other characters. 
> all the LC parameters are identical across both instances.
> Am I missing anything else?

OS-level differences, perhaps?

https://wiki.postgresql.org/wiki/Locale_data_changes

regards, tom lane




Re: Sorting difference in version 10 vs 13

2021-12-20 Thread Akheel Ahmed
Both are Amazon AWS Aurora instances. 10.x is on intel/amd64. 13.x is on 
their ARM based graviton.



select version();
   version
-
 PostgreSQL 13.4 on aarch64-unknown-linux-gnu, compiled by 
aarch64-unknown-linux-gnu-gcc (GCC) 7.4.0, 64-bit

(1 row)


select version();
 version
--
 PostgreSQL 10.18 on x86_64-pc-linux-gnu, compiled by 
x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit

(1 row)


On 21/12/2021 3:12 pm, Tom Lane wrote:

Akheel Ahmed  writes:

in 13.4, order by asc gives underscores higher than other characters.
all the LC parameters are identical across both instances.
Am I missing anything else?

OS-level differences, perhaps?

https://wiki.postgresql.org/wiki/Locale_data_changes

regards, tom lane

Re: Sorting difference in version 10 vs 13

2021-12-20 Thread Ron
Aurora is sufficiently different from vanilla Postgresql that you need to 
ask Amazon.


On 12/20/21 11:31 PM, Akheel Ahmed wrote:


Both are Amazon AWS Aurora instances. 10.x is on intel/amd64. 13.x is on 
their ARM based graviton.



select version();
   version
-
 PostgreSQL 13.4 on aarch64-unknown-linux-gnu, compiled by 
aarch64-unknown-linux-gnu-gcc (GCC) 7.4.0, 64-bit

(1 row)


select version();
 version
--
 PostgreSQL 10.18 on x86_64-pc-linux-gnu, compiled by 
x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit

(1 row)


On 21/12/2021 3:12 pm, Tom Lane wrote:

Akheel Ahmed  writes:

in 13.4, order by asc gives underscores higher than other characters.
all the LC parameters are identical across both instances.
Am I missing anything else?

OS-level differences, perhaps?

https://wiki.postgresql.org/wiki/Locale_data_changes

regards, tom lane


--
Angular momentum makes the world go 'round.