Re: pg_dump and search_path

2019-07-09 Thread Laurenz Albe
On Mon, 2019-07-08 at 23:54 -0700, Igal @ Lucee.org wrote:
> > I have a custom search_path:
> > 
> > # show search_path;
> >search_path
> > --
> >  "staging, transient, pg_catalog"
> > (1 row)
> > 
> > I ran `pg_dump --schema-only` and the only reference in the output to 
> > search_path is:
> > 
> >   SELECT pg_catalog.set_config('search_path', '', false);
> > 
> > Then one of my functions which does not reference the full name of a 
> > table with its schema fails with "relation [rel-name] does not exist".
> > 
> > Is that a bug?  I have seen some old posts about this issue but am not 
> > sure if there is a ticket or why it still is an issue.
> > 
> Looks like this might be by design.  I will follow the links at 
> https://www.postgresql.org/message-id/flat/MWHPR14MB160079A6D9DC64A2F60E9004C0D00%40MWHPR14MB1600.namprd14.prod.outlook.com
>  
> and ask more questions if I have them.
> 
> I might need to add the schema name to the table in my function.

Right.

Allowing object lookup along the search_path during pg_restore opens
doors to abuse, because it can make a superuser inadvertedly execute
code crafted by an attacker.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Error: rows returned by function are not all of the same row type

2019-07-09 Thread Andrey Sychev


> Please do not top post on the Postgres lists.

Thank  you  for  the advice. I will take into account this rule in the
future.

> See contrib/tablefunc/tablefunc.c crosstab_hash() as an example.

Thank you. I hope this will be good starting point for me.

-- 
Best regards,

Andrey Sychev

 mailto:andrey.syc...@cifrasoft.com





Re: Active connections are terminated because of small wal_sender_timeout

2019-07-09 Thread AYahorau
Hello Everyone!

>  I do not think anybody thinks this is a bug.  Setting 
wal_sender_timeout
> too small is a configuration mistake.
I don't understand why it is a mistake. 1second is acceptable value for 
wal_sender_timeout.
Moreover the behaviour contradicts with the official description for 
wal_sender_timeout:
Terminate replication connections that are inactive longer than the 
specified number of milliseconds.

First  of all  the connection between the master and standby  was good in 
my example. The problem was in keepalive message processing (because of 
busy standby server).
So, here 2 variants are possible:
1). Inappropriate keepalive message processing which contradicts with 
wal_sender_timeout description.
2) Incorrect description of wal_sender_timeout parameter in the 
documentation. 

> Yeah. I don't see any bug here. Please note that it can be also a 
problem to set up a too high value in some configuration setups. The lack 
of flexibility in this area is why wal_sender_timeout has been switch to 
be
> user-settable in v12. In short you can configure it in the connection 
string to enforce a custom value per standby.
Thanks for this announcement, This enhancement looks very useful.

Best regards, 
Andrei Yahorau

PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Konstantin Malanchev
Hello,

I'm running PostgreSQL 11.4 on Linux 4.12.14 and I see the following issue 
while executing single one query:
ERROR:  could not resize shared
memory segment "/PostgreSQL.1596105766" to 536870912 bytes: No space left on 
device

In my postgresql.conf I set sharred_buffers=256MB, I see that it is applied:
SHOW shared_buffers;
 shared_buffers

 256MB

At the same time during the query execution, I see a lot of files in /dev/shm 
with the total size more than 256MB

ls -lh /dev/shm
total 1.4G
-rw--- 1 postgres postgres 193K Jul  8 08:39 PostgreSQL.1026343462
-rw--- 1 postgres postgres 4.0M Jul  8 10:23 PostgreSQL.1066463809
-rw--- 1 postgres postgres 8.0M Jul  8 10:23 PostgreSQL.1154587693
-rw--- 1 postgres postgres 4.0M Jul  8 08:47 PostgreSQL.1292794952
-rw--- 1 postgres postgres 128M Jul  8 10:23 PostgreSQL.130026740
-rw--- 1 postgres postgres 8.0M Jul  8 10:23 PostgreSQL.1377271816
-rw--- 1 postgres postgres  16M Jul  8 10:23 PostgreSQL.1453162423
-rw--- 1 postgres postgres 128M Jul  8 10:23 PostgreSQL.1496397787
-rw--- 1 postgres postgres 8.0M Jul  8 10:23 PostgreSQL.1541518903
-rw--- 1 postgres postgres  64M Jul  8 10:23 PostgreSQL.1554139410
-rw--- 1 postgres postgres 2.0M Jul  8 10:23 PostgreSQL.1563273542
-rw--- 1 postgres postgres 256M Jul  8 10:23 PostgreSQL.1604524364
-rw--- 1 postgres postgres  64M Jul  8 10:23 PostgreSQL.1624127960
-rw--- 1 postgres postgres 4.0M Jul  8 10:23 PostgreSQL.1674892421
-rw--- 1 postgres postgres 128M Jul  8 10:32 PostgreSQL.179085785
-rw--- 1 postgres postgres  32M Jul  8 10:23 PostgreSQL.1921401343
-rw--- 1 postgres postgres  32M Jul  8 10:23 PostgreSQL.1931571650
-rw--- 1 postgres postgres 2.0M Jul  8 10:23 PostgreSQL.2002232858
-rw--- 1 postgres postgres 2.0M Jul  8 10:23 PostgreSQL.2057901523
-rw--- 1 postgres postgres 2.0M Jul  8 10:23 PostgreSQL.2110310896
-rw--- 1 postgres postgres  65M Jul  8 08:39 PostgreSQL.2136390898
-rw--- 1 postgres postgres 182K Jul  8 08:39 PostgreSQL.261370809
-rw--- 1 postgres postgres  16M Jul  8 10:23 PostgreSQL.397419407
-rw--- 1 postgres postgres 4.0M Jul  8 10:23 PostgreSQL.431734656
-rw--- 1 postgres postgres 128M Jul  8 10:23 PostgreSQL.478359814
-rw--- 1 postgres postgres 8.0M Jul  8 10:23 PostgreSQL.489042863
-rw--- 1 postgres postgres  64M Jul  8 10:23 PostgreSQL.590987495
-rw--- 1 postgres postgres 3.4M Jul  8 08:39 PostgreSQL.62466476
-rw--- 1 postgres postgres  16M Jul  8 10:23 PostgreSQL.629400316
-rw--- 1 postgres postgres 6.8K Jul  8 08:39 PostgreSQL.741718396
-rw--- 1 postgres postgres  16M Jul  8 10:23 PostgreSQL.770970033
-rw--- 1 postgres postgres 1.0M Jul  8 10:23 PostgreSQL.776045115
-rw--- 1 postgres postgres  64M Jul  8 10:23 PostgreSQL.795742467
-rw--- 1 postgres postgres  32M Jul  8 10:23 PostgreSQL.835134775
-rw--- 1 postgres postgres  32M Jul  8 10:23 PostgreSQL.953710812

How can I configure limit for total shared memory size?


Best regards,
Konstantin



Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Jean Louis
* Konstantin Malanchev  [2019-07-09 11:51]:
> Hello,
> 
> I'm running PostgreSQL 11.4 on Linux 4.12.14 and I see the following issue 
> while executing single one query:
> ERROR:  could not resize shared
> memory segment "/PostgreSQL.1596105766" to 536870912 bytes: No space left on 
> device
> 
> In my postgresql.conf I set sharred_buffers=256MB, I see that it is applied:
> SHOW shared_buffers;
>  shared_buffers
> 
>  256MB
> 
> At the same time during the query execution, I see a lot of files in /dev/shm 
> with the total size more than 256MB
> 
> ls -lh /dev/shm
> 
> How can I configure limit for total shared memory size?

The limit is mostly set by the memory, as /dev/shm
is like virtual memory or RAM disk.

Increase the RAM.

Jean




Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Konstantin Malanchev
Hello Jean,

I have 8 GB RAM and /dev/shm size is 4GB, and there is no significant memory 
usage by other system processes. I surprised that Postgres uses more space in 
/dev/shm than sharred_buffers parameter allows, probably I don't understand 
what this parameter means.

I have no opportunity to enlarge total RAM and probably this query requires too 
much RAM to execute. Should Postgres just use HDD as temporary storage in this 
case?


Konstantin

> On 9 Jul 2019, at 12:53, Jean Louis  wrote:
> 
> * Konstantin Malanchev mailto:hom...@gmail.com>> 
> [2019-07-09 11:51]:
>> Hello,
>> 
>> I'm running PostgreSQL 11.4 on Linux 4.12.14 and I see the following issue 
>> while executing single one query:
>> ERROR:  could not resize shared
>> memory segment "/PostgreSQL.1596105766" to 536870912 bytes: No space left on 
>> device
>> 
>> In my postgresql.conf I set sharred_buffers=256MB, I see that it is applied:
>> SHOW shared_buffers;
>> shared_buffers
>> 
>> 256MB
>> 
>> At the same time during the query execution, I see a lot of files in 
>> /dev/shm with the total size more than 256MB
>> 
>> ls -lh /dev/shm
>> 
>> How can I configure limit for total shared memory size?
> 
> The limit is mostly set by the memory, as /dev/shm
> is like virtual memory or RAM disk.
> 
> Increase the RAM.
> 
> Jean



Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Jean Louis
* Konstantin Malanchev  [2019-07-09 12:10]:
> Hello Jean,
> 
> I have 8 GB RAM and /dev/shm size is 4GB, and there is no significant memory 
> usage by other system processes. I surprised that Postgres uses more space in 
> /dev/shm than sharred_buffers parameter allows, probably I don't understand 
> what this parameter means.
> 
> I have no opportunity to enlarge total RAM and probably this query requires 
> too much RAM to execute. Should Postgres just use HDD as temporary storage in 
> this case?

That I cannot know. I know that /dev/shm could
grow as much as available free RAM.

Jean




Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Thomas Munro
On Tue, Jul 9, 2019 at 10:15 PM Jean Louis  wrote:
> * Konstantin Malanchev  [2019-07-09 12:10]:
> > I have 8 GB RAM and /dev/shm size is 4GB, and there is no significant 
> > memory usage by other system processes. I surprised that Postgres uses more 
> > space in /dev/shm than sharred_buffers parameter allows, probably I don't 
> > understand what this parameter means.
> >
> > I have no opportunity to enlarge total RAM and probably this query requires 
> > too much RAM to execute. Should Postgres just use HDD as temporary storage 
> > in this case?
>
> That I cannot know. I know that /dev/shm could
> grow as much as available free RAM.

Hi,

PostgreSQL creates segments in /dev/shm for parallel queries (via
shm_open()), not for shared buffers.  The amount used is controlled by
work_mem.  Queries can use up to work_mem for each node you see in the
EXPLAIN plan, and for each process, so it can be quite a lot if you
have lots of parallel worker processes and/or lots of
tables/partitions being sorted or hashed in your query.

-- 
Thomas Munro
https://enterprisedb.com




Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Konstantin Malanchev
Hello Thomas,

Thank you for explanation. work_mem = 512MB and max_parallel_workers_per_gather 
= 2 and I run only one Postgres instance and only one query. EXPLAIN shows 
"Workers Planned: 2" for this query. Why it can use more than 1GB of /dev/shm?


Konstantin

> On 9 Jul 2019, at 13:51, Thomas Munro  wrote:
> 
> On Tue, Jul 9, 2019 at 10:15 PM Jean Louis  wrote:
>> * Konstantin Malanchev  [2019-07-09 12:10]:
>>> I have 8 GB RAM and /dev/shm size is 4GB, and there is no significant 
>>> memory usage by other system processes. I surprised that Postgres uses more 
>>> space in /dev/shm than sharred_buffers parameter allows, probably I don't 
>>> understand what this parameter means.
>>> 
>>> I have no opportunity to enlarge total RAM and probably this query requires 
>>> too much RAM to execute. Should Postgres just use HDD as temporary storage 
>>> in this case?
>> 
>> That I cannot know. I know that /dev/shm could
>> grow as much as available free RAM.
> 
> Hi,
> 
> PostgreSQL creates segments in /dev/shm for parallel queries (via
> shm_open()), not for shared buffers.  The amount used is controlled by
> work_mem.  Queries can use up to work_mem for each node you see in the
> EXPLAIN plan, and for each process, so it can be quite a lot if you
> have lots of parallel worker processes and/or lots of
> tables/partitions being sorted or hashed in your query.
> 
> -- 
> Thomas Munro
> https://enterprisedb.com





Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Thomas Munro
On Tue, Jul 9, 2019 at 11:11 PM Konstantin Malanchev  wrote:
> Thank you for explanation. work_mem = 512MB and 
> max_parallel_workers_per_gather = 2 and I run only one Postgres instance and 
> only one query. EXPLAIN shows "Workers Planned: 2" for this query. Why it can 
> use more than 1GB of /dev/shm?

For example, if you have one Parallel Hash Join in your plan, it could
allocate up to 512MB * 3 of shared memory (3 = leader process + 2
workers).  It sounds like you'll need to set work_mem smaller.  If you
run EXPLAIN ANALYZE you'll see how much memory is used by individual
operations.  Usually it's regular private anonymous memory, but for
Parallel Hash it's /dev/shm memory.

-- 
Thomas Munro
https://enterprisedb.com




Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Konstantin Malanchev
Thank you!

> For example, if you have one Parallel Hash Join in your plan, it could
> allocate up to 512MB * 3 of shared memory (3 = leader process + 2
> workers).

I'm executing the query with smaller work_mem, it will take some time. But I 
still confused why it used all /dev/shm (4GB) and fails with "no space left" 
error while work_mem = 512MB.

> If you
> run EXPLAIN ANALYZE you'll see how much memory is used by individual
> operations.

I cannot run EXPLAIN ANALYSE, because the query fails. This is explanation for 
the query:

EXPLAIN
CREATE MATERIALIZED VIEW IF NOT EXISTS new_mat_view
AS
SELECT * FROM my_view
INNER JOIN another_mat_view USING (oid)
ORDER BY oid, field_name;

 Gather Merge  (cost=5696039356565.87..10040767101103.24 rows=37237923518438 
width=31)
   Workers Planned: 2
   ->  Sort  (cost=5696039355565.85..5742586759963.90 rows=18618961759219 
width=31)
 Sort Key: my_table.oid, my_table.field_name
 ->  Parallel Hash Join  (cost=11030236131.39..255829470118.27 
rows=18618961759219 width=31)
   Hash Cond: (another_mat_view.oid = my_table.oid)
   ->  Parallel Seq Scan on another_mat_view  (cost=0.00..652514.56 
rows=31645556 width=8)
   ->  Parallel Hash  (cost=636676233.38..636676233.38 
rows=20353804801 width=31)
 ->  Parallel Seq Scan on my_table  
(cost=0.00..636676233.38 rows=20353804801 width=31)
   Filter: (flag = ''::bit(16))

Konstantin



SuperUser permission denied when creating table.

2019-07-09 Thread Dave Hughes
I recently noticed that my default superuser "postgres" cannot create
tables in new schemas I create, but can in some existing ones.

So as the postgres user I ran:
afleads=# create schema mytest;
CREATE SCHEMA
afleads=# create table mytest.test (col1 int);
ERROR:  permission denied for schema mytest
LINE 1: create table mytest.test (col1 int);

So I tried to give usage & connect permission to another superuser I have,
but I get the same error:
afleads=# GRANT USAGE, CREATE ON SCHEMA mytest TO dhughes;
GRANT
afleads=# set role dhughes;
SET
afleads=# create table mytest.test (col1 int);
ERROR:  permission denied for schema mytest
LINE 1: create table mytest.test (col1 int);

So then I gave the same usage & create permission to a non-superuser.  It's
a group role we have:
afleads=# GRANT USAGE, CREATE ON SCHEMA mytest TO creator_role;
GRANT
afleads=# set role creator_role;
SET
afleads=> create table mytest.test (col1 int);
CREATE TABLE

And then it works.  This has me baffled as to why the two superusers I have
do not have the permissions to create a table, when non-superusers can
(once granted permissions).

However I still seem to be able to create tables in older schemas I have:
afleads=# select current_role;
 current_role
--
 postgres
(1 row)

afleads=# create table timeline.test (col1 int);
CREATE TABLE

Has anyone ran across something like this before?


Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Thomas Munro
On Wed, Jul 10, 2019 at 12:27 AM Konstantin Malanchev  wrote:
> I'm executing the query with smaller work_mem, it will take some time. But I 
> still confused why it used all /dev/shm (4GB) and fails with "no space left" 
> error while work_mem = 512MB.

I think it could fail that way for two reasons: /dev/shm size limit
(mount option, which I think you are saying you have set to 4GB?), or
your system ran out of RAM +swap.  The directly listing in your first
message only shows 1.4GB, not 4GB, so perhaps it's the second problem.

>  ->  Parallel Hash Join  (cost=11030236131.39..255829470118.27 
> rows=18618961759219 width=31)
>Hash Cond: (another_mat_view.oid = my_table.oid)
>->  Parallel Seq Scan on another_mat_view  
> (cost=0.00..652514.56 rows=31645556 width=8)
>->  Parallel Hash  (cost=636676233.38..636676233.38 
> rows=20353804801 width=31)
>  ->  Parallel Seq Scan on my_table  
> (cost=0.00..636676233.38 rows=20353804801 width=31)
>Filter: (flag = ''::bit(16))

It's strange that it's hashing the ~20B row table instead of the ~30M row table.

-- 
Thomas Munro
https://enterprisedb.com




Logical Replication for Very Large Databases

2019-07-09 Thread Taylor Sarrafian
Hi friends,

We have a relatively large database (~5TB) receiving very frequent 
INSERTs/UPDATEs, and we’d like to create a logical replica of it. To put things 
in perspective, we often generate up to 10 WAL-segments (16MB each) per second 
(wal_level=replica).

We successfully set up logical replication in our smaller staging DB, but the 
scale of production is making things difficult.

---

Here are some approaches we considered:

1. `CREATE SUBSCRIPTION … WITH ( copy_data=true )` 

This would be ideal, except we’re afraid of accumulating too much WAL on the 
publishing server while waiting for ~5TB to copy over. We could theoretically 
increase the size of the drive to something like 10TB, but that’s not currently 
an option for us.

There’s also the issue of performance. It’s unclear what impact 
`copy_data=true` would have on our DB read/write throughput, and we’re hesitant 
to try it.

2. Start from a base-backup and catch-up with `restore_command`:

`SELECT pg_replication_slot_advance( pg_current_wal_lsn() )` could work nicely 
with a matching `recovery_target_lsn` in `recovery.conf`.

We already use `archive_command` with `wal-g` to store WAL and base-backups to 
GCS. Unfortunately, by the time our base-backup is downloaded onto the replica, 
it’s already at least 6 hours old. And restoring with `restore_command` runs at 
a similar speed to `archive_command`, so it remains ~5-8 hours behind as long 
as it’s in standby.

Our postgres process isn’t limited by CPU, memory, network, or disk-speed on 
the replica. It doesn’t seem to be working at its full capacity, which suggests 
a configuration issue.

Please look at our configurations below and let us know if there are any 
options we could tweak in `postgresql.conf` to speed up the WAL ingestion for 
`restore_command`.

3. Start from a base-backup and catch-up with the `pg_wal` directory:

We could write a script to grab WAL-segments from GCS and place them into the 
`pg_wal` directory. This is what `restore_command` is technically doing, but we 
might be able to squeeze some efficiency out of a custom script. Maybe.

—

Our server has 32 cores and 120GB of memory with the following selected configs:
```
shared_buffers = 30GB
huge_pages = try
maintenance_work_mem = 1GB
max_files_per_process = 2000
effective_io_concurrency = 100
max_worker_processes = 32
max_parallel_maintenance_workers = 2
max_parallel_workers_per_gather = 0
parallel_leader_participation = on
max_parallel_workers = 32
wal_level = archive
wal_buffers = 16MB
checkpoint_timeout = 30min
max_wal_size = 100GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
max_wal_senders = 20
wal_keep_segments = 10
effective_cache_size = 90GB
```

Thanks for all the help!

ts
 <>

Re: SuperUser permission denied when creating table.

2019-07-09 Thread Achilleas Mantzios

On 9/7/19 3:31 μ.μ., Dave Hughes wrote:


Has anyone ran across something like this before?


can you show your pg_dumpall -x --schema-only
?

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt





Re: pg_dump and search_path

2019-07-09 Thread Ron

On 7/9/19 2:22 AM, Laurenz Albe wrote:

On Mon, 2019-07-08 at 23:54 -0700, Igal @ Lucee.org wrote:

I have a custom search_path:

# show search_path;
search_path
--
  "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output to
search_path is:

   SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of a
table with its schema fails with "relation [rel-name] does not exist".

Is that a bug?  I have seen some old posts about this issue but am not
sure if there is a ticket or why it still is an issue.


Looks like this might be by design.  I will follow the links at
https://www.postgresql.org/message-id/flat/MWHPR14MB160079A6D9DC64A2F60E9004C0D00%40MWHPR14MB1600.namprd14.prod.outlook.com
and ask more questions if I have them.

I might need to add the schema name to the table in my function.

Right.

Allowing object lookup along the search_path during pg_restore opens
doors to abuse, because it can make a superuser inadvertedly execute
code crafted by an attacker.


Then should search_path be set at the end of pg_restore?

--
Angular momentum makes the world go 'round.




Re: PGSQL 11.4: shared_buffers and /dev/shm size

2019-07-09 Thread Konstantin Malanchev
> I think it could fail that way for two reasons: /dev/shm size limit
> (mount option, which I think you are saying you have set to 4GB?), or
> your system ran out of RAM +swap.

df /dev/shm
Filesystem 1K-blocks   Used Available Use% Mounted on
shm  4194304 351176   3843128   9% /dev/shm

mount | grep /dev/shm
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=4194304k)

I have no swap and 8GB of RAM, when there is no active queries only ~800MB of 
RAM is used. So I don't believe that it is out of memory problem.


>  The directly listing in your first
> message only shows 1.4GB, not 4GB, so perhaps it's the second problem.

I cannot catch the right moment with ls, but I've run bash for-loop that that 
logs "df /dev/shm" every minute and the last entry before fail shows that 89% 
of /dev/shm is used:

Filesystem 1K-blocksUsed Available Use% Mounted on
shm  4194304 3732368461936  89% /dev/shm

There is no other processes that use /dev/shm.


> It's strange that it's hashing the ~20B row table instead of the ~30M row 
> table.

It could be a question for another mail thread =)


Konstantin





Re: pg_dump and search_path

2019-07-09 Thread Adrian Klaver

On 7/8/19 11:48 PM, Igal @ Lucee.org wrote:

I have a custom search_path:

# show search_path;
    search_path
--
  "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output to 
search_path is:


   SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of a 
table with its schema fails with "relation [rel-name] does not exist".


Where is this failing?

Do you have the search_path set in the config for the server you are 
dumping to?




Is that a bug?  I have seen some old posts about this issue but am not 
sure if there is a ticket or why it still is an issue.


Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org 




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: SuperUser permission denied when creating table.

2019-07-09 Thread Achilleas Mantzios

On 9/7/19 4:50 μ.μ., Dave Hughes wrote:

Sure thing.  I attached it as a txt file.  Thanks for the help!

PostgreSQL version?


On Tue, Jul 9, 2019 at 9:29 AM Achilleas Mantzios mailto:ach...@matrix.gatewaynet.com>> wrote:

On 9/7/19 3:31 μ.μ., Dave Hughes wrote:
>
> Has anyone ran across something like this before?

can you show your pg_dumpall -x --schema-only
?

-- 
Achilleas Mantzios

IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt






--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: pg_dump and search_path

2019-07-09 Thread Adrian Klaver

On 7/9/19 6:28 AM, Ron wrote:



Then should search_path be set at the end of pg_restore?



1) That would be to late for anything happening in the restore.

2) The:

SELECT pg_catalog.set_config('search_path', '', false);

only applies to the restore session. After the restore the search_path 
will return to whatever has been configured.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: SuperUser permission denied when creating table.

2019-07-09 Thread Tom Lane
Dave Hughes  writes:
> I recently noticed that my default superuser "postgres" cannot create
> tables in new schemas I create, but can in some existing ones.

> So as the postgres user I ran:
> afleads=# create schema mytest;
> CREATE SCHEMA
> afleads=# create table mytest.test (col1 int);
> ERROR:  permission denied for schema mytest
> LINE 1: create table mytest.test (col1 int);

I think you're shooting yourself in the foot with those event triggers,
ie trg_create_table is switching the role to something that lacks
permissions to do the CREATE.

regards, tom lane




Re: pg_dump and search_path

2019-07-09 Thread Igal @ Lucee.org

On 7/9/2019 7:02 AM, Adrian Klaver wrote:

On 7/8/19 11:48 PM, Igal @ Lucee.org wrote:

I have a custom search_path:

# show search_path;
    search_path
--
  "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output to 
search_path is:


   SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of a 
table with its schema fails with "relation [rel-name] does not exist".


Where is this failing?

Do you have the search_path set in the config for the server you are 
dumping to?


It is failing during the Restore operation.  I can provide more 
information if I'll understand what you mean exactly by "Where".


search_path is not set int he config, but rather with ALTER DATABASE SET 
search_path TO ... but I have executed that prior to the RESTORE on the 
target database.  Would it make a difference if I set it in the config?


Thanks,

Igal







Re: pg_dump and search_path

2019-07-09 Thread David G. Johnston
On Tue, Jul 9, 2019 at 7:41 AM Igal @ Lucee.org  wrote:

> search_path is not set int he config, but rather with ALTER DATABASE SET
> search_path TO ... but I have executed that prior to the RESTORE on the
> target database.  Would it make a difference if I set it in the config?
>

What is your restore command then?  Because if you are dropping and
recreating the same named database the ALTER DATABASE SET command is going
to be lost with the drop since it is associated to an OID and not just the
name.  By placing the search_path into postgres.conf you avoid that issue
altogether.

But, yes, objects saved to the database should usually have schema
qualifications (which gets a bit messy with custom operators).  search_path
reliance should probably be reserved to interactive use or at worse client
supplied queries.

David J.


Re: pg_dump and search_path

2019-07-09 Thread Igal @ Lucee.org

David,

On 7/9/2019 7:49 AM, David G. Johnston wrote:

On Tue, Jul 9, 2019 at 7:41 AM Igal @ Lucee.org > wrote:


search_path is not set int he config, but rather with ALTER
DATABASE SET
search_path TO ... but I have executed that prior to the RESTORE
on the
target database.  Would it make a difference if I set it in the
config?


What is your restore command then?  Because if you are dropping and 
recreating the same named database the ALTER DATABASE SET command is 
going to be lost with the drop since it is associated to an OID and 
not just the name.  By placing the search_path into postgres.conf you 
avoid that issue altogether.


The restore command is:

pg_restore.exe --verbose --single-transaction -h  -p  -d 
 -U postgres 


But how will I avoid the issue if the command  `SELECT 
pg_catalog.set_config('search_path', '', false);` is part of the pgdump 
file?  Wouldn't that override the config file setting during the restore 
process?


But, yes, objects saved to the database should usually have schema 
qualifications (which gets a bit messy with custom operators).  
search_path reliance should probably be reserved to interactive use or 
at worse client supplied queries.


In my case I use a separate Postgres cluster for each database and the 
roles, absent of any successful hacking, are all limited to trusted 
users, so the risk mentioned in the CVE is non-existent and it would be 
great if there was an option to turn off that "feature".


Thanks,

Igal



Re: pg_dump and search_path

2019-07-09 Thread David G. Johnston
On Tue, Jul 9, 2019 at 8:48 AM Igal @ Lucee.org  wrote:

> David,
>
> On 7/9/2019 7:49 AM, David G. Johnston wrote:
>
> On Tue, Jul 9, 2019 at 7:41 AM Igal @ Lucee.org  wrote:
>
>> search_path is not set int he config, but rather with ALTER DATABASE SET
>> search_path TO ... but I have executed that prior to the RESTORE on the
>> target database.  Would it make a difference if I set it in the config?
>>
>
> What is your restore command then?  Because if you are dropping and
> recreating the same named database the ALTER DATABASE SET command is going
> to be lost with the drop since it is associated to an OID and not just the
> name.  By placing the search_path into postgres.conf you avoid that issue
> altogether.
>
> The restore command is:
>
> pg_restore.exe --verbose --single-transaction -h  -p  -d
>  -U postgres 
>
> But how will I avoid the issue if the command  `SELECT
> pg_catalog.set_config('search_path', '', false);` is part of the pgdump
> file?  Wouldn't that override the config file setting during the restore
> process?
>
Yes, you are correct.  I should have waited to respond to that point until
I was more woke.

> But, yes, objects saved to the database should usually have schema
> qualifications (which gets a bit messy with custom operators).  search_path
> reliance should probably be reserved to interactive use or at worse client
> supplied queries.
>
> In my case I use a separate Postgres cluster for each database and the
> roles, absent of any successful hacking, are all limited to trusted users,
> so the risk mentioned in the CVE is non-existent and it would be great if
> there was an option to turn off that "feature".
>
This has been discussed a number of times shortly after the fix was
released.  I'm of generally the same mind but no one of both mind and
capability has come forth and proposed a patch.  IIRC there wasn't a
conclusive "we don't want this" so an implementation to discuss would at
least not be a foregone waste of time.

David J.


Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread John Lumby

> From: Peter Geoghegan 
> Sent: July 8, 2019 1:39 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
> 
> On Mon, Jul 8, 2019 at 9:23 AM John Lumby  wrote:
> 
> > Although a welcome improvement,  I think it is not enough to justify 
> > stopping use of setting
> > a lower explicit FILLFACTOR.   Which then brings me back to  thinking there 
> > is a case
> > for the subject of this thread,  an automatic way to preserve density.
> 
> I don't think that such an option would make much sense. The "waves of
> misery" paper is about smoothing out the frequency of page splits
> following bulk loading and a CREATE INDEX. It is not about making
> splits occur less often. It's well understood that a certain amount of
> free space is the overhead of B-Tree indexes, albeit an overhead that
> can be avoided in certain specific instances.
> 
Yes,   I see that. But surely "making splits occur less often" is a 
desirable
objective in itself, is it not? And I believe that a parameter to preserve 
the "steady-state"
density in high-traffic indexes would help achieve that goal,   wouldn't you 
agree?

Cheers,  John Lumby





Re: pg_dump and search_path

2019-07-09 Thread Adrian Klaver

On 7/9/19 7:41 AM, Igal @ Lucee.org wrote:

On 7/9/2019 7:02 AM, Adrian Klaver wrote:

On 7/8/19 11:48 PM, Igal @ Lucee.org wrote:

I have a custom search_path:

# show search_path;
    search_path
--
  "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output to 
search_path is:


   SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of a 
table with its schema fails with "relation [rel-name] does not exist".


Where is this failing?

Do you have the search_path set in the config for the server you are 
dumping to?


It is failing during the Restore operation.  I can provide more 
information if I'll understand what you mean exactly by "Where".


Yes, because I cannot replicate with just a function:

CREATE OR REPLACE FUNCTION public.search_path_test(integer)
 RETURNS integer
 LANGUAGE plpgsql
AS $function$
BEGIN
perform * from test_tbl;
RETURN 1;
END;
$function$

test_(postgres)# \d test_tbl
   Table "test_schema.test_tbl"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 id | integer |

pg_dump -d test -U postgres -x -p 5412 -Fc -f dump_search_path.out
pg_restore --single-transaction -d test -c -U postgres -p 5412 
dump_search_path.out


SELECT pg_catalog.set_config('search_path', '', false);

postgres-2019-07-09 10:37:32.488 PDT-604LOG:  statement: CREATE FUNCTION 
public.search_path_test(integer) RETURNS integer

LANGUAGE plpgsql
AS $$
BEGIN
perform * from test_tbl;
RETURN 1;
END;
$$;



postgres-2019-07-09 10:37:32.489 PDT-604LOG:  statement: ALTER FUNCTION 
public.search_path_test(integer) OWNER TO aklaver;



My guess is the function is being used somewhere.



search_path is not set int he config, but rather with ALTER DATABASE SET 
search_path TO ... but I have executed that prior to the RESTORE on the 
target database.  Would it make a difference if I set it in the config?


Thanks,

Igal






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread Peter Geoghegan
On Tue, Jul 9, 2019 at 10:31 AM John Lumby  wrote:
> Yes,   I see that. But surely "making splits occur less often" is a 
> desirable
> objective in itself, is it not? And I believe that a parameter to 
> preserve the "steady-state"
> density in high-traffic indexes would help achieve that goal,   wouldn't you 
> agree?

Anything that reliably reduces page splits without hurting space
utilization is well worthwhile. I can't see how what you describe
could have that effect, though. If you expect the leaf density to be
the same after a REINDEX, then why bother at all? There is no reason
to think that that will be more effective than simple vacuuming.

-- 
Peter Geoghegan




Restoring a database restores to unexpected tablespace

2019-07-09 Thread Alex Williams
Hi,

Can someone point me in the right direction for this issue we are having -- our 
goal is to dump a database that is currently on a tablespace named data2 that 
we want to restore on the same server but on tablespace pg_default -- we tried 
other ways like:
ALTER DATABASE "[database_name]" SET default_tablespace = [new_tablespace];
alter table all in tablespace data2 set tablespace pg_default;

But we want to try it with a pgdump/psql.

To reproduce on our end (Server 9.5):
1. create new database for the restore with the tablespace as pg_default
2. Dump the source database (currently on data2 tablespace) with the following 
command:
sudo -u postgres pg_dump mydatabase --no-owner --no-tablespaces | gzip  > 
mydatabase.gz
3. Restore the database with this command:
zcat /var/backup/db/mydatabase.gz | sudo -H -u postgres psql 
--quiet -e -c 'SET default_tablespace = pg_default;' -f - mydatabase_test > 
/tmp/mydatabase_test.log

What happens during the restore is that all tables are created on data2, not 
pg_default.

Any help would be greatly appreciated.

Thanks,

Alex

Sent with [ProtonMail](https://protonmail.com) Secure Email.

execute_values

2019-07-09 Thread Weatherby,Gerard
?Can I use pyscopy2.extras.execute_values with a prepared statement?


--
Gerard Weatherby| Application Architect
NMRbox | Department of Molecular Biology and Biophysics | UConn Health
263 Farmington Avenue, Farmington, CT 06030-6406
Phone: 860 679 8484
uchc.edu


Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread John Lumby

> From: Peter Geoghegan 
> Sent: July 9, 2019 1:47 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
>  
> On Tue, Jul 9, 2019 at 10:31 AM John Lumby  wrote:
> > Yes,   I see that. But surely "making splits occur less often" is a 
> > desirable
> > objective in itself, is it not? And I believe that a parameter to 
> > preserve the "steady-state"
> > density in high-traffic indexes would help achieve that goal,   wouldn't 
> > you agree?
> 
> Anything that reliably reduces page splits without hurting space
> utilization is well worthwhile. I can't see how what you describe
> could have that effect, though. If you expect the leaf density to be
> the same after a REINDEX, then why bother at all? There is no reason
> to think that that will be more effective than simple vacuuming.
> 
Ah,  I did not explain the idea welll enough.
The scenario (simplified) is this:
Time 0  FILLFACTORs all set to default 90%
because we do not yet know what the steady-state density
will turn out to be.
   {   workload runs for a few weeks  }
Time N  gather table and index stats,   discover growth and learn density.
growth is more than autovacuum could control so
   {   ALTER INDEX ??? SET (fillfactor = AUTO); }
   {   REINDEX,   desiring to preserve current density whatever this is }
   {   workload runs for a few more weeks  }
Time 2*Ngather table and index stats,   discover little or no growth since 
time N.
we have achieved steady-state in total number of pages.

Would this not work?

Cheers,   John



Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread John Lumby
And the point of the REINDEX at that point (below) is to remove dead tuple 
keys-tids
and  reorganize those split pages back into physical order without losing the 
freespace.

> From: Peter Geoghegan 
> Sent: July 9, 2019 1:47 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
>  
> On Tue, Jul 9, 2019 at 10:31 AM John Lumby  wrote:
> > Yes,   I see that. But surely "making splits occur less often" is a 
> > desirable
> > objective in itself, is it not? And I believe that a parameter to 
> > preserve the "steady-state"
> > density in high-traffic indexes would help achieve that goal,   wouldn't 
> > you agree?
> 
> Anything that reliably reduces page splits without hurting space
> utilization is well worthwhile. I can't see how what you describe
> could have that effect, though. If you expect the leaf density to be
> the same after a REINDEX, then why bother at all? There is no reason
> to think that that will be more effective than simple vacuuming.
> 
Ah,  I did not explain the idea welll enough.
The scenario (simplified) is this:
Time 0  FILLFACTORs all set to default 90%
because we do not yet know what the steady-state density
will turn out to be.
   {   workload runs for a few weeks  }
Time N  gather table and index stats,   discover growth and learn density.
growth is more than autovacuum could control so
   {   ALTER INDEX ??? SET (fillfactor = AUTO); }
   {   REINDEX,   desiring to preserve current density whatever this is }
   {   workload runs for a few more weeks  }
Time 2*Ngather table and index stats,   discover little or no growth since 
time N.
we have achieved steady-state in total number of pages.

Would this not work?

Cheers,   John



Advice on Best BCP Plan

2019-07-09 Thread Naresh g
Dear Team,

We have a production database with two hot standby's at two different
locations.

Let's say A (production) has two standby's (B&C).

Now we are planning to do a BCP activity.

What would be the best plan to do this activity?

As I have two standby's configured, is shutting down one standby site
throughout the activity creates any problem? or any other best way ?

1. A ( Standby) and B(production) and C being in shutdown mode.
2. A (Standby) and B(production) and C(standby for B).

or like any other plan?

Please let me know.

Thank you.
Naresh.


Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread Peter Geoghegan
On Tue, Jul 9, 2019 at 11:27 AM John Lumby  wrote:
> And the point of the REINDEX at that point (below) is to remove dead tuple 
> keys-tids
> and  reorganize those split pages back into physical order without losing the 
> freespace.

VACUUM already removes the tuples, accounting for all overhead.

You are right that it would be possible for us to "defragment" the
pages, so that they'd be in sequential order on disk from the point of
view of a whole index scan -- this is what the "leaf_fragmentation"
statistic from pgstatindex() reports on. We could in principle come up
with a way of moving pages around, which would have some modest
benefit for certain types of queries (it wouldn't improve the
heap/index correlation, though, which is far more important). That
would either necessitate that the command acquire a disruptive lock on
the index (i.e. no writes, just like regular REINDEX), or that we
drastically rearchitect the B-Tree code to make it support this.
Neither of which seem particularly appealing.

I believe that this is a lot more important in systems that generally
use clustered indexes, such as MS SQL Server. This kind of
"fragmentation" isn't usually much of a problem when using Postgres.

-- 
Peter Geoghegan




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread John Lumby
> From: Peter Geoghegan 
> Sent: July 9, 2019 3:01 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
>
> On Tue, Jul 9, 2019 at 11:27 AM John Lumby  wrote:
> > And the point of the REINDEX at that point (below) is to remove dead tuple 
> > keys-tids
> > and  reorganize those split pages back into physical order without losing 
> > the freespace.
>
> VACUUM already removes the tuples, accounting for all overhead.
>
> We could in principle come up with a way of moving pages around,
>  [ ... ]
> That would either necessitate that the command acquire a disruptive lock
>  [ ... ]
> Neither of which seem particularly appealing.

I was not thinking of a new command,  just an extension of the existing REINDEX
which would apply a fillfactor equal to current average page density,
by adding a preliminary step to sample that first.
Of course,   the user can do that for themselves by a series of steps with
ANALYZE, get page_density from pgstattuple,  ALTER INDEX,  REINDEX
so this new parameter would be a convenience,  assuming that this sequence
actually is beneficial,   which I believe it is  -  see my next.

>
> I believe that this is a lot more important in systems that generally
> use clustered indexes, such as MS SQL Server. This kind of
> "fragmentation" isn't usually much of a problem when using Postgres.
>
We have found that, for an index which has both experienced large number of 
page splits
and whose table has a large number of dead tuples (despite autovacuum),
REINDEX with FILLFACTOR set to current page_density does produce a performance 
improvement,
and also does reduce future growth in number of pages.I don't have numbers 
to
hand,  and in fact not sure if any catalog view or pgstattuple tells me about 
the proportion
of dead key-tids in the index itself (do you know of any source?) as opposed to 
the table,
but based on that recollection,  yes,   REINDEX can reduce fragmentation.

However we did not run a VACUUM command first. Maybe if we had run VACUUM 
instead of
the REINDEX commands,   we might have obtained the same degree of improvement,  
I don't know.
I think this was Tom's point earlier on in this thread.

Correct me if I'm wrong but I believe whether an index is "clustered" or not is 
not relevant for
this discussion because the clustering in that context is referring to ordering 
of the
table pages,  not the index pages.I believe it is quite possible to have a 
perfectly
"clustered" table whose clustering index is itself badly disorganized.

Cheers, John



how to return rows of data via function written by language C strict

2019-07-09 Thread Christoph Bilz
Hello,

i want write functions like this:

CREATE FUNCTION foo(text) returns real as '/foo.dll', 'foo'  LANGUAGE C 
STRICT;"
CREATE FUNCTION foo2(text) returns table(c1 text, c2 int) as '/foo2.dll', 
'foo'  LANGUAGE C STRICT;
So far, so clear. I don't want to return one scalar value or SETOF smth, I want 
to start the function like this:
select * from foo; … and the rows will be returned.
I read the chapter
https://www.postgresql.org/docs/11/spi.html 

and especially 
https://www.postgresql.org/docs/11/xfunc-c.html#id-1.8.3.13.11 


again and again but I found no solution. So, the functions should return rows 
of data not just „one“ result or composite types.
The documentation regarding usable examples are very sparse, so I found and 
tried with this pice of code:

ArrayType* pg_array = DatumGetArrayTypeP(_row_val);
c_array = (float *)ARR_DATA_PTR(pg_array);
pg_array_size = ARR_DIMS(pg_array)[0];
—> so it’s clear  how do I get my data via SPI* functionalty, the result set is 
within the pg_array or the casted c_array.

I found this within funcapi.h:
 /* Type categories for get_call_result_type and siblings */
typedef enum TypeFuncClass
{
TYPEFUNC_SCALAR,/* scalar result type */
TYPEFUNC_COMPOSITE, /* determinable rowtype result */
TYPEFUNC_COMPOSITE_DOMAIN,  /* domain over determinable rowtype result */
TYPEFUNC_RECORD,/* indeterminate rowtype result */
TYPEFUNC_OTHER  /* bogus type, eg pseudotype */
} TypeFuncClass;

and tried:

TupleDesc   tupdesc;
HeapTuple   tuple;
Datum   rs[100];
int tuplen;
boolnulls;

if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_RECORD)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function returning record called in context "
"that cannot accept type record")));

BlessTupleDesc(tupdesc);

for (int j = 0; j < 100; j++) {
rs[j] = Float8GetDatum(c_array[j]);
}

tuplen = tupdesc->natts;
nulls = palloc(tuplen * sizeof(bool));

tuple = heap_form_tuple(tupdesc, rs, nulls);

pfree(nulls);

SPI_finish();

PG_RETURN_DATUM(HeapTupleGetDatum(tuple));


But it doesn’t work. Either the get_call_result_type fails because the function 
definition doesn’t match or the the client process crashes because smth. 
happens and I don’t know how this stuff should work.
So, due to the lack of examples in general and the sparse documentation about 
it, any help will be appreciate.
Thanks.





Re: execute_values

2019-07-09 Thread Adrian Klaver

On 7/9/19 11:04 AM, Weatherby,Gerard wrote:

​Can I use pyscopy2.extras.execute_values with a prepared statement?


I don't think so as you cannot pass in VALUES():

cur.execute('prepare pt(integer, varchar, boolean) AS INSERT INTO 
prepare_test values ($1, $2, $3)')


execute_values(cur, 'EXECUTE pt(%s)', [(2, 'test', 'f'),])

SyntaxError: wrong number of parameters for prepared statement "pt"
DETAIL:  Expected 3 parameters but got 1.


execute_batch works though:

execute_batch(cur, 'EXECUTE pt(%s, %s, %s)', [(2, 'test', 'f'), (3, 
'dog', 't')])






--
*Gerard Weatherby*| Application Architect
NMRbox | Department of Molecular Biology and Biophysics | UConn Health
263 Farmington Avenue, Farmington, CT 06030-6406
Phone: 860 679 8484
uchc.edu



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread Peter Geoghegan
On Tue, Jul 9, 2019 at 12:29 PM John Lumby  wrote:
> I was not thinking of a new command,  just an extension of the existing 
> REINDEX
> which would apply a fillfactor equal to current average page density,
> by adding a preliminary step to sample that first.

That would be a very different thing to REINDEX no matter how you
spelt it, though. REINDEX creates a new index, from scratch, whereas
you're talking about restructuring what's already there.

> > I believe that this is a lot more important in systems that generally
> > use clustered indexes, such as MS SQL Server. This kind of
> > "fragmentation" isn't usually much of a problem when using Postgres.
> >
> We have found that, for an index which has both experienced large number of 
> page splits
> and whose table has a large number of dead tuples (despite autovacuum),
> REINDEX with FILLFACTOR set to current page_density does produce a 
> performance improvement,
> and also does reduce future growth in number of pages.I don't have 
> numbers to
> hand,  and in fact not sure if any catalog view or pgstattuple tells me about 
> the proportion
> of dead key-tids in the index itself (do you know of any source?) as opposed 
> to the table,
> but based on that recollection,  yes,   REINDEX can reduce fragmentation.

This could help the old "getting tired" behavior with many duplicates,
by making the free space available in earlier leaf pages (those
further to the left) that are full of duplicates -- the original
average space utilization may reflect a very uneven distribution of
free space overall. Or, it could be that range scan performance
benefitted from reduced fragmentation, because your workload happened
to be bottlenecked on large range scans. Though that seems unlikely.

I believe that the effect that you identified is real, but at a
minimum it's not clear why a REINDEX with a fillfactor to match the
original leaf space utilization helped. It would be fairly difficult
to figure it out for sure. If it was a problem with
duplicates/"getting tired", then I'd expect the new v12 code will help
a lot.

> However we did not run a VACUUM command first. Maybe if we had run VACUUM 
> instead of
> the REINDEX commands,   we might have obtained the same degree of 
> improvement,  I don't know.
> I think this was Tom's point earlier on in this thread.

It was. Tom's intuition about that matches my own, though I
acknowledge that the old behavior with duplicates muddies the waters.

> Correct me if I'm wrong but I believe whether an index is "clustered" or not 
> is not relevant for
> this discussion because the clustering in that context is referring to 
> ordering of the
> table pages,  not the index pages.

Right.

> I believe it is quite possible to have a perfectly
> "clustered" table whose clustering index is itself badly disorganized.

Technically the two things are separate metrics, so that is
theoretically possible, but it doesn't seem all that likely. It could
happen with lots of non-HOT updates, where all new index tuples relate
to the same logical row as some existing index tuple, causing many
page splits despite there being no real change in the logical contents
of the index. Even then, the table will itself lose much of its
original order, so the index will become "unclustered" as it becomes
fragmented.

-- 
Peter Geoghegan




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread John Lumby
> From: Peter Geoghegan 
> Sent: July 9, 2019 5:15 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
>  
> On Tue, Jul 9, 2019 at 12:29 PM John Lumby  wrote:
> > I was not thinking of a new command,  just an extension of the existing 
> > REINDEX
> > which would apply a fillfactor equal to current average page density,
> > by adding a preliminary step to sample that first.
> 
> That would be a very different thing to REINDEX no matter how you
> spelt it, though. REINDEX creates a new index, from scratch, whereas
> you're talking about restructuring what's already there.

No,  no,   I really am talking about an extension to the *existing* REINDEX,
and yes,  yes,  my extended REINDEX *would* still; create a new index from 
scratch.
The *only* difference is that,  instead of taking the FILLFACTOR it uses from
whatever is set in the index attributes,  it would take it from first 
calculating
current average page density.  and then build a new index with that fillfactor.

> 
> Or, it could be that range scan performance benefitted from reduced 
> fragmentation,
> 

Yes,  I think so.

Cheers,  John



number of concurrent writes that are allowed for database

2019-07-09 Thread Julie Nishimura
Hello guys,
How do I determine the number of concurrent writes that are allowed for some 
databases? Is it max of connections parameter?
Is there a way to determine

how many we are currently using on average?


Thank you,

julie



Re: pg_dump and search_path

2019-07-09 Thread Igal @ Lucee.org

On 7/9/2019 10:45 AM, Adrian Klaver wrote:

On 7/9/19 7:41 AM, Igal @ Lucee.org wrote:

On 7/9/2019 7:02 AM, Adrian Klaver wrote:

On 7/8/19 11:48 PM, Igal @ Lucee.org wrote:

I have a custom search_path:

# show search_path;
    search_path
--
  "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output 
to search_path is:


   SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of 
a table with its schema fails with "relation [rel-name] does not 
exist".


Where is this failing?

Do you have the search_path set in the config for the server you are 
dumping to?


It is failing during the Restore operation.  I can provide more 
information if I'll understand what you mean exactly by "Where".


Yes, because I cannot replicate with just a function:

CREATE OR REPLACE FUNCTION public.search_path_test(integer)
 RETURNS integer
 LANGUAGE plpgsql
AS $function$
BEGIN
    perform * from test_tbl;
    RETURN 1;
END;
$function$

test_(postgres)# \d test_tbl
   Table "test_schema.test_tbl"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 id | integer |

pg_dump -d test -U postgres -x -p 5412 -Fc -f dump_search_path.out
pg_restore --single-transaction -d test -c -U postgres -p 5412 
dump_search_path.out


SELECT pg_catalog.set_config('search_path', '', false);

postgres-2019-07-09 10:37:32.488 PDT-604LOG:  statement: CREATE 
FUNCTION public.search_path_test(integer) RETURNS integer

    LANGUAGE plpgsql
    AS $$
    BEGIN
    perform * from test_tbl;
    RETURN 1;
    END;
    $$;



postgres-2019-07-09 10:37:32.489 PDT-604LOG:  statement: ALTER 
FUNCTION public.search_path_test(integer) OWNER TO aklaver;



My guess is the function is being used somewhere.


I see.  Yes, the function is used by an INDEX.  So somewhere down the 
line in the pgdump file I have:


  CREATE INDEX ix_items_tags ON staging.items USING gin 
(staging.some_func_returning_array(col1));


Igal




Re: pg_dump and search_path

2019-07-09 Thread Adrian Klaver

On 7/9/19 4:24 PM, Igal @ Lucee.org wrote:

On 7/9/2019 10:45 AM, Adrian Klaver wrote:

On 7/9/19 7:41 AM, Igal @ Lucee.org wrote:

On 7/9/2019 7:02 AM, Adrian Klaver wrote:

On 7/8/19 11:48 PM, Igal @ Lucee.org wrote:

I have a custom search_path:

# show search_path;
    search_path
--
  "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output 
to search_path is:


   SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of 
a table with its schema fails with "relation [rel-name] does not 
exist".


Where is this failing?

Do you have the search_path set in the config for the server you are 
dumping to?


It is failing during the Restore operation.  I can provide more 
information if I'll understand what you mean exactly by "Where".


Yes, because I cannot replicate with just a function:

CREATE OR REPLACE FUNCTION public.search_path_test(integer)
 RETURNS integer
 LANGUAGE plpgsql
AS $function$
BEGIN
    perform * from test_tbl;
    RETURN 1;
END;
$function$

test_(postgres)# \d test_tbl
   Table "test_schema.test_tbl"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 id | integer |

pg_dump -d test -U postgres -x -p 5412 -Fc -f dump_search_path.out
pg_restore --single-transaction -d test -c -U postgres -p 5412 
dump_search_path.out


SELECT pg_catalog.set_config('search_path', '', false);

postgres-2019-07-09 10:37:32.488 PDT-604LOG:  statement: CREATE 
FUNCTION public.search_path_test(integer) RETURNS integer

    LANGUAGE plpgsql
    AS $$
    BEGIN
    perform * from test_tbl;
    RETURN 1;
    END;
    $$;



postgres-2019-07-09 10:37:32.489 PDT-604LOG:  statement: ALTER 
FUNCTION public.search_path_test(integer) OWNER TO aklaver;



My guess is the function is being used somewhere.


I see.  Yes, the function is used by an INDEX.  So somewhere down the 
line in the pgdump file I have:


   CREATE INDEX ix_items_tags ON staging.items USING gin 
(staging.some_func_returning_array(col1));


Well you are part of the way there, the function is schema qualified:)

I will leave it others more knowledgeable on the subject as to whether a
function as a GIN expression is a good idea or not.



Igal




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: how to return rows of data via function written by language C strict

2019-07-09 Thread Ian Barwick

On 7/10/19 5:00 AM, Christoph Bilz wrote:

Hello,

i want write functions like this:

|CREATEFUNCTIONfoo(text)returns real as'/foo.dll','foo'LANGUAGE C STRICT;" 
CREATE FUNCTION foo2(text) returns table(c1 text, c2 int) as '/foo2.dll', 'foo' 
LANGUAGE C STRICT;|

So far, so clear. I don't want to return one scalar value or SETOF smth, I want 
to start the function like this:

select * from foo; … and the rows will be returned.

I read the chapter

https://www.postgresql.org/docs/11/spi.html

and especially

https://www.postgresql.org/docs/11/xfunc-c.html#id-1.8.3.13.11

again and again but I found no solution. So, the functions should return rows 
of data not just „one“ result or composite types.
The documentation regarding usable examples are very sparse, so I found and 
tried with this pice of code:

|ArrayType*pg_array =DatumGetArrayTypeP(_row_val);c_array =(float 
*)ARR_DATA_PTR(pg_array);pg_array_size =ARR_DIMS(pg_array)[0];|

—> so it’s clear  how do I get my data via SPI* functionalty, the result set is 
within the pg_array or the casted c_array.

I found this within funcapi.h:

|/* Type categories for get_call_result_type and siblings */typedef enum 
TypeFuncClass {TYPEFUNC_SCALAR,/* scalar result type */TYPEFUNC_COMPOSITE,/* 
determinable rowtype result */TYPEFUNC_COMPOSITE_DOMAIN,/* domain over 
determinable rowtype result */TYPEFUNC_RECORD,/* indeterminate rowtype result 
*/TYPEFUNC_OTHER /* bogus type, eg pseudotype */}TypeFuncClass;|


and tried:

|TupleDesc tupdesc;HeapTuple tuple;Datum rs[100];int tuplen;bool 
nulls;if(get_call_result_type(fcinfo,NULL,&tupdesc)!=TYPEFUNC_RECORD)ereport(ERROR,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),errmsg("function
 returning record called in context ""that cannot accept type 
record")));BlessTupleDesc(tupdesc);for(int j =0;j <100;j++){rs[j]=Float8GetDatum(c_array[j]);}tuplen 
=tupdesc->natts;nulls =palloc(tuplen *sizeof(bool));tuple 
=heap_form_tuple(tupdesc,rs,nulls);pfree(nulls);SPI_finish();PG_RETURN_DATUM(HeapTupleGetDatum(tuple));|



But it doesn’t work. Either the get_call_result_type fails because the function 
definition doesn’t match or the the client process crashes because smth. 
happens and I don’t know how this stuff should work.
So, due to the lack of examples in general and the sparse documentation about 
it, any help will be appreciate.


A good place to find working examples is the source code of the various contrib 
modules.


Regards

Ian Barwick

--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: how to return rows of data via function written by language C strict

2019-07-09 Thread David G. Johnston
On Tue, Jul 9, 2019 at 1:00 PM Christoph Bilz 
wrote:

> Hello,
>
> i want write functions like this:
>
> CREATE FUNCTION foo(text) returns real as '/foo.dll', 'foo'  LANGUAGE C 
> STRICT;"
> CREATE FUNCTION foo2(text) returns table(c1 text, c2 int) as 
> '/foo2.dll', 'foo'  LANGUAGE C STRICT;
>
> So far, so clear. I don't want to return one scalar value or SETOF smth, I 
> want to start the function like this:
>
> select * from foo; … and the rows will be returned.
>
> Maybe SQL or pl/pgSQL would be a better choice then?

Also, if you plan to return more than one row you are, by definition,
creating a SETOF (TABLE is just shorthand) function.

Also, for what its worth I have no idea with "smth" means here - but I also
don't program C.

> if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_RECORD)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("function returning record called in context "
> "that cannot accept type record")));
>
> The inequality check above seems wrong - you want to error if you are
presented with a record, not when you aren't.

But it doesn’t work. Either the get_call_result_type fails because the
> function definition doesn’t match or the the client process crashes because
> smth. happens and I don’t know how this stuff should work.
> So, due to the lack of examples in general and the sparse documentation
> about it, any help will be appreciate.
>

As Ian noted, contrib is usually the recommended source for up-to-date
coding examples.

David J.


Re: number of concurrent writes that are allowed for database

2019-07-09 Thread Julie Nishimura
Any advice is appreciated, thank you, guys!!!

Sent from my iPhone

On Jul 9, 2019, at 3:56 PM, Julie Nishimura 
mailto:juliez...@hotmail.com>> wrote:

Hello guys,
How do I determine the number of concurrent writes that are allowed for some 
databases? Is it max of connections parameter?
Is there a way to determine

how many we are currently using on average?


Thank you,

julie



Re: Restoring a database restores to unexpected tablespace

2019-07-09 Thread Ian Barwick

On 7/10/19 2:56 AM, Alex Williams wrote:

Hi,

Can someone point me in the right direction for this issue we are having -- our 
goal is to dump a database that is currently on a tablespace named data2 that 
we want to restore on the same server but on tablespace pg_default -- we tried 
other ways like:
ALTER DATABASE "[database_name]" SET default_tablespace = [new_tablespace];
alter table all in tablespace data2 set tablespace pg_default;

But we want to try it with a pgdump/psql.

To reproduce on our end (Server 9.5):

1. create new database for the restore with the tablespace as pg_default

2. Dump the source database (currently on data2 tablespace) with the following 
command:
sudo -u postgres pg_dump mydatabase --no-owner --no-tablespaces | gzip  > 
mydatabase.gz

3. Restore the database with this command:
zcat /var/backup/db/mydatabase.gz | sudo -H -u postgres psql 
--quiet -e -c 'SET default_tablespace = pg_default;' -f - mydatabase_test > 
/tmp/mydatabase_test.log

What happens during the restore is that all tables are created on data2, not 
pg_default.

Any help would be greatly appreciated.


This should work.

Double-check each step to make sure nothing has been missed out somewhere, e.g.
in step 2 you create mydatabase.gz in the current working directory but in step 
3 restore it
from an absolute filepath, which is a common cause of errors.

Also maybe try dumping an individual table definition (pg_dump --schema-only 
--table=sometablename ...) and check
exactly what's being dumped and how it gets restored.

Regards


Ian Barwick

--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread Peter Geoghegan
On Tue, Jul 9, 2019 at 3:18 PM John Lumby  wrote:
> > Or, it could be that range scan performance benefitted from reduced 
> > fragmentation,
> >
>
> Yes,  I think so.

ISTM that the simplest explanation here is that index fragmentation
(and even index size) is a red herring, and the real issue is that
you're suffering from problems similar to those that are described in
these old threads:

https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com
https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com

There have been numerous reports from users with problems involving
low cardinality indexes that gradually became less correlated with the
underlying table over time. At least a couple of these users found
that a periodic REINDEX temporarily fixed the problem -- see the first
thread for an example. Postgres 12 maintains the heap/table sort order
among duplicates by treating heap TID as a tiebreaker column, which
may make REINDEXing totally unnecessary for you. It's harder to model
this issue because the problem with heap TID order will only be seen
when there is at least a moderate amount of churn.

-- 
Peter Geoghegan




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-09 Thread Peter Geoghegan
On Tue, Jul 9, 2019 at 9:04 PM Peter Geoghegan  wrote:
> ISTM that the simplest explanation here is that index fragmentation
> (and even index size) is a red herring, and the real issue is that
> you're suffering from problems similar to those that are described in
> these old threads:
>
> https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com
> https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com

I can imagine why you found you needed to reduce fillfactor to get
much of any benefit from a REINDEX. Page splits are inherently
expensive, for one thing. Also, in this specific scenario a succession
of page splits might hasten the index returning to having little
correlation with the underlying table within each large group of
duplicates. Splits on fillfactor 90 pages would make new space
available for future insertions on earlier duplicate pages, mixing old
and new rows together before long.

-- 
Peter Geoghegan