Re: pgbench - allow to create partitioned tables

2019-10-03 Thread Ashutosh Sharma
Hi Fabien, Amit,

I could see that when an invalid number of partitions is specified,
sometimes pgbench fails with an error "invalid number of partitions:
..." whereas many a times it doesn't, instead it creates number of
partitions that hasn't been specified by the user.

As partitions is an integer type variable, the maximum value it can
hold is "2147483647". But if I specify partitions as "3147483647",
atoi function returns a value lesser than zero and pgbench terminates
with an error. However, if the value for number of partitions
specified is something like "5147483647", atoi returns a non-negative
number and pgbench creates as many number of partitions as the value
returned by atoi function.

Have a look at the below examples,

[ashu@localhost bin]$ ./pgbench -i -s 10 --partitions=2147483647 postgres
dropping old tables...
creating tables...
creating 2147483647 partitions...
^C
[ashu@localhost bin]$ ./pgbench -i -s 10 --partitions=3147483647 postgres
invalid number of partitions: "3147483647"

[ashu@localhost bin]$ ./pgbench -i -s 10 --partitions=5147483647 postgres
dropping old tables...
creating tables...
creating 852516351 partitions...
^C

This seems like a problem with atoi function, isn't it?

atoi functions has been used at several places in pgbench script and I
can see similar behaviour for all. For e.g. it has been used with
scale factor and above observation is true for that as well. So, is
this a bug or you guys feel that it isn't and can be ignored? Please
let me know your thoughts on this. Thank you.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Oct 3, 2019 at 10:30 AM Fabien COELHO  wrote:
>
>
> >> Thanks, attached is a patch with minor modifications which I am
> >> planning to push after one more round of review on Thursday morning
> >> IST unless there are more comments by anyone else.
> >
> > Pushed.
>
> Thanks!
>
> --
> Fabien.
>
>




Re: pgbench - allow to create partitioned tables

2019-10-03 Thread Fabien COELHO



Hello,


As partitions is an integer type variable, the maximum value it can
hold is "2147483647". But if I specify partitions as "3147483647",
atoi function returns a value lesser than zero and pgbench terminates
with an error. However, if the value for number of partitions
specified is something like "5147483647", atoi returns a non-negative
number and pgbench creates as many number of partitions as the value
returned by atoi function.

This seems like a problem with atoi function, isn't it?


Yes.


atoi functions has been used at several places in pgbench script and I
can see similar behaviour for all. For e.g. it has been used with
scale factor and above observation is true for that as well. So, is
this a bug or you guys feel that it isn't and can be ignored? Please
let me know your thoughts on this. Thank you.


I think that it is a known bug (as you noted atoi is used more or less 
everywhere in pgbench and other commands) which shoud be addressed 
separately: all integer user inputs should be validated for syntax and 
overflow, everywhere, really. This is not currently the case, so I simply 
replicated the current bad practice when developing this feature.


There is/was a current patch/discussion to improve integer parsing, which 
could address this.


--
Fabien.




Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2019-10-03 Thread Fujii Masao
On Thu, Oct 3, 2019 at 1:57 PM Michael Paquier  wrote:
>
> On Thu, Oct 03, 2019 at 01:49:34PM +0900, Fujii Masao wrote:
> > But this can cause subsequent recovery to always fail with invalid-pages 
> > error
> > and the server not to start up. This is bad. So, to allviate the situation,
> > I'm thinking it would be worth adding something like igore_invalid_pages
> > developer parameter. When this parameter is set to true, the startup process
> > always ignores invalid-pages errors. Thought?
>
> That could be helpful.

So attached patch adds new developer GUC "ignore_invalid_pages".
Setting ignore_invalid_pages to true causes the system
to ignore the failure (but still report a warning), and continue recovery.

I will add this to next CommitFest.

Regards,

-- 
Fujii Masao


ignore_invalid_pages_v1.patch
Description: Binary data


Re: pgbench - allow to create partitioned tables

2019-10-03 Thread Ashutosh Sharma
On Thu, Oct 3, 2019 at 1:53 PM Fabien COELHO  wrote:
>
>
> Hello,
>
> > As partitions is an integer type variable, the maximum value it can
> > hold is "2147483647". But if I specify partitions as "3147483647",
> > atoi function returns a value lesser than zero and pgbench terminates
> > with an error. However, if the value for number of partitions
> > specified is something like "5147483647", atoi returns a non-negative
> > number and pgbench creates as many number of partitions as the value
> > returned by atoi function.
> >
> > This seems like a problem with atoi function, isn't it?
>
> Yes.
>
> > atoi functions has been used at several places in pgbench script and I
> > can see similar behaviour for all. For e.g. it has been used with
> > scale factor and above observation is true for that as well. So, is
> > this a bug or you guys feel that it isn't and can be ignored? Please
> > let me know your thoughts on this. Thank you.
>
> I think that it is a known bug (as you noted atoi is used more or less
> everywhere in pgbench and other commands) which shoud be addressed
> separately: all integer user inputs should be validated for syntax and
> overflow, everywhere, really. This is not currently the case, so I simply
> replicated the current bad practice when developing this feature.
>

Okay, I think we should possibly replace atoi with strtol function
call for better error handling. It handles the erroneous inputs better
than atoi.

> There is/was a current patch/discussion to improve integer parsing, which
> could address this.
>

It seems like you are trying to point out the following discussion on hackers,

https://www.postgresql.org/message-id/flat/20190724040237.GB64205%40begriffs.com#5677c361d3863518b0db5d5baae72bbe

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-03 Thread Amit Kapila
On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra 
wrote:

> On Wed, Oct 02, 2019 at 04:27:30AM +0530, Amit Kapila wrote:
> >On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra  >
> >wrote:
> >
> >> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:
> >> >
> >> >On further testing, I found that the patch seems to have problems with
> >> >toast.  Consider below scenario:
> >> >Session-1
> >> >Create table large_text(t1 text);
> >> >INSERT INTO large_text
> >> >SELECT (SELECT string_agg('x', ',')
> >> >FROM generate_series(1, 100)) FROM generate_series(1, 1000);
> >> >
> >> >Session-2
> >> >SELECT * FROM pg_create_logical_replication_slot('regression_slot',
> >> >'test_decoding');
> >> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL);
> >> >*--kaboom*
> >> >
> >> >The second statement in Session-2 leads to a crash.
> >> >
> >>
> >> OK, thanks for the report - will investigate.
> >>
> >
> >It was an assertion failure in ReorderBufferCleanupTXN at below line:
> >+ /* Check we're not mixing changes from different transactions. */
> >+ Assert(change->txn == txn);
> >
>
> Can you still reproduce this issue with the patch I sent on 28/9? I have
> been unable to trigger the failure, and it seems pretty similar to the
> failure you reported (and I fixed) on 28/9.
>

Yes, it seems we need a similar change in ReorderBufferAddNewTupleCids.  I
think in session-2 you need to create replication slot before creating
table in session-1 to see this problem.

--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2196,6 +2196,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb,
TransactionId xid,
change->data.tuplecid.cmax = cmax;
change->data.tuplecid.combocid = combocid;
change->lsn = lsn;
+   change->txn = txn;
change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
dlist_push_tail(&txn->tuplecids, &change->node);

Few more comments:
---
1.
+static bool
+check_logical_decoding_work_mem(int *newval, void **extra, GucSource
source)
+{
+ /*
+ * -1 indicates fallback.
+ *
+ * If we haven't yet changed the boot_val default of -1, just let it be.
+ * logical decoding will look to maintenance_work_mem instead.
+ */
+ if (*newval == -1)
+ return true;
+
+ /*
+ * We clamp manually-set values to at least 64kB. The maintenance_work_mem
+ * uses a higher minimum value (1MB), so this is OK.
+ */
+ if (*newval < 64)
+ *newval = 64;

I think this needs to be changed as now we don't rely on
maintenance_work_mem.  Another thing related to this is that I think the
default value for logical_decoding_work_mem still seems to be -1.  We need
to make it to 64MB.  I have seen this while debugging memory accounting
changes.  I think this is the reason why I was not seeing toast related
changes being serialized because, in that test, I haven't changed the
default value of logical_decoding_work_mem.

2.
+ /*
+ * We're going modify the size of the change, so to make sure the
+ * accounting is correct we'll make it look like we're removing the
+ * change now (with the old size), and then re-add it at the end.
+ */


/going modify/going to modify/

3.
+ *
+ * While updating the existing change with detoasted tuple data, we need to
+ * update the memory accounting info, because the change size will differ.
+ * Otherwise the accounting may get out of sync, triggering serialization
+ * at unexpected times.
+ *
+ * We simply subtract size of the change before rejiggering the tuple, and
+ * then adding the new size. This makes it look like the change was removed
+ * and then added back, except it only tweaks the accounting info.
+ *
+ * In particular it can't trigger serialization, which would be pointless
+ * anyway as it happens during commit processing right before handing
+ * the change to the output plugin.
  */
 static void
 ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
@@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb,
ReorderBufferTXN *txn,
  if (txn->toast_hash == NULL)
  return;

+ /*
+ * We're going modify the size of the change, so to make sure the
+ * accounting is correct we'll make it look like we're removing the
+ * change now (with the old size), and then re-add it at the end.
+ */
+ ReorderBufferChangeMemoryUpdate(rb, change, false);

It is not very clear why this change is required.  Basically, this is done
at commit time after which actually we shouldn't attempt to spill these
changes.  This is mentioned in comments as well, but it is not clear if
that is the case, then how and when accounting can create a problem.  If
possible, can you explain it with an example?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-03 Thread Alexey Kondratov

On 03.10.2019 6:07, Michael Paquier wrote:

On Wed, Oct 02, 2019 at 08:28:09PM +0300, Alexey Kondratov wrote:

I've directly followed your guess and tried to elaborate pg_rewind test
cases and... It seems I've caught a few bugs:

1) --dry-run actually wasn't completely 'dry'. It did update target
controlfile, which could cause repetitive pg_rewind calls to fail after
dry-run ones.

I have just paid attention to this thread, but this is a bug which
goes down to 12 actually so let's treat it independently of the rest.
The control file was not written thanks to the safeguards in
write_target_range() in past versions, but the recent refactoring
around control file handling broke that promise.  Another thing which
is not completely exact is the progress reporting which should be
reported even if the dry-run mode runs.  That's less critical, but
let's make things consistent.


I also thought about v12, though didn't check whether it's affected.


Patch 0001 also forgot that recovery.conf should not be written either
when no rewind is needed.


Yes, definitely, I forgot this code path, thanks.


I have reworked your first patch as per the attached.  What do you
think about it?  The part with the control file needs to go down to
v12, and I would likely split that into two commits on HEAD: one for
the control file and a second for the recovery.conf portion with the
fix for --no-ensure-shutdown to keep a cleaner history.


It looks fine for me excepting the progress reporting part. It now adds 
PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control 
file is either included into filemap and fetch_size or counted during 
calculate_totals(). Maybe I've missed something, but now it looks like 
we report something that wasn't planned for progress reporting, doesn't it?



+   # Check that incompatible options error out.
+   command_fails(
+   [
+   'pg_rewind', "--debug",
+   "--source-pgdata=$standby_pgdata",
+   "--target-pgdata=$master_pgdata", "-R",
+   "--no-ensure-shutdown"
+   ],
+   'pg_rewind local with -R');
Incompatible options had better be checked within a separate perl
script?  We generally do that for the other binaries.


Yes, it makes sense. I've reworked the patch with tests and added a 
couple of extra cases.



--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 9e828e311dc7c216e5bfb1936022be4f7fd3805f Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Thu, 3 Oct 2019 12:37:26 +0300
Subject: [PATCH v2 2/2] Increase pg_rewind code coverage

Branch: pg-rewind-fixes
---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/005_same_timeline.pl   | 32 +---
 src/bin/pg_rewind/t/006_actions.pl | 61 ++
 src/bin/pg_rewind/t/RewindTest.pm  | 20 ++-
 7 files changed, 107 insertions(+), 14 deletions(-)
 create mode 100644 src/bin/pg_rewind/t/006_actions.pl

diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index c3293e93df..1ba1648af6 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 11;
+use Test::More tests => 13;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl
index 1db534c0dc..57674ff4b3 100644
--- a/src/bin/pg_rewind/t/002_databases.pl
+++ b/src/bin/pg_rewind/t/002_databases.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 9;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index f4710440fc..16c92cb2d6 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 7;
 
 use File::Find;
 
diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index 639eeb9c91..6dabd11db6 100644
--- a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
+++ b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
@@ -14,7 +14,7 @@ if ($windows_os)
 }
 else
 {
-	plan tests => 5;
+	plan tests => 7;
 }
 
 use FindBin;
diff --git a/src/bin/pg_rewind/t/005_same_timeline.pl b/src/bin/pg_rewind/t/005_same_timeline.pl
index 40dbc44caa..089466a06b 100644
--- a/src/bin/pg_rewind/t/005_same_timeline.pl
+++ b/src/bin/pg_rewind/t/005_same_timeline.pl
@@ -1,19 +1,35 @@
+#
+# Test that running pg_rewind if the two cl

Re: dropping column prevented due to inherited index

2019-10-03 Thread Amit Langote
On Wed, Sep 4, 2019 at 2:36 PM Amit Langote  wrote:
>
> Hi,
>
> Maybe I'm forgetting some dependency management discussion that I was
> part of recently, but is the following behavior unintentional?
>
> create table p (a int, b int, c int) partition by list (a);
> create table p1 partition of p for values in (1) partition by list (b);
> create table p11 partition of p1 for values in (1);
> create index on p (c);
> alter table p drop column c;
> ERROR:  cannot drop index p11_c_idx because index p1_c_idx requires it
> HINT:  You can drop index p1_c_idx instead.
>
> Dropping c should've automatically dropped the index p_c_idx and its
> children and grandchildren, so the above complaint seems redundant.

Interestingly, this behavior only occurs with v12 and HEAD.  With v11:

create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1) partition by list (b);
create table p11 partition of p1 for values in (1);
create index on p (c);
alter table p drop column c; -- ok

Note that the multi-level partitioning is not really needed to
reproduce the error.

I think the error in my first email has to do with the following
commit made to v12 and HEAD branches earlier this year:

commit 1d92a0c9f7dd547ad14cd8a3974289c5ec7f04ce
Author: Tom Lane 
Date:   Mon Feb 11 14:41:13 2019 -0500

Redesign the partition dependency mechanism.

There may not really be any problem with the commit itself, but I
suspect that the new types of dependencies (or the way
findDependentObject() analyzes them) don't play well with inheritance
recursion of ATExecDropColumn().  Currently, child columns (and its
dependencies) are dropped before the parent column (and its
dependencies).  By using the attached patch which reverses that order,
the error goes away, but I'm not sure that that's the correct
solution.

Tom, thoughts?

Thanks,
Amit


ATExecDropColumn-inh-recursion-fix.patch
Description: Binary data


Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-10-03 Thread Amit Khandekar
On Wed, 18 Sep 2019 at 12:24, Amit Khandekar  wrote:
> Probably, for now at least, what everyone seems to agree is to take my
> earlier attached patch forward.
>
> I am going to see if I can add a TAP test for the patch, and will add
> the patch into the commitfest soon.

Attached is an updated patch v2. Has a new test scenario added in
contrib/test_decoding/sql/spill test, and some minor code cleanup.

Going to add this into Nov commitfest.





--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/test_decoding/expected/spill.out b/contrib/test_decoding/expected/spill.out
index 10734bd..e1d150b6 100644
--- a/contrib/test_decoding/expected/spill.out
+++ b/contrib/test_decoding/expected/spill.out
@@ -247,6 +247,25 @@ GROUP BY 1 ORDER BY 1;
  'serialize-nested-subbig-subbigabort-subbig-3 |  5000 | table public.spill_test: INSERT: data[text]:'serialize-nested-subbig-subbigabort-subbig-3:5001' | table public.spill_test: INSERT: data[text]:'serialize-nested-subbig-subbigabort-subbig-3:1'
 (2 rows)
 
+-- large number of spilling subxacts. Should not exceed maxAllocatedDescs
+BEGIN;
+DO $$
+BEGIN
+FOR i IN 1..600 LOOP
+BEGIN
+INSERT INTO spill_test select generate_series(1, 5000) ;
+EXCEPTION
+when division_by_zero then perform 'dummy';
+END;
+END LOOP;
+END $$;
+COMMIT;
+SELECT 1 from pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
+ ?column? 
+--
+1
+(1 row)
+
 DROP TABLE spill_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/spill.sql b/contrib/test_decoding/sql/spill.sql
index e638cac..715d1f9 100644
--- a/contrib/test_decoding/sql/spill.sql
+++ b/contrib/test_decoding/sql/spill.sql
@@ -174,6 +174,21 @@ SELECT (regexp_split_to_array(data, ':'))[4] COLLATE "C", COUNT(*), (array_agg(d
 FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL) WHERE data ~ 'INSERT'
 GROUP BY 1 ORDER BY 1;
 
+-- large number of spilling subxacts. Should not exceed maxAllocatedDescs
+BEGIN;
+DO $$
+BEGIN
+FOR i IN 1..600 LOOP
+BEGIN
+INSERT INTO spill_test select generate_series(1, 5000) ;
+EXCEPTION
+when division_by_zero then perform 'dummy';
+END;
+END LOOP;
+END $$;
+COMMIT;
+SELECT 1 from pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
+
 DROP TABLE spill_test;
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 8ce28ad..1e5a725 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -103,13 +103,21 @@ typedef struct ReorderBufferTupleCidEnt
 	CommandId	combocid;		/* just for debugging */
 } ReorderBufferTupleCidEnt;
 
+/* Virtual file descriptor with file offset tracking */
+typedef struct TXNEntryFile
+{
+	File		vfd;			/* -1 when the file is closed */
+	off_t		curOffset;		/* offset for next write or read. Reset to 0
+ * when vfd is opened. */
+} TXNEntryFile;
+
 /* k-way in-order change iteration support structures */
 typedef struct ReorderBufferIterTXNEntry
 {
 	XLogRecPtr	lsn;
 	ReorderBufferChange *change;
 	ReorderBufferTXN *txn;
-	int			fd;
+	TXNEntryFile file;
 	XLogSegNo	segno;
 } ReorderBufferIterTXNEntry;
 
@@ -194,7 +202,7 @@ static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
 static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
-		int *fd, XLogSegNo *segno);
+		TXNEntryFile *file, XLogSegNo *segno);
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	   char *change);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -988,7 +996,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 	for (off = 0; off < state->nr_txns; off++)
 	{
-		state->entries[off].fd = -1;
+		state->entries[off].file.vfd = -1;
 		state->entries[off].segno = 0;
 	}
 
@@ -1013,7 +1021,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		{
 			/* serialize remaining changes */
 			ReorderBufferSerializeTXN(rb, txn);
-			ReorderBufferRestoreChanges(rb, txn, &state->entries[off].fd,
+			ReorderBufferRestoreChanges(rb, txn, &state->entries[off].file,
 		&state->entries[off].segno);
 		}
 
@@ -1043,7 +1051,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
 /* serialize remaining changes */
 ReorderBufferSerializeTXN(rb, cur_txn);
 ReorderBufferRestoreChanges(rb, cur_txn,
-			&state->entries[off].fd,
+			&state->entries[off].file,
 			&state->entries[off].segno);
 			}
 			cur_ch

Re: WIP/PoC for parallel backup

2019-10-03 Thread Jeevan Chalke
Hi  Asif,

I was looking at the patch and tried comipling it. However, got few errors
and warnings.

Fixed those in the attached patch.

On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman  wrote:

> Hi Robert,
>
> Thanks for the feedback. Please see the comments below:
>
> On Tue, Sep 24, 2019 at 10:53 PM Robert Haas 
> wrote:
>
>> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman 
>> wrote:
>> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
>> > If the parallel option is there, then it will only do pg_start_backup,
>> scans PGDATA and sends a list of file names.
>>
>> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
>> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
>> list of file names. I don't think that's a good approach. It's too
>> confusing to have one replication command that returns totally
>> different things depending on whether some option is given.
>>
>
> Sure. I will add a separate command (START_BACKUP)  for parallel.
>
>
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
>> list.
>> > pg_basebackup will then send back a list of filenames in this command.
>> This commands will be send by each worker and that worker will be getting
>> the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>>
>
> I considered this approach initially,  however, I adopted the current
> strategy to avoid multiple round trips between the server and clients and
> save on query processing time by issuing a single command rather than
> multiple ones. Further fetching multiple files at once will also aid in
> supporting the tar format by utilising the existing ReceiveTarFile()
> function and will be able to create a tarball for per tablespace per worker.
>
>
>>
>> > - STOP_BACKUP
>> > when all workers finish then, pg_basebackup will send STOP_BACKUP
>> command.
>>
>> This also seems reasonable, but surely the matching command should
>> then be called START_BACKUP, not BASEBACKUP PARALLEL.
>>
>> > I have done a basic proof of concenpt (POC), which is also attached. I
>> would appreciate some input on this. So far, I am simply dividing the list
>> equally and assigning them to worker processes. I intend to fine tune this
>> by taking into consideration file sizes. Further to add tar format support,
>> I am considering that each worker process, processes all files belonging to
>> a tablespace in its list (i.e. creates and copies tar file), before it
>> processes the next tablespace. As a result, this will create tar files that
>> are disjointed with respect tablespace data. For example:
>>
>> Instead of doing this, I suggest that you should just maintain a list
>> of all the files that need to be fetched and have each worker pull a
>> file from the head of the list and fetch it when it finishes receiving
>> the previous file.  That way, if some connections go faster or slower
>> than others, the distribution of work ends up fairly even.  If you
>> instead pre-distribute the work, you're guessing what's going to
>> happen in the future instead of just waiting to see what actually does
>> happen. Guessing isn't intrinsically bad, but guessing when you could
>> be sure of doing the right thing *is* bad.
>>
>> If you want to be really fancy, you could start by sorting the files
>> in descending order of size, so that big files are fetched before
>> small ones.  Since the largest possible file is 1GB and any database
>> where this feature is important is probably hundreds or thousands of
>> GB, this may not be very important. I suggest not worrying about it
>> for v1.
>>
>
> Ideally, I would like to support the tar format as well, which would be
> much easier to implement when fetching multiple files at once since that
> would enable using the existent functionality to be used without much
> change.
>
> Your idea of sorting the files in descending order of size seems very
> appealing. I think we can do this and have the file divided among the
> workers one by one i.e. the first file in the list goes to worker 1, the
> second to process 2, and so on and so forth.
>
>
>>
>> > Say, tablespace t1 has 20 files and we have 5 worker processes and
>> tablespace t2 has 10. Ignoring all other factors for the sake of this
>> example, each worker process will get a group of 4 files of t1 and 2 files
>> of t2. Each process will create 2 tar files, one for t1 containing 4 files
>> and another for t2 containing 2 files.
>>
>> This is one of several possible approaches. If we're doing a
>> plain-format backup in parallel, we can just write each file where it
>> needs to go and call it good. But, with a tar-format backup, what
>> should we do? I can see three options:
>>
>> 1. Error! Tar format parallel backups are not supported.
>>
>> 2. Write multiple tar files. The user might reasonably expect that
>> they're going to end up with the same files at the end of the backup
>> regardless of wheth

Re: [HACKERS] Block level parallel vacuum

2019-10-03 Thread Dilip Kumar
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada  wrote:
>
I have started reviewing this patch and I have some cosmetic comments.
I will continue the review tomorrow.

+This change adds PARALLEL option to VACUUM command that enable us to
+perform index vacuuming and index cleanup with background
+workers. Indivisual

/s/Indivisual/Individual/

+ * parallel worker processes. Individual indexes is processed by one vacuum
+ * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the

/s/Individual indexes is processed/Individual indexes are processed/
/s/At beginning/ At the beginning

+ * parallel workers. In parallel lazy vacuum, we enter parallel mode and
+ * create the parallel context and the DSM segment before starting heap
+ * scan.

Can we extend the comment to explain why we do that before starting
the heap scan?

+ else
+ {
+ if (for_cleanup)
+ {
+ if (lps->nworkers_requested > 0)
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index cleanup
(planned: %d, requested %d)",
+   "launched %d parallel vacuum workers for index cleanup (planned:
%d, requsted %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers,
+ lps->nworkers_requested);
+ else
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers);
+ }
+ else
+ {
+ if (lps->nworkers_requested > 0)
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index vacuuming
(planned: %d, requested %d)",
+   "launched %d parallel vacuum workers for index vacuuming (planned:
%d, requested %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers,
+ lps->nworkers_requested);
+ else
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index vacuuming
(planned: %d)",
+   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers);
+ }

Multiple places I see a lot of duplicate code for for_cleanup is true
or false.  The only difference is in the error message whether we give
index cleanup or index vacuuming otherwise complete code is the same
for
both the cases.  Can't we create some string and based on the value of
the for_cleanup and append it in the error message that way we can
avoid duplicating this at many places?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: dropping column prevented due to inherited index

2019-10-03 Thread Alvaro Herrera
On 2019-Oct-03, Amit Langote wrote:

> There may not really be any problem with the commit itself, but I
> suspect that the new types of dependencies (or the way
> findDependentObject() analyzes them) don't play well with inheritance
> recursion of ATExecDropColumn().  Currently, child columns (and its
> dependencies) are dropped before the parent column (and its
> dependencies).  By using the attached patch which reverses that order,
> the error goes away, but I'm not sure that that's the correct
> solution.

Hmm.  I wonder if we shouldn't adopt the coding pattern we've used
elsewhere of collecting all columns to be dropped first into an
ObjectAddresses array, then use performMultipleDeletions.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Collation versioning

2019-10-03 Thread Thomas Munro
On Thu, Oct 3, 2019 at 7:53 AM Peter Eisentraut
 wrote:
> On 2018-09-05 23:18, Thomas Munro wrote:
> > On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg  wrote:
> >>> So, it's not ideal but perhaps worth considering on the grounds that
> >>> it's better than nothing?
> >>
> >> Ack.
> >
> > Ok, here's a little patch like that.
> >
> > postgres=# select collname, collcollate, collversion from pg_collation
> > where collname = 'en_NZ';
> >  collname | collcollate | collversion
> > --+-+-
> >  en_NZ| en_NZ.utf8  | 2.24
> > (1 row)
>
> After, um, briefly sleeping on this, I would like to go ahead with this.
>
> There is ongoing work to make ICU available globally, and as part of
> that I've also proposed a way to make the collation version tracking
> work on a database level.
>
> This here would be a useful piece on the overall picture.  Independent
> of what becomes of the ICU effort, we could have glibc collation version
> tracking completely working for PG13.

+1

Also, better ideas about which objects to attach versions to can come
along independently of this.

> The only open question on this patch was whether it's a good version to
> use.  I think based on subsequent discussions, there was the realization
> that this is the best we can do and better than nothing.
>
> In the patch, I would skip the configure test and just do
>
> #ifdef __GLIBC__
>
> directly.

Ok.  Here's one like that.  Also, a WIP patch for FreeBSD.

Thanks,
Thomas
From 79ec43cd69e82867f479e4df7e87789f89525826 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 4 Oct 2019 00:03:04 +1300
Subject: [PATCH 1/2] Use libc version as a collation version on glibc systems.

Using glibc's version number to detect potential collation definition
changes is not 100% reliable, but it's better than nothing.

Author: Thomas Munro
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/4b76c6d4-ae5e-0dc6-7d0d-b5c796a07e34%402ndquadrant.com
---
 src/backend/utils/adt/pg_locale.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b2f08ead45..694ff7626e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -70,6 +70,10 @@
 #include 
 #endif
 
+#ifdef __GLIBC__
+#include 
+#endif
+
 #ifdef WIN32
 /*
  * This Windows file defines StrNCpy. We don't need it here, so we undefine
@@ -1518,7 +1522,7 @@ pg_newlocale_from_collation(Oid collid)
 char *
 get_collation_actual_version(char collprovider, const char *collcollate)
 {
-	char	   *collversion;
+	char	   *collversion = NULL;
 
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
@@ -1542,7 +1546,13 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	}
 	else
 #endif
-		collversion = NULL;
+	if (collprovider == COLLPROVIDER_LIBC)
+	{
+#if defined(__GLIBC__)
+		/* Use the glibc version because we don't have anything better. */
+		collversion = pstrdup(gnu_get_libc_version());
+#endif
+	}
 
 	return collversion;
 }
-- 
2.20.1

From 81f5cf0283541874f6c9d2c2339e9326ff1ebee4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 4 Oct 2019 00:19:49 +1300
Subject: [PATCH 2/2] WIP: Use querylocale() for collation versions on FreeBSD.

Requires a WIP FreeBSD patch: https://reviews.freebsd.org/D17166
---
 src/backend/utils/adt/pg_locale.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 694ff7626e..a66f1c1fbe 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1551,6 +1551,18 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
 		collversion = pstrdup(gnu_get_libc_version());
+#elif defined(LC_VERSION_MASK)
+		/* FreeBSD exposes the CLDR version. */
+		locale_t loc = newlocale(LC_COLLATE, collcollate, NULL);
+		if (loc)
+		{
+			collversion =
+pstrdup(querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
+			freelocale(loc);
+		}
+		else
+			ereport(ERROR,
+	(errmsg("could not load locale \"%s\"", collcollate)));
 #endif
 	}
 
-- 
2.20.1



Re: Close stdout and stderr in syslogger

2019-10-03 Thread Святослав Ермилин
Hi,

> In any case, the proposed patch almost certainly introduces new
> problems, in that you dropped the fcloses's into code that
> executes repeatedly.

I guess it's better to place fclose() right after successful syslogger start. 
In that case we close descriptors just one time. But it's enough to solve the 
problem.
Developers who debug syslogger generally should see problems before we close 
descriptors.
In other case they can edit code of syslogger.

There is another way to solve this problem:
We can give users the opportunity to leave or close descriptors.
I.e. config switch for this. But I think that it's too complicated.


One can reproduce the problem by following steps in previous messages.

_

Regards, Sviatoslav ErmilinFrom 809121ee5494d9189a757cbc924c0c933fd64e14 Mon Sep 17 00:00:00 2001
From: Ermilin Sviatoslav 
Date: Wed, 2 Oct 2019 11:59:00 +0500
Subject: [PATCH] Close stderr and stdout in syslogger.

Debian tools from postgresql-common starts pg_ctl piped to logfile.
Descriptor is piped to logfile and block it for delete.
That is why we can't delete logfile.1 after logrotate.
And after second logrotate logfile.1 is in "deleted" status,
but can't be deleted in fact.
---
 src/backend/postmaster/syslogger.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 14d72d38d7..aa063cd08d 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -296,6 +296,8 @@ SysLoggerMain(int argc, char *argv[])
 	 */
 	whereToSendOutput = DestNone;
 
+	fclose(stderr);
+	fclose(stdout);
 	/* main worker loop */
 	for (;;)
 	{
-- 
2.20.1 (Apple Git-117)



RE: Shared Memory: How to use SYSV rather than MMAP ?

2019-10-03 Thread REIX, Tony
Hi Thomas,

I've noticed that your patch adds:

+#if defined(SHM_LGPAGE)
+   /* AIX */
+   shmget_extra_flags = SHM_LGPAGE | SHM_PIN;
+#endif

However, my original patch contained:
| SHM_LGPAGE | SHM_PIN | S_IRUSR | S_IWUSR);

It looks like these 2 additional flags are specific to AIX and, as far as I 
remember, and based on the man of shmget() of AIX, there are required:
  S_IRUSR
   Permits the process that owns the data structure to read it.
  S_IWUSR
   Permits the process that owns the data structure to modify 
it.
So, I'm adding them to your patch.


I'm now trying to run the tests (some other weird issue during build before 
broke the whole process).

So, I think that I have to do this change in order to ask SystemV memory 
mapping on AIX:
--- ./src/bin/pg_upgrade/tmp_check/data/postgresql.conf.ORIGIN
+++ ./src/bin/pg_upgrade/tmp_check/data/postgresql.conf
-#shared_memory_type = mmap # the default is the first option
+shared_memory_type = sysv  # the default is the first option

However, I do not master the details of which .conf file is used when testing.

Trying now to rebuild and hope tests will be launched this time.

Regards,

Tony

De : Thomas Munro 
Envoyé : mardi 10 septembre 2019 00:57
À : Alvaro Herrera 
Cc : REIX, Tony ; Andres Freund ; 
Robert Haas ; Pg Hackers ; 
EMPEREUR-MOT, SYLVIE 
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

On Wed, Sep 4, 2019 at 10:30 AM Alvaro Herrera  wrote:
> On 2019-Feb-03, Thomas Munro wrote:
> > On Sat, Feb 2, 2019 at 12:49 AM Thomas Munro
> >  wrote:
> > > I am planning to commit the 0001 patch shortly, unless there are
> > > objections.  I attach a new version, which improves the documentation
> > > a bit (cross-referencing the new GUC and the section on sysctl
> > > settings).  That will give us shared_memory_type = sysv.
> >
> > Committed 0001.
>
> So can you please rebase what remains?

Here's a quick rebase.  It needs testing, review and (probably)
adjustment from AIX users.  I'm not going to be able to do anything
with it on my own due to lack of access, though I'm happy to help get
this committed eventually.  If we don't get any traction in this CF,
I'll withdraw this submission for now.  For consistency, I think we
should eventually also do the same thing for Linux when using sysv
(it's pretty similar, it just uses different flag names; it may also
be necessary to query the page size and round up the requested size,
on one or both of those OSes; I'm not sure).

--
Thomas Munro
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fenterprisedb.com&data=02%7C01%7Ctony.reix%40atos.net%7C1d6794406e304ea3813b08d7357931e2%7C33440fc6b7c7412cbb730e70b0198d5a%7C0%7C0%7C63703949593490&sdata=8D7VDNLLLs1Aj9XZ8o%2B3YveH6iDQcM3E67AiE66v4f8%3D&reserved=0

ATOS WARNING !
This message contains attachments that could potentially harm your computer.
Please make sure you open ONLY attachments from senders you know, trust and is 
in an e-mail that you are expecting.

AVERTISSEMENT ATOS !
Ce message contient des pièces jointes qui peuvent potentiellement endommager 
votre ordinateur.
Merci de vous assurer que vous ouvrez uniquement les pièces jointes provenant 
d’emails que vous attendez et dont vous connaissez les expéditeurs et leur 
faites confiance.

AVISO DE ATOS !
Este mensaje contiene datos adjuntos que pudiera ser que dañaran su ordenador.
Asegúrese de abrir SOLO datos adjuntos enviados desde remitentes de confianza y 
que procedan de un correo esperado.

ATOS WARNUNG !
Diese E-Mail enthält Anlagen, welche möglicherweise ihren Computer beschädigen 
könnten.
Bitte beachten Sie, daß Sie NUR Anlagen öffnen, von einem Absender den Sie 
kennen, vertrauen und vom dem Sie vor allem auch E-Mails mit Anlagen erwarten.


Re: WIP/PoC for parallel backup

2019-10-03 Thread Robert Haas
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman  wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. 
>> > This commands will be send by each worker and that worker will be getting 
>> > the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially,  however, I adopted the current 
> strategy to avoid multiple round trips between the server and clients and 
> save on query processing time by issuing a single command rather than 
> multiple ones. Further fetching multiple files at once will also aid in 
> supporting the tar format by utilising the existing ReceiveTarFile() function 
> and will be able to create a tarball for per tablespace per worker.

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

> Ideally, I would like to support the tar format as well, which would be much 
> easier to implement when fetching multiple files at once since that would 
> enable using the existent functionality to be used without much change.

I think we should just have the client generate the tarfile. It'll
require duplicating some code, but it's not actually that much code or
that complicated from what I can see.


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: Implement jsonpath .datetime() method

2019-10-03 Thread Robert Haas
On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
 wrote:
> So, basically standard requires us to suppress any error happening in
> filter expression.

Sounds like the standard is dumb, then. :-)

> But as I wrote before suppression of errors in
> datetime comparison may lead to surprising results.  That happens in
> rare corner cases, but still.  This makes uneasy choice between
> consistent behavior and standard behavior.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Include RELKIND_TOASTVALUE in get_relkind_objtype

2019-10-03 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Oct 01, 2019 at 12:10:50AM +, Hsu, John wrote:
>> get_relkind_objtype(...) was introduced as part of 8b9e9644dc, and it 
>> doesn't include
>> RELKIND_TOASTVALUE. As a result when a user who has usage rights on schema 
>> pg_toast
>> and attempts to reindex a table it is not the owner of it fails with the 
>> wrong error
>> message.

> (Adding Peter E. in CC)

> Sure.  However this implies that the user doing the reindex not only
> has ownership of the relation worked on, but is also able to work
> directly on the schema pg_toast.  Should we really encourage people to
> do that with non-superusers?

FWIW, I really dislike this patch, mainly because it is based on the 
assumption (as John said) that get_relkind_objtype is used only
in aclcheck_error calls.  However it's not obvious why that should
be true, and there certainly is no documentation suggesting that
it needs to be true.  That's mainly because get_relkind_objtype has no
documentation period, which if you ask me is flat out unacceptable
for a globally-exposed function.  (Same comment about its wrapper
get_object_type.)

The patch also falsifies the comment just a few lines away that

/*
 * other relkinds are not supported here because they don't map to
 * OBJECT_* values
 */

without doing anything about that.

I'm inclined to think that we should redefine the charter of
get_relkind_objtype/get_object_type to be that they'll produce
some OBJECT_* value for any relkind whatever, on the grounds
that throwing an error here isn't a particularly useful behavior;
we'd rather come out with a possibly-slightly-inaccurate generic
message about a "table".  And they need to be documented that way.

Alternatively, instead of mapping other relkinds to OBJECT_TABLE,
we could invent a new enum entry OBJECT_RELATION.  There's precedent
for that in OBJECT_ROUTINE ... but I don't know that we want to
build out all the other infrastructure for a new ObjectType right now.

regards, tom lane




Re: Regarding extension

2019-10-03 Thread Euler Taveira
Em qui, 3 de out de 2019 às 02:24, Natarajan R  escreveu:
>
> I am creating sample extension in postgres, For that "During PG_INIT, i want 
> to get the list of database Id's in which my extension is installed". Is 
> there a way to get this?
>
I'm not sure what you mean by "ld". However, if you want to know an
extension is installed in a specific database, you should be logged in
it. That's because extension catalog is not global.

> How to have trigger for create extension?
>
Event triggers.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




identity column behavior in WHEN condition for BEFORE EACH ROW trigger

2019-10-03 Thread Suraj Kharage
Hi,

It is been observed that when we define the generated columns in WHEN
condition for BEFORE EACH ROW trigger then server throw an error from
CreateTrigger().

e.g:
create table bar(a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2)
STORED);

CREATE OR REPLACE FUNCTION test() RETURNS trigger AS $$
BEGIN
NEW.b  = 10;
raise notice 'Before row trigger';
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

postgres@78049=#CREATE TRIGGER bar_trigger
BEFORE INSERT ON bar
FOR EACH ROW
*WHEN (NEW.b < 8)*
EXECUTE FUNCTION test();
2019-10-03 19:25:29.945 IST [78049] ERROR:  *BEFORE trigger's WHEN
condition cannot reference NEW generated columns* at character 68
2019-10-03 19:25:29.945 IST [78049] DETAIL:  Column "b" is a generated
column.
2019-10-03 19:25:29.945 IST [78049] STATEMENT:  CREATE TRIGGER bar_trigger
BEFORE INSERT ON bar
FOR EACH ROW
WHEN (NEW.b < 8)
EXECUTE FUNCTION test();
ERROR:  BEFORE trigger's WHEN condition cannot reference NEW generated
columns
LINE 4: WHEN (NEW.b < 8)
  ^
DETAIL:  Column "b" is a generated column.


whereas, for identity columns, server allows us to create trigger for same
and trigger gets invoked as defined. Is this behavior expected? or we need
to restrict the identity columns in such scenario because anyone one
override the identity column value in trigger.

e.g:

create table foo(no int, id int  generated always as identity);

CREATE OR REPLACE FUNCTION test() RETURNS trigger AS $$
BEGIN
NEW.id  = 10;
raise notice 'Before row trigger';
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER foo_trigger
BEFORE INSERT ON foo
FOR EACH ROW
WHEN (NEW.id < 8)
EXECUTE FUNCTION test();


postgres@78049=#insert into foo values(1);
*NOTICE:  Before row trigger*
INSERT 0 1
postgres@78049=#select * from foo;
 no | id
+
  1 | *10*
(1 row)


Thoughts?

-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.


Re: WIP: Generic functions for Node types using generated metadata

2019-10-03 Thread Robert Haas
On Wed, Oct 2, 2019 at 4:46 PM Andres Freund  wrote:
> > The existing expectation is that we make our build tools in Perl.
> > I'm sure Andres doesn't want to write a C parser in Perl, but
> > poking around suggests that there are multiple options already
> > available in CPAN.  I'd much rather tell people "oh, there's YA
> > module you need to get from CPAN" than "figure out how to install
> > version XXX of LLVM".
>
> As far as I can tell they're all at least one of
> 1) written in C, so also have build requirements (obviously a shorter
>build time)
> 2) not very good (including plenty unsupported C, not to speak of
>various optional extensions we use, not having preprocessor support,
>...)
> 3) unmaintained for many years.

I find this argument to be compelling.

To me, it seems like having to install some random CPAN module is
probably more likely to be a problem than having to install LLVM. For
one thing, it's pretty likely that any given developer already has
LLVM, either because they're using clang as their C compiler in
general, or because their system has it installed anyway, or because
they've built with JIT support at least once. And, if they haven't got
it, their favorite packaging system will certainly have a build of
LLVM, but it may not have a build of Parse::C::Erratically.

Really, this has been a problem even just for TAP tests, where we've
had periodic debates over whether it's OK for a test to depend on some
new module that everyone may not have installed, possibly because
they're running some ancient Perl distribution (and our definition of
"ancient" might make a historian giggle) where it wasn't bundled with
core Perl. Can we rewrite the test so it doesn't need the module? Do
we just skip the test, possibly masking a failure for some users? Do
we make it fail and force everybody to find a way to install that
module?

Just to be clear, I'm not in love with using LLVM. It's a big hairy
piece of technology that I don't understand. However, it's also a very
widely-supported and very capable piece of technology that other
people do understand, and we're already using it for other things. I
don't think we're going to do better by using something different and
probably less capable and less well-supported for this thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Hooks for session start and end, take two

2019-10-03 Thread Fabrízio de Royes Mello
On Thu, Oct 3, 2019 at 1:35 AM Andres Freund  wrote:
>
> [...]
>
> In this state I think this patch should be flat out rejected.
>

Ok.

> I'm seriously baffled at how this stuff is being pursued aggressively,
> quite apparently without any serious design considerations. I mean this
> stuff had to be backed out twice, yet it's being reproposed without much
> consideration for concerns?
>

And what if (again) for the first version of session start hook we do it
just for:
- client backends
- background workers

For sure we'll need new design, but for now I'm can't imagine a use case
for sessions different than listed above.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


fairywren failures

2019-10-03 Thread Andrew Dunstan


My new msys2 animal fairywren has had 3 recent failures when checking
pg_upgrade. The failures have been while running the regression tests,
specifically the interval test, and they all look like this:


2019-10-03 05:36:00.373 UTC [24272:43] LOG:  server process (PID 23756) was 
terminated by exception 0xC028
2019-10-03 05:36:00.373 UTC [24272:44] DETAIL:  Failed process was running: 
INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted interval');


That error is "bad stack"

The failures have been on REL_12_STABLE (twice) and master (once).
However, they are not consistent (REL_!2_STABLE is currently green).


The interval test itself hasn't changed for m ore than 2 years, and I
haven't found any obvious recent change that might cause the problem. I
guess it could be a comoiler bug ... this is gcc 9.2.0, which is the
current release.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Regarding extension

2019-10-03 Thread Natarajan R
Thanks for your response Euler.

1)
"id" i meant by database id

I make my question simple, " during pg_init i want to get databaseid's in
which my extension is installed... "
1. by using pg_database and pg_extension catalogs
2. if there any other way, kindly suggest me.


2)
I have one sql file which will be loaded during create extension, in that
file only i have code for event trigger for create extension on
ddl_command_end event
My question is "When giving create extension, sql file will be loaded at
that time only, if that is the case, this event trigger will be invoked or
not? "

Thanks...


On Thu, 3 Oct 2019 at 19:31, Euler Taveira  wrote:

> Em qui, 3 de out de 2019 às 02:24, Natarajan R 
> escreveu:
> >
> > I am creating sample extension in postgres, For that "During PG_INIT, i
> want to get the list of database Id's in which my extension is installed".
> Is there a way to get this?
> >
> I'm not sure what you mean by "ld". However, if you want to know an
> extension is installed in a specific database, you should be logged in
> it. That's because extension catalog is not global.
>
> > How to have trigger for create extension?
> >
> Event triggers.
>
>
> --
>Euler Taveira   Timbira -
> http://www.timbira.com.br/
>PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>


Re: Value of Transparent Data Encryption (TDE)

2019-10-03 Thread Robert Haas
On Tue, Oct 1, 2019 at 12:19 PM Bruce Momjian  wrote:
> Just to give more detail.  Initially, there was a desire to store keys
> in only one place, either in the file system or in database tables.
> However, it became clear that the needs of booting the server and crash
> recovery required file system keys, and per-user/db keys were best done
> at the SQL level, so that indexing can be used, and logical dumps
> contain the locked keys.  SQL-level storage allows databases to be
> completely independent of other databases in terms of key storage and
> usage.

Wait, we're going to store the encryption keys with the database? It
seems like you're debating whether to store your front door keys under
the doormat or in a fake rock by the side of the path, when what you
really ought to be doing is keeping them physically separated from the
house, like in your pocket or your purse.

It seems to me that the right design is that there's a configurable
mechanism for PostgreSQL to request keys from someplace outside the
database, and that other place is responsible for storing the keys
securely and not losing them. Probably, it's a key-server of some kind
running on another machine, but if you really want you can do
something insecure instead, like getting them from the local
filesystem.

I admit I haven't been following the threads on this topic, but this
just seems like a really strange idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Robert Haas
On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian  wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
>
> 
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

As others have said, that sounds wrong to me.  I think you need to
encrypt everything.

I'm not sold on the comments that have been made about encrypting the
server log. I agree that could leak data, but that seems like somebody
else's problem: the log files aren't really under PostgreSQL's
management in the same way as pg_clog is. If you want to secure your
logs, send them to syslog and configure it to do whatever you need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian  wrote:
> > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > overflow).  The plan is:
> >
> > 
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> >
> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> > other files.  Is that correct?  Do any other PGDATA files contain user
> > data?
> 
> As others have said, that sounds wrong to me.  I think you need to
> encrypt everything.

That isn't what other database systems do though and isn't what people
actually asking for this feature are expecting to have or deal with.

People who are looking for 'encrypt all the things' should and will be
looking at filesytem-level encryption options.  That's not what this
feature is about.

> I'm not sold on the comments that have been made about encrypting the
> server log. I agree that could leak data, but that seems like somebody
> else's problem: the log files aren't really under PostgreSQL's
> management in the same way as pg_clog is. If you want to secure your
> logs, send them to syslog and configure it to do whatever you need.

I agree with this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Value of Transparent Data Encryption (TDE)

2019-10-03 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Oct 1, 2019 at 12:19 PM Bruce Momjian  wrote:
> > Just to give more detail.  Initially, there was a desire to store keys
> > in only one place, either in the file system or in database tables.
> > However, it became clear that the needs of booting the server and crash
> > recovery required file system keys, and per-user/db keys were best done
> > at the SQL level, so that indexing can be used, and logical dumps
> > contain the locked keys.  SQL-level storage allows databases to be
> > completely independent of other databases in terms of key storage and
> > usage.
> 
> Wait, we're going to store the encryption keys with the database? It
> seems like you're debating whether to store your front door keys under
> the doormat or in a fake rock by the side of the path, when what you
> really ought to be doing is keeping them physically separated from the
> house, like in your pocket or your purse.

This isn't news and shouldn't be shocking- databases which support TDE
all have a vaulting system for managing the keys and, yes, that's stored
with the database.

> It seems to me that the right design is that there's a configurable
> mechanism for PostgreSQL to request keys from someplace outside the
> database, and that other place is responsible for storing the keys
> securely and not losing them. Probably, it's a key-server of some kind
> running on another machine, but if you really want you can do
> something insecure instead, like getting them from the local
> filesystem.

I support the option to have an external vault that's used, but I don't
believe that should be a requirement and I don't think that removes the
need to have a vaulting system of our own, so we can have a stand-alone
TDE solution.

> I admit I haven't been following the threads on this topic, but this
> just seems like a really strange idea.

It's not new and it's how TDE works in all of the other database systems
which support it.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Value of Transparent Data Encryption (TDE)

2019-10-03 Thread Tomas Vondra

On Thu, Oct 03, 2019 at 10:43:21AM -0400, Stephen Frost wrote:

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:

On Tue, Oct 1, 2019 at 12:19 PM Bruce Momjian  wrote:
> Just to give more detail.  Initially, there was a desire to store keys
> in only one place, either in the file system or in database tables.
> However, it became clear that the needs of booting the server and crash
> recovery required file system keys, and per-user/db keys were best done
> at the SQL level, so that indexing can be used, and logical dumps
> contain the locked keys.  SQL-level storage allows databases to be
> completely independent of other databases in terms of key storage and
> usage.

Wait, we're going to store the encryption keys with the database? It
seems like you're debating whether to store your front door keys under
the doormat or in a fake rock by the side of the path, when what you
really ought to be doing is keeping them physically separated from the
house, like in your pocket or your purse.


This isn't news and shouldn't be shocking- databases which support TDE
all have a vaulting system for managing the keys and, yes, that's stored
with the database.



Right. The important bit here is that the vault is encrypted, and has to
be unlocked using a passphrase (or something like that) when starting
the database. So it's not really as silly as a key under the doormat.


It seems to me that the right design is that there's a configurable
mechanism for PostgreSQL to request keys from someplace outside the
database, and that other place is responsible for storing the keys
securely and not losing them. Probably, it's a key-server of some kind
running on another machine, but if you really want you can do
something insecure instead, like getting them from the local
filesystem.


I support the option to have an external vault that's used, but I don't
believe that should be a requirement and I don't think that removes the
need to have a vaulting system of our own, so we can have a stand-alone
TDE solution.



Right. If anything, we need a local vault that we could use for testing.
In other cases it might be a simple wrapper for a vault/keyring provided
by the operating system (if it's good enough for gpg keys ...).


I admit I haven't been following the threads on this topic, but this
just seems like a really strange idea.


It's not new and it's how TDE works in all of the other database systems
which support it.



Yep.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Hooks for session start and end, take two

2019-10-03 Thread Andres Freund
Hi,

On 2019-10-03 11:19:58 -0300, Fabrízio de Royes Mello wrote:
> On Thu, Oct 3, 2019 at 1:35 AM Andres Freund  wrote:
> >
> > [...]
> >
> > In this state I think this patch should be flat out rejected.
> >
> 
> Ok.
> 
> > I'm seriously baffled at how this stuff is being pursued aggressively,
> > quite apparently without any serious design considerations. I mean this
> > stuff had to be backed out twice, yet it's being reproposed without much
> > consideration for concerns?
> >
> 
> And what if (again) for the first version of session start hook we do it
> just for:
> - client backends
> - background workers
> 
> For sure we'll need new design, but for now I'm can't imagine a use case
> for sessions different than listed above.

I think what would need to be designed is something more like
RegisterXactCallback():

/*
 *  start- and end-of-transaction callbacks for dynamically loaded modules
 */
typedef enum
{
XACT_EVENT_COMMIT,
XACT_EVENT_PARALLEL_COMMIT,
XACT_EVENT_ABORT,
XACT_EVENT_PARALLEL_ABORT,
XACT_EVENT_PREPARE,
XACT_EVENT_PRE_COMMIT,
XACT_EVENT_PARALLEL_PRE_COMMIT,
XACT_EVENT_PRE_PREPARE
} XactEvent;

typedef void (*XactCallback) (XactEvent event, void *arg);

extern void RegisterXactCallback(XactCallback callback, void *arg);
extern void UnregisterXactCallback(XactCallback callback, void *arg);


which would be called at various parts of a processes lifetime. Maybe
something like:

1) shortly after process creation
2) shortly after shared memory attach
3) after database connection has been established
4) after database writes are possible (this excludes database-less walsenders)
5a) before closing database connection due to fatal error
(with a note saying that no writes are ever allowed)
5b) before closing database connection due to normal exit
(with a note saying that errors better be handled, and that no
transaction may escape)
6) before shared memory detach
7) before final exit

Greetings,

Andres Freund




Re: fairywren failures

2019-10-03 Thread Andres Freund
Hi,

On 2019-10-03 10:21:13 -0400, Andrew Dunstan wrote:
> My new msys2 animal fairywren has had 3 recent failures when checking
> pg_upgrade. The failures have been while running the regression tests,
> specifically the interval test, and they all look like this:
> 
> 
> 2019-10-03 05:36:00.373 UTC [24272:43] LOG:  server process (PID 23756) was 
> terminated by exception 0xC028
> 2019-10-03 05:36:00.373 UTC [24272:44] DETAIL:  Failed process was running: 
> INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted interval');
> 
> 
> That error is "bad stack"

> The failures have been on REL_12_STABLE (twice) and master (once).
> However, they are not consistent (REL_!2_STABLE is currently green).
> 
> 
> The interval test itself hasn't changed for m ore than 2 years, and I
> haven't found any obvious recent change that might cause the problem. I
> guess it could be a comoiler bug ... this is gcc 9.2.0, which is the
> current release.

This is around where an error is thrown:
 -- badly formatted interval
 INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted interval');
-ERROR:  invalid input syntax for type interval: "badly formatted interval"
-LINE 1: INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted inter...
-  ^

and the error is stack related. So I suspect that setjmp/longjmp might
be to blame here, and somehow don't save/restore the stack into a proper
state. I don't know enough about mingw/msys/windows to know whether that
uses a self-written setjmp or relies on the MS implementation.

If you could gather a backtrace it might help us. It's possible that the
stack is "just" misaligned or something, we had problems with that
before (IIRC valgrind didn't always align stacks correctly for processes
that forked from within a signal handler, which then crashed when using
instructions with alignment requirements, but only sometimes, because
the stack coiuld be aligned).

Greetings,

Andres Freund




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Tomas Vondra

On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:

On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian  wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
>
> 
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

As others have said, that sounds wrong to me.  I think you need to
encrypt everything.


That isn't what other database systems do though and isn't what people
actually asking for this feature are expecting to have or deal with.

People who are looking for 'encrypt all the things' should and will be
looking at filesytem-level encryption options.  That's not what this
feature is about.



That's almost certainly not true, at least not universally.

It may be true for some people, but a a lot of the people asking for
in-database encryption essentially want to do filesystem encryption but
can't use it for various reasons. E.g. because they're running in
environments that make filesystem encryption impossible to use (OS not
supporting it directly, no access to the block device, lack of admin
privileges, ...). Or maybe they worry about people with fs access.

If you look at how the two threads discussing the FDE design, both of
them pretty much started as "let's do FDE in the database".


I'm not sold on the comments that have been made about encrypting the
server log. I agree that could leak data, but that seems like somebody
else's problem: the log files aren't really under PostgreSQL's
management in the same way as pg_clog is. If you want to secure your
logs, send them to syslog and configure it to do whatever you need.


I agree with this.



I don't. I know it's not an easy problem to solve, but it may contain
user data (which is what we manage). We may allow disabling that, at
which point it becomes someone else's problem.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: fairywren failures

2019-10-03 Thread Andres Freund
Hi,

On 2019-10-03 08:18:42 -0700, Andres Freund wrote:
> On 2019-10-03 10:21:13 -0400, Andrew Dunstan wrote:
> > My new msys2 animal fairywren has had 3 recent failures when checking
> > pg_upgrade. The failures have been while running the regression tests,
> > specifically the interval test, and they all look like this:
> > 
> > 
> > 2019-10-03 05:36:00.373 UTC [24272:43] LOG:  server process (PID 23756) was 
> > terminated by exception 0xC028
> > 2019-10-03 05:36:00.373 UTC [24272:44] DETAIL:  Failed process was running: 
> > INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted interval');
> > 
> > 
> > That error is "bad stack"
> 
> > The failures have been on REL_12_STABLE (twice) and master (once).
> > However, they are not consistent (REL_!2_STABLE is currently green).
> > 
> > 
> > The interval test itself hasn't changed for m ore than 2 years, and I
> > haven't found any obvious recent change that might cause the problem. I
> > guess it could be a comoiler bug ... this is gcc 9.2.0, which is the
> > current release.
> 
> This is around where an error is thrown:
>  -- badly formatted interval
>  INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted interval');
> -ERROR:  invalid input syntax for type interval: "badly formatted interval"
> -LINE 1: INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted inter...
> -  ^
> 
> and the error is stack related. So I suspect that setjmp/longjmp might
> be to blame here, and somehow don't save/restore the stack into a proper
> state. I don't know enough about mingw/msys/windows to know whether that
> uses a self-written setjmp or relies on the MS implementation.
> 
> If you could gather a backtrace it might help us. It's possible that the
> stack is "just" misaligned or something, we had problems with that
> before (IIRC valgrind didn't always align stacks correctly for processes
> that forked from within a signal handler, which then crashed when using
> instructions with alignment requirements, but only sometimes, because
> the stack coiuld be aligned).

It seems we're not the only ones hitting this:
https://rt.perl.org/Public/Bug/Display.html?id=133603

Doesn't look like they've really narrowed it down that much yet.

- Andres




Re: Regarding extension

2019-10-03 Thread Tomas Vondra

On Thu, Oct 03, 2019 at 07:51:04PM +0530, Natarajan R wrote:

Thanks for your response Euler.

1)
"id" i meant by database id

I make my question simple, " during pg_init i want to get databaseid's in
which my extension is installed... "
1. by using pg_database and pg_extension catalogs
2. if there any other way, kindly suggest me.



Well, there's also MyDatabaseId variable, which tells you the OID of the
current database. So you can use that, from the C code. In SQL, you can
simply run "SELECT current_database()" or something like that.



2)
I have one sql file which will be loaded during create extension, in that
file only i have code for event trigger for create extension on
ddl_command_end event
My question is "When giving create extension, sql file will be loaded at
that time only, if that is the case, this event trigger will be invoked or
not? "



I'm not sure I understand the question. Are you asking if the event
trigger will be invoked to notify you about creation of the extension
containing it? I'm pretty sure that won't happen - it will be executed
only for future CREATE EXTENSION commands.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Hooks for session start and end, take two

2019-10-03 Thread Mike Palmiotto
On Thu, Oct 3, 2019 at 11:09 AM Andres Freund  wrote:
>
> 
> I think what would need to be designed is something more like
> RegisterXactCallback():
>
> /*
>  *  start- and end-of-transaction callbacks for dynamically loaded modules
>  */
> typedef enum
> {
> XACT_EVENT_COMMIT,
> XACT_EVENT_PARALLEL_COMMIT,
> XACT_EVENT_ABORT,
> XACT_EVENT_PARALLEL_ABORT,
> XACT_EVENT_PREPARE,
> XACT_EVENT_PRE_COMMIT,
> XACT_EVENT_PARALLEL_PRE_COMMIT,
> XACT_EVENT_PRE_PREPARE
> } XactEvent;
>
> typedef void (*XactCallback) (XactEvent event, void *arg);
>
> extern void RegisterXactCallback(XactCallback callback, void *arg);
> extern void UnregisterXactCallback(XactCallback callback, void *arg);
>
>
> which would be called at various parts of a processes lifetime. Maybe
> something like:
>
> 1) shortly after process creation
> 2) shortly after shared memory attach
> 3) after database connection has been established
> 4) after database writes are possible (this excludes database-less walsenders)
> 5a) before closing database connection due to fatal error
> (with a note saying that no writes are ever allowed)
> 5b) before closing database connection due to normal exit
> (with a note saying that errors better be handled, and that no
> transaction may escape)
> 6) before shared memory detach
> 7) before final exit

This suggestion really resonates with me, as I had considered
something similar in the process centralization patchset (CF entry
linked upthread and discussion here:
https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO=htC6LnA6aW4r-+jq=3q5raofqgw8...@mail.gmail.com).
It would be possible to build such a mechanism into that existing
architecture rather easily. Has anyone had a chance to review that
work?

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
> >People who are looking for 'encrypt all the things' should and will be
> >looking at filesytem-level encryption options.  That's not what this
> >feature is about.
> 
> That's almost certainly not true, at least not universally.
> 
> It may be true for some people, but a a lot of the people asking for
> in-database encryption essentially want to do filesystem encryption but
> can't use it for various reasons. E.g. because they're running in
> environments that make filesystem encryption impossible to use (OS not
> supporting it directly, no access to the block device, lack of admin
> privileges, ...). Or maybe they worry about people with fs access.

Anyone coming from other database systems isn't asking for that though
and it wouldn't be a comparable offering to other systems.

> If you look at how the two threads discussing the FDE design, both of
> them pretty much started as "let's do FDE in the database".

And that's how some folks continue to see it- let's just encrypt all the
things, until they actually look at it and start thinking about what
that means and how to implement it.

Yeah, it'd be great to just encrypt everything, with a bunch of
different keys, all of which are stored somewhere else, and can be
updated and changed by the user when they need to do a rekeying, but
then you start have to asking about what keys need to be available when
for doing crash recovery, how do you handle a crash in the middle of a
rekeying, how do you handle updating keys from the user, etc..

Sure, we could offer a dead simple "here, use this one key at database
start to just encrypt everything" and that would be enough for some set
of users (a very small set, imv, but that's subjective, obviously), but
I don't think we could dare promote that as having TDE because it
wouldn't be at all comparable to what other databases have, and it
wouldn't materially move us in the direction of having real TDE.

> >>I'm not sold on the comments that have been made about encrypting the
> >>server log. I agree that could leak data, but that seems like somebody
> >>else's problem: the log files aren't really under PostgreSQL's
> >>management in the same way as pg_clog is. If you want to secure your
> >>logs, send them to syslog and configure it to do whatever you need.
> >
> >I agree with this.
> 
> I don't. I know it's not an easy problem to solve, but it may contain
> user data (which is what we manage). We may allow disabling that, at
> which point it becomes someone else's problem.

We also send user data to clients, but I don't imagine we're suggesting
that we need to control what some downstream application does with that
data or how it gets stored.  There's definitely a lot of room for
improvement in our logging (in an ideal world, we'd have a way to
actually store the logs in the database, at which point it could be
encrypted or not that way...), but I'm not seeing the need for us to
have a way to encrypt the log files.  If we did encrypt them, we'd have
to make sure to do it in a way that users could still access them
without the database being up and running, which might be tricky if the
key is in the vault...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Peter Eisentraut
On 2019-10-03 16:40, Stephen Frost wrote:
>> As others have said, that sounds wrong to me.  I think you need to
>> encrypt everything.
> That isn't what other database systems do though and isn't what people
> actually asking for this feature are expecting to have or deal with.

It is what some other database systems do.  Perhaps some others don't.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 2019-10-03 16:40, Stephen Frost wrote:
> >> As others have said, that sounds wrong to me.  I think you need to
> >> encrypt everything.
> > That isn't what other database systems do though and isn't what people
> > actually asking for this feature are expecting to have or deal with.
> 
> It is what some other database systems do.  Perhaps some others don't.

I looked at the contemporary databases and provided details about all of
them earlier in the thread.  Please feel free to review that and let me
know if your research shows differently.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Improving on MAX_CONVERSION_GROWTH

2019-10-03 Thread Tom Lane
I wrote:
> In the meantime, I still think we should commit what I proposed in the
> other thread (<974.1569356...@sss.pgh.pa.us>), or something close to it.
> Andres' proposal would perhaps be an improvement on that, but I don't
> think it'll be ready anytime soon; and for sure we wouldn't risk
> back-patching it, while I think we could back-patch what I suggested.
> In any case, that patch is small enough that dropping it would be no big
> loss if a better solution comes along.

Not having heard any objections, I'll proceed with that.  Andres is
welcome to work on replacing it with his more-complicated idea...

> Also, as far as the immediate subject of this thread is concerned,
> I'm inclined to get rid of MAX_CONVERSION_GROWTH in favor of using
> the target encoding's max char length.

I realized after re-reading the comment for MAX_CONVERSION_GROWTH that
this thread is based on a false premise, namely that encoding conversions
always produce one "character" out per "character" in.  In the presence of
combining characters and suchlike, that premise fails, and it becomes
quite unclear just what the max growth ratio actually is.  So I'm going
to leave that alone for now.  Maybe this point is an argument for pushing
forward with Andres' approach, but I'm still dubious about the overall
cost/benefit ratio of that concept.

regards, tom lane




Re: fairywren failures

2019-10-03 Thread Andres Freund
Hi,

On 2019-10-03 08:23:49 -0700, Andres Freund wrote:
> On 2019-10-03 08:18:42 -0700, Andres Freund wrote:
> > This is around where an error is thrown:
> >  -- badly formatted interval
> >  INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted interval');
> > -ERROR:  invalid input syntax for type interval: "badly formatted interval"
> > -LINE 1: INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted inter...
> > -  ^
> >
> > and the error is stack related. So I suspect that setjmp/longjmp might
> > be to blame here, and somehow don't save/restore the stack into a proper
> > state. I don't know enough about mingw/msys/windows to know whether that
> > uses a self-written setjmp or relies on the MS implementation.
> >
> > If you could gather a backtrace it might help us. It's possible that the
> > stack is "just" misaligned or something, we had problems with that
> > before (IIRC valgrind didn't always align stacks correctly for processes
> > that forked from within a signal handler, which then crashed when using
> > instructions with alignment requirements, but only sometimes, because
> > the stack coiuld be aligned).
>
> It seems we're not the only ones hitting this:
> https://rt.perl.org/Public/Bug/Display.html?id=133603
>
> Doesn't look like they've really narrowed it down that much yet.

A few notes:

* As an experiment, it could be worthwhile to try to redefine
  sigsetjmp/longjmp/sigjmp_buf with what
  https://gcc.gnu.org/onlinedocs/gcc/Nonlocal-Gotos.html
  provides, it's apparently a separate implementation from MS crt one.

* Arguably
  "Do not use longjmp to transfer control from a callback routine
  invoked directly or indirectly by Windows code."
  and
  "Do not use longjmp to transfer control out of an interrupt-handling
  routine unless the interrupt is caused by a floating-point
  exception. In this case, a program may return from an interrupt
  handler via longjmp if it first reinitializes the floating-point math
  package by calling _fpreset."

  from 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/longjmp?view=vs-2019

  might be violated by our signal signal emulation on windows. But I've
  not looked into that in detail.

* Any chance you could get the pre-processed source for postgres.c or
  such? I'm kinda wondering if the definition of setjmp() that we get
  includes the returns_twice attribute that gcc wants to see, and
  whether we're picking up the mingw version of longjmp, or the windows
  one.

  
https://sourceforge.net/p/mingw-w64/mingw-w64/ci/844cb490ab2cc32ac3df5914700564b2e40739d8/tree/mingw-w64-headers/crt/setjmp.h#l31

* It's certainly curious that the failures so far only have happended as
  part of pg_upgradeCheck, rather than the plain regression tests.

Greetings,

Andres Freund




Re: Improving on MAX_CONVERSION_GROWTH

2019-10-03 Thread Andres Freund
Hi,

On 2019-10-03 12:12:40 -0400, Tom Lane wrote:
> I wrote:
> > In the meantime, I still think we should commit what I proposed in the
> > other thread (<974.1569356...@sss.pgh.pa.us>), or something close to it.
> > Andres' proposal would perhaps be an improvement on that, but I don't
> > think it'll be ready anytime soon; and for sure we wouldn't risk
> > back-patching it, while I think we could back-patch what I suggested.
> > In any case, that patch is small enough that dropping it would be no big
> > loss if a better solution comes along.
> 
> Not having heard any objections, I'll proceed with that.  Andres is
> welcome to work on replacing it with his more-complicated idea...

Yea, what I'm proposing is clearly not backpatchable. So +1


> Maybe this point is an argument for pushing forward with Andres'
> approach, but I'm still dubious about the overall cost/benefit ratio
> of that concept.

I think if it were just for MAX_CONVERSION_GROWTH, I'd be inclined to
agree. But I think it has other advantages, so I'm mildy positivie that
it'll be an overall win...

Greetings,

Andres Freund




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Tomas Vondra

On Thu, Oct 03, 2019 at 11:51:41AM -0400, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:

On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
>People who are looking for 'encrypt all the things' should and will be
>looking at filesytem-level encryption options.  That's not what this
>feature is about.

That's almost certainly not true, at least not universally.

It may be true for some people, but a a lot of the people asking for
in-database encryption essentially want to do filesystem encryption but
can't use it for various reasons. E.g. because they're running in
environments that make filesystem encryption impossible to use (OS not
supporting it directly, no access to the block device, lack of admin
privileges, ...). Or maybe they worry about people with fs access.


Anyone coming from other database systems isn't asking for that though
and it wouldn't be a comparable offering to other systems.



I don't think that's quite accurate. In the previous message you claimed
(1) this isn't what other database systems do and (2) people who want to
encrypt everything should just use fs encryption, because that's not
what TDE is about.

Regarding (1), I'm pretty sure Oracle TDE does pretty much exactly this,
at least in the mode with tablespace-level encryption. It's true there
is also column-level mode, but from my experience it's far less used
because it has a number of annoying limitations.

So I'm somewhat puzzled by your claim that people coming from other
systems are asking for the column-level mode. At least I'm assuming
that's what they're asking for, because I don't see other options.


If you look at how the two threads discussing the FDE design, both of
them pretty much started as "let's do FDE in the database".


And that's how some folks continue to see it- let's just encrypt all the
things, until they actually look at it and start thinking about what
that means and how to implement it.



This argument also works the other way, though. On Oracle, people often
start with the column-level encryption because it seems naturally
superior (hey, I can encrypt just the columns I want, ...) and then they
start running into the various limitations and eventually just switch to
the tablespace-level encryption.

Now, maybe we'll be able to solve those limitations - but I think it's
pretty unlikely, because those limitations seem quite inherent to how
encryption affects indexes etc.


Yeah, it'd be great to just encrypt everything, with a bunch of
different keys, all of which are stored somewhere else, and can be
updated and changed by the user when they need to do a rekeying, but
then you start have to asking about what keys need to be available when
for doing crash recovery, how do you handle a crash in the middle of a
rekeying, how do you handle updating keys from the user, etc..

Sure, we could offer a dead simple "here, use this one key at database
start to just encrypt everything" and that would be enough for some set
of users (a very small set, imv, but that's subjective, obviously), but
I don't think we could dare promote that as having TDE because it
wouldn't be at all comparable to what other databases have, and it
wouldn't materially move us in the direction of having real TDE.



I think that very much depends on the definition of what "real TDE".  I
don't know what exactly that means at this point. And as I said before,
I think such simple mode *is* comparable to (at least some) solutions
available in other databases (as explained above).

As for the users, I don't have any objective data about this, but I
think the amount of people wanting such simple solution is non-trivial.
That does not mean we can't extend it to support more advanced features.


>>I'm not sold on the comments that have been made about encrypting the
>>server log. I agree that could leak data, but that seems like somebody
>>else's problem: the log files aren't really under PostgreSQL's
>>management in the same way as pg_clog is. If you want to secure your
>>logs, send them to syslog and configure it to do whatever you need.
>
>I agree with this.

I don't. I know it's not an easy problem to solve, but it may contain
user data (which is what we manage). We may allow disabling that, at
which point it becomes someone else's problem.


We also send user data to clients, but I don't imagine we're suggesting
that we need to control what some downstream application does with that
data or how it gets stored.  There's definitely a lot of room for
improvement in our logging (in an ideal world, we'd have a way to
actually store the logs in the database, at which point it could be
encrypted or not that way...), but I'm not seeing the need for us to
have a way to encrypt the log files.  If we did encrypt them, we'd have
to make sure to do it in a way that users could still access them
without the database being up and running, which might be tricky if the
key is in the vault...



That's a 

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Tomas Vondra

On Thu, Oct 03, 2019 at 11:58:55AM -0400, Stephen Frost wrote:

Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:

On 2019-10-03 16:40, Stephen Frost wrote:
>> As others have said, that sounds wrong to me.  I think you need to
>> encrypt everything.
> That isn't what other database systems do though and isn't what people
> actually asking for this feature are expecting to have or deal with.

It is what some other database systems do.  Perhaps some others don't.


I looked at the contemporary databases and provided details about all of
them earlier in the thread.  Please feel free to review that and let me
know if your research shows differently.



I assume you mean this (in one of the other threads):

https://www.postgresql.org/message-id/20190817175217.GE16436%40tamriel.snowman.net

FWIW I don't see anything contradicting the idea of just encrypting
everything (including vm, fsm etc.). The only case that seems to be an
exception is the column-level encryption in Oracle, all the other
options (especially the database-level ones) seem to be consistent with
this principle.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Thu, Oct 03, 2019 at 11:51:41AM -0400, Stephen Frost wrote:
> >* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> >>On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
> >>>People who are looking for 'encrypt all the things' should and will be
> >>>looking at filesytem-level encryption options.  That's not what this
> >>>feature is about.
> >>
> >>That's almost certainly not true, at least not universally.
> >>
> >>It may be true for some people, but a a lot of the people asking for
> >>in-database encryption essentially want to do filesystem encryption but
> >>can't use it for various reasons. E.g. because they're running in
> >>environments that make filesystem encryption impossible to use (OS not
> >>supporting it directly, no access to the block device, lack of admin
> >>privileges, ...). Or maybe they worry about people with fs access.
> >
> >Anyone coming from other database systems isn't asking for that though
> >and it wouldn't be a comparable offering to other systems.
> 
> I don't think that's quite accurate. In the previous message you claimed
> (1) this isn't what other database systems do and (2) people who want to
> encrypt everything should just use fs encryption, because that's not
> what TDE is about.
> 
> Regarding (1), I'm pretty sure Oracle TDE does pretty much exactly this,
> at least in the mode with tablespace-level encryption. It's true there
> is also column-level mode, but from my experience it's far less used
> because it has a number of annoying limitations.

We're probably being too general and that's ending up with us talking
past each other.  Yes, Oracle provides tablespace and column level
encryption, but neither case results in *everything* being encrypted.

> So I'm somewhat puzzled by your claim that people coming from other
> systems are asking for the column-level mode. At least I'm assuming
> that's what they're asking for, because I don't see other options.

I've seen asks for tablespace, table, and column-level, but it's always
been about the actual data.  Something like clog is an entirely internal
structure that doesn't include the actual data.  Yes, it's possible it
could somehow be used for a side-channel attack, as could other things,
such as WAL, and as such I'm not sure that forcing a policy of "encrypt
everything" is actually a sensible approach and it definitely adds
complexity and makes it a lot more difficult to come up with a sensible
solution.

> >>If you look at how the two threads discussing the FDE design, both of
> >>them pretty much started as "let's do FDE in the database".
> >
> >And that's how some folks continue to see it- let's just encrypt all the
> >things, until they actually look at it and start thinking about what
> >that means and how to implement it.
> 
> This argument also works the other way, though. On Oracle, people often
> start with the column-level encryption because it seems naturally
> superior (hey, I can encrypt just the columns I want, ...) and then they
> start running into the various limitations and eventually just switch to
> the tablespace-level encryption.
> 
> Now, maybe we'll be able to solve those limitations - but I think it's
> pretty unlikely, because those limitations seem quite inherent to how
> encryption affects indexes etc.

It would probably be useful to discuss the specific limitations that
you've seen causes people to move away from column-level encryption.

I definitely agree that figuring out how to make things work with
indexes is a non-trivial challenge, though I'm hopeful that we can come
up with something sensible.

> >Yeah, it'd be great to just encrypt everything, with a bunch of
> >different keys, all of which are stored somewhere else, and can be
> >updated and changed by the user when they need to do a rekeying, but
> >then you start have to asking about what keys need to be available when
> >for doing crash recovery, how do you handle a crash in the middle of a
> >rekeying, how do you handle updating keys from the user, etc..
> >
> >Sure, we could offer a dead simple "here, use this one key at database
> >start to just encrypt everything" and that would be enough for some set
> >of users (a very small set, imv, but that's subjective, obviously), but
> >I don't think we could dare promote that as having TDE because it
> >wouldn't be at all comparable to what other databases have, and it
> >wouldn't materially move us in the direction of having real TDE.
> 
> I think that very much depends on the definition of what "real TDE".  I
> don't know what exactly that means at this point. And as I said before,
> I think such simple mode *is* comparable to (at least some) solutions
> available in other databases (as explained above).

When I was researching this, I couldn't find any example of a database
that wouldn't start without the one magic key that encrypts everything.
I'm happy to be told that I was wrong in my und

Fix for Bug #16032

2019-10-03 Thread Rob

Hello,

I stumbled on a windows-only bug in pg_basebackup which I've reported as 
#16032 
(https://www.postgresql.org/message-id/16032-4ba56823a2b2805f%40postgresql.org).


I'm pretty sure I've fixed it in the attached patch.

Many Thanks,
RobFrom 148d525c0d30af35abba1b8c5bbe07e4e72ecfec Mon Sep 17 00:00:00 2001
From: Rob Emery 
Date: Tue, 1 Oct 2019 23:51:52 +0100
Subject: [PATCH] Fix bug16032; under windows when the backup is aborted or
 fails, pg_basebackup is unable to cleanup the backup as the filehandles
 haven't been released

---
 src/bin/pg_basebackup/pg_basebackup.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index aa68f59965..b6d68e0256 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1098,6 +1098,17 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 	{
 		fprintf(stderr, _("%s: could not get COPY data stream: %s"),
 progname, PQerrorMessage(conn));
+		if (strcmp(basedir, "-") != 0)
+		{
+			/*
+			 * File handles could already be closed so we don't care about the result
+			 */
+#ifdef HAVE_LIBZ
+			gzclose(ztarfile);
+#else
+			fclose(tarfile);
+#endif
+		}
 		disconnect_and_exit(1);
 	}
 
@@ -1179,6 +1190,17 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 		{
 			fprintf(stderr, _("%s: could not read COPY data: %s"),
 	progname, PQerrorMessage(conn));
+			if (strcmp(basedir, "-") != 0)
+			{
+/*
+ * File handles could already be closed so we don't care about the result
+ */
+#ifdef HAVE_LIBZ
+gzclose(ztarfile);
+#else
+fclose(tarfile);
+#endif
+			}
 			disconnect_and_exit(1);
 		}
 
-- 
2.11.0



Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Thu, Oct 03, 2019 at 11:58:55AM -0400, Stephen Frost wrote:
> >* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> >>On 2019-10-03 16:40, Stephen Frost wrote:
>  As others have said, that sounds wrong to me.  I think you need to
>  encrypt everything.
> >>> That isn't what other database systems do though and isn't what people
> >>> actually asking for this feature are expecting to have or deal with.
> >>
> >>It is what some other database systems do.  Perhaps some others don't.
> >
> >I looked at the contemporary databases and provided details about all of
> >them earlier in the thread.  Please feel free to review that and let me
> >know if your research shows differently.
> 
> I assume you mean this (in one of the other threads):
> 
> https://www.postgresql.org/message-id/20190817175217.GE16436%40tamriel.snowman.net
> 
> FWIW I don't see anything contradicting the idea of just encrypting
> everything (including vm, fsm etc.). The only case that seems to be an
> exception is the column-level encryption in Oracle, all the other
> options (especially the database-level ones) seem to be consistent with
> this principle.

I don't think I was arguing specifically about VM/FSM in particular but
rather about things which, for us, are cluster level.  Admittedly, some
other database systems put more things into tablespaces or databases
than we do (it'd sure be nice if we did in some cases too, but we
don't...), but they do also have things *outside* of those, such that
you can at least bring the system up, to some extent, even if you can't
access a given tablespace or database.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Auxiliary Processes and MyAuxProc

2019-10-03 Thread Andres Freund
Hi,

On 2019-09-30 15:13:18 -0400, Mike Palmiotto wrote:
> Attached is the reworked and rebased patch set. I put the hook on top
> and a separate commit for each process type. Note that avworker and
> avlauncher were intentionally left together. Let me know if you think
> those should be split out as well.

I've not looked at this before, nor the thread. Just jumping here
because of the reference to this patch you made in another thread...

In short, I think this is far from ready, nor am I sure this is going in
the right direction. There's a bit more detail about where I think this
ought to go interspersed below (in particular search for PgSubProcess,
EXEC_BACKEND).



> From 205ea86dde898f7edac327d46b2b43b04721aadc Mon Sep 17 00:00:00 2001
> From: Mike Palmiotto 
> Date: Mon, 30 Sep 2019 14:42:53 -0400
> Subject: [PATCH 7/8] Add backends to process centralization

.oO(out of order attached patches, how fun). 7, 8, 6, 5, 4, 2, 3, 1...



> From 2a3a35f2e80cb2badcb0efbce1bad2484e364b7b Mon Sep 17 00:00:00 2001
> From: Mike Palmiotto 
> Date: Fri, 27 Sep 2019 12:28:19 -0400
> Subject: [PATCH 1/8] Add ForkProcType infrastructure

A better explanation would be nice. It's hard to review non-trivial
patches that have little explanation as to what they're
achieving. Especially when there's some unrelated seeming changes.


> diff --git a/src/backend/bootstrap/bootstrap.c 
> b/src/backend/bootstrap/bootstrap.c
> index 9238fbe98d..9f3dad1c6d 100644
> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -70,6 +70,7 @@ static void cleanup(void);
>   */
>
>  AuxProcType MyAuxProcType = NotAnAuxProcess; /* declared in miscadmin.h */
> +ForkProcType MyForkProcType = NoForkProcess; /* declared in fork_process.h */
>
>  Relation boot_reldesc;   /* current relation descriptor
>  */

IMO, before this stuff gets massaged meaningfully, we ought to move code
like this (including AuxiliaryProcessMain etc) somewhere where it makes
sense. If we're going to rejigger things, it ought to be towards a more
understandable architecture, rather than keeping weird quirks around.

I'm inclined to think that this better also include properly
centralizing a good portion of the EXEC_BACKEND code. Building
infrastructure that's supposed to be long-lived with that being left as
is just seems like recipe for yet another odd framework.


> @@ -538,11 +539,11 @@ static void ShmemBackendArrayAdd(Backend *bn);
>  static void ShmemBackendArrayRemove(Backend *bn);
>  #endif   /* EXEC_BACKEND 
> */
>
> -#define StartupDataBase()StartChildProcess(StartupProcess)
> -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> -#define StartCheckpointer()  StartChildProcess(CheckpointerProcess)
> -#define StartWalWriter() StartChildProcess(WalWriterProcess)
> -#define StartWalReceiver()   StartChildProcess(WalReceiverProcess)
> +#define StartupDataBase()StartChildProcess(StartupFork)
> +#define StartBackgroundWriter() StartChildProcess(BgWriterFork)
> +#define StartCheckpointer()  StartChildProcess(CheckpointerFork)
> +#define StartWalWriter() StartChildProcess(WalWriterFork)
> +#define StartWalReceiver()   StartChildProcess(WalReceiverFork)

FWIW, I'd just rip this out, it adds exactly nothing but one more place
to change.


> +/*
> + * PrepAuxProcessFork
> + *
> + * Prepare a ForkProcType struct for the auxiliary process specified by

ForkProcData, not ForkProcType


> + * AuxProcType. This does all prep related to av parameters and error 
> messages.
> + */
> +static void
> +PrepAuxProcessFork(ForkProcData *aux_fork)
> +{
> + int ac = 0;
> +
> + /*
> +  * Set up command-line arguments for subprocess
> +  */
> + aux_fork->av[ac++] = pstrdup("postgres");

There's no documentation as to what 'av' is actually supposed to mean. I
assume auxvector or such, but if so that's neither obvious, nor
necessarily good.


> +#ifdef EXEC_BACKEND
> + aux_fork->av[ac++] = pstrdup("--forkboot");
> + aux_fork->av[ac++] = NULL;  /* filled in by 
> postmaster_forkexec */
> +#endif

What's the point of pstrdup()ing constant strings here? Afaict you're
not freeing them ever?


> + aux_fork->av[ac++] = psprintf("-x%d", MyForkProcType);
> +
> + aux_fork->av[ac] = NULL;
> + Assert(ac < lengthof(*aux_fork->av));

Unless I miss something, the lengthof here doesn't do anythign
useful. ForkProcData->av is defined as:

> +typedef struct ForkProcData
> +{
> +   char   **av;

which I think means you're computing sizeof(char*)/sizeof(char)?



> + aux_fork->ac = ac;
> + switch (MyForkProcType)
> + {
> + case StartupProcess:
> + aux_fork->type_desc = pstrdup("startup");
> + break;
> + case BgWriterProcess:
> + aux_

Re: dropdb --force

2019-10-03 Thread vignesh C
On Wed, Oct 2, 2019 at 10:21 PM Pavel Stehule  wrote:
>
> Thank you for careful review. I hope so all issues are out.
>
>
Thanks Pavel for fixing the comments.
Few comments:
The else part cannot be hit in DropDatabase function as gram.y expects FORCE.
+
+ if (strcmp(opt->defname, "force") == 0)
+ force = true;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
+ parser_errposition(pstate, opt->location)));
+ }
+

We should change gram.y to accept any keyword and throw error from
DropDatabase function.
+ */
+drop_option: FORCE
+ {
+ $$ = makeDefElem("force", NULL, @1);
+ }
+ ;

"This will also fail" should be "This will fail"
+ 
+  This will fail, if current user has no permissions to terminate other
+  connections. Required permissions are the same as with
+  pg_terminate_backend, described
+  in .
+
+  This will also fail if we are not able to terminate connections or
+  when there are active prepared transactions or active logical replication
+  slots.
+ 

Can we add few tests for this feature.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Auxiliary Processes and MyAuxProc

2019-10-03 Thread Mike Palmiotto
On Thu, Oct 3, 2019 at 1:31 PM Andres Freund  wrote:
>
> Hi,
>
> On 2019-09-30 15:13:18 -0400, Mike Palmiotto wrote:
> > Attached is the reworked and rebased patch set. I put the hook on top
> > and a separate commit for each process type. Note that avworker and
> > avlauncher were intentionally left together. Let me know if you think
> > those should be split out as well.
>
> I've not looked at this before, nor the thread. Just jumping here
> because of the reference to this patch you made in another thread...
>
> In short, I think this is far from ready, nor am I sure this is going in
> the right direction. There's a bit more detail about where I think this
> ought to go interspersed below (in particular search for PgSubProcess,
> EXEC_BACKEND).

Great, thanks for the review!

>
>
>
> > From 205ea86dde898f7edac327d46b2b43b04721aadc Mon Sep 17 00:00:00 2001
> > From: Mike Palmiotto 
> > Date: Mon, 30 Sep 2019 14:42:53 -0400
> > Subject: [PATCH 7/8] Add backends to process centralization
>
> .oO(out of order attached patches, how fun). 7, 8, 6, 5, 4, 2, 3, 1...

Yeah, hm. Not really sure why/how that happened.

>
>
>
> > From 2a3a35f2e80cb2badcb0efbce1bad2484e364b7b Mon Sep 17 00:00:00 2001
> > From: Mike Palmiotto 
> > Date: Fri, 27 Sep 2019 12:28:19 -0400
> > Subject: [PATCH 1/8] Add ForkProcType infrastructure
>
> A better explanation would be nice. It's hard to review non-trivial
> patches that have little explanation as to what they're
> achieving. Especially when there's some unrelated seeming changes.

I should have maintained the initial patch description from the last
cut -- there is a lot more detail there. I will pull that back in
after addressing the other concerns. :)

>
>
> > diff --git a/src/backend/bootstrap/bootstrap.c 
> > b/src/backend/bootstrap/bootstrap.c
> > index 9238fbe98d..9f3dad1c6d 100644
> > --- a/src/backend/bootstrap/bootstrap.c
> > +++ b/src/backend/bootstrap/bootstrap.c
> > @@ -70,6 +70,7 @@ static void cleanup(void);
> >   */
> >
> >  AuxProcType MyAuxProcType = NotAnAuxProcess; /* declared in miscadmin.h */
> > +ForkProcType MyForkProcType = NoForkProcess; /* declared in fork_process.h 
> > */
> >
> >  Relation boot_reldesc;   /* current relation descriptor
> >  */
>
> IMO, before this stuff gets massaged meaningfully, we ought to move code
> like this (including AuxiliaryProcessMain etc) somewhere where it makes
> sense. If we're going to rejigger things, it ought to be towards a more
> understandable architecture, rather than keeping weird quirks around.

Sounds good. I'll address this in the next round.

>
> I'm inclined to think that this better also include properly
> centralizing a good portion of the EXEC_BACKEND code. Building
> infrastructure that's supposed to be long-lived with that being left as
> is just seems like recipe for yet another odd framework.

+1

>
>
> > @@ -538,11 +539,11 @@ static void ShmemBackendArrayAdd(Backend *bn);
> >  static void ShmemBackendArrayRemove(Backend *bn);
> >  #endif   /* 
> > EXEC_BACKEND */
> >
> > -#define StartupDataBase()StartChildProcess(StartupProcess)
> > -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> > -#define StartCheckpointer()  StartChildProcess(CheckpointerProcess)
> > -#define StartWalWriter() StartChildProcess(WalWriterProcess)
> > -#define StartWalReceiver()   StartChildProcess(WalReceiverProcess)
> > +#define StartupDataBase()StartChildProcess(StartupFork)
> > +#define StartBackgroundWriter() StartChildProcess(BgWriterFork)
> > +#define StartCheckpointer()  StartChildProcess(CheckpointerFork)
> > +#define StartWalWriter() StartChildProcess(WalWriterFork)
> > +#define StartWalReceiver()   StartChildProcess(WalReceiverFork)
>
> FWIW, I'd just rip this out, it adds exactly nothing but one more place
> to change.

+1

>
>
> > +/*
> > + * PrepAuxProcessFork
> > + *
> > + * Prepare a ForkProcType struct for the auxiliary process specified by
>
> ForkProcData, not ForkProcType
>
>
> > + * AuxProcType. This does all prep related to av parameters and error 
> > messages.
> > + */
> > +static void
> > +PrepAuxProcessFork(ForkProcData *aux_fork)
> > +{
> > + int ac = 0;
> > +
> > + /*
> > +  * Set up command-line arguments for subprocess
> > +  */
> > + aux_fork->av[ac++] = pstrdup("postgres");
>
> There's no documentation as to what 'av' is actually supposed to mean. I
> assume auxvector or such, but if so that's neither obvious, nor
> necessarily good.

Got it, better variable names here.

>
>
> > +#ifdef EXEC_BACKEND
> > + aux_fork->av[ac++] = pstrdup("--forkboot");
> > + aux_fork->av[ac++] = NULL;  /* filled in by 
> > postmaster_forkexec */
> > +#endif
>
> What's the point of pstrdup()ing constant strings here? Afaict you're
> not freeing them ever?

The idea was supposed to be that the

Re: Wrong results using initcap() with non normalized string

2019-10-03 Thread Juan José Santamaría Flecha
On Sun, Sep 29, 2019 at 3:38 AM Alvaro Herrera  wrote:
>
> The UTF8 bits looks reasonable to me.  I guess the other part of that
> question is whether we support any other multibyte encoding that
> supports combining characters.  Maybe for cases other than UTF8 we can
> test for 0-width chars (using pg_encoding_dsplen() perhaps?) and drive
> the upper/lower decision off that?  (For the UTF8 case, I don't know if
> Juanjo's proposal is better than pg_encoding_dsplen.  Both seem to boil
> down to a bsearch, though unicode_norm.c's table seems much larger than
> wchar.c's).
>

Using pg_encoding_dsplen() looks like the way to go. The normalizarion
logic included in ucs_wcwidth() already does what is need to avoid the
issue, so there is no need to use unicode_norm_table.h. UTF8 is the
only multibyte encoding that can return a 0-width dsplen, so this
approach would also works for all the other encodings that do not use
combining characters.

Please find attached a patch with this approach.

Regards,

Juan José Santamaría Flecha
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index f7175df..8af62d2 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1947,7 +1947,10 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 wchar_t*workspace;
 size_t		curr_char;
 size_t		result_size;
+int			encoding;
+char		wdsplen[MAX_MULTIBYTE_CHAR_LEN];
 
+encoding = GetDatabaseEncoding();
 /* Overflow paranoia */
 if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
 	ereport(ERROR,
@@ -1968,7 +1971,9 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 			workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
 		else
 			workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-		wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
+		wchar2char(wdsplen, &workspace[curr_char], MAX_MULTIBYTE_CHAR_LEN, mylocale);
+		if (pg_encoding_dsplen(encoding, wdsplen) != 0)
+			wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
 	}
 	else
 #endif
@@ -1977,7 +1982,9 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 			workspace[curr_char] = towlower(workspace[curr_char]);
 		else
 			workspace[curr_char] = towupper(workspace[curr_char]);
-		wasalnum = iswalnum(workspace[curr_char]);
+		wchar2char(wdsplen, &workspace[curr_char], MAX_MULTIBYTE_CHAR_LEN, mylocale);
+		if (pg_encoding_dsplen(encoding, wdsplen) != 0)
+			wasalnum = iswalnum(workspace[curr_char]);
 	}
 }
 


Re: Auxiliary Processes and MyAuxProc

2019-10-03 Thread Andres Freund
Hi,

On 2019-10-03 14:33:26 -0400, Mike Palmiotto wrote:
> > > +#ifdef EXEC_BACKEND
> > > + aux_fork->av[ac++] = pstrdup("--forkboot");
> > > + aux_fork->av[ac++] = NULL;  /* filled in by 
> > > postmaster_forkexec */
> > > +#endif
> >
> > What's the point of pstrdup()ing constant strings here? Afaict you're
> > not freeing them ever?
>
> The idea was supposed to be that the prep work lives up until the fork
> happens and then is freed en masse with the rest of the MemoryContext.
> Perhaps there was some oversight here. I'll revisit this and others in
> the next pass.

Well, two things:
1) That'd *NEVER* be more efficient that just referencing constant
   strings. Unless you pfree() the variable, and there sometimes can be
   constant, and sometimes non-constant strings, there is simply no
   reason to ever pstrdup a constant string.
2) Which context are you talking about? Are you thinking of? A forked
   process doing MemoryContextDelete(PostmasterContext); doesn't free
   that memory in postmaster.


> > > @@ -58,4 +58,8 @@ extern pid_t fork_process(void);
> > >  extern pid_t postmaster_forkexec(ForkProcData *fork_data);
> > >  #endif
> > >
> > > +/* Hook for plugins to get control after a successful fork_process() */
> > > +typedef void (*ForkProcess_post_hook_type) ();
> > > +extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook;
> > > +
> > >  #endif   /* 
> > > FORK_PROCESS_H */
> > > --
> > > 2.23.0
> > >
> >
> > Why do we want libraries to allow to hook into processes like the
> > startup process etc?
>
> There are a number of OS-level process manipulations that this could
> afford you as an extension developer. For instance, you could roll
> your own seccomp implementation (to limit syscalls per-process-type,
> perhaps?), perform some setcap magic, or some other security-related
> magic.

Color me unconvinced.

Greetings,

Andres Freund




consider including server_version in explain(settings)

2019-10-03 Thread Justin Pryzby
explain(SETTINGS) was implemented to show relevant settings for which an odd
value could affect a query but could be forgotten during troubleshooting.

This is a "concept" patch to show the version, which is frequently requested on
-performance list and other support requests.  If someone sends
explain(settings), they don't need to also (remember to) send the version..

postgres=# explain(settings)SELECT;
 Result  (cost=0.00..0.01 rows=1 width=0)
 Settings: server_version_num = '13', work_mem = '128MB'

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 85ca2b3..2edc83c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3143,7 +3143,7 @@ static struct config_int ConfigureNamesInt[] =
{"server_version_num", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the server version as an integer."),
NULL,
-   GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+   GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_EXPLAIN
},
&server_version_num,
PG_VERSION_NUM, PG_VERSION_NUM, PG_VERSION_NUM,
@@ -8955,7 +8955,7 @@ get_explain_guc_options(int *num)
}
 
/* skip GUC variables that match the built-in default */
-   if (!modified)
+   if (!modified && strcmp(conf->name, "server_version_num"))
continue;
 
/* assign to the values array */




Re: consider including server_version in explain(settings)

2019-10-03 Thread Tom Lane
Justin Pryzby  writes:
> This is a "concept" patch to show the version, which is frequently requested 
> on
> -performance list and other support requests.  If someone sends
> explain(settings), they don't need to also (remember to) send the version..

I'm not really on board with the proposal at all here; I think it'll
be useless clutter most of the time.  I do not agree with the position
that the only use-case for explain(settings) is performance trouble
reports.  Moreover, if we start including fixed settings then where
do we stop?  People might also want "pg_config" output for example,
and that's surely not reasonable to include in EXPLAIN.

Independently of that, however:
 
>   /* skip GUC variables that match the built-in default */
> - if (!modified)
> + if (!modified && strcmp(conf->name, "server_version_num"))
>   continue;

This is both horribly contorted logic (it could at least do with a
comment) and against project coding conventions (do not use the result
of strcmp() as if it were a boolean).

regards, tom lane




Re: fairywren failures

2019-10-03 Thread Tom Lane
Andres Freund  writes:
> * It's certainly curious that the failures so far only have happended as
>   part of pg_upgradeCheck, rather than the plain regression tests.

Isn't it though.  We spent a long time wondering why we saw parallel
plan instability mostly in pg_upgradeCheck, too [1].  We eventually
decided that the cause of that instability was chance timing collisions
with bgwriter/checkpointer, but nobody ever really explained why
pg_upgradeCheck should be more prone to hit those windows than the plain
tests are.  I feel like there's something still to be understood there.

Whether this is related, who's to say.  But given your thought about
stack alignment, I'm half thinking that the crash is seen when we get a
signal (e.g. SIGUSR1 from sinval processing) at the wrong time, allowing
the stack to become unaligned, and that the still-unexplained timing
difference in pg_upgradeCheck accounts for that test being more prone to
show it.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/20190605050037.ga33...@rfd.leadboat.com




Re: Value of Transparent Data Encryption (TDE)

2019-10-03 Thread David Fetter
On Thu, Oct 03, 2019 at 10:26:15AM -0400, Robert Haas wrote:
> On Tue, Oct 1, 2019 at 12:19 PM Bruce Momjian  wrote:
> > Just to give more detail.  Initially, there was a desire to store
> > keys in only one place, either in the file system or in database
> > tables.  However, it became clear that the needs of booting the
> > server and crash recovery required file system keys, and
> > per-user/db keys were best done at the SQL level, so that indexing
> > can be used, and logical dumps contain the locked keys.  SQL-level
> > storage allows databases to be completely independent of other
> > databases in terms of key storage and usage.
> 
> Wait, we're going to store the encryption keys with the database?

Encryption keys are fine there so long as decryption keys are
separate.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-10-03 Thread Alexander Korotkov
On Tue, Oct 1, 2019 at 5:55 AM Alexander Korotkov
 wrote:
> On Mon, Sep 30, 2019 at 10:54 PM Peter Geoghegan  wrote:
> >
> > On Sun, Sep 29, 2019 at 8:12 AM Alexander Korotkov
> >  wrote:
> > > I just managed to reproduce this using two sessions on master branch.
> > >
> > > session 1
> > > session 2
> >
> > Was the involvement of the pending list stuff in Chen's example just a
> > coincidence? Can you recreate the problem while eliminating that
> > factor (i.e. while setting fastupdate to off)?
> >
> > Chen's example involved an INSERT that deadlocked against VACUUM --
> > not a SELECT. Is this just a coincidence?
>
> Chen wrote.
>
> > Unfortunately the insert process(run by gcore) held no lwlock, it should be 
> > another process(we did not fetch core file) that hold the lwlock needed for 
> > autovacuum process.
>
> So, he catched backtrace for INSERT and post it for information.  But
> since INSERT has no lwlocks held, it couldn't participate deadlock.
> It was just side waiter.
>
> I've rerun my reproduction case and it still deadlocks.  Just the same
> steps but GIN index with (fastupdate = off).

BTW, while trying to revise README I found another bug.  It appears to
be possible to reach deleted page from downlink.  The reproduction
case is following.

session 1
session 2

# create table tmp (ar int[]) with (autovacuum_enabled = false);
# insert into tmp (select '{1}' from generate_series(1,1000) i);
# insert into tmp values ('{1,2}');
# insert into tmp (select '{1}' from generate_series(1,1000) i);
# create index tmp_idx on tmp using gin(ar);

# delete from tmp;

# set max_parallel_workers_per_gather = 0;
/* Breakpoint where entyLoadMoreItems() calls ginFindLeafPage() to
search GIN posting tree  */
gdb> b ginget.c:682
gdb> select * from tmp where ar @> '{1,2}';
gdb> /* step till ReleaseAndReadBuffer() releases a buffer */

# vacuum tmp;

# continue

It also appears that previous version of deadlock fix didn't supply
left sibling to leftmost child of any page.  As result, internal pages
were never deleted.  The first attached patch is revised fix is
attached.

The second patch fix traversing to deleted page using downlink.
Similarly to nbtree, we just always move right if landed on deleted
page.  Also, it appears that we clear all other flags while marking
page as deleted.  That cause assert to fire.  With patch, we just add
deleted flag without erasing others.  Also, I have to remove assert
that ginStepRight() never steps to deleted page.  If we landed to
deleted page from downlink, then we can find other deleted page by
rightlink.

I'm planning to continue work on README, comments and commit messages.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-gin_ginDeletePage_ginStepRight_deadlock_fix-2.patch
Description: Binary data


0002-gin-fix-traversing-to-deleted-page-by-downlink-2.patch
Description: Binary data


Re: allocation limit for encoding conversion

2019-10-03 Thread Tom Lane
I wrote:
> [ v2-0001-cope-with-large-encoding-conversions.patch ]
> [ v2-0002-actually-recover-space-in-repalloc.patch ]

I've pushed these patches (after some more review and cosmetic
adjustments) and marked the CF entry closed.  Andres is welcome
to see if he can improve the situation further.

regards, tom lane




Re: allocation limit for encoding conversion

2019-10-03 Thread Alvaro Herrera
On 2019-Oct-03, Tom Lane wrote:

> I wrote:
> > [ v2-0001-cope-with-large-encoding-conversions.patch ]
> > [ v2-0002-actually-recover-space-in-repalloc.patch ]
> 
> I've pushed these patches (after some more review and cosmetic
> adjustments) and marked the CF entry closed.  Andres is welcome
> to see if he can improve the situation further.

Many thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Robert Haas
On Thu, Oct 3, 2019 at 1:29 PM Stephen Frost  wrote:
> I don't think I was arguing specifically about VM/FSM in particular but
> rather about things which, for us, are cluster level.  Admittedly, some
> other database systems put more things into tablespaces or databases
> than we do (it'd sure be nice if we did in some cases too, but we
> don't...), but they do also have things *outside* of those, such that
> you can at least bring the system up, to some extent, even if you can't
> access a given tablespace or database.

It sounds like you're making this up as you go along. The security
ramifications of encrypting a file don't depend on whether that file
is database-level or cluster-level, but rather on whether the contents
could be useful to an attacker. It doesn't seem like it would require
much work at all to construct an argument that a hacker might enjoy
having unfettered access to pg_clog even if no other part of the
database can be read.

My perspective on this feature is, and has always been, that there are
two different things somebody might want, both of which we seem to be
calling "TDE." One is to encrypt every single data page in the cluster
(and possibly things other than data pages, but at least those) with a
single encryption key, much as filesystem encryption would do, but
internal to the database. Contrary to your assertions, such a solution
has useful properties. One is that it will work the same way on any
system where PostgreSQL runs, whereas filesystem encryption solutions
vary. Another is that it does not require the cooperation of the
person who has root in order to set up. A third is that someone with
access to the system does not have automatic and unfettered access to
the database's data; sure, they can get it with enough work, but it's
significantly harder to finish the encryption keys out of the memory
space of a running process than to tar up the data directory that the
filesystem has already decrypted for you. I would personally not care
about any of this based on my own background as somebody who generally
had to do set up systems from scratch, starting with buying the
hardware, but in enterprise and government environments they can pose
significant problems.

The other thing people sometimes want is to encrypt some of the data
within the database but not all of it. In my view, trying to implement
this is not a great idea, because it's vastly more complicated than
just encrypting everything with one key. Would I like to have the
feature? Sure. Do I expect that we're going to get that feature any
time soon? Nope. Even the thing I described in the previous paragraph,
as limited as it is, is complicated and could take several release
cycles to get into committable shape. Fine-grained encryption is
probably an order of magnitude more complicated. The problem of
figuring out which keys apply to which objects does not seem to have
any reasonably simple solution, assuming you want something that's
neither insecure nor a badly-done hack.

I am unsure what the thought process is among people, such as
yourself, who are arguing that fine-grained encryption is the only way
to go. It seems like you're determined to refuse a free Honda Civic on
the grounds that it's not a Cadillac. It's not even like accepting the
patch for the Honda Civic solution would some how block accepting the
Cadillac if that shows up later. It wouldn't. It would just mean that,
unless or until that patch shows up, we'd have something rather than
nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Oct 3, 2019 at 1:29 PM Stephen Frost  wrote:
> > I don't think I was arguing specifically about VM/FSM in particular but
> > rather about things which, for us, are cluster level.  Admittedly, some
> > other database systems put more things into tablespaces or databases
> > than we do (it'd sure be nice if we did in some cases too, but we
> > don't...), but they do also have things *outside* of those, such that
> > you can at least bring the system up, to some extent, even if you can't
> > access a given tablespace or database.
> 
> It sounds like you're making this up as you go along. 

I'm not surprised, and I doubt that's really got much to do with the
actual topic.

> The security
> ramifications of encrypting a file don't depend on whether that file
> is database-level or cluster-level, but rather on whether the contents
> could be useful to an attacker.

I don't believe that I claimed otherwise.  I agree with this.

> It doesn't seem like it would require
> much work at all to construct an argument that a hacker might enjoy
> having unfettered access to pg_clog even if no other part of the
> database can be read.

The question isn't about what hackers would like to have access to, it's
about what would actually provide them with a channel to get information
that's sensitive, and at what rate.  Perhaps there's an argument to be
made that clog would provide a high enough rate of information that
could be used to glean sensitive information, but that's certainly not
an argument that's been put forth, instead it's the knee-jerk reaction
of "oh goodness, if anything isn't encrypted then hackers will be able
to get access to everything" and that's just not a real argument.

> My perspective on this feature is, and has always been, that there are
> two different things somebody might want, both of which we seem to be
> calling "TDE." One is to encrypt every single data page in the cluster
> (and possibly things other than data pages, but at least those) with a
> single encryption key, much as filesystem encryption would do, but
> internal to the database. 

Making it all up as I go along notwithstanding, I did go look at other
database systems which I considered on-par with PG, shared that
information here, and am basing my comments on that review.

Which database systems have you looked at which have the properties
you're describing above that we should be working hard towards?

> The other thing people sometimes want is to encrypt some of the data
> within the database but not all of it. In my view, trying to implement
> this is not a great idea, because it's vastly more complicated than
> just encrypting everything with one key. 

Which database systems that you'd consider to be on-par with PG, and
which do have TDE, don't have some mechanism for supporting multiple
keys and for encrypting only a subset of the data?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Partitioning versus autovacuum

2019-10-03 Thread Amit Langote
Hi Greg,

On Tue, Oct 1, 2019 at 4:03 AM Greg Stark  wrote:
>
> Actually -- I'm sorry to followup to myself (twice) -- but that's
> wrong. That Todo item predates the modern partitioning code. It came
> from when the partitioned statistics were added for inheritance trees.
> The resulting comment almost doesn't make sense any more since it
> talks about updates to the parent table and treats them as distinct
> from updates to the children.
>
> In any case it's actually not true any more as updates to the parent
> table aren't even tracked any more -- see below. My modest proposal is
> that we should count any updates that arrive through the parent table
> as mods for both the parent and child.

Yeah, we need to teach autovacuum to consider analyzing partitioned
tables.  That is still a TODO for declarative partitioning.

We do need to weigh the trade-offs here.  In the thread quoted in your
previous email, Tom expresses a concern [1] about ending up doing
excessive work, because partitions would be scanned twice -- first to
collect their own statistics and then to collect the parent's when the
parent table is analyzed.  Maybe if we find a way to calculate
parent's stats from the partitions' stats without scanning the
partitions, that would be great.

Another thing to consider is that users now (as of v11) have the
option of using partitionwise plans.  Consider joining two huge
partitioned tables.  If they are identically partitioned, Postgres
planner considers joining pairs of matching partitions and appending
the outputs of these smaller joins.  In this case, even if the
non-partitionwise join couldn't use hash join, individual smaller
joins could, because partition stats would be up to date.  The
requirements that the tables being joined be identically partitioned
(or be partitioned at all) might be a bit too restrictive though.

> A more ambitious proposal would have updates to the children also
> count against the parent somehow but I'm not sure exactly how. And I'm
> not sure we shouldn't be updating the parent statistics whenever we
> run analyze on a child anyways but again I'm not sure how.

As I mentioned above, we could try to figure out a way to "merge" the
individual partitions' statistics when they're refreshed into the
parent's stats.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/489.1276114285%40sss.pgh.pa.us




Re: Change atoi to strtol in same place

2019-10-03 Thread Joe Nelson
Kyotaro Horiguchi wrote:
> > pg_standby: -k keepfiles could not parse 'hoge' as integer
>
> I didn't checked closely, but -k of pg_standby's message looks
> somewhat strange. Needs a separator?

Good point, how about this:

pg_standby: -k keepfiles: 

> Building a sentense just concatenating multiple nonindependent
> (or incomplete) subphrases makes translation harder.

I could have pg_strtoint64_range() wrap its error messages in _() so
that translators could customize the messages prior to concatenation.

*error = psprintf(_("could not parse '%s' as integer"), str);

Would this suffice?




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-03 Thread Amit Kapila
On Wed, Oct 2, 2019 at 9:16 PM Tom Lane  wrote:

> Joe Nelson  writes:
> > Isaac Morland wrote:
> >> I hope you'll forgive a noob question. Why does the "After"
> >> initialization for the boolean array have {0} rather than {false}?
>
> > I think using a value other than {0} potentially gives the incorrect
> > impression that the value is used for *all* elements of the
> > array/structure, whereas it is only used for the first element.
>
> There's been something vaguely bothering me about this proposal,
> and I think you just crystallized it.
>
> > Using {false} may encourage the unwary to try
> >   bool foo[2] = {true};
> > which will not set all elements to true.
>
> Right.  I think that in general it's bad practice for an initializer
> to not specify all fields/elements of the target.  It is okay in the
> specific case that we're substituting for a memset(..., 0, ...).
> Perhaps we could make this explicit by using a coding style like
>
> /* in c.h or some such place: */
> #define INIT_ALL_ZEROES  {0}
>
> /* in code: */
> Datum values[N] = INIT_ALL_ZEROES;
>

This is a good idea, but by reading the thread it is not completely clear
if we want to pursue this or want to explore something else or leave the
current code as it is.  Also, if we want to pursue, do we want to
use INIT_ALL_ZEROES for bool arrays as well?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-03 Thread Smith, Peter
From: Amit Kapila  Sent: Friday, 4 October 2019 1:32 PM

>  Also, if we want to pursue, do we want to use INIT_ALL_ZEROES for bool 
>arrays as well?

FYI - In case it went unnoticed - my last patch addresses this by defining 2 
macros:

#define INIT_ALL_ELEMS_ZERO {0}
#define INIT_ALL_ELEMS_FALSE{false}

Kind Regards
--
Peter Smith
Fujitsu Australia


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-03 Thread Tom Lane
"Smith, Peter"  writes:
> From: Amit Kapila  Sent: Friday, 4 October 2019 1:32 
> PM
>>   Also, if we want to pursue, do we want to use INIT_ALL_ZEROES for bool 
>> arrays as well?

> FYI - In case it went unnoticed - my last patch addresses this by defining 2 
> macros:

> #define INIT_ALL_ELEMS_ZERO   {0}
> #define INIT_ALL_ELEMS_FALSE  {false}

I would say that's 100% wrong.  The entire point here is that it's
memset-equivalent, and therefore writes zeroes, regardless of what
the datatype is.  As a counterexample, the coding you have above
looks a lot like it would work to add

#define INIT_ALL_ELEMS_TRUE {true}

which as previously noted will *not* work.  So I think the
one-size-fits-all approach is what to use.

regards, tom lane




Re: [HACKERS] proposal: schema variables

2019-10-03 Thread Pavel Stehule
Hi

so 10. 8. 2019 v 9:10 odesílatel Pavel Stehule 
napsal:

> Hi
>
> just rebase
>

fresh rebase

Regards

Pavel


> Regards
>
> Pavel
>


schema-variables-20191004.patch.gz
Description: application/gzip


Re: Memory Accounting

2019-10-03 Thread Tom Lane
So ... why exactly did this patch define MemoryContextData.mem_allocated
as int64?  That seems to me to be doubly wrong: it is not the right width
on 32-bit machines, and it is not the right signedness anywhere.  I think
that field ought to be of type Size (a/k/a size_t, but memnodes.h always
calls it Size).

I let this pass when the patch went in, but now I'm on the warpath
about it, because since c477f3e449 went in, some of the 32-bit buildfarm
members are failing with

2019-10-04 00:41:56.569 CEST [66916:86] pg_regress/_int LOG:  statement: CREATE 
INDEX text_idx on test__int using gist ( a gist__int_ops );
TRAP: FailedAssertion("total_allocated == context->mem_allocated", File: 
"aset.c", Line: 1533)
2019-10-04 00:42:25.505 CEST [63836:11] LOG:  server process (PID 66916) was 
terminated by signal 6: Abort trap
2019-10-04 00:42:25.505 CEST [63836:12] DETAIL:  Failed process was running: 
CREATE INDEX text_idx on test__int using gist ( a gist__int_ops );

What I think is happening is that c477f3e449 allowed this bit in
AllocSetRealloc:

context->mem_allocated += blksize - oldblksize;

to be executed in situations where blksize < oldblksize, where before
that was not possible.  Of course blksize and oldblksize are of type
Size, hence unsigned, so the subtraction result underflows in this
case.  If mem_allocated is of the same width as Size then this does
not matter because the final result wraps around to the proper value,
but if we're going to allow mem_allocated to be wider than Size then
we will need more logic here to add or subtract as appropriate.

(I'm not quite sure why we're not seeing this failure on *all* the
32-bit machines; maybe there's some other factor involved?)

I see no value in defining mem_allocated to be wider than Size.
Yes, the C standard contemplates the possibility that the total
available address space is larger than the largest chunk you can
ever ask malloc for; but nobody has built a platform like that in
this century, and they sucked to program on back in the dark ages
when they did exist.  (I speak from experience.)  I do not think
we need to design Postgres to allow for that.

Likewise, there's no evident value in allowing mem_allocated
to be negative.

I haven't chased down exactly what else would need to change.
It might be that s/int64/Size/g throughout the patch is
sufficient, but I haven't analyzed it.

regards, tom lane




Re: Regarding extension

2019-10-03 Thread Natarajan R
On Thu, 3 Oct 2019 at 20:54, Tomas Vondra 
wrote:

> On Thu, Oct 03, 2019 at 07:51:04PM +0530, Natarajan R wrote:
> >Thanks for your response Euler.
> >
> >1)
> >"id" i meant by database id
> >
> >I make my question simple, " during pg_init i want to get databaseid's in
> >which my extension is installed... "
> >1. by using pg_database and pg_extension catalogs
> >2. if there any other way, kindly suggest me.
> >
>
> Well, there's also MyDatabaseId variable, which tells you the OID of the
> current database. So you can use that, from the C code. In SQL, you can
> simply run "SELECT current_database()" or something like that.
>
> Me:  Thanks Tomas, But this is for that particular database only, I want
to get the *list of database Id's* on which my extension is installed
during *PG_INIT* itself...

>
> >2)
> >I have one sql file which will be loaded during create extension, in that
> >file only i have code for event trigger for create extension on
> >ddl_command_end event
> >My question is "When giving create extension, sql file will be loaded at
> >that time only, if that is the case, this event trigger will be invoked or
> >not? "
> >
>
> I'm not sure I understand the question. Are you asking if the event
> trigger will be invoked to notify you about creation of the extension
> containing it? I'm pretty sure that won't happen - it will be executed
> only for future CREATE EXTENSION commands.
>
> Me: Thanks Tomas, Yaah, what you said above is the way it should perform,
but this trigger has been invoked in postgres 10.0 but not in postgres
10.4.. So, i am asking any GUC or anything need to be enabled to invoke
this type of event triggers in 10.4 version tooo..


Re: [HACKERS] Block level parallel vacuum

2019-10-03 Thread Masahiko Sawada
On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar  wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada  wrote:
> >
> I have started reviewing this patch and I have some cosmetic comments.
> I will continue the review tomorrow.
>

Thank you for reviewing the patch!

> +This change adds PARALLEL option to VACUUM command that enable us to
> +perform index vacuuming and index cleanup with background
> +workers. Indivisual
>
> /s/Indivisual/Individual/

Fixed.

>
> + * parallel worker processes. Individual indexes is processed by one vacuum
> + * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the
>
> /s/Individual indexes is processed/Individual indexes are processed/
> /s/At beginning/ At the beginning

Fixed.

>
> + * parallel workers. In parallel lazy vacuum, we enter parallel mode and
> + * create the parallel context and the DSM segment before starting heap
> + * scan.
>
> Can we extend the comment to explain why we do that before starting
> the heap scan?

Added more comment.

>
> + else
> + {
> + if (for_cleanup)
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned:
> %d, requsted %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup (planned: 
> %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
> + else
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned:
> %d, requested %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
>
> Multiple places I see a lot of duplicate code for for_cleanup is true
> or false.  The only difference is in the error message whether we give
> index cleanup or index vacuuming otherwise complete code is the same
> for
> both the cases.  Can't we create some string and based on the value of
> the for_cleanup and append it in the error message that way we can
> avoid duplicating this at many places?

I think it's necessary for translation. IIUC if we construct the
message it cannot be translated.

Attached the updated patch.

Regards,

--
Masahiko Sawada
From 9eb0b5e4e010783e04882cab4e4bab5063eabc56 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Wed, 23 Jan 2019 16:07:53 +0900
Subject: [PATCH v27 2/2] Add --paralell, -P option to vacuumdb command

---
 doc/src/sgml/ref/vacuumdb.sgml| 16 +++
 src/bin/scripts/t/100_vacuumdb.pl | 10 ++-
 src/bin/scripts/vacuumdb.c| 48 ++-
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 47d93456f8..f6ac0c6e5a 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -226,6 +226,22 @@ PostgreSQL documentation
   
  
 
+ 
+  -P workers
+  --parallel=workers
+  
+   
+Execute parallel vacuum with PostgreSQL's
+workers background workers.
+   
+   
+This option will require background workers, so make sure your
+ setting is more
+than one.
+   
+  
+ 
+
  
   -q
   --quiet
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index b685b35282..8fe80719e8 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 44;
+use Test::More tests => 48;
 
 program_help_ok('vacuumdb');
 program_version_ok('vacuumdb');
@@ -48,6 +48,14 @@ $node->issues_sql_like(
 $node->command_fails(
 	[ 'vacuumdb', '--analyze-only', '--disable-page-skipping', 'postgres' ],
 	'--analyze-only and --disable-page-skipping specified together');
+$node->issues_sql_like(
+	[ 'vacuumdb', '-P2', 'postgres' ],
+	qr/statement: VACUUM \(PARALLEL 2\).*;/,
+	'vacuumdb -P2');
+$node->issues_sql_like(
+	[ 'vacuumdb', '-P', 'postgres' ],
+	qr/statement: VACUUM \(PARALLEL\).*;/,
+	'vacuumdb -P');
 $node->command_ok([qw(vacuumdb -Z --table=pg_am dbname=template1)],
 	'vacuumdb with connection string');
 
diff --g

Re: [HACKERS] Block level parallel vacuum

2019-10-03 Thread Amit Kapila
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada 
wrote:

> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila 
> wrote:
> > *
> > In function compute_parallel_workers, don't we want to cap the number
> > of workers based on maintenance_work_mem as we do in
> > plan_create_index_workers?
> >
> > The basic point is how do we want to treat maintenance_work_mem for
> > this feature.  Do we want all workers to use at max the
> > maintenance_work_mem or each worker is allowed to use
> > maintenance_work_mem?  I would prefer earlier unless we have good
> > reason to follow the later strategy.
> >
> > Accordingly, we might need to update the below paragraph in docs:
> > "Note that parallel utility commands should not consume substantially
> > more memory than equivalent non-parallel operations.  This strategy
> > differs from that of parallel query, where resource limits generally
> > apply per worker process.  Parallel utility commands treat the
> > resource limit maintenance_work_mem as a limit to
> > be applied to the entire utility command, regardless of the number of
> > parallel worker processes."
>
> I'd also prefer to use maintenance_work_mem at max during parallel
> vacuum regardless of the number of parallel workers. This is current
> implementation. In lazy vacuum the maintenance_work_mem is used to
> record itempointer of dead tuples. This is done by leader process and
> worker processes just refers them for vacuuming dead index tuples.
> Even if user sets a small amount of maintenance_work_mem the parallel
> vacuum would be helpful as it still would take a time for index
> vacuuming. So I thought we should cap the number of parallel workers
> by the number of indexes rather than maintenance_work_mem.
>
>
Isn't that true only if we never use maintenance_work_mem during index
cleanup?  However, I think we are using during index cleanup, see forex.
ginInsertCleanup.  I think before reaching any conclusion about what to do
about this, first we need to establish whether this is a problem.  If I am
correct, then only some of the index cleanups (like gin index) use
maintenance_work_mem, so we need to consider that point while designing a
solution for this.


> > *
> > + keys++;
> > +
> > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > + maxtuples = compute_max_dead_tuples(nblocks, true);
> > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> > +mul_size(sizeof(ItemPointerData), maxtuples)));
> > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> > + keys++;
> > +
> > + shm_toc_estimate_keys(&pcxt->estimator, keys);
> > +
> > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> > + querylen = strlen(debug_query_string);
> > + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
> > + shm_toc_estimate_keys(&pcxt->estimator, 1);
> >
> > The code style looks inconsistent here.  In some cases, you are
> > calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> > and in other cases, you are accumulating keys.  I think it is better
> > to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> > in all cases.
>
> Fixed. But there are some code that call shm_toc_estimate_keys for
> multiple keys in for example nbtsort.c and parallel.c. What is the
> difference?
>
>
We can do it, either way, depending on the situation.  For example, in
nbtsort.c, there is an if check based on which 'number of keys' can vary.
I think here we should try to write in a way that it should not confuse the
reader why it is done in a particular way.  This is the reason I told you
to be consistent.


> >
> > *
> > +void
> > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> > +{
> > ..
> > + /* Open table */
> > + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
> > ..
> > }
> >
> > I don't think it is a good idea to assume the lock mode as
> > ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
> > is a change in lock level for the vacuum process, we might forget to
> > update it here.  I think it is better if we can get this information
> > from the master backend.
>
> So did you mean to declare the lock mode for lazy vacuum somewhere as
> a global variable and use it in both try_relation_open in the leader
> process and relation_open in the worker process? Otherwise we would
> end up with adding something like shared->lmode =
> ShareUpdateExclusiveLock during parallel context initialization, which
> seems not to resolve your concern.
>
>
I was thinking that if we can find a way to pass the lockmode we used in
vacuum_rel, but I guess we need to pass it through multiple functions which
will be a bit inconvenient.  OTOH, today, I checked nbtsort.c
(_bt_parallel_build_main) and found that there also we are using it
directly instead of passing it from the master backend.  I think we can
leave it as you have in the patch, but add a comment on why it is okay to
use that lock mode?

-- 
With Regards,
Amit Kapila.
EnterpriseD

Re: Regarding extension

2019-10-03 Thread Tom Lane
Natarajan R  writes:
> Me:  Thanks Tomas, But this is for that particular database only, I want
> to get the *list of database Id's* on which my extension is installed
> during *PG_INIT* itself...

You can't.  In the first place, that information simply isn't obtainable,
because a session running within one database doesn't have access to the
catalogs of other databases in the cluster.  (You could perhaps imagine
firing up connections to other DBs a la dblink/postgres_fdw, but that will
fail because you won't necessarily have permissions to connect to every
database.)  In the second place, it's a pretty terrible design to be
attempting any sort of database access within _PG_init, because that
precludes loading that module outside a transaction; for example you
will not be able to preload it via shared_preload_libraries or allied
features.

We should back up about three steps and ask why you think you need
to do this.  Generally speaking, database code shouldn't be concerning
itself with what is happening in other databases.

regards, tom lane




Re: [HACKERS] Block level parallel vacuum

2019-10-03 Thread Amit Kapila
On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada 
wrote:

> On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar  wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada 
> wrote:
> >
> > + else
> > + {
> > + if (for_cleanup)
> > + {
> > + if (lps->nworkers_requested > 0)
> > + appendStringInfo(&buf,
> > + ngettext("launched %d parallel vacuum worker for index cleanup
> > (planned: %d, requested %d)",
> > +   "launched %d parallel vacuum workers for index cleanup (planned:
> > %d, requsted %d)",
> > +   lps->pcxt->nworkers_launched),
> > + lps->pcxt->nworkers_launched,
> > + lps->pcxt->nworkers,
> > + lps->nworkers_requested);
> > + else
> > + appendStringInfo(&buf,
> > + ngettext("launched %d parallel vacuum worker for index cleanup
> (planned: %d)",
> > +   "launched %d parallel vacuum workers for index cleanup (planned:
> %d)",
> > +   lps->pcxt->nworkers_launched),
> > + lps->pcxt->nworkers_launched,
> > + lps->pcxt->nworkers);
> > + }
> > + else
> > + {
> > + if (lps->nworkers_requested > 0)
> > + appendStringInfo(&buf,
> > + ngettext("launched %d parallel vacuum worker for index vacuuming
> > (planned: %d, requested %d)",
> > +   "launched %d parallel vacuum workers for index vacuuming (planned:
> > %d, requested %d)",
> > +   lps->pcxt->nworkers_launched),
> > + lps->pcxt->nworkers_launched,
> > + lps->pcxt->nworkers,
> > + lps->nworkers_requested);
> > + else
> > + appendStringInfo(&buf,
> > + ngettext("launched %d parallel vacuum worker for index vacuuming
> > (planned: %d)",
> > +   "launched %d parallel vacuum workers for index vacuuming (planned:
> %d)",
> > +   lps->pcxt->nworkers_launched),
> > + lps->pcxt->nworkers_launched,
> > + lps->pcxt->nworkers);
> > + }
> >
> > Multiple places I see a lot of duplicate code for for_cleanup is true
> > or false.  The only difference is in the error message whether we give
> > index cleanup or index vacuuming otherwise complete code is the same
> > for
> > both the cases.  Can't we create some string and based on the value of
> > the for_cleanup and append it in the error message that way we can
> > avoid duplicating this at many places?
>
> I think it's necessary for translation. IIUC if we construct the
> message it cannot be translated.
>
>
Do we really need to log all those messages?  The other places where we
launch parallel workers doesn't seem to be using such messages.  Why do you
think it is important to log the messages here when other cases don't use
it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Magnus Hagander
On Fri, Oct 4, 2019 at 3:42 AM Stephen Frost  wrote:

>
> > It doesn't seem like it would require
> > much work at all to construct an argument that a hacker might enjoy
> > having unfettered access to pg_clog even if no other part of the
> > database can be read.
>
> The question isn't about what hackers would like to have access to, it's
> about what would actually provide them with a channel to get information
> that's sensitive, and at what rate.  Perhaps there's an argument to be
> made that clog would provide a high enough rate of information that
> could be used to glean sensitive information, but that's certainly not
> an argument that's been put forth, instead it's the knee-jerk reaction
> of "oh goodness, if anything isn't encrypted then hackers will be able
> to get access to everything" and that's just not a real argument.
>

Huh. That is *exactly* the argument I made. Though granted the example was
on multixact primarily, because I think that is much more likely to leak
interesting information, but the basis certainly applies to all the
metadata.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-03 Thread Magnus Hagander
On Thu, Oct 3, 2019 at 4:40 PM Stephen Frost  wrote:

>
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian  wrote:
> > > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > > overflow).  The plan is:
> > >
> > >
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> > >
> > > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
> or
> > > other files.  Is that correct?  Do any other PGDATA files contain user
> > > data?
> >
> > As others have said, that sounds wrong to me.  I think you need to
> > encrypt everything.
>
> That isn't what other database systems do though and isn't what people
> actually asking for this feature are expecting to have or deal with.
>

Do any of said other database even *have* the equivalence of say pg_clog or
pg_multixact *stored outside their tablespaces*? (Because as long as the
data is in the tablespace, it's encrypted when using tablespace
encryption..)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-03 Thread Smith, Peter
From: Tom Lane  Sent: Friday, 4 October 2019 2:08 PM

>> #define INIT_ALL_ELEMS_ZERO  {0}
>> #define INIT_ALL_ELEMS_FALSE {false}

>I would say that's 100% wrong.  The entire point here is that it's 
>memset-equivalent, and therefore writes zeroes, regardless of what the 
>datatype is.  

I agree it is memset-equivalent.

All examples of the memset code that INIT_ALL_ELEMS_ZERO replaces looked like 
this:
memset(values, 0, sizeof(values));

Most examples of the memset code that INIT_ALL_ELEMS_FALSE replaces looked like 
this:
memset(nulls, false, sizeof(nulls));

~

I made the 2nd macro because I anticipate the same folk that don't like setting 
0 to a bool will also not like setting something called INIT_ALL_ELEMS_ZERO to 
a bool array.

How about I just define them both the same?
#define INIT_ALL_ELEMS_ZERO {0}
#define INIT_ALL_ELEMS_FALSE{0}

>As a counterexample, the coding you have above looks a lot like it would work 
>to add
>
>#define INIT_ALL_ELEMS_TRUE{true}
> which as previously noted will *not* work.  So I think the one-size-fits-all 
> approach is what to use.

I agree it looks that way; in my previous email I should have provided more 
context to the code.
Below is the full fragment of the last shared patch, which included a note to 
prevent anybody from doing such a thing.

~~

/*
 * Macros for C99 designated-initialiser syntax to set all array elements to 
0/false.
 *
 * Use these macros in preference to explicit {0} syntax to avoid giving a 
misleading
 * impression that the same value is always used for all elements.
 * e.g.
 * bool foo[2] = {false}; // sets both elements false
 * bool foo[2] = {true}; // does NOT set both elements true
 *
 * Reference: C99 [$6.7.8/21] If there are fewer initializers in a 
brace-enclosed list than there
 * are elements or members of an aggregate, or fewer characters in a string 
literal used to
 * initialize an array of known size than there are elements in the array, the 
remainder of the
 * aggregate shall be initialized implicitly the same as objects that have 
static storage duration
 */
#define INIT_ALL_ELEMS_ZERO {0}
#define INIT_ALL_ELEMS_FALSE{false} 

~~


Kind Regards,
--
Peter Smith
Fujitsu Australia


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-03 Thread Amit Kapila
On Fri, Oct 4, 2019 at 12:10 PM Smith, Peter 
wrote:

> From: Tom Lane  Sent: Friday, 4 October 2019 2:08 PM
>
> >> #define INIT_ALL_ELEMS_ZERO  {0}
> >> #define INIT_ALL_ELEMS_FALSE {false}
>
> >I would say that's 100% wrong.  The entire point here is that it's
> memset-equivalent, and therefore writes zeroes, regardless of what the
> datatype is.
>
> I agree it is memset-equivalent.
>
> All examples of the memset code that INIT_ALL_ELEMS_ZERO replaces looked
> like this:
> memset(values, 0, sizeof(values));
>
> Most examples of the memset code that INIT_ALL_ELEMS_FALSE replaces looked
> like this:
> memset(nulls, false, sizeof(nulls));
>
> ~
>
> I made the 2nd macro because I anticipate the same folk that don't like
> setting 0 to a bool will also not like setting something called
> INIT_ALL_ELEMS_ZERO to a bool array.
>
> How about I just define them both the same?
> #define INIT_ALL_ELEMS_ZERO {0}
> #define INIT_ALL_ELEMS_FALSE{0}
>
>
I think using one define would be preferred, but you can wait and see if
others prefer defining different macros for the same thing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com