Re: ICU for global collation

2019-10-31 Thread Peter Eisentraut

On 2019-09-17 15:08, Daniel Verite wrote:

When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).



fr-utf8=# select to_tsvector('été');
ERROR:  invalid multibyte character for locale
HINT:  The server's LC_CTYPE locale is probably incompatible with the
database encoding.


I looked into this problem.  The way to address this would be adding 
proper collation support to the text search subsystem.  See the TODO 
markers in src/backend/tsearch/ts_locale.c for starting points.  These 
APIs spread out to a lot of places, so it will take some time to finish. 
 In the meantime, I'm pausing this thread and will set the CF entry as RwF.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Fujii Masao
On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
>
> Fujii Masao  writes:
> > Currently CREATE OR REPLACE VIEW command fails if the column names
> > are changed.
>
> That is, I believe, intentional.  It's an effective aid to catching
> mistakes in view redefinitions, such as misaligning the new set of
> columns relative to the old.  That's particularly important given
> that we allow you to add columns during CREATE OR REPLACE VIEW.
> Consider the oversimplified case where you start with
>
> CREATE VIEW v AS SELECT 1 AS x, 2 AS y;
>
> and you want to add a column z, and you get sloppy and write
>
> CREATE OR REPLACE VIEW v AS SELECT 1 AS x, 3 AS z, 2 AS y;
>
> If we did not throw an error on this, references that formerly
> pointed to column y would now point to z (as that's still attnum 2),
> which is highly unlikely to be what you wanted.

This example makes me wonder if the addtion of column by
CREATE OR REPLACE VIEW also has the same (or even worse) issue.
That is, it may increase the oppotunity for users' mistake.
I'm thinking the case where users mistakenly added new column
into the view when replacing the view definition. This mistake can
happen because CREATE OR REPLACE VIEW allows new column to
be added. But what's the worse is that, currently there is no way to
drop the column from the view, except recreation of the view.
Neither CREATE OR REPLACE VIEW nor ALTER TABLE support
the drop of the column from the view. So, to fix the mistake,
users would need to drop the view itself and recreate it. If there are
some objects depending the view, they also might need to be recreated.
This looks not good. Since the feature has been supported,
it's too late to say that, though...

At least, the support for ALTER VIEW DROP COLUMN might be
necessary to alleviate that situation.

Regards,

-- 
Fujii Masao




Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-10-31 Thread Amit Langote
Hi Michael,

On Wed, Oct 30, 2019 at 12:11 PM Michael Paquier  wrote:
> Looks fine.  I have done some refinements as per the attached.

Thanks.  This stood out to me:

+ * The result is a structure containing all the parsed option values in
+ * text-array format.

This sentence sounds wrong, because the result structure doesn't
contain values in text-array format.  Individual values in the struct
would be in their native format (C bool for RELOPT_TYPE_BOOL, options,
etc.).

Maybe we don't need this sentence, because the first line already says
we return a struct.

> Running the regression tests of dummy_index_am has proved to be able
> to break the assertion you have added.

This breakage seems to have to do with the fact that the definition of
DummyIndexOptions struct and the entries of relopt_parse_elt table
don't agree?

These are the last two members of DummyIndexOptions struct:

int option_string_val_offset;
int option_string_null_offset;
} DummyIndexOptions;

but di_relopt_tab's last two entries are these:

add_string_reloption(di_relopt_kind, "option_string_val",
 "String option for dummy_index_am with
non-NULL default",
 "DefaultValue", &validate_string_option,
 AccessExclusiveLock);
di_relopt_tab[4].optname = "option_string_val";
di_relopt_tab[4].opttype = RELOPT_TYPE_STRING;
di_relopt_tab[4].offset = offsetof(DummyIndexOptions,
   option_string_val_offset);

/*
 * String option for dummy_index_am with NULL default, and without
 * description.
 */
add_string_reloption(di_relopt_kind, "option_string_null",
 NULL,  /* description */
 NULL, &validate_string_option,
 AccessExclusiveLock);
di_relopt_tab[5].optname = "option_string_null";
di_relopt_tab[5].opttype = RELOPT_TYPE_STRING;
di_relopt_tab[5].offset = offsetof(DummyIndexOptions,
   option_string_null_offset);

If I fix the above code like this:

@@ -113,7 +113,7 @@ create_reloptions_table(void)
  "DefaultValue", &validate_string_option,
  AccessExclusiveLock);
  di_relopt_tab[4].optname = "option_string_val";
- di_relopt_tab[4].opttype = RELOPT_TYPE_STRING;
+ di_relopt_tab[4].opttype = RELOPT_TYPE_INT;
  di_relopt_tab[4].offset = offsetof(DummyIndexOptions,
 option_string_val_offset);

@@ -126,7 +126,7 @@ create_reloptions_table(void)
  NULL, &validate_string_option,
  AccessExclusiveLock);
  di_relopt_tab[5].optname = "option_string_null";
- di_relopt_tab[5].opttype = RELOPT_TYPE_STRING;
+ di_relopt_tab[5].opttype = RELOPT_TYPE_INT;
  di_relopt_tab[5].offset = offsetof(DummyIndexOptions,
 option_string_null_offset);
 }

test passes.

But maybe this Assert isn't all that robust, so I'm happy to take it out.

> Also if some options are parsed options will never be NULL, so there
> is no need to check for it before pfree()-ing it, no?

I haven't fully read parseRelOptions(), but I will trust you. :)

Thanks,
Amit




Re: Allow cluster_name in log_line_prefix

2019-10-31 Thread Fujii Masao
On Mon, Oct 28, 2019 at 1:33 PM Craig Ringer  wrote:
>
> Hi folks
>
> I was recently surprised to notice that log_line_prefix doesn't support a 
> cluster_name placeholder. I suggest adding one. If I don't hear objections 
> I'll send a patch.

If we do this, cluster_name should be included in csvlog?

Regards,

-- 
Fujii Masao




Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-10-31 Thread Michael Paquier
On Thu, Oct 31, 2019 at 04:38:55PM +0900, Amit Langote wrote:
> On Wed, Oct 30, 2019 at 12:11 PM Michael Paquier  wrote:
> This sentence sounds wrong, because the result structure doesn't
> contain values in text-array format.  Individual values in the struct
> would be in their native format (C bool for RELOPT_TYPE_BOOL, options,
> etc.).
> 
> Maybe we don't need this sentence, because the first line already says
> we return a struct.

Let's remove it then.

> This breakage seems to have to do with the fact that the definition of
> DummyIndexOptions struct and the entries of relopt_parse_elt table
> don't agree?
> 
> These are the last two members of DummyIndexOptions struct:
> 
> @@ -126,7 +126,7 @@ create_reloptions_table(void)
>   NULL, &validate_string_option,
>   AccessExclusiveLock);
>   di_relopt_tab[5].optname = "option_string_null";
> - di_relopt_tab[5].opttype = RELOPT_TYPE_STRING;
> + di_relopt_tab[5].opttype = RELOPT_TYPE_INT;
>   di_relopt_tab[5].offset = offsetof(DummyIndexOptions,
>  option_string_null_offset);
>  }
> 
> test passes.
> 
> But maybe this Assert isn't all that robust, so I'm happy to take it out.

This one should remain a string reloption, and what's stored in the
structure is an offset to get the string.  See for example around
RelationHasCascadedCheckOption before it got switched to an enum in
773df88 regarding the way to get the value.
--
Michael


signature.asc
Description: PGP signature


Re: update ALTER TABLE with ATTACH PARTITION lock mode

2019-10-31 Thread Amit Langote
Hello,

On Tue, Oct 29, 2019 at 12:13 AM Alvaro Herrera
 wrote:
> On 2019-Oct-28, Michael Paquier wrote:
> > On Sun, Oct 27, 2019 at 07:12:07PM -0500, Justin Pryzby wrote:
> > > commit #898e5e32 (Allow ATTACH PARTITION with only 
> > > ShareUpdateExclusiveLock)
> > > updates ddl.sgml but not alter_table.sgml, which only says:
> > >
> > > https://www.postgresql.org/docs/12/release-12.html
> > > |An ACCESS EXCLUSIVE lock is held unless explicitly noted.
> >
> > + 
> > +  Attaching a partition acquires a SHARE UPDATE 
> > EXCLUSIVE
> > +  lock on the partitioned table, in addition to an
> > +  ACCESS EXCLUSIVE lock on the partition.
> > + 
> > Updating the docs of ALTER TABLE sounds like a good idea.  This
> > sentence looks fine to me.  Perhaps others have suggestions?
>
> Doesn't the command also acquire a lock on the default partition if
> there is one?  It sounds worth noting.
>
> > > Find attached patch, which also improve language in several related 
> > > places.
> >
> > Not sure that these are actually improvements.
>
> I think some of them (most?) are clear improvements.

As someone who has written some of those lines, I agree that Justin's
tweaks make them more readable, so +1 to apply 0002 patch too.

Thanks,
Amit




Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-10-31 Thread Amit Langote
On Thu, Oct 31, 2019 at 4:49 PM Michael Paquier  wrote:
> On Thu, Oct 31, 2019 at 04:38:55PM +0900, Amit Langote wrote:
> > On Wed, Oct 30, 2019 at 12:11 PM Michael Paquier  
> > wrote:
> > This sentence sounds wrong, because the result structure doesn't
> > contain values in text-array format.  Individual values in the struct
> > would be in their native format (C bool for RELOPT_TYPE_BOOL, options,
> > etc.).
> >
> > Maybe we don't need this sentence, because the first line already says
> > we return a struct.
>
> Let's remove it then.

Removed in the attached.

> > This breakage seems to have to do with the fact that the definition of
> > DummyIndexOptions struct and the entries of relopt_parse_elt table
> > don't agree?
> >
> > These are the last two members of DummyIndexOptions struct:
> >
> > @@ -126,7 +126,7 @@ create_reloptions_table(void)
> >   NULL, &validate_string_option,
> >   AccessExclusiveLock);
> >   di_relopt_tab[5].optname = "option_string_null";
> > - di_relopt_tab[5].opttype = RELOPT_TYPE_STRING;
> > + di_relopt_tab[5].opttype = RELOPT_TYPE_INT;
> >   di_relopt_tab[5].offset = offsetof(DummyIndexOptions,
> >  option_string_null_offset);
> >  }
> >
> > test passes.
> >
> > But maybe this Assert isn't all that robust, so I'm happy to take it out.
>
> This one should remain a string reloption, and what's stored in the
> structure is an offset to get the string.  See for example around
> RelationHasCascadedCheckOption before it got switched to an enum in
> 773df88 regarding the way to get the value.

I see.  I didn't know that about STRING options when writing that
Assert, so it was indeed broken to begin with.

v5 attached.

Thanks,
Amit


build_reloptions-v5.patch
Description: Binary data


Re: Problem with synchronous replication

2019-10-31 Thread Fujii Masao
On Thu, Oct 31, 2019 at 11:12 AM Michael Paquier  wrote:
>
> On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao  
> > wrote in
> >> This change causes every ending backends to always take the exclusive lock
> >> even when it's not in SyncRep queue. This may be problematic, for example,
> >> when terminating multiple backends at the same time? If yes,
> >> it might be better to check SHMQueueIsDetached() again after taking the 
> >> lock.
> >> That is,
> >
> > I'm not sure how much that harms but double-checked locking
> > (releasing) is simple enough for reducing possible congestion here, I
> > think.
>
> FWIW, I could not measure any actual difference with pgbench -C, up to
> 500 sessions and an empty input file (just have one meta-command) and
> -c 20.
>
> I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
> the patch with the suggestion from Fujii-san.  Any comments?

Thanks for the patch! Looks good to me.

Regards,

-- 
Fujii Masao




Can avoid list_copy in recomputeNamespacePath() conditionally?

2019-10-31 Thread amul sul
Hi all,

I wondered can we have a shortcut somewhat similar to following POC
in recomputeNamespacePath () when the recomputed path is the same as the
previous baseSearchPath/activeSearchPath :

== POC patch ==
diff --git a/src/backend/catalog/namespace.c
b/src/backend/catalog/namespace.c
index e251f5a9fdc..b25ef489e47 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3813,6 +3813,9 @@ recomputeNamespacePath(void)
!list_member_oid(oidlist, myTempNamespace))
oidlist = lcons_oid(myTempNamespace, oidlist);

+   /* TODO: POC */
+   if (equal(oidlist, baseSearchPath))
+   return;
/*
 * Now that we've successfully built the new list of namespace OIDs,
save
 * it in permanent storage.
== POC patch end ==

It can have two advantages as:

1. Avoid unnecessary list_copy() in TopMemoryContext context &
2. Global pointers like activeSearchPath/baseSearchPath will not change if
some
implementation end up with cascaded call to recomputeNamespacePath().

Thoughts/Comments?

Regards,
Amul


Re: MarkBufferDirtyHint() and LSN update

2019-10-31 Thread Antonin Houska
Tomas Vondra  wrote:

> On Wed, Oct 30, 2019 at 02:44:18PM +0100, Antonin Houska wrote:
> >Please consider this scenario (race conditions):
> >
> >1. FlushBuffer() has written the buffer but hasn't yet managed to clear the
> >BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now).
> >
> >2. Another backend modified a hint bit and called MarkBufferDirtyHint().
> >
> >3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true
> >(e.g. due to checksums enabled), new LSN is computed, however it's not
> >assigned to the page because the buffer is still dirty:
> >
> > if (!(buf_state & BM_DIRTY))
> > {
> > ...
> >
> > if (!XLogRecPtrIsInvalid(lsn))
> > PageSetLSN(page, lsn);
> > }
> >
> >4. MarkBufferDirtyHint() completes.
> >
> >5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
> >BM_DIRTY because MarkBufferDirtyHint() has eventually set
> >BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
> >call of FlushBuffer(). However page LSN is hasn't been updated so the
> >requirement that WAL must be flushed first is not met.
> >
> >I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any
> >subtle detail?
> >
> 
> Isn't this prevented by locking of the buffer header? Both FlushBuffer
> and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
> does a bit of work before, but that's related to BM_PERMANENT.
> 
> If there really is a race condition, it shouldn't be that difficult to
> trigger it by adding a sleep or a breakpoint, I guess.

Yes, I had verified it using gdb: inserted a row into a table, then attached
gdb to checkpointer and stopped it when FlushBuffer() was about to call
TerminateBufferIO(). Then, when scanning the table, I saw that
MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before
checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was
3553624066 ~ 0b110100010010.

With BM_DIRTY defined as

#define BM_DIRTY(1U << 23)

the flag appeared to be set.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Remove configure --disable-float4-byval and --disable-float8-byval

2019-10-31 Thread Peter Eisentraut
AFAICT, these build options were only useful to maintain compatibility 
for version-0 functions, but those are no longer supported, so these 
options can be removed.  There is a fair amount of code all over the 
place to support these options, so the cleanup is quite significant.


The current behavior became the default in PG9.3.  Note that this does 
not affect on-disk storage.  The only upgrade issue that I can see is 
that pg_upgrade refuses to upgrade incompatible clusters if you have 
contrib/isn installed.  But hopefully everyone who is affected by that 
will have upgraded at least once since PG9.2 already.


float4 is now always pass-by-value; the pass-by-reference code path is 
completely removed.


float8 and related types are now hardcoded to pass-by-value or 
pass-by-reference depending on whether the build is 64- or 32-bit, as 
was previously also the default.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e009755a160d3d67900c80c9bc17276e27f79baa Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 31 Oct 2019 09:21:38 +0100
Subject: [PATCH 1/3] Remove configure --disable-float4-byval and
 --disable-float8-byval

These build options were only useful to maintain compatibility for version-0
functions, but those are no longer supported, so these options can be
removed.

float4 is now always pass-by-value; the pass-by-reference
code path is completely removed.

float8 and related types are now hardcoded to pass-by-value or
pass-by-reference depending on whether the build is 64- or 32-bit, as
was previously also the default.
---
 configure   | 118 
 configure.in|  33 ---
 doc/src/sgml/func.sgml  |  10 --
 doc/src/sgml/installation.sgml  |  28 --
 src/backend/access/index/indexam.c  |   5 -
 src/backend/access/transam/xlog.c   |  35 ---
 src/backend/bootstrap/bootstrap.c   |   2 +-
 src/backend/catalog/genbki.pl   |   2 +-
 src/backend/commands/analyze.c  |   2 +-
 src/backend/utils/adt/numeric.c |   5 +-
 src/backend/utils/fmgr/dfmgr.c  |  18 
 src/backend/utils/fmgr/fmgr.c   |  23 +
 src/backend/utils/misc/pg_controldata.c |  18 +---
 src/bin/initdb/initdb.c |   3 -
 src/bin/pg_controldata/pg_controldata.c |   4 -
 src/bin/pg_resetwal/pg_resetwal.c   |   6 --
 src/bin/pg_upgrade/controldata.c|   7 ++
 src/include/c.h |  15 +++
 src/include/catalog/catversion.h|   2 +-
 src/include/catalog/pg_control.h|   6 +-
 src/include/catalog/pg_proc.dat |   6 +-
 src/include/catalog/pg_type.dat |   2 +-
 src/include/fmgr.h  |   4 -
 src/include/pg_config.h.in  |  15 ---
 src/include/postgres.h  |  23 +
 src/tools/msvc/Solution.pm  |  25 -
 src/tools/msvc/config_default.pl|   5 -
 27 files changed, 40 insertions(+), 382 deletions(-)

diff --git a/configure b/configure
index 6b1c779ee3..b38ad1526d 100755
--- a/configure
+++ b/configure
@@ -866,8 +866,6 @@ with_system_tzdata
 with_zlib
 with_gnu_ld
 enable_largefile
-enable_float4_byval
-enable_float8_byval
 '
   ac_precious_vars='build_alias
 host_alias
@@ -1525,8 +1523,6 @@ Optional Features:
   --enable-cassertenable assertion checks (for debugging)
   --disable-thread-safety disable thread-safety in client libraries
   --disable-largefile omit support for large files
-  --disable-float4-byval  disable float4 passed by value
-  --disable-float8-byval  disable float8 passed by value
 
 Optional Packages:
   --with-PACKAGE[=ARG]use PACKAGE [ARG=yes]
@@ -16801,120 +16797,6 @@ _ACEOF
 
 
 
-# Decide whether float4 is passed by value: user-selectable, enabled by default
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with float4 
passed by value" >&5
-$as_echo_n "checking whether to build with float4 passed by value... " >&6; }
-
-
-# Check whether --enable-float4-byval was given.
-if test "${enable_float4_byval+set}" = set; then :
-  enableval=$enable_float4_byval;
-  case $enableval in
-yes)
-
-$as_echo "#define USE_FLOAT4_BYVAL 1" >>confdefs.h
-
-   float4passbyval=true
-  ;;
-no)
-  float4passbyval=false
-  ;;
-*)
-  as_fn_error $? "no argument expected for --enable-float4-byval option" 
"$LINENO" 5
-  ;;
-  esac
-
-else
-  enable_float4_byval=yes
-
-$as_echo "#define USE_FLOAT4_BYVAL 1" >>confdefs.h
-
-   float4passbyval=true
-fi
-
-
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $enable_float4_byval" >&5
-$as_echo "$enable_float4_byval" >&6; }
-
-cat >>confdefs.h <<_ACEOF
-#define FLOAT4PASSBYVAL $float4passbyval
-_ACEOF
-
-
-# Decide whether float8 is passed by value.
-# Note: this setting also controls int8 and related types such as 

Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-10-31 Thread Michael Paquier
On Thu, Oct 31, 2019 at 05:18:46PM +0900, Amit Langote wrote:
> On Thu, Oct 31, 2019 at 4:49 PM Michael Paquier  wrote:
>> Let's remove it then.
> 
> Removed in the attached.

Thanks.  I exactly did the same thing on my local branch.
--
Michael


signature.asc
Description: PGP signature


Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)

2019-10-31 Thread Michael Paquier
On Mon, Oct 28, 2019 at 10:56:33PM -0500, Justin Pryzby wrote:
> I suppose it should something other than partition(ed), since partitions can 
> be
> partitioned, too...
> 
>   Attaching a partition acquires a SHARE UPDATE 
> EXCLUSIVE
>   lock on the parent table, in addition to
>   ACCESS EXCLUSIVE locks on the child table and the
>   DEFAULT partition (if any).

In this context, "on the child table" sounds a bit confusing?  Would
it make more sense to say the "on the table to be attached" instead?
--
Michael


signature.asc
Description: PGP signature


Re: Getting psql to redisplay command after \e

2019-10-31 Thread Fabien COELHO



Hello Tom,


   psql=> select 1...
   psql-> 



I cannot move back with readline to edit further, I'm stuck there, which
is strange.


I don't follow.  readline doesn't allow you to edit already-entered lines
today, that is, after typing "select 1" you see

regression=# select 1
regression-#

and there isn't any way to move back and edit the already-entered line
within readline.


Yep.

My point is to possibly not implicitely  at the end of \e, but to 
behave as if we were moving in history, which allows editing the lines, so 
that you would get


  psql=> select 1

Instead of the above.

I agree it might be nicer if you could do that, but that's *far* beyond 
the scope of this patch.  It would take entirely fundamental rethinking 
of our use of libreadline, if indeed it's possible at all.  I also don't 
see how we could have syntax-aware per-line prompts if we were allowing 
readline to treat the whole query as one line.


I was suggesting something much simpler than rethinking readline handling. 
Does not mean that it is a good idea, but while testing the patch I would 
have liked the unfinished line to be in the current editing buffer, 
basically as if I had not typed .


ISTM more natural that \e behaves like history when coming back from 
editing, i.e. the \e-edited line is set as the current buffer for 
readline.



In the larger picture, tinkering with how that works would affect
every psql user at the level of "muscle memory" editing habits,
and I suspect that their reactions would not be uniformly positive.
What I propose here doesn't affect anyone who doesn't use \e at all.
Even for \e users it doesn't have any effect on what you need to type.


--
Fabien.




Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-10-31 Thread Etsuro Fujita
On Tue, Oct 29, 2019 at 7:29 PM Etsuro Fujita  wrote:
> On Fri, Oct 25, 2019 at 6:59 PM amul sul  wrote:
> > On Wed, Oct 16, 2019 at 6:20 PM Etsuro Fujita  
> > wrote:
> >> So I'd like to propose to introduce separate functions like
> >> process_outer_partition() and process_inner_partition() in the
> >> attached, instead of handle_missing_partition().  (I added a fast path
> >> to these functions that if both outer/inner sides have the default
> >> partitions, give up on merging partitions.  Also, I modified the
> >> caller sites of proposed functions so that they call these if
> >> necessary.)
>
> > Agree -- process_outer_partition() & process_inner_partition() approach 
> > looks
> > much cleaner than before.

> > Note that LHS numbers are the line numbers in your previously posted 
> > patch[1].
> >
> >  455 +   if (inner_has_default ||
> >  456 +   jointype == JOIN_LEFT ||
> >  457 +   jointype == JOIN_ANTI ||
> >  458 +   jointype == JOIN_FULL)
> >  459 +   {
> >  460 +   if (!process_outer_partition(&outer_map,
> >
> >  512 +   if (outer_has_default || jointype == JOIN_FULL)
> >  513 +   {
> >  514 +   if (!process_inner_partition(&outer_map,
> >
> > How about adding these conditions to the else block of 
> > process_outer_partition()
> > & process_inner_partition() function respectively so that these functions 
> > can be
> > called unconditionally?  Thoughts/Comments?
>
> I'm not sure that's a good idea; these functions might be called many
> times, so I just thought it would be better to call these functions
> conditionally, to avoid useless function call overhead.

The overhead might be small, but isn't zero, so I still think that we
should call these functions if necessary.

> >  456 +   jointype == JOIN_LEFT ||
> >  457 +   jointype == JOIN_ANTI ||
> >  458 +   jointype == JOIN_FULL)
> >
> > Also, how about using IS_OUTER_JOIN() instead. But we need an assertion to
> > restrict JOIN_RIGHT or something.
>
> Seems like a good idea.

Done.

> > For the convenience, I did both aforesaid changes in the attached delta 
> > patch that
> > can be applied atop of your previously posted patch[1].  Kindly have a look 
> > & share
> > your thoughts, thanks.
>
> Thanks for the patch!
>
> > 1273 + * *next_index is incremented when creating a new merged partition 
> > associated
> > 1274 + * with the given outer partition.
> > 1275 + */
> >
> > Correction: s/outer/inner
> > ---
> >
> > 1338 +* In range partitioning, if the given outer partition is 
> > already
> > 1339 +* merged (eg, because we found an overlapping range earlier), 
> > we know
> > 1340 +* where it fits in the join result; nothing to do in that 
> > case.  Else
> > 1341 +* create a new merged partition.
> >
> > Correction: s/outer/inner.
> > ---
> >
> > 1712 /*
> > 1713  * If the NULL partition was missing from the inner side of 
> > the join,
> >
> > s/inner side/outer side
> > --
>
> Good catch!  Will fix.

Done.

I also added some assertions to process_outer_partition() and
process_inner_partition(), including ones as proposed in your patch.
Attached is an updated version.  If no objections, I'll merge this
with the main patch [1].

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAPmGK14WHKckT1P6UJV2B63TZAxPyMn8iZJ99XF=azunhg6...@mail.gmail.com


modify-partbounds-changes-2.patch
Description: Binary data


Postgres cache

2019-10-31 Thread Natarajan R
Hi,

I want to know how postgres stores catalog relations in cache in-depth. Is
there any documentation for that?


Re: allow online change primary_conninfo

2019-10-31 Thread Sergei Kornilov
Hello

> So, I'd like to propose to move the stuff to the second switch().
> (See the attached incomplete patch.) This is rather similar to
> Sergei's previous proposal, but the structure of the state
> machine is kept.

Very similar to my v4 proposal (also move RequestXLogStreaming call), but 
closer to currentSource reading. No objections from me, attached patch is 
changed this way.
I renamed start_wal_receiver to startWalReceiver - this style looks more 
consistent to near code.

> + /*
> +  * Else, check if WAL receiver is still 
> active.
> +  */
> + else if (!WalRcvStreaming())

I think we still need wait WalRcvStreaming after RequestXLogStreaming call. So 
I remove else branch and leave separate condition.

> In ProcessStartupSigHup, conninfo and slotname don't need to be
> retained until the end of the function.

Agreed, I move pfree

> The log message in the function seems to be too detailed. On the
> other hand, if we changed primary_conninfo to '' (stop) or vise
> versa (start), the message (restart) looks strange.

I have no strong opinion here. These messages was changed many times during 
this thread lifetime, can be changed anytime. I think this is not issue since 
we have no consensus about overall design.
Write detailed messages was proposed here: 
https://www.postgresql.org/message-id/20190216151025.GJ2240%40paquier.xyz

> or vise versa (start)

I explicitly check currentSource and WalRcvRunning, so we have no such messages 
if user had no walreceiver before.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0191ec84b1..587031dea8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3983,9 +3983,15 @@ ANY num_sync ( 
@@ -4000,9 +4006,13 @@ ANY num_sync ( ).
-  This parameter can only be set at server start.
+  This parameter can only be set in the postgresql.conf
+  file or on the server command line.
   This setting has no effect if primary_conninfo is not
-  set.
+  set or the server is not in standby mode.
+ 
+ 
+  The WAL receiver is restarted after an update of primary_slot_name.
  
 

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2e3cc51006..8eee079eb2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -803,6 +803,12 @@ static XLogSource readSource = 0;	/* XLOG_FROM_* code */
 static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
 static bool lastSourceFailed = false;
 
+/*
+ * Need for restart running WalReceiver due the configuration change.
+ * Suitable only for XLOG_FROM_STREAM source
+ */
+static bool pendingWalRcvRestart = false;
+
 typedef struct XLogPageReadPrivate
 {
 	int			emode;
@@ -11773,6 +11779,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	for (;;)
 	{
 		int			oldSource = currentSource;
+		bool		startWalReceiver = false;
 
 		/*
 		 * First check if we failed to read from the current source, and
@@ -11806,54 +11813,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	if (!StandbyMode)
 		return false;
 
-	/*
-	 * If primary_conninfo is set, launch walreceiver to try
-	 * to stream the missing WAL.
-	 *
-	 * If fetching_ckpt is true, RecPtr points to the initial
-	 * checkpoint location. In that case, we use RedoStartLSN
-	 * as the streaming start position instead of RecPtr, so
-	 * that when we later jump backwards to start redo at
-	 * RedoStartLSN, we will have the logs streamed already.
-	 */
-	if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
-	{
-		XLogRecPtr	ptr;
-		TimeLineID	tli;
-
-		if (fetching_ckpt)
-		{
-			ptr = RedoStartLSN;
-			tli = ControlFile->checkPointCopy.ThisTimeLineID;
-		}
-		else
-		{
-			ptr = RecPtr;
-
-			/*
-			 * Use the record begin position to determine the
-			 * TLI, rather than the position we're reading.
-			 */
-			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
-
-			if (curFileTLI > 0 && tli < curFileTLI)
-elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
-	 (uint32) (tliRecPtr >> 32),
-	 (uint32) tliRecPtr,
-	 tli, curFileTLI);
-		}
-		curFileTLI = tli;
-		RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
-			 PrimarySlotName);
-		receivedUpto = 0;
-	}
-
 	/*
 	 * Move to XLOG_FROM_STREAM state in either case. We'll
 	 * get immediate failure if we didn't launch walreceiver,
 	 * and move on to the next state.
 	 */
 	currentSource = XLOG_FROM_STREAM;
+	startWalReceiver = true;
 	break;
 
 case X

The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"

2019-10-31 Thread Fujii Masao
Hi,

The command tag of ALTER MATERIALIZED VIEW is basically
"ALTER MATERIALIZED VIEW". For example,

=# ALTER MATERIALIZED VIEW test ALTER COLUMN j SET STATISTICS 100;
ALTER MATERIALIZED VIEW
=# ALTER MATERIALIZED VIEW test OWNER TO CURRENT_USER;
ALTER MATERIALIZED VIEW
=# ALTER MATERIALIZED VIEW test RENAME TO hoge;
ALTER MATERIALIZED VIEW

This is ok and looks intuitive to users. But I found that the command tag of
ALTER MATERIALIZED VIEW RENAME COLUMN is "ALTER TABLE", not "ALTER VIEW".

=# ALTER MATERIALIZED VIEW hoge RENAME COLUMN j TO x;
ALTER TABLE

Is this intentional? Or bug?

Regards,

-- 
Fujii Masao




Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Ibrar Ahmed
On Thu, Oct 31, 2019 at 12:32 PM Fujii Masao  wrote:

> On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
> >
> > Fujii Masao  writes:
> > > Currently CREATE OR REPLACE VIEW command fails if the column names
> > > are changed.
> >
> > That is, I believe, intentional.  It's an effective aid to catching
> > mistakes in view redefinitions, such as misaligning the new set of
> > columns relative to the old.  That's particularly important given
> > that we allow you to add columns during CREATE OR REPLACE VIEW.
> > Consider the oversimplified case where you start with
> >
> > CREATE VIEW v AS SELECT 1 AS x, 2 AS y;
> >
> > and you want to add a column z, and you get sloppy and write
> >
> > CREATE OR REPLACE VIEW v AS SELECT 1 AS x, 3 AS z, 2 AS y;
> >
> > If we did not throw an error on this, references that formerly
> > pointed to column y would now point to z (as that's still attnum 2),
> > which is highly unlikely to be what you wanted.
>
> This example makes me wonder if the addtion of column by
> CREATE OR REPLACE VIEW also has the same (or even worse) issue.
> That is, it may increase the oppotunity for users' mistake.
> I'm thinking the case where users mistakenly added new column
> into the view when replacing the view definition. This mistake can
> happen because CREATE OR REPLACE VIEW allows new column to
> be added. But what's the worse is that, currently there is no way to
> drop the column from the view, except recreation of the view.
> Neither CREATE OR REPLACE VIEW nor ALTER TABLE support
> the drop of the column from the view. So, to fix the mistake,
> users would need to drop the view itself and recreate it. If there are
> some objects depending the view, they also might need to be recreated.
> This looks not good. Since the feature has been supported,
> it's too late to say that, though...
>
> At least, the support for ALTER VIEW DROP COLUMN might be
> necessary to alleviate that situation.
>
>
- Is this intentional not implemented the "RENAME COLUMN" statement for
VIEW because it is implemented for Materialized View? I have made just a
similar
change to view and it works.

ALTER VIEW v RENAME COLUMN d to e;

- For "DROP COLUMN" for VIEW is throwing error.

postgres=# alter view v drop column e;
ERROR:  "v" is not a table, composite type, or foreign table



Regards,
>
> --
> Fujii Masao
>
>
>

-- 
Ibrar Ahmed


001_alter_view_rename_column_ibrar_v1.patch
Description: Binary data


Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Fujii Masao
On Thu, Oct 31, 2019 at 7:59 PM Ibrar Ahmed  wrote:
>
>
>
> On Thu, Oct 31, 2019 at 12:32 PM Fujii Masao  wrote:
>>
>> On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
>> >
>> > Fujii Masao  writes:
>> > > Currently CREATE OR REPLACE VIEW command fails if the column names
>> > > are changed.
>> >
>> > That is, I believe, intentional.  It's an effective aid to catching
>> > mistakes in view redefinitions, such as misaligning the new set of
>> > columns relative to the old.  That's particularly important given
>> > that we allow you to add columns during CREATE OR REPLACE VIEW.
>> > Consider the oversimplified case where you start with
>> >
>> > CREATE VIEW v AS SELECT 1 AS x, 2 AS y;
>> >
>> > and you want to add a column z, and you get sloppy and write
>> >
>> > CREATE OR REPLACE VIEW v AS SELECT 1 AS x, 3 AS z, 2 AS y;
>> >
>> > If we did not throw an error on this, references that formerly
>> > pointed to column y would now point to z (as that's still attnum 2),
>> > which is highly unlikely to be what you wanted.
>>
>> This example makes me wonder if the addtion of column by
>> CREATE OR REPLACE VIEW also has the same (or even worse) issue.
>> That is, it may increase the oppotunity for users' mistake.
>> I'm thinking the case where users mistakenly added new column
>> into the view when replacing the view definition. This mistake can
>> happen because CREATE OR REPLACE VIEW allows new column to
>> be added. But what's the worse is that, currently there is no way to
>> drop the column from the view, except recreation of the view.
>> Neither CREATE OR REPLACE VIEW nor ALTER TABLE support
>> the drop of the column from the view. So, to fix the mistake,
>> users would need to drop the view itself and recreate it. If there are
>> some objects depending the view, they also might need to be recreated.
>> This looks not good. Since the feature has been supported,
>> it's too late to say that, though...
>>
>> At least, the support for ALTER VIEW DROP COLUMN might be
>> necessary to alleviate that situation.
>>
>
> - Is this intentional not implemented the "RENAME COLUMN" statement for
> VIEW because it is implemented for Materialized View?

Not sure that, but Tom's suggestion to support ALTER VIEW RENAME COLUMN
sounds reasonable whether we support the rename of columns when replacing
the view definition, or not. Attached is the patch that adds support for
ALTER VIEW RENAME COLUMN command.

> I have made just a similar
> change to view and it works.

Yeah, ISTM that we made the same patch at the same time. You changed gram.y,
but I added the following changes additionally.

- Update the doc
- Add HINT message emit when CRAETE OR REPLACE VIEW fails to rename the columns
- Update tab-complete.c
- Add regression test

One issue I've not addressed yet is about the command tag of
"ALTER VIEW RENAME COLUMN". Currently "ALTER TABLE" is returned as the tag
like ALTER MATERIALIZED VIEW RENAME COLUMN, but ISTM that "ALTER VIEW"
is better. I started the discussion about the command tag of
"ALTER MATERIALIZED VIEW" at another thread. I will update the patch according
to the result of that discussion.
https://postgr.es/m/CAHGQGwGUaC03FFdTFoHsCuDrrNvFvNVQ6xyd40==p25wvub...@mail.gmail.com

Regards,

-- 
Fujii Masao


alter_view_rename_column_v1.patch
Description: Binary data


Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Ibrar Ahmed
On Thu, Oct 31, 2019 at 5:01 PM Fujii Masao  wrote:

> On Thu, Oct 31, 2019 at 7:59 PM Ibrar Ahmed  wrote:
> >
> >
> >
> > On Thu, Oct 31, 2019 at 12:32 PM Fujii Masao 
> wrote:
> >>
> >> On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
> >> >
> >> > Fujii Masao  writes:
> >> > > Currently CREATE OR REPLACE VIEW command fails if the column names
> >> > > are changed.
> >> >
> >> > That is, I believe, intentional.  It's an effective aid to catching
> >> > mistakes in view redefinitions, such as misaligning the new set of
> >> > columns relative to the old.  That's particularly important given
> >> > that we allow you to add columns during CREATE OR REPLACE VIEW.
> >> > Consider the oversimplified case where you start with
> >> >
> >> > CREATE VIEW v AS SELECT 1 AS x, 2 AS y;
> >> >
> >> > and you want to add a column z, and you get sloppy and write
> >> >
> >> > CREATE OR REPLACE VIEW v AS SELECT 1 AS x, 3 AS z, 2 AS y;
> >> >
> >> > If we did not throw an error on this, references that formerly
> >> > pointed to column y would now point to z (as that's still attnum 2),
> >> > which is highly unlikely to be what you wanted.
> >>
> >> This example makes me wonder if the addtion of column by
> >> CREATE OR REPLACE VIEW also has the same (or even worse) issue.
> >> That is, it may increase the oppotunity for users' mistake.
> >> I'm thinking the case where users mistakenly added new column
> >> into the view when replacing the view definition. This mistake can
> >> happen because CREATE OR REPLACE VIEW allows new column to
> >> be added. But what's the worse is that, currently there is no way to
> >> drop the column from the view, except recreation of the view.
> >> Neither CREATE OR REPLACE VIEW nor ALTER TABLE support
> >> the drop of the column from the view. So, to fix the mistake,
> >> users would need to drop the view itself and recreate it. If there are
> >> some objects depending the view, they also might need to be recreated.
> >> This looks not good. Since the feature has been supported,
> >> it's too late to say that, though...
> >>
> >> At least, the support for ALTER VIEW DROP COLUMN might be
> >> necessary to alleviate that situation.
> >>
> >
> > - Is this intentional not implemented the "RENAME COLUMN" statement for
> > VIEW because it is implemented for Materialized View?
>
> Not sure that, but Tom's suggestion to support ALTER VIEW RENAME COLUMN
> sounds reasonable whether we support the rename of columns when replacing
> the view definition, or not. Attached is the patch that adds support for
> ALTER VIEW RENAME COLUMN command.
>
> > I have made just a similar
> > change to view and it works.
>
> Yeah, ISTM that we made the same patch at the same time. You changed
> gram.y,
> but I added the following changes additionally.
>
> - Update the doc
> - Add HINT message emit when CRAETE OR REPLACE VIEW fails to rename the
> columns
> - Update tab-complete.c
> - Add regression test
>
>
Oh, I just sent the patch to ask, good you do that in all the places.

One issue I've not addressed yet is about the command tag of
> "ALTER VIEW RENAME COLUMN". Currently "ALTER TABLE" is returned as the tag
> like ALTER MATERIALIZED VIEW RENAME COLUMN, but ISTM that "ALTER VIEW"
> is better. I started the discussion about the command tag of
> "ALTER MATERIALIZED VIEW" at another thread. I will update the patch
> according
> to the result of that discussion.
>
> https://postgr.es/m/CAHGQGwGUaC03FFdTFoHsCuDrrNvFvNVQ6xyd40==p25wvub...@mail.gmail.com
>
> Attached patch contain small change for ALTER MATERIALIZED VIEW.



> Regards,
>
> --
> Fujii Masao
>


-- 
Ibrar Ahmed


alter_view_rename_column_v2.patch
Description: Binary data


Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Ibrar Ahmed
On Thu, Oct 31, 2019 at 5:11 PM Ibrar Ahmed  wrote:

>
>
> On Thu, Oct 31, 2019 at 5:01 PM Fujii Masao  wrote:
>
>> On Thu, Oct 31, 2019 at 7:59 PM Ibrar Ahmed 
>> wrote:
>> >
>> >
>> >
>> > On Thu, Oct 31, 2019 at 12:32 PM Fujii Masao 
>> wrote:
>> >>
>> >> On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
>> >> >
>> >> > Fujii Masao  writes:
>> >> > > Currently CREATE OR REPLACE VIEW command fails if the column names
>> >> > > are changed.
>> >> >
>> >> > That is, I believe, intentional.  It's an effective aid to catching
>> >> > mistakes in view redefinitions, such as misaligning the new set of
>> >> > columns relative to the old.  That's particularly important given
>> >> > that we allow you to add columns during CREATE OR REPLACE VIEW.
>> >> > Consider the oversimplified case where you start with
>> >> >
>> >> > CREATE VIEW v AS SELECT 1 AS x, 2 AS y;
>> >> >
>> >> > and you want to add a column z, and you get sloppy and write
>> >> >
>> >> > CREATE OR REPLACE VIEW v AS SELECT 1 AS x, 3 AS z, 2 AS y;
>> >> >
>> >> > If we did not throw an error on this, references that formerly
>> >> > pointed to column y would now point to z (as that's still attnum 2),
>> >> > which is highly unlikely to be what you wanted.
>> >>
>> >> This example makes me wonder if the addtion of column by
>> >> CREATE OR REPLACE VIEW also has the same (or even worse) issue.
>> >> That is, it may increase the oppotunity for users' mistake.
>> >> I'm thinking the case where users mistakenly added new column
>> >> into the view when replacing the view definition. This mistake can
>> >> happen because CREATE OR REPLACE VIEW allows new column to
>> >> be added. But what's the worse is that, currently there is no way to
>> >> drop the column from the view, except recreation of the view.
>> >> Neither CREATE OR REPLACE VIEW nor ALTER TABLE support
>> >> the drop of the column from the view. So, to fix the mistake,
>> >> users would need to drop the view itself and recreate it. If there are
>> >> some objects depending the view, they also might need to be recreated.
>> >> This looks not good. Since the feature has been supported,
>> >> it's too late to say that, though...
>> >>
>> >> At least, the support for ALTER VIEW DROP COLUMN might be
>> >> necessary to alleviate that situation.
>> >>
>> >
>> > - Is this intentional not implemented the "RENAME COLUMN" statement for
>> > VIEW because it is implemented for Materialized View?
>>
>> Not sure that, but Tom's suggestion to support ALTER VIEW RENAME COLUMN
>> sounds reasonable whether we support the rename of columns when replacing
>> the view definition, or not. Attached is the patch that adds support for
>> ALTER VIEW RENAME COLUMN command.
>>
>> > I have made just a similar
>> > change to view and it works.
>>
>> Yeah, ISTM that we made the same patch at the same time. You changed
>> gram.y,
>> but I added the following changes additionally.
>>
>> - Update the doc
>> - Add HINT message emit when CRAETE OR REPLACE VIEW fails to rename the
>> columns
>> - Update tab-complete.c
>> - Add regression test
>>
>>
> Oh, I just sent the patch to ask, good you do that in all the places.
>
> One issue I've not addressed yet is about the command tag of
>> "ALTER VIEW RENAME COLUMN". Currently "ALTER TABLE" is returned as the tag
>> like ALTER MATERIALIZED VIEW RENAME COLUMN, but ISTM that "ALTER VIEW"
>> is better. I started the discussion about the command tag of
>> "ALTER MATERIALIZED VIEW" at another thread. I will update the patch
>> according
>> to the result of that discussion.
>>
>> https://postgr.es/m/CAHGQGwGUaC03FFdTFoHsCuDrrNvFvNVQ6xyd40==p25wvub...@mail.gmail.com
>>
>> Attached patch contain small change for ALTER MATERIALIZED VIEW.
>
>
Hmm, my small change of "ALTER MATERIALIZED VIEW" does not work in some
cases need more work on that.


>
>
>> Regards,
>>
>> --
>> Fujii Masao
>>
>
>
> --
> Ibrar Ahmed
>


-- 
Ibrar Ahmed


ECPG: proposal for new DECLARE STATEMENT

2019-10-31 Thread kuroda.hay...@fujitsu.com
Dear hackers,

As declared last month, I propose again the new ECPG grammar, DECLARE STATEMENT.
This had been committed once, but it removed from PG12 because of
some problems. 
In this mail, I want to report some problems that previous implementation has,
produce a new solution, and attach a WIP patch.

[Basic function, Grammar, and Use case]
This statement will be used for the purpose of designating a connection easily.
Please see below:
https://www.postgresql.org/message-id/flat/4E72940DA2BF16479384A86D54D0988A4D80D3C9@G01JPEXMBKW04
The Oracle's manual will also help your understanding:
https://docs.oracle.com/en/database/oracle/oracle-database/19/lnpcc/embedded-SQL-statements-and-directives.html#GUID-0A30B7B4-BD91-42EA-AACE-2E9CBF7E9C1A

[Issues]
That's why this feature has been reverted.
1. The namespace of the identifier was not clear. If you use a same identifier 
for other SQL statements,
   these interfered each other and statements might be executed at the 
unexpected connection.
2. Declaring at the outside of functions was not allowed. This specification is 
quite different from the other 
   declarative statements, so some users might be confused.
   For instance, the following example was rejected.
```
EXEC SQL DECLARE stmt STATEMENT;

int
main()
{
...
EXEC SQL DECLARE cur CURSOR FOR stmt;
...
}
```
3. These specifications were not compatible with other DBMSs.

[Solutions]
The namespace is set to be a file unit. This follows other DBMSs.
When the DECLARE SATATEMENT statement is read, the name, identifier
and the related connection are recorded.
And if you use the declared identifier in order to prepare or declare cursor,
the fourth argument for ECPGdo(it represents the connection) will be 
overwritten.
This declaration is enabled only the precompile phase.

 [Limitations]
The declaration must be appeared before using it.
This also follows Pro*C precompiler.

A WIP patch is attached. Confirm that all ECPG tests have passed,
however, some documents are not included.
They will be added later.
I applied the pgindent as a test, but it might be failed because this is the
first time for me.

Best regards

Hayato Kuroda
FUJITSU LIMITED
E-Mail:kuroda.hay...@fujitsu.com




DeclareStmt01.patch
Description: DeclareStmt01.patch


Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Ibrar Ahmed
On Thu, Oct 31, 2019 at 5:28 PM Ibrar Ahmed  wrote:

>
>
> On Thu, Oct 31, 2019 at 5:11 PM Ibrar Ahmed  wrote:
>
>>
>>
>> On Thu, Oct 31, 2019 at 5:01 PM Fujii Masao 
>> wrote:
>>
>>> On Thu, Oct 31, 2019 at 7:59 PM Ibrar Ahmed 
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Oct 31, 2019 at 12:32 PM Fujii Masao 
>>> wrote:
>>> >>
>>> >> On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
>>> >> >
>>> >> > Fujii Masao  writes:
>>> >> > > Currently CREATE OR REPLACE VIEW command fails if the column names
>>> >> > > are changed.
>>> >> >
>>> >> > That is, I believe, intentional.  It's an effective aid to catching
>>> >> > mistakes in view redefinitions, such as misaligning the new set of
>>> >> > columns relative to the old.  That's particularly important given
>>> >> > that we allow you to add columns during CREATE OR REPLACE VIEW.
>>> >> > Consider the oversimplified case where you start with
>>> >> >
>>> >> > CREATE VIEW v AS SELECT 1 AS x, 2 AS y;
>>> >> >
>>> >> > and you want to add a column z, and you get sloppy and write
>>> >> >
>>> >> > CREATE OR REPLACE VIEW v AS SELECT 1 AS x, 3 AS z, 2 AS y;
>>> >> >
>>> >> > If we did not throw an error on this, references that formerly
>>> >> > pointed to column y would now point to z (as that's still attnum 2),
>>> >> > which is highly unlikely to be what you wanted.
>>> >>
>>> >> This example makes me wonder if the addtion of column by
>>> >> CREATE OR REPLACE VIEW also has the same (or even worse) issue.
>>> >> That is, it may increase the oppotunity for users' mistake.
>>> >> I'm thinking the case where users mistakenly added new column
>>> >> into the view when replacing the view definition. This mistake can
>>> >> happen because CREATE OR REPLACE VIEW allows new column to
>>> >> be added. But what's the worse is that, currently there is no way to
>>> >> drop the column from the view, except recreation of the view.
>>> >> Neither CREATE OR REPLACE VIEW nor ALTER TABLE support
>>> >> the drop of the column from the view. So, to fix the mistake,
>>> >> users would need to drop the view itself and recreate it. If there are
>>> >> some objects depending the view, they also might need to be recreated.
>>> >> This looks not good. Since the feature has been supported,
>>> >> it's too late to say that, though...
>>> >>
>>> >> At least, the support for ALTER VIEW DROP COLUMN might be
>>> >> necessary to alleviate that situation.
>>> >>
>>> >
>>> > - Is this intentional not implemented the "RENAME COLUMN" statement for
>>> > VIEW because it is implemented for Materialized View?
>>>
>>> Not sure that, but Tom's suggestion to support ALTER VIEW RENAME COLUMN
>>> sounds reasonable whether we support the rename of columns when replacing
>>> the view definition, or not. Attached is the patch that adds support for
>>> ALTER VIEW RENAME COLUMN command.
>>>
>>> > I have made just a similar
>>> > change to view and it works.
>>>
>>> Yeah, ISTM that we made the same patch at the same time. You changed
>>> gram.y,
>>> but I added the following changes additionally.
>>>
>>> - Update the doc
>>> - Add HINT message emit when CRAETE OR REPLACE VIEW fails to rename the
>>> columns
>>> - Update tab-complete.c
>>> - Add regression test
>>>
>>>
>> Oh, I just sent the patch to ask, good you do that in all the places.
>>
>> One issue I've not addressed yet is about the command tag of
>>> "ALTER VIEW RENAME COLUMN". Currently "ALTER TABLE" is returned as the
>>> tag
>>> like ALTER MATERIALIZED VIEW RENAME COLUMN, but ISTM that "ALTER VIEW"
>>> is better. I started the discussion about the command tag of
>>> "ALTER MATERIALIZED VIEW" at another thread. I will update the patch
>>> according
>>> to the result of that discussion.
>>>
>>> https://postgr.es/m/CAHGQGwGUaC03FFdTFoHsCuDrrNvFvNVQ6xyd40==p25wvub...@mail.gmail.com
>>>
>>> Attached patch contain small change for ALTER MATERIALIZED VIEW.
>>
>>
> Hmm, my small change of "ALTER MATERIALIZED VIEW" does not work in some
> cases need more work on that.
>
>

The AlterObjectTypeCommandTag function just take one parameter, but to
show  "ALTER MATERIALIZED VIEW" instead of ALTER TABLE we need to
pass "relationType = OBJECT_MATVIEW" along with "renameType = OBJECT_COLUMN"
and handle that in the function. The "AlterObjectTypeCommandTag" function
has many
calls. Do you think just for the command tag, we should do all this change?


>
>>
>>> Regards,
>>>
>>> --
>>> Fujii Masao
>>>
>>
>>
>> --
>> Ibrar Ahmed
>>
>
>
> --
> Ibrar Ahmed
>


-- 
Ibrar Ahmed


Application name for pg_basebackup and friends

2019-10-31 Thread Jesper Pedersen

Hi,

The attached patch adds an -a / --appname command line switch to 
pg_basebackup, pg_receivewal and pg_recvlogical.


This is useful when f.ex. pg_receivewal needs to connect as a 
synchronous client (synchronous_standby_names),


 pg_receivewal -h myhost -p 5432 -S replica1 -a replica1 --synchronous 
-D /wal


I'll add the patch to the CommitFest for discussion, as there is overlap 
with the -d switch.


Best regards,
 Jesper
>From 3aee659423137a547ed178a1dab34fe3caf30702 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Thu, 31 Oct 2019 08:34:41 -0400
Subject: [PATCH] Add an -a / --appname command line switch to control the
 application_name property.

Author: Jesper Pedersen 
---
 doc/src/sgml/ref/pg_basebackup.sgml| 11 +++
 doc/src/sgml/ref/pg_receivewal.sgml| 11 +++
 doc/src/sgml/ref/pg_recvlogical.sgml   | 11 +++
 src/bin/pg_basebackup/pg_basebackup.c  |  7 ++-
 src/bin/pg_basebackup/pg_receivewal.c  |  7 ++-
 src/bin/pg_basebackup/pg_recvlogical.c |  7 ++-
 src/bin/pg_basebackup/streamutil.c |  7 +++
 src/bin/pg_basebackup/streamutil.h |  1 +
 8 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index fc9e222f8d..28b5dee206 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -543,6 +543,17 @@ PostgreSQL documentation
 The following command-line options control the database connection parameters.
 
 
+ 
+  -a application_name
+  --appname=application_name
+  
+   
+Specifies the application name used to connect to the server.
+See  for more information.
+   
+  
+ 
+
  
   -d connstr
   --dbname=connstr
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 177e9211c0..0957e0a9f5 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -248,6 +248,17 @@ PostgreSQL documentation
 The following command-line options control the database connection parameters.
 
 
+ 
+  -a application_name
+  --appname=application_name
+  
+   
+Specifies the application name used to connect to the server.
+See  for more information.
+   
+  
+ 
+
  
   -d connstr
   --dbname=connstr
diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml
index 4c79f90414..b2d9b35362 100644
--- a/doc/src/sgml/ref/pg_recvlogical.sgml
+++ b/doc/src/sgml/ref/pg_recvlogical.sgml
@@ -272,6 +272,17 @@ PostgreSQL documentation
 The following command-line options control the database connection parameters.
 
 
+  
+   -a application_name
+   --appname=application_name
+   
+
+ Specifies the application name used to connect to the server.
+ See  for more information.
+
+   
+  
+
   
-d database
--dbname=database
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a9d162a7da..237945f879 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -351,6 +351,7 @@ usage(void)
 			 " do not verify checksums\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
 	printf(_("\nConnection options:\n"));
+	printf(_("  -a, --appname=NAME application name\n"));
 	printf(_("  -d, --dbname=CONNSTR   connection string\n"));
 	printf(_("  -h, --host=HOSTNAMEdatabase server host or socket directory\n"));
 	printf(_("  -p, --port=PORTdatabase server port number\n"));
@@ -2031,6 +2032,7 @@ main(int argc, char **argv)
 		{"label", required_argument, NULL, 'l'},
 		{"no-clean", no_argument, NULL, 'n'},
 		{"no-sync", no_argument, NULL, 'N'},
+		{"appname", required_argument, NULL, 'a'},
 		{"dbname", required_argument, NULL, 'd'},
 		{"host", required_argument, NULL, 'h'},
 		{"port", required_argument, NULL, 'p'},
@@ -2070,7 +2072,7 @@ main(int argc, char **argv)
 
 	atexit(cleanup_directories_atexit);
 
-	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:a:d:c:h:p:U:s:wWkvP",
 			long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -2176,6 +2178,9 @@ main(int argc, char **argv)
 	exit(1);
 }
 break;
+			case 'a':
+application_name = pg_strdup(optarg);
+break;
 			case 'd':
 connection_string = pg_strdup(optarg);
 break;
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index c0c8747982..863dcdb161 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -93,6 +93,7 @@ usage(void)
 	printf(_("  -Z, --compress=0-9 compress logs with given compression level\n"));
 	printf(_("  -?, --help show this

Re: Application name for pg_basebackup and friends

2019-10-31 Thread Heikki Linnakangas

On 31/10/2019 14:52, Jesper Pedersen wrote:

Hi,

The attached patch adds an -a / --appname command line switch to
pg_basebackup, pg_receivewal and pg_recvlogical.

This is useful when f.ex. pg_receivewal needs to connect as a
synchronous client (synchronous_standby_names),

   pg_receivewal -h myhost -p 5432 -S replica1 -a replica1 --synchronous
-D /wal

I'll add the patch to the CommitFest for discussion, as there is overlap
with the -d switch.


You can already set application name with the environment variable or on 
the database connections string:


pg_receivewal -D /wal -d "host=myhost application_name=myreceiver"

I don't think we need a new comand line switch for it.

- Heikki




Re: Collation versioning

2019-10-31 Thread Julien Rouhaud
Hello Thomas,

On Tue, May 28, 2019 at 9:00 PM Thomas Munro  wrote:
>
> Since there's a chance of an "unconference" session on locale versions
> tomorrow at PGCon, here's a fresh rebase of the patchset to add
> per-database-object collation version tracking.  It doesn't handle
> default collations yet (not hard AFAIK, will try that soon), but it
> does work well enough to demonstrate the generate principal.  I won't
> attach the CHECK support just yet, because it needs more work, but the
> point of it was to demonstrate that pg_depend can handle this for all
> kinds of database objects in one standard way, rather than sprinkling
> collation version stuff all over the place in pg_index, pg_constraint,
> etc, and I think it did that already.

Are you planning to continue working on it?  For the record, that's
something needed to be able to implement a filter in REINDEX command
[1].

I'm not sending a review since the code isn't finished yet, but one
issue with current approach is that the WARNING message recommending
to issue a REINDEX can be issued when running the required REINDEX,
which is at best unhelpful:

# update pg_depend set refobjversion = 'a' || refobjversion where
refobjversion != '';

# reindex table t1;
WARNING:  01000: index "t1_val_idx" depends on collation 13330 version
"a153.97.35.8", but the current version is "153.97.35.8"
DETAIL:  The index may be corrupted due to changes in sort order.
HINT:  REINDEX to avoid the risk of corruption.
LOCATION:  index_check_collation_version, index.c:1263


[1]: 
https://www.postgresql.org/message-id/a81069b1-fdaa-ff40-436e-7840bd639ccf%402ndquadrant.com




Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-10-31 Thread Tom Lane
Peter Eisentraut  writes:
> float4 is now always pass-by-value; the pass-by-reference code path is 
> completely removed.

I think this is OK.

> float8 and related types are now hardcoded to pass-by-value or 
> pass-by-reference depending on whether the build is 64- or 32-bit, as 
> was previously also the default.

I'm less happy with doing this.  It makes it impossible to test the
pass-by-reference code paths without actually firing up a 32-bit
environment.  It'd be fine to document --disable-float8-byval as
a developer-only option (it might be so already), but I don't want
to lose it completely.  I fail to see any advantage in getting rid
of it, anyway, since we do still have to maintain both code paths.

regards, tom lane




Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Tom Lane
Fujii Masao  writes:
> On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
>> Fujii Masao  writes:
>>> Currently CREATE OR REPLACE VIEW command fails if the column names
>>> are changed.

>> That is, I believe, intentional.  It's an effective aid to catching
>> mistakes in view redefinitions, such as misaligning the new set of
>> columns relative to the old.  [example]

> This example makes me wonder if the addtion of column by
> CREATE OR REPLACE VIEW also has the same (or even worse) issue.
> That is, it may increase the oppotunity for users' mistake.

The idea in CREATE OR REPLACE VIEW is to allow addition of new
columns at the end, the same as you can do with tables.  Checking
the column name matchups is a way to ensure that you actually do
add at the end, rather than insert, which wouldn't act as you
expect.  Admittedly it's only heuristic.

We could, perhaps, have insisted that adding a column also requires
special syntax, but we didn't.  Consider for example a case where
the new column needs to come from an additionally joined table;
then you have to be able to edit the underlying view definition along
with adding the column.  So that seems like kind of a pain in the
neck to insist on.

> But what's the worse is that, currently there is no way to
> drop the column from the view, except recreation of the view.

I think this has been discussed, as well.  It's not necessarily
simple to drop a view output column.  For example, if the view
uses SELECT DISTINCT, removing an output column would have
semantic effects on the set of rows that can be returned, since
distinct-ness would now mean something else than it did before.

It's conceivable that we could enumerate all such effects and
then allow DROP COLUMN (probably replacing the output column
with a null constant) if none of them apply, but I can't get
terribly excited about it.  The field demand for such a feature
has not been high.  I'd be a bit worried about bugs arising
from failures to check attisdropped for views, too; so that
the cost of getting this working might be greater than it seems
on the surface.

regards, tom lane




Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"

2019-10-31 Thread Tom Lane
Fujii Masao  writes:
> ... I found that the command tag of
> ALTER MATERIALIZED VIEW RENAME COLUMN is "ALTER TABLE", not "ALTER VIEW".

> =# ALTER MATERIALIZED VIEW hoge RENAME COLUMN j TO x;
> ALTER TABLE

> Is this intentional? Or bug?

Seems like an oversight.

regards, tom lane




function calls optimization

2019-10-31 Thread Andrzej Barszcz
Hi

I almost finished patch optimizing non volatile function calls.

select f(t.n) from t where f(t.n) > 10 and f(t.n) < 100;  needs 3 calls of
f() for each tuple,
after applying patch only 1.

Any pros and cons  ?


Re: function calls optimization

2019-10-31 Thread Andres Freund
Hi, 

On October 31, 2019 7:06:13 AM PDT, Andrzej Barszcz  wrote:
>Hi
>
>I almost finished patch optimizing non volatile function calls.
>
>select f(t.n) from t where f(t.n) > 10 and f(t.n) < 100;  needs 3 calls
>of
>f() for each tuple,
>after applying patch only 1.
>
>Any pros and cons  ?

Depends on the actual way of implementing this proposal. Think we need more 
details than what you idea here.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-10-31 Thread Masahiko Sawada
On Fri, Sep 6, 2019 at 3:34 PM Smith, Peter  wrote:
>
> -Original Message-
> From: Masahiko Sawada  Sent: Thursday, 15 August 2019 
> 7:10 PM
>
> > BTW I've created PoC patch for cluster encryption feature. Attached patch 
> > set has done some items of TODO list and some of them can be used even for 
> > finer granularity encryption. Anyway, the implemented components are 
> > followings:
>
> Hello Sawada-san,
>
> I guess your original patch code may be getting a bit out-dated by the 
> ongoing TDE discussions, but I have done some code review of it anyway.
>
> Hopefully a few comments below can still be of use going forward:
>
> ---
>
> REVIEW COMMENTS
>
> * src/backend/storage/encryption/enc_cipher.c – For functions 
> EncryptionCipherValue/String maybe should log warnings for unexpected values 
> instead of silently assigning to default 0/”off”.
>
> * src/backend/storage/encryption/enc_cipher.c – For function 
> EncryptionCipherString, purpose of returning ”unknown” if unclear because 
> that will map back to “off” again anyway via EncryptionCipherValue. Why not 
> just return "off" (with warning logged).
>
> * src/include/storage/enc_common.h – Typo in comment: "Encrypton".
>
> * src/include/storage/encryption.h - The macro DataEncryptionEnabled may be 
> better to be using enum TDE_ENCRYPTION_OFF instead of magic number 0
>
> * src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will report 
> error if USE_OPENSSL is not defined. The check seems premature because it 
> would fail even if the user is not using encryption. Shouldn't the lack of 
> openssl be OK when user is not using TDE at all (i.e. when encryption is 
> "none")?
>
> * src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest 
> better to check if (bootstrap_data_encryption_cipher == TDE_ENCRYPTION_OFF) 
> using enum instead of the magic number 0.
>
> * src/backend/storage/encryption/kmgr.c - The function 
> run_cluster_passphrase_command function seems mostly a clone of an existing 
> run_ssl_passphrase_command function. Is it possible to refactor to share the 
> common code?
>
> * src/backend/storage/encryption/kmgr.c - The function derive_encryption_key 
> declares a char key_len. Why char? It seems int everywhere else.
>
> * src/backend/bootstrap/bootstrap.c - Suggest better if variable declaration 
> bootstrap_data_encryption_cipher = 0 uses enum TDE_ENCRYPTION_OFF instead of 
> magic number 0
>
> * src/backend/utils/misc/guc.c - It looks like the default value for GUC 
> variable data_encryption_cipher is AES128. Wouldn't "off" be the more 
> appropriate default value? Otherwise it seems inconsistent with the logic of 
> initdb (which insists that the -e option is mandatory if you wanted any 
> encryption).
>
> * src/backend/utils/misc/guc.c - There is a missing entry in the 
> config_group_names[]. The patch changed the config_group[] in guc_tables.h, 
> so I think there needs to be a matching item in the config_group_names.
>
> * src/bin/initdb/initdb.c - The function check_encryption_cipher would 
> disallow an encryption value of "none". Although maybe it is not very useful 
> to say -e none, it does seem inconsistent to reject it, given that "none" was 
> a valid value for the GUC variable data_encryption_cipher.
>
> * contrib/bloom/blinsert.c - In function btbuildempty the arguments for 
> PageEncryptionInPlace seem in the wrong order (forknum should be 2nd).
>
> * src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the 
> arguments for PageEncryptionInPlace seem in the wrong order (forknum should 
> be 2nd).
>
> * src/backend/access/spgist/spginsert.c - In function spgbuildempty the 
> arguments for PageEncryptionInPlace seem in the wrong order (forknum should 
> be 2nd). This error looks repeated 3X.
>
> * in multiple files - The encryption enums have equivalent strings ("off", 
> "aes-128", "aes-256") but they are scattered as string literals in many 
> places (e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it 
> would be better to declare those as string constants in one place only.em
>

Thank you for reviewing this patch.

I've updated TDE patches. These patches implements key system, buffer
encryption and WAL encryption. Please refer to ToDo of cluster-wide
encryption for more details of design and components. It lacks
temporary file encryption and front end tools encryption. For
temporary file encryption, we are discussing which files should be
encrypted on other thread and I thought that temporary file encryption
might be related to that. So I'm currently studying the temporary
encryption patch that Antonin already submitted[1] but some changes
might be needed based on that discussion. For frontend tool support,
Shawn will share his patch that is built on my patch.

I haven't changed the usage of this feature. Please refer to the
email[2] for how to setup encrypted database cluster.

[1] https://www.postgresql.org/message-id/7082.1562337694%40localh

Re: function calls optimization

2019-10-31 Thread Tom Lane
Andres Freund  writes:
> On October 31, 2019 7:06:13 AM PDT, Andrzej Barszcz  wrote:
>> I almost finished patch optimizing non volatile function calls.
>> 
>> select f(t.n) from t where f(t.n) > 10 and f(t.n) < 100;  needs 3 calls
>> of
>> f() for each tuple,
>> after applying patch only 1.
>> 
>> Any pros and cons  ?

> Depends on the actual way of implementing this proposal. Think we need more 
> details than what you idea here.

We've typically supposed that the cost of searching for duplicate
subexpressions would outweigh the benefits of sometimes finding them.

regards, tom lane




Re: function calls optimization

2019-10-31 Thread Andres Freund
Hi, 

On October 31, 2019 7:45:26 AM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On October 31, 2019 7:06:13 AM PDT, Andrzej Barszcz
> wrote:
>>> I almost finished patch optimizing non volatile function calls.
>>> 
>>> select f(t.n) from t where f(t.n) > 10 and f(t.n) < 100;  needs 3
>calls
>>> of
>>> f() for each tuple,
>>> after applying patch only 1.
>>> 
>>> Any pros and cons  ?
>
>> Depends on the actual way of implementing this proposal. Think we
>need more details than what you idea here.
>
>We've typically supposed that the cost of searching for duplicate
>subexpressions would outweigh the benefits of sometimes finding them.

Based on profiles I've seen I'm not sure that's the right choice. Both for when 
the calls are expensive (say postgis stuff), and for when a lot of rows are 
processed.

I think one part of doing this in a realistic manner is an efficient search for 
redundant expressions. The other, also non trivial, is how to even represent 
references to the results of expressions in other parts of the expression tree 
/ other expressions.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: pgbench - extend initialization phase control

2019-10-31 Thread Fabien COELHO


Hello Masao-san,


+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_branches(bid,bbalance) "
+ "select bid, 0 "
+ "from generate_series(1, %d) as bid", scale);

"scale" should be "nbranches * scale".


Yep, even if nbranches is 1, it should be there.


+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_accounts(aid,bid,abalance,filler) "
+ "select aid, (aid - 1) / %d + 1, 0, '' "
+ "from generate_series(1, %d) as aid", naccounts, scale * naccounts);

Like client-side data generation, INT64_FORMAT should be used here
instead of %d?


Indeed.


If large scale factor is specified, the query for generating pgbench_accounts
data can take a very long time. While that query is running, operators may be
likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench
should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep
running to the end.


Hmmm. Why not. Now the infra to do that seems to already exists twice, 
once in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c".


I cannot say I'm thrilled to replicate this once more. I think that the 
reasonable option is to share this in fe-utils and then to reuse it from 
there. However, ISTM that such a restructuring patch which not belong to 
this feature.



- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

Per PostgreSQL basic coding style,


C99 (20 years ago) is new the norm, and this style is now allowed, there 
are over a hundred instances of these already. I tend to use that where

appropriate.


- fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+ fprintf(stderr,
+ "unrecognized initialization step \"%c\"\n"
+ "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
 *step);
- fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\",
\"p\", \"f\"\n");

The original message seems better to me. So what about just appending "G"
into the above latter message? That is,
"allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n"


I needed this list in several places, so it makes sense to share the 
definition, and frankly the list of half a dozen comma-separated chars 
does not strike me as much better than just giving the allowed chars 
directly. So the simpler the better, from my point of view.



Isn't it better to explain a bit more what "client-side / server-side data
generation" is? For example, something like


Ok.

Attached v7 does most of the above, but the list of char message and the 
signal handling. The first one does not look really better to me, and the 
second one belongs to a restructuring patch that I'll try to submit.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e3a0abb4c7..7be9c81c43 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -193,12 +193,25 @@ pgbench  options  d
   
  
  
-  g (Generate data)
+  g or G (Generate data, client-side or server-side)
   

 Generate data and load it into the standard tables,
 replacing any data already present.

+   
+With g (client-side data generation),
+data is generated in pgbench client and then
+sent to the server. This uses the client/server bandwidth
+extensively through a COPY.
+   
+   
+With G (server-side data generation),
+only limited queries are sent from pgbench
+client and then data is actually generated in the server.
+No significant bandwidth is required for this variant, but
+the server will do more work.
+   
   
  
  
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 03bcd22996..7f5c1d00c8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -131,8 +131,14 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /
  * some configurable parameters */
 
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
 #define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
 
+#define ALL_INIT_STEPS "dtgGvpf"	/* all possible steps */
+
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
@@ -627,7 +633,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "   run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM set fill factor\n"
 		   "  -n, --no-vacuum  do not run VACUUM du

Re: [BUG] Partition creation fails after dropping a column and adding a partial index

2019-10-31 Thread Ibrar Ahmed
On Thu, Oct 31, 2019 at 9:45 AM Michael Paquier  wrote:

> On Tue, Oct 29, 2019 at 01:16:58PM +0900, Michael Paquier wrote:
> > Yes, something looks wrong with that.  I have not looked at it in
> > details yet though.  I'll see about that tomorrow.
>
> So..  When building the attribute map for a cloned index (with either
> LIKE during the transformation or for partition indexes), we store
> each attribute number with 0 used for dropped columns.  Unfortunately,
> if you look at the way the attribute map is built in this case the
> code correctly generates the mapping with convert_tuples_by_name_map.
> But, the length of the mapping used is incorrect as this makes use of
> the number of attributes for the newly-created child relation, and not
> the parent which includes the dropped column in its count.  So the
> answer is simply to use the parent as reference for the mapping
> length.
>
> The patch is rather simple as per the attached, with extended
> regression tests included.  I have not checked on back-branches yet,
> but that's visibly wrong since 8b08f7d down to v11 (will do that when
> back-patching).
>
> There could be a point in changing convert_tuples_by_name_map & co so
> as they return the length of the map on top of the map to avoid such
> mistakes in the future.  That's a more invasive patch not really
> adapted for a back-patch, but we could do that on HEAD once this bug
> is fixed.  I have also checked other calls of this API and the
> handling is done correctly.
>
> The patch works for me on master and on 12. I have rebased the patch for
Version 11.


> Wyatt, what do you think?
> --
> Michael
>


-- 
Ibrar Ahmed
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b53f6ed3ac..9a1ef78a1f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -961,7 +961,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 gettext_noop("could not convert row type"));
 			idxstmt =
 generateClonedIndexStmt(NULL, RelationGetRelid(rel), idxRel,
-		attmap, RelationGetDescr(rel)->natts,
+		attmap, RelationGetDescr(parent)->natts,
 		&constraintOid);
 			DefineIndex(RelationGetRelid(rel),
 		idxstmt,
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index c25117e47c..d20ba3dbdf 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -1006,3 +1006,52 @@ CONTEXT:  SQL statement "create table tab_part_create_1 partition of tab_part_cr
 PL/pgSQL function func_part_create() line 3 at EXECUTE
 drop table tab_part_create;
 drop function func_part_create();
+-- tests of column drop with partition tables and indexes using
+-- predicates and expressions.
+create table part_column_drop (
+  useless_1 int,
+  id int,
+  useless_2 int,
+  d int,
+  b int,
+  useless_3 int
+) partition by range (id);
+alter table part_column_drop drop column useless_1;
+alter table part_column_drop drop column useless_2;
+alter table part_column_drop drop column useless_3;
+create index part_column_drop_b_pred on part_column_drop(b) where b = 1;
+create index part_column_drop_b_expr on part_column_drop((b = 1));
+create index part_column_drop_d_pred on part_column_drop(d) where d = 2;
+create index part_column_drop_d_expr on part_column_drop((d = 2));
+create table part_column_drop_1_10 partition of
+  part_column_drop for values from (1) to (10);
+\d part_column_drop
+  Table "public.part_column_drop"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ id | integer |   |  | 
+ d  | integer |   |  | 
+ b  | integer |   |  | 
+Partition key: RANGE (id)
+Indexes:
+"part_column_drop_b_expr" btree ((b = 1))
+"part_column_drop_b_pred" btree (b) WHERE b = 1
+"part_column_drop_d_expr" btree ((d = 2))
+"part_column_drop_d_pred" btree (d) WHERE d = 2
+Number of partitions: 1 (Use \d+ to list them.)
+
+\d part_column_drop_1_10
+   Table "public.part_column_drop_1_10"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ id | integer |   |  | 
+ d  | integer |   |  | 
+ b  | integer |   |  | 
+Partition of: part_column_drop FOR VALUES FROM (1) TO (10)
+Indexes:
+"part_column_drop_1_10_b_idx" btree (b) WHERE b = 1
+"part_column_drop_1_10_d_idx" btree (d) WHERE d = 2
+"part_column_drop_1_10_expr_idx" btree ((b = 1))
+"part_column_drop_1_10_expr_idx1" btree ((d = 2))
+
+drop table part_column_drop;
diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql
index 27e6f59e87..601a4b9dc3 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -814,3 +814,26 @@ create trigger trig_part_create before 

Re: function calls optimization

2019-10-31 Thread Andres Freund
Hi

On October 31, 2019 7:53:20 AM PDT, Andres Freund  wrote:
>On October 31, 2019 7:45:26 AM PDT, Tom Lane  wrote:
>>We've typically supposed that the cost of searching for duplicate
>>subexpressions would outweigh the benefits of sometimes finding them.
>
>Based on profiles I've seen I'm not sure that's the right choice. Both
>for when the calls are expensive (say postgis stuff), and for when a
>lot of rows are processed.
>
>I think one part of doing this in a realistic manner is an efficient
>search for redundant expressions. 

One way to improve the situation - which is applicable in a lot of situations, 
e.g. setrefs.c etc - would be to compute hashes for (sub-) expression trees. 
Which makes it a lot easier to bail out early when trees are not the same, and 
also easier to build an efficient way to find redundant expressions. There's 
some complexity in invalidating such hashes once computed, and when to first 
compute them, obviously.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: function calls optimization

2019-10-31 Thread Andreas Karlsson

On 10/31/19 3:45 PM, Tom Lane wrote:

Andres Freund  writes:

On October 31, 2019 7:06:13 AM PDT, Andrzej Barszcz  wrote:

Any pros and cons  ?



Depends on the actual way of implementing this proposal. Think we need more 
details than what you idea here.


We've typically supposed that the cost of searching for duplicate
subexpressions would outweigh the benefits of sometimes finding them.


That is an important concern, but given how SQL does not make it 
convenient to re-use partial results of calculations I think such 
queries are quite common in real world workloads.


So if we can make it cheap enough I think that it is a worthwhile 
optimization.


Andreas




Re: function calls optimization

2019-10-31 Thread Tom Lane
Andres Freund  writes:
> On October 31, 2019 7:45:26 AM PDT, Tom Lane  wrote:
>> We've typically supposed that the cost of searching for duplicate
>> subexpressions would outweigh the benefits of sometimes finding them.

> Based on profiles I've seen I'm not sure that's the right choice. Both for 
> when the calls are expensive (say postgis stuff), and for when a lot of rows 
> are processed.

Yeah, if your mental model of a function call is some remarkably expensive
PostGIS geometry manipulation, it's easy to justify doing a lot of work
to try to detect duplicates.  But most functions in most queries are
more like int4pl or btint8cmp, and it's going to be extremely remarkable
if you can make back the planner costs of checking for duplicate usages
of those.

Possibly this could be finessed by only trying to find duplicates of
functions that have high cost estimates.  Not sure how high is high
enough.

> I think one part of doing this in a realistic manner is an efficient
> search for redundant expressions. The other, also non trivial, is how to
> even represent re eferences to the results of expressions in other parts of 
> the expression tree / other expressions.

Yup, both of those would be critical to do right.

regards, tom lane




Re: function calls optimization

2019-10-31 Thread Andres Freund
Hi, 

On October 31, 2019 8:06:50 AM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On October 31, 2019 7:45:26 AM PDT, Tom Lane 
>wrote:
>>> We've typically supposed that the cost of searching for duplicate
>>> subexpressions would outweigh the benefits of sometimes finding
>them.
>
>> Based on profiles I've seen I'm not sure that's the right choice.
>Both for when the calls are expensive (say postgis stuff), and for when
>a lot of rows are processed.
>
>Yeah, if your mental model of a function call is some remarkably
>expensive
>PostGIS geometry manipulation, it's easy to justify doing a lot of work
>to try to detect duplicates.  But most functions in most queries are
>more like int4pl or btint8cmp, and it's going to be extremely
>remarkable
>if you can make back the planner costs of checking for duplicate usages
>of those.

Well, if it's an expression containing those individuals cheap calls on a 
seqscan on a large table below an aggregate, it'd likely still be a win. But we 
don't, to my knowledge, really have a good way to model optimizations like this 
that should only be done if either expensive or have a high loop count.

I guess one ugly way to deal with this would be to eliminate redundancies very 
late, e.g. during setrefs (where a better data structure for matching 
expressions would be good anyway), as we already know all the costs. 

But ideally we would want to do be able to take such savings into account 
earlier, when considering different paths. I suspect that it might be a good 
enough vehicle to tackle the rest of the work however, at least initially.

We could also "just" do such elimination during expression "compilation", but 
it'd be better to not have to do something as complicated as this for every 
execution of a prepared statement.


>> I think one part of doing this in a realistic manner is an efficient
>> search for redundant expressions. The other, also non trivial, is how
>to
>> even represent re eferences to the results of expressions in other
>parts of the expression tree / other expressions.
>
>Yup, both of those would be critical to do right.

Potentially related note: for nodes like seqscan, combining the qual and 
projection processing into one expression seems to be a noticable win (at least 
when taking care do emit two different sets of deform expression steps). Wonder 
if something like that would take care of avoiding the need for cross 
expression value passing in enough places.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Resume vacuum and autovacuum from interruption and cancellation

2019-10-31 Thread Ibrar Ahmed
On Thu, Sep 26, 2019 at 1:53 AM Alvaro Herrera 
wrote:

> Apparently this patch now has a duplicate OID.  Please do use random
> OIDs >8000 as suggested by the unused_oids script.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>
I have updated the patch using OIDs > 8000


-- 
Ibrar Ahmed


v4-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patch
Description: Binary data


ssl passphrase callback

2019-10-31 Thread Andrew Dunstan

This is the first of a number of patches to enhance SSL functionality,
particularly w.r.t. passphrases.


This patch provides a hook for a function that can supply an SSL
passphrase. The hook can be filled in by a shared preloadable module. In
order for that to be effective, the startup order is modified slightly.
There is a test attached that builds and uses one trivial
implementation, which just takes a configuration setting and rot13's it
before supplying the result as the passphrase.


cheers


andrew




-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 629919cc6e..bf4493c94a 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -45,6 +45,9 @@
 #include "utils/memutils.h"
 
 
+ssl_passphrase_func_cb ssl_passphrase_function = NULL;
+bool ssl_passphrase_function_supports_reload = false;
+
 static int	my_sock_read(BIO *h, char *buf, int size);
 static int	my_sock_write(BIO *h, const char *buf, int size);
 static BIO_METHOD *my_BIO_s_socket(void);
@@ -124,12 +127,16 @@ be_tls_init(bool isServerStart)
 	 */
 	if (isServerStart)
 	{
-		if (ssl_passphrase_command[0])
+		if (ssl_passphrase_function)
+			SSL_CTX_set_default_passwd_cb(context, ssl_passphrase_function);
+		else if (ssl_passphrase_command[0])
 			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
 	}
 	else
 	{
-		if (ssl_passphrase_command[0] && ssl_passphrase_command_supports_reload)
+		if (ssl_passphrase_function && ssl_passphrase_function_supports_reload)
+			SSL_CTX_set_default_passwd_cb(context, ssl_passphrase_function);
+		else if (ssl_passphrase_command[0] && ssl_passphrase_command_supports_reload)
 			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
 		else
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5f30359165..18dd8578d9 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -972,17 +972,6 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	LocalProcessControlFile(false);
 
-	/*
-	 * Initialize SSL library, if specified.
-	 */
-#ifdef USE_SSL
-	if (EnableSSL)
-	{
-		(void) secure_initialize(true);
-		LoadedSSL = true;
-	}
-#endif
-
 	/*
 	 * Register the apply launcher.  Since it registers a background worker,
 	 * it needs to be called before InitializeMaxBackends(), and it's probably
@@ -996,6 +985,17 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	process_shared_preload_libraries();
 
+	/*
+	 * Initialize SSL library, if specified.
+	 */
+#ifdef USE_SSL
+	if (EnableSSL)
+	{
+		(void) secure_initialize(true);
+		LoadedSSL = true;
+	}
+#endif
+
 	/*
 	 * Now that loadable modules have had their chance to register background
 	 * workers, calculate MaxBackends.
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 541f970f99..dd2b5fc6c8 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -287,6 +287,17 @@ extern void be_tls_get_peer_serial(Port *port, char *ptr, size_t len);
 extern char *be_tls_get_certificate_hash(Port *port, size_t *len);
 #endif
 
+/*
+ * ssl_passphrase_function can be filled in by a shared preloaded module
+ * to supply a passphrase for a key file, as can the flag noting whether the
+ * function supports reloading.
+ */
+
+typedef int (* ssl_passphrase_func_cb) (char *buf, int size, int rwflag,
+		void *userdata);
+extern ssl_passphrase_func_cb ssl_passphrase_function;
+extern bool ssl_passphrase_function_supports_reload;
+
 #endif			/* USE_SSL */
 
 #ifdef ENABLE_GSS
diff --git a/src/test/modules/ssl_passphrase_callback/.gitignore b/src/test/modules/ssl_passphrase_callback/.gitignore
new file mode 100644
index 00..1dbadf7baf
--- /dev/null
+++ b/src/test/modules/ssl_passphrase_callback/.gitignore
@@ -0,0 +1 @@
+tmp_check
diff --git a/src/test/modules/ssl_passphrase_callback/Makefile b/src/test/modules/ssl_passphrase_callback/Makefile
new file mode 100644
index 00..876ede2f21
--- /dev/null
+++ b/src/test/modules/ssl_passphrase_callback/Makefile
@@ -0,0 +1,22 @@
+# ssl_passphrase Makefile
+
+MODULE_big = ssl_passphrase_func
+OBJS = ssl_passphrase_func.o $(WIN32RES)
+PGFILEDESC = "callback function to provide a passphrase"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/ssl_passphrase_callback
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check: prove-check
+
+prove-check: | temp-install
+	@echo running prove ...
+	$(prove_check)
diff --git a/src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c b/src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c
new file mode 100644
index 00..e3e433b6f0
--- /dev/null
+++ b/

Re: function calls optimization

2019-10-31 Thread Tom Lane
Andres Freund  writes:
> Potentially related note: for nodes like seqscan, combining the qual and 
> projection processing into one expression seems to be a noticable win (at 
> least when taking care do emit two different sets of deform expression steps).

There's just one problem: that violates SQL semantics, and not in
a small way.

select 1/x from tab where x <> 0

regards, tom lane




Re: function calls optimization

2019-10-31 Thread Andres Freund
Hi, 

On October 31, 2019 8:45:26 AM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> Potentially related note: for nodes like seqscan, combining the qual
>and projection processing into one expression seems to be a noticable
>win (at least when taking care do emit two different sets of deform
>expression steps).
>
>There's just one problem: that violates SQL semantics, and not in
>a small way.
>
>   select 1/x from tab where x <> 0

The expression would obviously have to return early, before projecting, when 
not matching the qual? I'm basically just thinking of first executing the steps 
for the qual, and in the success case execute the projection steps before 
returning the quals positive result. 

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: function calls optimization

2019-10-31 Thread Andrzej Barszcz
x <> 0 is evaluated first, 1/x only when x <> 0, not ?

czw., 31 paź 2019 o 16:45 Tom Lane  napisał(a):

> Andres Freund  writes:
> > Potentially related note: for nodes like seqscan, combining the qual and
> projection processing into one expression seems to be a noticable win (at
> least when taking care do emit two different sets of deform expression
> steps).
>
> There's just one problem: that violates SQL semantics, and not in
> a small way.
>
> select 1/x from tab where x <> 0
>
> regards, tom lane
>


Adding percentile metrics to pg_stat_statements module

2019-10-31 Thread Igor Calabria
Hi everyone,

I was taking a look at pg_stat_statements module and noticed that it does
not collect any percentile metrics. I believe that It would be really handy
to have those available and I'd love to contribute with this feature.

The basic idea is to accumulate the the query execution times using an
approximation structure like q-digest or t-digest and add those results to
the pg_stat_statements table as fixed columns. Something like this

p90_time:
p95_time:
p99_time:
p70_time:
...

Another solution is to persist de digest structure in a binary column and
use a function to extract the desired quantile ilke this SELECT
approx_quantile(digest_times, 0.99) FROM pg_stat_statements

What do you guys think?
Cheers,


Re: function calls optimization

2019-10-31 Thread Andres Freund
Hi, 

On October 31, 2019 8:51:11 AM PDT, Andrzej Barszcz  wrote:
>x <> 0 is evaluated first, 1/x only when x <> 0, not ?
>
>czw., 31 paź 2019 o 16:45 Tom Lane  napisał(a):
>
>> Andres Freund  writes:
>> > Potentially related note: for nodes like seqscan, combining the
>qual and
>> projection processing into one expression seems to be a noticable win
>(at
>> least when taking care do emit two different sets of deform
>expression
>> steps).
>>
>> There's just one problem: that violates SQL semantics, and not in
>> a small way.
>>
>> select 1/x from tab where x <> 0

On postgres lists the policy is to reply below the quoted bit, and to trim the 
quote appropriately.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Allow cluster_name in log_line_prefix

2019-10-31 Thread Andres Freund
Hi,

On 2019-10-28 12:33:00 +0800, Craig Ringer wrote:
> I was recently surprised to notice that log_line_prefix doesn't support a
> cluster_name placeholder. I suggest adding one. If I don't hear objections
> I'll send a patch.
> 
> Before anyone asks "but why?!":
> 
> * A constant (short) string in log_line_prefix is immensely useful when
> working with logs from multi-node systems. Whether that's physical
> streaming replication, logical replication, Citus, whatever, it doesn't
> matter. It's worth paying the small storage price for sanity when looking
> at logs.
> 
> * Yes you can embed it directly into log_line_prefix. But then it gets
> copied by pg_basebackup or whatever you're using to clone standbys etc, so
> you can easily land up with multiple instances reporting the same name.
> This rather defeats the purpose.

+1. For a while this was part of the patch that added cluster_name
(possibly worthwhile digging it up from that thread), but some people
thought it was unnecessary, so it was excised from the patch to get the
basic feature...

Greetings,

Andres Freund




Re: Adding percentile metrics to pg_stat_statements module

2019-10-31 Thread Pavel Stehule
čt 31. 10. 2019 v 16:51 odesílatel Igor Calabria 
napsal:

> Hi everyone,
>
> I was taking a look at pg_stat_statements module and noticed that it does
> not collect any percentile metrics. I believe that It would be really handy
> to have those available and I'd love to contribute with this feature.
>
> The basic idea is to accumulate the the query execution times using an
> approximation structure like q-digest or t-digest and add those results to
> the pg_stat_statements table as fixed columns. Something like this
>
> p90_time:
> p95_time:
> p99_time:
> p70_time:
> ...
>
> Another solution is to persist de digest structure in a binary column and
> use a function to extract the desired quantile ilke this SELECT
> approx_quantile(digest_times, 0.99) FROM pg_stat_statements
>
> What do you guys think?
>

+ the idea

But I am not sure about performance and memory overhead

Pavel

> Cheers,
>
>


idea - proposal - defining own psql commands

2019-10-31 Thread Pavel Stehule
Hi

long time we are think how to allow add some custom commands in psql. I had
a following idea

1. psql can has special buffer for custom queries. This buffer can be
filled by special command \gdefq. This command will have two parameters -
name and number of arguments.

some like

select * from pg_class where relname = :'_first' \gdefcq m1 1
select * from pg_class where relnamespace = :_first::regnamespace and
rename = :'_second' \gdefcq m1 2

the custom queries can be executed via doubled backslash like

\\m1 pg_proc
\\m1 pg_catalog pg_proc

the runtime will count number of parameters and chose variant with selected
name and same number of arguments. Next, it save parameters to variables
like _first, _second. Last step is query execution.

What do you think about this?

Regards

Pavel


Re: RFC: split OBJS lines to one object per line

2019-10-31 Thread Mark Dilger




On 10/29/19 11:32 PM, Andres Freund wrote:

Hi,

On 2019-10-29 16:31:11 -0400, Tom Lane wrote:

Andres Freund  writes:

one of the most frequent conflicts I see is that two patches add files
to OBJS (or one of its other spellings), and there are conflicts because
another file has been added.
...
Now, obviously these types of conflicts are easy enough to resolve, but
it's still annoying.  It seems that this would be substantially less
often a problem if we just split such lines to one file per
line.


We did something similar not too long ago in configure.in (bfa6c5a0c),
and it seems to have helped.  +1


Cool. Any opinion on whether to got for

OBJS = \
dest.o \
fastpath.o \
...

or

OBJS = dest.o \
fastpath.o \
...

I'm mildly inclined to go for the former.


+1 for the former.



Greetings,

Andres Freund







TestLib::command_fails_like enhancement

2019-10-31 Thread Andrew Dunstan

This small patch authored by my colleague Craig Ringer enhances
Testlib's command_fails_like by allowing the passing of extra keyword
type arguments. The keyword initially recognized is 'extra_ipcrun_opts'.
The value for this keyword needs to be an array, and is passed through
to the call to IPC::Run.

Some patches I will be submitting shortly rely on this enhancement.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 905d0d178f..5264111d00 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -771,10 +771,16 @@ the given regular expression.
 sub command_fails_like
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
-	my ($cmd, $expected_stderr, $test_name) = @_;
+	my ($cmd, $expected_stderr, $test_name, %kwargs) = @_;
+	my @extra_ipcrun_opts = ();
+	if (defined($kwargs{'extra_ipcrun_opts'}))
+	{
+		push(@extra_ipcrun_opts, @{$kwargs{'extra_ipcrun_opts'}});
+	}
 	my ($stdout, $stderr);
 	print("# Running: " . join(" ", @{$cmd}) . "\n");
-	my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+	my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr,
+	  @extra_ipcrun_opts;
 	ok(!$result, "$test_name: exit code not 0");
 	like($stderr, $expected_stderr, "$test_name: matches");
 	return;


fe-utils - share query cancellation code

2019-10-31 Thread Fabien COELHO


Hello Devs,

This patch moves duplicated query cancellation code code from psql & 
scripts to fe-utils, so that it is shared and may be used by other 
commands.


This is because Masao-san suggested to add a query cancellation feature to 
pgbench for long queries (server-side data generation being discussed, but 
possibly pk and fk could use that as well).


--
Fabien.diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b981ae81ff..f1d9e0298a 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -29,6 +29,7 @@
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "common/logging.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/print.h"
 #include "fe_utils/string_utils.h"
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 90f6380170..a00990d214 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -24,6 +24,7 @@
 #include "copy.h"
 #include "crosstabview.h"
 #include "fe_utils/mbprint.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/string_utils.h"
 #include "portability/instr_time.h"
 #include "settings.h"
@@ -222,7 +223,7 @@ NoticeProcessor(void *arg, const char *message)
 	pg_log_info("%s", message);
 }
 
-
+#ifndef WIN32
 
 /*
  * Code to support query cancellation
@@ -241,7 +242,7 @@ NoticeProcessor(void *arg, const char *message)
  *
  * SIGINT is supposed to abort all long-running psql operations, not only
  * database queries.  In most places, this is accomplished by checking
- * cancel_pressed during long-running loops.  However, that won't work when
+ * CancelRequested during long-running loops.  However, that won't work when
  * blocked on user input (in readline() or fgets()).  In those places, we
  * set sigint_interrupt_enabled true while blocked, instructing the signal
  * catcher to longjmp through sigint_interrupt_jmp.  We assume readline and
@@ -249,37 +250,11 @@ NoticeProcessor(void *arg, const char *message)
  * not work on win32, so control-C is less useful there)
  */
 volatile bool sigint_interrupt_enabled = false;
-
 sigjmp_buf	sigint_interrupt_jmp;
 
-static PGcancel *volatile cancelConn = NULL;
-
-#ifdef WIN32
-static CRITICAL_SECTION cancelConnLock;
-#endif
-
-/*
- * Write a simple string to stderr --- must be safe in a signal handler.
- * We ignore the write() result since there's not much we could do about it.
- * Certain compilers make that harder than it ought to be.
- */
-#define write_stderr(str) \
-	do { \
-		const char *str_ = (str); \
-		int		rc_; \
-		rc_ = write(fileno(stderr), str_, strlen(str_)); \
-		(void) rc_; \
-	} while (0)
-
-
-#ifndef WIN32
-
 static void
-handle_sigint(SIGNAL_ARGS)
+psql_sigint_callback(void)
 {
-	int			save_errno = errno;
-	char		errbuf[256];
-
 	/* if we are waiting for input, longjmp out of it */
 	if (sigint_interrupt_enabled)
 	{
@@ -288,74 +263,19 @@ handle_sigint(SIGNAL_ARGS)
 	}
 
 	/* else, set cancel flag to stop any long-running loops */
-	cancel_pressed = true;
-
-	/* and send QueryCancel if we are processing a database query */
-	if (cancelConn != NULL)
-	{
-		if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
-			write_stderr("Cancel request sent\n");
-		else
-		{
-			write_stderr("Could not send cancel request: ");
-			write_stderr(errbuf);
-		}
-	}
-
-	errno = save_errno;			/* just in case the write changed it */
-}
-
-void
-setup_cancel_handler(void)
-{
-	pqsignal(SIGINT, handle_sigint);
-}
-#else			/* WIN32 */
-
-static BOOL WINAPI
-consoleHandler(DWORD dwCtrlType)
-{
-	char		errbuf[256];
-
-	if (dwCtrlType == CTRL_C_EVENT ||
-		dwCtrlType == CTRL_BREAK_EVENT)
-	{
-		/*
-		 * Can't longjmp here, because we are in wrong thread :-(
-		 */
-
-		/* set cancel flag to stop any long-running loops */
-		cancel_pressed = true;
-
-		/* and send QueryCancel if we are processing a database query */
-		EnterCriticalSection(&cancelConnLock);
-		if (cancelConn != NULL)
-		{
-			if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
-write_stderr("Cancel request sent\n");
-			else
-			{
-write_stderr("Could not send cancel request: ");
-write_stderr(errbuf);
-			}
-		}
-		LeaveCriticalSection(&cancelConnLock);
-
-		return TRUE;
-	}
-	else
-		/* Return FALSE for any signals not being handled */
-		return FALSE;
+	CancelRequested = true;
 }
+#endif
 
 void
-setup_cancel_handler(void)
+psql_setup_cancel_handler(void)
 {
-	InitializeCriticalSection(&cancelConnLock);
-
-	SetConsoleCtrlHandler(consoleHandler, TRUE);
+#ifndef WIN32
+	setup_cancel_handler(psql_sigint_callback);
+#else
+	setup_cancel_handler();
+#endif /* WIN32 */
 }
-#endif			/* WIN32 */
 
 
 /* ConnectionUp
@@ -369,7 +289,6 @@ ConnectionUp(void)
 }
 
 
-
 /* CheckConnection
  *
  * Verify that we still have a good connection to the backend, and if not,
@@ -428,62 +347,6 @@ CheckConnection(void)
 
 
 
-/*
- * SetCancelConn
- *
- * Set cancelConn to point to the current database connection.
- */
-void
-SetCancelConn(void)
-{
-	PGcancel   *oldCancelConn;
-
-#ifdef WIN32
-	EnterCriticalSection(&cancel

Removing alignment padding for byval types

2019-10-31 Thread Andres Freund
Hi,

We currently align byval types such as int4/8, float4/8, timestamp *,
date etc, even though we mostly don't need to. When tuples are deformed,
all byval types are copied out from the tuple data into the
corresponding Datum array, therefore the original alignment in the tuple
data doesn't matter.  This is different from byref types, where the
Datum formed will often be a pointer into the tuple data.

While there are some older systems where it could be a bit slower to
copy data out from unaligned positions into the datum array, this is
more than bought back by the next point:


The fact that these types are aligned has substantial costs:

For one, we often waste substantial amounts of space inside tables with
alignment padding. It's not uncommon to see about 30% or more of space
wasted (especially when taking alignment of the first column into
account).

For another, and this I think is less obvious, we actually waste
substantial amounts of CPU maintaining the alignment. This is primarily
the case because we have to perform to align the pointer to the next
field during tuple [de]forming. Those instructions [1] have to be
executed taking time, but what's worse, they also reduce the ability of
out-of-order execution to hide latencies. There's a hard dependency on
knowing the offset to the next column to be able to continue with the
next column.


There's two reasons why we can't just set the alignment for these types
to 'c'.
1) pg_upgrade, for fairly obvious reasons
2) We map catalog table rows to structs, in a *lot* of places.


It seems to me that, despite the above, it's still worth trying to
improve upon the current state, to benefit from reduced space and CPU
usage.

As it turns out we already separate out the alignment for a type, and a
column, between pg_type.typalign and pg_attribute.attalign. One way to
tackle this would be to allow to specify, for byval types only, at
column creation time whether a column uses a 'struct-mappable' alignment
or not. When set, set attalign to pg_type.typalign for alignment, when
not, to 'c'. By changing pg_dump in binary upgrade mode to emit the
necessary options, and by adding such options during bki processing,
we'd solve 1) and 2), but otherwise gain the benefits.

Alternatively we could declare such a propert on the table level, but
that seems more restrictive, without a corresponding upside.


It's possible that we should do something related with a few varlena
datatypes. We currently use intalign for types like text, json, and as
far as I can tell that does not make all that much sense. They're not
struct mappable *anyway* (and if they were, they'd need to be 8 byte
aligned on common platforms, __alignof__(void*) is 8). We'd have to take
a bit of care to treat the varlena header as unaligned - but we need to
do so anyway, because of 1byte varlenas. Short varlenas seems to make it
less crucial to pursue this, as the average datum that'd benefit is long
enough to make padding a non-issue. So I don't think it'd make sense to
tackle this project at the same time.


To fully benefit from the increased tuple deforming speed, it might be
beneficial to branch very early between two different versions within
slot_deform_heap_tuple, having determined whether there's any byval
columns with alignment requirements at slot creation /
ExecSetSlotDescriptor() time (or even set a different callback
getsomeattrs callback, but that's a bit more complicated).


Thoughts?


Independent of the above, I think it might make sense to replace
pg_attribute.attalign with a smallint or such. It's a bit absurd that we
need code like
#define att_align_nominal(cur_offset, attalign) \
( \
((attalign) == 'i') ? INTALIGN(cur_offset) : \
 (((attalign) == 'c') ? (uintptr_t) (cur_offset) : \
  (((attalign) == 'd') ? DOUBLEALIGN(cur_offset) : \
   ( \
AssertMacro((attalign) == 's'), \
SHORTALIGN(cur_offset) \
   ))) \
)

instead of just using TYPEALIGN(). There's no need to adapt CREATE TYPE,
or pg_type - we should just store the number in
pg_attribute.attalign. That keeps CREATE TYPE 32/64bit/arch independent,
doesn't require reconstructing c/s/i/d in pg_dump, simplifies the
generated code [1], and would also "just" work for what I described
earlier in this email.

Greetings,

Andres Freund


[1] E.g. as a function of (void *ptr, char attalign) this ends up with assembly
like
.globl  alignme
.type   alignme, @function
alignme:
.LFB210:
.cfi_startproc
movq%rdi, %rax
cmpb$105, %sil
je  .L496
cmpb$99, %sil
je  .L492
addq$7, %rax
leaq1(%rdi), %rdi
andq$-8, %rax
andq$-2, %rdi
cmpb$100, %sil
cmovne  %rdi, %rax
.L492:
ret
.p2align 4,,10
.p2align 3
.L496:
addq$3, %rax
andq$-4, %rax
ret
.cfi_endproc

using

Re: Proposal: Global Index

2019-10-31 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Wed, Oct 30, 2019 at 9:23 AM Tom Lane  wrote:
> > Well, the *effects* of the feature seem desirable, but that doesn't
> > mean that we want an implementation that actually has a shared index.
> > As soon as you do that, you've thrown away most of the benefits of
> > having a partitioned data structure in the first place.
> 
> Right, but that's only the case for the global index. Global indexes
> are useful when used judiciously. They enable the use of partitioning
> for use cases where not being able to enforce uniqueness across all
> partitions happens to be a deal breaker. I bet that this is fairly
> common.

Absolutely- our lack of such is a common point of issue when folks are
considering using or migrating to PostgreSQL.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Global Index

2019-10-31 Thread Isaac Morland
On Thu, 31 Oct 2019 at 14:50, Stephen Frost  wrote:

> Greetings,
>
> * Peter Geoghegan (p...@bowt.ie) wrote:
>
[]

>
> Absolutely- our lack of such is a common point of issue when folks are
> considering using or migrating to PostgreSQL.
>

Not sure how similar my situation really is, but I find myself wanting to
have indices that cross non-partition members of an inheritance hierarchy:

create table t (
id int,
primary key (id)
);

create table t1 (
a text
) inherits (t);

create table t2 (
b int,
c int
) inherits (t);

So "t"s are identified by an integer; and one kind of "t" has a single text
attribute while a different kind of "t" has 2 int attributes. The idea is
that there is a single primary key constraint on the whole hierarchy that
ensures only one record with a particular id can exist in all the tables
together. I can imagine wanting to do this with other unique constraints
also.

At present I don't actually use inheritance; instead I put triggers on the
child tables that do an insert on the parent table, which has the effect of
enforcing the uniqueness I want.


Re: pgbench - extend initialization phase control

2019-10-31 Thread Fabien COELHO



Hello Masao-san,

If large scale factor is specified, the query for generating 
pgbench_accounts data can take a very long time. While that query is 
running, operators may be likely to do Ctrl-C to cancel the data 
generation. In this case, IMO pgbench should cancel the query, i.e., 
call PQcancel(). Otherwise, the query will keep running to the end.


Hmmm. Why not. Now the infra to do that seems to already exists twice, once 
in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c".


I cannot say I'm thrilled to replicate this once more. I think that the 
reasonable option is to share this in fe-utils and then to reuse it from 
there. However, ISTM that such a restructuring patch which not belong to this 
feature. [...]


I just did a patch to share the code:

  
https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1910311939430.27369@lancre
  https://commitfest.postgresql.org/25/2336/

--
Fabien.




Re: Removing alignment padding for byval types

2019-10-31 Thread Tomas Vondra

On Thu, Oct 31, 2019 at 11:48:21AM -0700, Andres Freund wrote:

Hi,

We currently align byval types such as int4/8, float4/8, timestamp *,
date etc, even though we mostly don't need to. When tuples are deformed,
all byval types are copied out from the tuple data into the
corresponding Datum array, therefore the original alignment in the tuple
data doesn't matter.  This is different from byref types, where the
Datum formed will often be a pointer into the tuple data.

While there are some older systems where it could be a bit slower to
copy data out from unaligned positions into the datum array, this is
more than bought back by the next point:


The fact that these types are aligned has substantial costs:

For one, we often waste substantial amounts of space inside tables with
alignment padding. It's not uncommon to see about 30% or more of space
wasted (especially when taking alignment of the first column into
account).

For another, and this I think is less obvious, we actually waste
substantial amounts of CPU maintaining the alignment. This is primarily
the case because we have to perform to align the pointer to the next
field during tuple [de]forming. Those instructions [1] have to be
executed taking time, but what's worse, they also reduce the ability of
out-of-order execution to hide latencies. There's a hard dependency on
knowing the offset to the next column to be able to continue with the
next column.



Right. Reducing this overhead was one of the goals to allow "logical
ordering" of columns in a table (while arbitrarily reordering the
physical ones), but that patch got out of hand pretty quickly. Also,
it'd still keep some of the overhead, because it was not removing the
alignment/padding entirely.



There's two reasons why we can't just set the alignment for these types
to 'c'.
1) pg_upgrade, for fairly obvious reasons
2) We map catalog table rows to structs, in a *lot* of places.


It seems to me that, despite the above, it's still worth trying to
improve upon the current state, to benefit from reduced space and CPU
usage.

As it turns out we already separate out the alignment for a type, and a
column, between pg_type.typalign and pg_attribute.attalign. One way to
tackle this would be to allow to specify, for byval types only, at
column creation time whether a column uses a 'struct-mappable' alignment
or not. When set, set attalign to pg_type.typalign for alignment, when
not, to 'c'. By changing pg_dump in binary upgrade mode to emit the
necessary options, and by adding such options during bki processing,
we'd solve 1) and 2), but otherwise gain the benefits.

Alternatively we could declare such a propert on the table level, but
that seems more restrictive, without a corresponding upside.



I don't know, but it seems entirely sufficient specifying this at the
table level, no? What would be the use case for removing padding for
only some of the columns? I don't see the use case for that.



It's possible that we should do something related with a few varlena
datatypes. We currently use intalign for types like text, json, and as
far as I can tell that does not make all that much sense. They're not
struct mappable *anyway* (and if they were, they'd need to be 8 byte
aligned on common platforms, __alignof__(void*) is 8). We'd have to take
a bit of care to treat the varlena header as unaligned - but we need to
do so anyway, because of 1byte varlenas. Short varlenas seems to make it
less crucial to pursue this, as the average datum that'd benefit is long
enough to make padding a non-issue. So I don't think it'd make sense to
tackle this project at the same time.



Not sure, but how come it's not failing on the picky platforms, then? On
x86 it's probably OK because it's pretty permissive, but I'd expect some
platforms (s390, parisc, itanium, powerpc, ...) to be much pickier.



To fully benefit from the increased tuple deforming speed, it might be
beneficial to branch very early between two different versions within
slot_deform_heap_tuple, having determined whether there's any byval
columns with alignment requirements at slot creation /
ExecSetSlotDescriptor() time (or even set a different callback
getsomeattrs callback, but that's a bit more complicated).


Thoughts?



Seems reasonable. I certainly agree this padding is pretty annoying, so
if we can get rid of it without causing issues, that'd be nice. 



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: Global Index

2019-10-31 Thread Tomas Vondra

On Thu, Oct 31, 2019 at 03:02:40PM -0400, Isaac Morland wrote:

On Thu, 31 Oct 2019 at 14:50, Stephen Frost  wrote:


Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:


[]



Absolutely- our lack of such is a common point of issue when folks are
considering using or migrating to PostgreSQL.



Not sure how similar my situation really is, but I find myself wanting to
have indices that cross non-partition members of an inheritance hierarchy:

create table t (
   id int,
   primary key (id)
);

create table t1 (
   a text
) inherits (t);

create table t2 (
   b int,
   c int
) inherits (t);

So "t"s are identified by an integer; and one kind of "t" has a single text
attribute while a different kind of "t" has 2 int attributes. The idea is
that there is a single primary key constraint on the whole hierarchy that
ensures only one record with a particular id can exist in all the tables
together. I can imagine wanting to do this with other unique constraints
also.



IMO the chances of us supporting global indexes with generic inheritance
hierarchies are about zero. We don't even support creating "partition"
indexes on those hierarchies ...


At present I don't actually use inheritance; instead I put triggers on the
child tables that do an insert on the parent table, which has the effect of
enforcing the uniqueness I want.


Does it? Are you sure it actually works in READ COMMITTED? What exactly
does the trigger do?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Removing alignment padding for byval types

2019-10-31 Thread Andres Freund
Hi,

On 2019-10-31 20:15:12 +0100, Tomas Vondra wrote:
> On Thu, Oct 31, 2019 at 11:48:21AM -0700, Andres Freund wrote:
> > We currently align byval types such as int4/8, float4/8, timestamp *,
> > date etc, even though we mostly don't need to. When tuples are deformed,
> > all byval types are copied out from the tuple data into the
> > corresponding Datum array, therefore the original alignment in the tuple
> > data doesn't matter.  This is different from byref types, where the
> > Datum formed will often be a pointer into the tuple data.
> > 
> > While there are some older systems where it could be a bit slower to
> > copy data out from unaligned positions into the datum array, this is
> > more than bought back by the next point:
> > 
> > 
> > The fact that these types are aligned has substantial costs:
> > 
> > For one, we often waste substantial amounts of space inside tables with
> > alignment padding. It's not uncommon to see about 30% or more of space
> > wasted (especially when taking alignment of the first column into
> > account).
> > 
> > For another, and this I think is less obvious, we actually waste
> > substantial amounts of CPU maintaining the alignment. This is primarily
> > the case because we have to perform to align the pointer to the next
> > field during tuple [de]forming. Those instructions [1] have to be
> > executed taking time, but what's worse, they also reduce the ability of
> > out-of-order execution to hide latencies. There's a hard dependency on
> > knowing the offset to the next column to be able to continue with the
> > next column.
> > 
> 
> Right. Reducing this overhead was one of the goals to allow "logical
> ordering" of columns in a table (while arbitrarily reordering the
> physical ones), but that patch got out of hand pretty quickly. Also,
> it'd still keep some of the overhead, because it was not removing the
> alignment/padding entirely.

Yea. It'd keep just about all the CPU overhead, because we'd still need
to align as soon as there is a preceding nulled or varlena colum.

There's still some benefit for logical column order, as grouping NOT
NULL fixed-length columns at the start is beneficial. And it's also
beneficial to have frequently accessed columns at the start. But I think
this proposal gains a lot of the space related benefits, at a much lower
complexity, together with a lot of other benefits.


> > There's two reasons why we can't just set the alignment for these types
> > to 'c'.
> > 1) pg_upgrade, for fairly obvious reasons
> > 2) We map catalog table rows to structs, in a *lot* of places.
> > 
> > 
> > It seems to me that, despite the above, it's still worth trying to
> > improve upon the current state, to benefit from reduced space and CPU
> > usage.
> > 
> > As it turns out we already separate out the alignment for a type, and a
> > column, between pg_type.typalign and pg_attribute.attalign. One way to
> > tackle this would be to allow to specify, for byval types only, at
> > column creation time whether a column uses a 'struct-mappable' alignment
> > or not. When set, set attalign to pg_type.typalign for alignment, when
> > not, to 'c'. By changing pg_dump in binary upgrade mode to emit the
> > necessary options, and by adding such options during bki processing,
> > we'd solve 1) and 2), but otherwise gain the benefits.
> > 
> > Alternatively we could declare such a propert on the table level, but
> > that seems more restrictive, without a corresponding upside.

> I don't know, but it seems entirely sufficient specifying this at the
> table level, no? What would be the use case for removing padding for
> only some of the columns? I don't see the use case for that.

Well, if we had it on a per-table level, we'd also align
a) catalog table columns that follow the first varlena column - which we don't 
need
   to align, as they can't be accessed via mapping
b) columns in pg_upgraded tables that have been added after the upgrade


> > It's possible that we should do something related with a few varlena
> > datatypes. We currently use intalign for types like text, json, and as
> > far as I can tell that does not make all that much sense. They're not
> > struct mappable *anyway* (and if they were, they'd need to be 8 byte
> > aligned on common platforms, __alignof__(void*) is 8). We'd have to take
> > a bit of care to treat the varlena header as unaligned - but we need to
> > do so anyway, because of 1byte varlenas. Short varlenas seems to make it
> > less crucial to pursue this, as the average datum that'd benefit is long
> > enough to make padding a non-issue. So I don't think it'd make sense to
> > tackle this project at the same time.
> > 
> 
> Not sure, but how come it's not failing on the picky platforms, then? On
> x86 it's probably OK because it's pretty permissive, but I'd expect some
> platforms (s390, parisc, itanium, powerpc, ...) to be much pickier.

I'm not quite following? As I said, we already need to use alignment
aware code due to caring

Re: [PATCH] Implement INSERT SET syntax

2019-10-31 Thread Ibrar Ahmed
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Patch looks to me and works on my machine 
73025140885c889410b9bfc4a30a3866396fc5db  - HEAD I have not reviewed the 
documentaion changes.

The new status of this patch is: Ready for Committer


Re: Adding percentile metrics to pg_stat_statements module

2019-10-31 Thread Tomas Vondra

On Thu, Oct 31, 2019 at 12:51:17PM -0300, Igor Calabria wrote:

Hi everyone,

I was taking a look at pg_stat_statements module and noticed that it does
not collect any percentile metrics. I believe that It would be really handy
to have those available and I'd love to contribute with this feature.

The basic idea is to accumulate the the query execution times using an
approximation structure like q-digest or t-digest and add those results to
the pg_stat_statements table as fixed columns. Something like this

p90_time:
p95_time:
p99_time:
p70_time:
...

Another solution is to persist de digest structure in a binary column and
use a function to extract the desired quantile ilke this SELECT
approx_quantile(digest_times, 0.99) FROM pg_stat_statements



IMO having some sort of CDF approximation (being a q-digest or t-digest)
would be useful, although it'd probably need to be optional (mostly
becuase of memory consumption).

I don't see why we would not store the digests themselves. Storing just
some selected percentiles would be pretty problematic due to losing a
lot of information on restart. Also, pg_stat_statements is not a table
but a view on in-memory hash table.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Removing alignment padding for byval types

2019-10-31 Thread Tomas Vondra

On Thu, Oct 31, 2019 at 12:24:33PM -0700, Andres Freund wrote:

Hi,

On 2019-10-31 20:15:12 +0100, Tomas Vondra wrote:

On Thu, Oct 31, 2019 at 11:48:21AM -0700, Andres Freund wrote:
> We currently align byval types such as int4/8, float4/8, timestamp *,
> date etc, even though we mostly don't need to. When tuples are deformed,
> all byval types are copied out from the tuple data into the
> corresponding Datum array, therefore the original alignment in the tuple
> data doesn't matter.  This is different from byref types, where the
> Datum formed will often be a pointer into the tuple data.
>
> While there are some older systems where it could be a bit slower to
> copy data out from unaligned positions into the datum array, this is
> more than bought back by the next point:
>
>
> The fact that these types are aligned has substantial costs:
>
> For one, we often waste substantial amounts of space inside tables with
> alignment padding. It's not uncommon to see about 30% or more of space
> wasted (especially when taking alignment of the first column into
> account).
>
> For another, and this I think is less obvious, we actually waste
> substantial amounts of CPU maintaining the alignment. This is primarily
> the case because we have to perform to align the pointer to the next
> field during tuple [de]forming. Those instructions [1] have to be
> executed taking time, but what's worse, they also reduce the ability of
> out-of-order execution to hide latencies. There's a hard dependency on
> knowing the offset to the next column to be able to continue with the
> next column.
>

Right. Reducing this overhead was one of the goals to allow "logical
ordering" of columns in a table (while arbitrarily reordering the
physical ones), but that patch got out of hand pretty quickly. Also,
it'd still keep some of the overhead, because it was not removing the
alignment/padding entirely.


Yea. It'd keep just about all the CPU overhead, because we'd still need
to align as soon as there is a preceding nulled or varlena colum.

There's still some benefit for logical column order, as grouping NOT
NULL fixed-length columns at the start is beneficial. And it's also
beneficial to have frequently accessed columns at the start. But I think
this proposal gains a lot of the space related benefits, at a much lower
complexity, together with a lot of other benefits.



+1




> There's two reasons why we can't just set the alignment for these types
> to 'c'.
> 1) pg_upgrade, for fairly obvious reasons
> 2) We map catalog table rows to structs, in a *lot* of places.
>
>
> It seems to me that, despite the above, it's still worth trying to
> improve upon the current state, to benefit from reduced space and CPU
> usage.
>
> As it turns out we already separate out the alignment for a type, and a
> column, between pg_type.typalign and pg_attribute.attalign. One way to
> tackle this would be to allow to specify, for byval types only, at
> column creation time whether a column uses a 'struct-mappable' alignment
> or not. When set, set attalign to pg_type.typalign for alignment, when
> not, to 'c'. By changing pg_dump in binary upgrade mode to emit the
> necessary options, and by adding such options during bki processing,
> we'd solve 1) and 2), but otherwise gain the benefits.
>
> Alternatively we could declare such a propert on the table level, but
> that seems more restrictive, without a corresponding upside.



I don't know, but it seems entirely sufficient specifying this at the
table level, no? What would be the use case for removing padding for
only some of the columns? I don't see the use case for that.


Well, if we had it on a per-table level, we'd also align
a) catalog table columns that follow the first varlena column - which we don't 
need
  to align, as they can't be accessed via mapping
b) columns in pg_upgraded tables that have been added after the upgrade



Hmm, OK. I think the question is whether it's worth the extra
complexity. I'd say it's not, but perhaps I'm wrong.




> It's possible that we should do something related with a few varlena
> datatypes. We currently use intalign for types like text, json, and as
> far as I can tell that does not make all that much sense. They're not
> struct mappable *anyway* (and if they were, they'd need to be 8 byte
> aligned on common platforms, __alignof__(void*) is 8). We'd have to take
> a bit of care to treat the varlena header as unaligned - but we need to
> do so anyway, because of 1byte varlenas. Short varlenas seems to make it
> less crucial to pursue this, as the average datum that'd benefit is long
> enough to make padding a non-issue. So I don't think it'd make sense to
> tackle this project at the same time.
>

Not sure, but how come it's not failing on the picky platforms, then? On
x86 it's probably OK because it's pretty permissive, but I'd expect some
platforms (s390, parisc, itanium, powerpc, ...) to be much pickier.


I'm not quite following? As I said, we already need to

Re: fe-utils - share query cancellation code

2019-10-31 Thread Ibrar Ahmed
On Thu, Oct 31, 2019 at 11:43 PM Fabien COELHO  wrote:

>
> Hello Devs,
>
> This patch moves duplicated query cancellation code code from psql &
> scripts to fe-utils, so that it is shared and may be used by other
> commands.
>
> This is because Masao-san suggested to add a query cancellation feature to
> pgbench for long queries (server-side data generation being discussed, but
> possibly pk and fk could use that as well).
>
> --
> Fabien.


I give a quick look and I think we can

void
psql_setup_cancel_handler(void)
{
#ifndef WIN32
setup_cancel_handler(psql_sigint_callback);
#else
setup_cancel_handler();
#endif /* WIN32 */
}

to

void
psql_setup_cancel_handler(void)
{
setup_cancel_handler(psql_sigint_callback);
}

Because it does not matter for setup_cancel_handler what we passed
because it is ignoring that in case of windows.

Hmm, need to remove the assert in the function
"setup_cancel_handler"

-- Ibrar Ahmed


Re: Creating foreign key on partitioned table is too slow

2019-10-31 Thread David Rowley
On Thu, 31 Oct 2019 at 17:56, Tom Lane  wrote:
>
> David Rowley  writes:
> > In Ottawa this year, Andres and I briefly talked about the possibility
> > of making a series of changes to how equalfuncs.c works. The idea was
> > to make it easy by using some pre-processor magic to allow us to
> > create another version of equalfuncs which would let us have an equal
> > comparison function that returns -1 / 0 / +1 rather than just true or
> > false.  This would allow us to Binary Search Trees of objects. I
> > identified that EquivalenceClass.ec_members would be better written as
> > a BST to allow much faster lookups in get_eclass_for_sort_expr().
>
> This seems like a good idea, but why would we want to maintain two
> versions?  Just change equalfuncs.c into comparefuncs.c, full stop.
> equal() would be a trivial wrapper for (compare_objects(a,b) == 0).

If we can do that without slowing down the comparison, then sure.
Checking which node sorts earlier is a bit more expensive than just
checking for equality. But if that's not going to be noticeable in
real-world test cases, then I agree.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Allow superuser to grant passwordless connection rights on postgres_fdw

2019-10-31 Thread Andrew Dunstan

This patch allows the superuser to grant passwordless connection rights
in postgres_fdw user mappings.


The patch is authored by my colleague Craig Ringer, with slight bitrot
fixed by me.


One use case for this is with passphrase-protected client certificates,
a patch for which will follow shortly.


Here are Craig's remarks on the patch:

  
    postgres_fdw denies a non-superuser the ability to establish a
connection that
    doesn't have a password in the connection string, or one that fails
to actually
    use the password in authentication. This is to stop the unprivileged
user using
    OS-level authentication as the postgres server (peer, ident, trust).
It also
    stops unauthorized use of local credentials like .pgpass, a service
file,
    client certificate files, etc.
   
    Add the ability for a superuser to create user mappings that
override this
    behaviour by setting the passwordless_ok attribute to true in a user
mapping
    for a non-superuser. The non-superuser gains the ability to use the
FDW the
    mapping applies to even if there's no password in their mapping or
in the
    connection string.
   
    This is only safe if the superuser has established that the local
server is
    configured safely. It must be configured not to allow
    trust/peer/ident/sspi/gssapi auth to allow the OS user the postgres
server runs
    as to log in to postgres as a superuser. Client certificate keys can
be used
    too, if accessible. But the superuser can already GRANT superrole TO
    normalrole, so it's not any sort of new power.
   

cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 57ed5f4b90..4739ab0630 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -91,7 +92,7 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 	 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 	 PGresult **result);
-
+static bool UserMappingPasswordRequired(UserMapping *user);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -274,14 +275,16 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 		/*
 		 * Check that non-superuser has used password to establish connection;
 		 * otherwise, he's piggybacking on the postgres server's user
-		 * identity. See also dblink_security_check() in contrib/dblink.
+		 * identity. See also dblink_security_check() in contrib/dblink
+		 * and check_conn_params.
 		 */
-		if (!superuser_arg(user->userid) && !PQconnectionUsedPassword(conn))
+		if (!superuser_arg(user->userid) && UserMappingPasswordRequired(user) &&
+			!PQconnectionUsedPassword(conn))
 			ereport(ERROR,
 	(errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
 	 errmsg("password is required"),
 	 errdetail("Non-superuser cannot connect if the server does not request a password."),
-	 errhint("Target server's authentication method must be changed.")));
+	 errhint("Target server's authentication method must be changed or password_required=false set in the user mapping attributes.")));
 
 		/* Prepare new session for use */
 		configure_remote_session(conn);
@@ -314,12 +317,31 @@ disconnect_pg_server(ConnCacheEntry *entry)
 	}
 }
 
+/*
+ * Return true if the password_required is defined and false for this user
+ * mapping, otherwise false. The mapping has been pre-validated.
+ */
+static bool
+UserMappingPasswordRequired(UserMapping *user)
+{
+	ListCell   *cell;
+
+	foreach(cell, user->options)
+	{
+		DefElem*def = (DefElem *) lfirst(cell);
+		if (strcmp(def->defname, "password_required") == 0)
+			return defGetBoolean(def);
+	}
+
+	return true;
+}
+
 /*
  * For non-superusers, insist that the connstr specify a password.  This
- * prevents a password from being picked up from .pgpass, a service file,
- * the environment, etc.  We don't want the postgres user's passwords
- * to be accessible to non-superusers.  (See also dblink_connstr_check in
- * contrib/dblink.)
+ * prevents a password from being picked up from .pgpass, a service file, the
+ * environment, etc.  We don't want the postgres user's passwords,
+ * certificates, etc to be accessible to non-superusers.  (See also
+ * dblink_connstr_check in contrib/dblink.)
  */
 static void
 check_conn_params(const char **keywords, const char **values, UserMapping *user)
@@ -337,6 +359,10 @@ check_conn_params(const char **keywords, const char **values, UserMapping *user)
 			return;
 	}
 
+	/* ok if the superuser explicitly said so at user mapping creation time */
+	if (!UserMa

Re: Postgres cache

2019-10-31 Thread Tomas Vondra

On Thu, Oct 31, 2019 at 03:19:23PM +0530, Natarajan R wrote:

Hi,

I want to know how postgres stores catalog relations in cache in-depth. Is
there any documentation for that?


Not sure what exactly you mean by "cache" - whether shared buffers (as a
shared general database cache) or syscache/catcache, i.e. the special
cache of catalog records each backend maintains privately.

I'm not aware of exhaustive developer docs explaining these parts, but
for shared buffers you might want to look at

   src/backend/storage/buffer/README

while for catcache/syscache you probably need to look at the code and
comments in the related files, particularly in 


   src/backend/utils/cache/syscache.c
   src/backend/utils/cache/relcache.c

Not sure if there's a better source of information :-(

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"

2019-10-31 Thread Ibrar Ahmed
On Thu, Oct 31, 2019 at 6:56 PM Tom Lane  wrote:

> Fujii Masao  writes:
> > ... I found that the command tag of
> > ALTER MATERIALIZED VIEW RENAME COLUMN is "ALTER TABLE", not "ALTER VIEW".
>
> > =# ALTER MATERIALIZED VIEW hoge RENAME COLUMN j TO x;
> > ALTER TABLE
>
> > Is this intentional? Or bug?
>
> Seems like an oversight.
>
> regards, tom lane
>
>
>
The same issue is with ALTER FOREIGN TABLE

# ALTER FOREIGN TABLE ft RENAME COLUMN a to t;

ALTER TABLE


# ALTER MATERIALIZED VIEW mv RENAME COLUMN a to r;

ALTER TABLE



Attached patch fixes that for ALTER VIEW , ALTER MATERIALIZED VIEW and
ALTER FOREIGN TABLE


# ALTER MATERIALIZED VIEW mv RENAME COLUMN a to r;

ALTER MATERIALIZED VIEW


# ALTER FOREIGN TABLE ft RENAME COLUMN a to t;

ALTER FOREIGN TABLE


-- 
Ibrar Ahmed


001_alter_tag_ibrar_v1.patch
Description: Binary data


libpq sslpassword parameter and callback function

2019-10-31 Thread Andrew Dunstan


This patch provides for an sslpassword parameter for libpq, and a hook
that a client can fill in for a callback function to set the password.


This provides similar facilities to those already available in the JDBC
driver.


There is also a function to fetch the sslpassword from the connection
parameters, in the same way that other settings can be fetched.


This is mostly the excellent work of my colleague Craig Ringer, with a
few embellishments from me.


Here are his notes:


    Allow libpq to non-interactively decrypt client certificates that
are stored
    encrypted by adding a new "sslpassword" connection option.
   
    The sslpassword option offers a middle ground between a cleartext
key and
    setting up advanced key mangement via openssl engines, PKCS#11, USB
crypto
    offload and key escrow, etc.
   
    Previously use of encrypted client certificate keys only worked if
the user
    could enter the key's password interactively on stdin, in response
to openssl's
    default prompt callback:
   
    Enter PEM passhprase:
   
    That's infesible in many situations, especially things like use from
    postgres_fdw.
   
    This change also allows admins to prevent libpq from ever prompting
for a
    password by calling:
   
    PQsetSSLKeyPassHook(PQdefaultSSLKeyPassHook);
   
    which is useful since OpenSSL likes to open /dev/tty to prompt for a
password,
    so even closing stdin won't stop it blocking if there's no user
input available.
    Applications may also override or extend SSL password fetching with
their own
    callback.
   
    There is deliberately no environment variable equivalent for the
sslpassword
    option.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: libpq sslpassword parameter and callback function

2019-10-31 Thread Andrew Dunstan

This time with attachment.


On 10/31/19 6:33 PM, Andrew Dunstan wrote:
> This patch provides for an sslpassword parameter for libpq, and a hook
> that a client can fill in for a callback function to set the password.
>
>
> This provides similar facilities to those already available in the JDBC
> driver.
>
>
> There is also a function to fetch the sslpassword from the connection
> parameters, in the same way that other settings can be fetched.
>
>
> This is mostly the excellent work of my colleague Craig Ringer, with a
> few embellishments from me.
>
>
> Here are his notes:
>
>
>     Allow libpq to non-interactively decrypt client certificates that
> are stored
>     encrypted by adding a new "sslpassword" connection option.
>    
>     The sslpassword option offers a middle ground between a cleartext
> key and
>     setting up advanced key mangement via openssl engines, PKCS#11, USB
> crypto
>     offload and key escrow, etc.
>    
>     Previously use of encrypted client certificate keys only worked if
> the user
>     could enter the key's password interactively on stdin, in response
> to openssl's
>     default prompt callback:
>    
>     Enter PEM passhprase:
>    
>     That's infesible in many situations, especially things like use from
>     postgres_fdw.
>    
>     This change also allows admins to prevent libpq from ever prompting
> for a
>     password by calling:
>    
>     PQsetSSLKeyPassHook(PQdefaultSSLKeyPassHook);
>    
>     which is useful since OpenSSL likes to open /dev/tty to prompt for a
> password,
>     so even closing stdin won't stop it blocking if there's no user
> input available.
>     Applications may also override or extend SSL password fetching with
> their own
>     callback.
>    
>     There is deliberately no environment variable equivalent for the
> sslpassword
>     option.
>
>
> cheers
>
>
> andrew
>
>
-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index 6ceabb453c..6516d4f131 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -879,7 +879,7 @@ $d$;
 CREATE USER MAPPING FOR public SERVER fdtest
   OPTIONS (server 'localhost');  -- fail, can't specify server here
 ERROR:  invalid option "server"
-HINT:  Valid options in this context are: user, password
+HINT:  Valid options in this context are: user, password, sslpassword
 CREATE USER MAPPING FOR public SERVER fdtest OPTIONS (user :'USER');
 GRANT USAGE ON FOREIGN SERVER fdtest TO regress_dblink_user;
 GRANT EXECUTE ON FUNCTION dblink_connect_u(text, text) TO regress_dblink_user;
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index c58527b0c3..a606680182 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -776,6 +776,72 @@ PGPing PQping(const char *conninfo);
  
 
 
+
+ PQsetSSLKeyPassHookPQsetSSLKeyPassHook
+ 
+  
+   PQsetSSLKeyPassHook lets an application override
+   libpq's default
+   handling of encrypted client certificate key files using
+or interactive prompting.
+
+
+void PQsetSSLKeyPassHook(PQsslKeyPassHook_type hook);
+
+
+   The application passes a pointer to a callback function with signature:
+   
+   int callback_fn(char *buf, int size, PGconn *conn);
+   
+   which libpq will then call instead of
+   its default PQdefaultSSLKeyPassHook handler. The callback
+   should determine the password for the key and copy it to result-buffer
+   buf of size size. The string in 
+   buf must be null-terminated. The calback must return the length of
+   the password stored in buf excluding the null terminator.
+   On failure, the callback should set buf[0] = '\0' and return 0.
+   See PQdefaultSSLKeyPassHook in libpq's
+   source code for an example.
+  
+   
+  
+   If the user specified an explicit key location,
+   its path will be in conn->pgsslkey when the callback
+   is invoked. This will be empty if the default key path is being used.
+   For keys that are engine specifiers, it is up to engine implementations
+   whether they use the OpenSSL password callback or define their own handling.
+  
+
+  
+   The app callback may choose to delegate unhandled cases to
+   PQdefaultSSLKeyPassHook,
+   or call it first and try something else if it returns 0, or completely override it.
+  
+
+  
+   The callback must not escape normal flow control with exceptions,
+   longjmp(...), etc. It must return normally.
+  
+
+ 
+
+
+
+ PQgetSSLKeyPassHookPQgetSSLKeyPassHook
+ 
+  
+   PQgetSSLKeyPassHook returns the current
+   client certificate key password hook, or NULL
+   if none has been set.
+
+
+PQsslKeyPassHook_type PQgetSSLKeyPassHook(void);
+
+  
+
+ 

Re: merging HashJoin and Hash nodes

2019-10-31 Thread Tomas Vondra

On Tue, Oct 29, 2019 at 02:00:00PM +1300, Thomas Munro wrote:

On Tue, Oct 29, 2019 at 12:15 PM Andres Freund  wrote:

I've groused about this a few times, but to me it seems wrong that
HashJoin and Hash are separate nodes. They're so tightly bound together
that keeping them separate just doesn't architecturally makes sense,
imo. So I wrote a prototype.

Evidence of being tightly bound together:
- functions in nodeHash.h that take a HashJoinState etc
- how many functions in nodeHash.h and nodeHashjoin.h are purely exposed
  so the other side can call them
- there's basically no meaningful separation of concerns between code in
  nodeHash.c and nodeHashjoin.c
- the Hash node doesn't really exist during most of the planning, it's
  kind of faked up in create_hashjoin_plan().
- HashJoin knows that the inner node is always going to be a Hash node.
- HashJoinState and HashState both have pointers to HashJoinTable, etc

Besides violating some aesthetical concerns, I think it also causes
practical issues:

- code being in different translation code prevents the compiler from
  inlining etc. There's a lot of HOT calls going between both. For each
  new outer tuple we e.g. call, from nodeHashjoin.c separately into
  nodeHash.c for ExecHashGetHashValue(), ExecHashGetBucketAndBatch(),
  ExecHashGetSkewBucket(), ExecScanHashBucket(). They each have to
  do memory loads from HashJoinState/HashJoinTable, even though previous
  code *just* has done so.


I wonder how much we can gain by this. I don't expect any definitive
figures from a patch at this stage, but maybe you have some guesses? 


- a separate executor node, and all the ancillary data (slots,
  expression context, target lists etc) is far from free
- instead of just applying an "empty outer" style optimization to both
  sides of the HashJoin, we have to choose. Once unified it's fairly
  easy to just use it on both.
- generally, a lot of improvements are harder to develop because of the
  odd separation.


I agree with all of that.


Does anybody have good arguments for keeping them separate? The only
real one I can see is that it's not a small change, and will make
bugfixes etc a bit harder.  Personally I think that's outweighed by the
disadvantages.


Yeah, the ~260KB of churn you came up with is probably the reason I
didn't even think of suggesting something along these lines while
working on PHJ, though it did occur to me that the division was
entirely artificial as I carefully smashed more holes in both
directions through that wall.

Trying to think of a reason to keep Hash, I remembered Kohei KaiGai's
speculation about Hash nodes that are shared by different Hash Join
nodes (in the context of a partition-wise join where each partition is
joined against one table).  But even if we were to try to do that, a
Hash node isn't necessary to share the hash table, so that's not an
argument.


Attached is a quick prototype that unifies them. It's not actually that hard,
I think? Obviously this is far from ready, but I thought it'd be a good
basis to get a discussion started?


I haven't looked at the patch yet but this yet but it sounds like a
good start from the description.


Comments on the prototype:

- I've hacked EXPLAIN to still show the Hash node, to reduce the size of
  the diffs. I'm doubtful that that's the right approach (and I'm sure
  it's not the right approach to do so with the code I injected) - I
  think the Hash node in the explain doesn't really help users, and just
  makes the explain bigger (except for making it clearer which side is
  hashed)


Yeah, I'm not sure why you'd want to show a Hash node to users if
there is no way to use it in any other context than a Hash Join.

FWIW, Oracle, DB2 and SQL Server don't show an intermediate Hash node
in their plans, and generally you just have to know which way around
the input relations are shown (from a quick a glance at some examples
found on the web, Oracle and SQL Server show hash relation above probe
relation, while DB2, PostgreSQL and MySQL show probe relation above
hash relation).  Curiously, MySQL 8 just added hash joins, and they do
show a Hash node (at least in FORMAT=TREE mode, which looks a bit like
our EXPLAIN).



Not sure. Maybe we don't need to show an explicit Hash node, because
that might seem to imply there's a separate executor step. And that
would be misleading.

But IMO we should make it obvious which side of the join is hashed,
instead of relyin on users to "know which way around the relations are
shown". The explain is often used by users who're learning stuff, or
maybe investigating it for the first time, and we should not make it
unnecessarily hard to understand.

I don't think we have any other "explain node" that would not represent
an actual executor node, so not sure what's the right approach. So maybe
that's not the right way to do that ...

OTOH we have tools visualizing execution plans, so maybe backwards
compatibility of the output is a concern too (I know we don

A wiki page to track hash join projects and ideas

2019-10-31 Thread Thomas Munro
Hello hackers,

Please feel free to edit this new page, which I'd like to use to keep
track of observations, ideas and threads relating to hash joins.

https://wiki.postgresql.org/wiki/Hash_Join




Re: libpq sslpassword parameter and callback function

2019-10-31 Thread Andrew Dunstan


On 10/31/19 6:34 PM, Andrew Dunstan wrote:
> This time with attachment.
>
>
> On 10/31/19 6:33 PM, Andrew Dunstan wrote:
>> This patch provides for an sslpassword parameter for libpq, and a hook
>> that a client can fill in for a callback function to set the password.
>>
>>
>> This provides similar facilities to those already available in the JDBC
>> driver.
>>
>>
>> There is also a function to fetch the sslpassword from the connection
>> parameters, in the same way that other settings can be fetched.
>>
>>
>> This is mostly the excellent work of my colleague Craig Ringer, with a
>> few embellishments from me.
>>
>>
>> Here are his notes:
>>
>>
>>     Allow libpq to non-interactively decrypt client certificates that
>> are stored
>>     encrypted by adding a new "sslpassword" connection option.
>>    
>>     The sslpassword option offers a middle ground between a cleartext
>> key and
>>     setting up advanced key mangement via openssl engines, PKCS#11, USB
>> crypto
>>     offload and key escrow, etc.
>>    
>>     Previously use of encrypted client certificate keys only worked if
>> the user
>>     could enter the key's password interactively on stdin, in response
>> to openssl's
>>     default prompt callback:
>>    
>>     Enter PEM passhprase:
>>    
>>     That's infesible in many situations, especially things like use from
>>     postgres_fdw.
>>    
>>     This change also allows admins to prevent libpq from ever prompting
>> for a
>>     password by calling:
>>    
>>     PQsetSSLKeyPassHook(PQdefaultSSLKeyPassHook);
>>    
>>     which is useful since OpenSSL likes to open /dev/tty to prompt for a
>> password,
>>     so even closing stdin won't stop it blocking if there's no user
>> input available.
>>     Applications may also override or extend SSL password fetching with
>> their own
>>     callback.
>>    
>>     There is deliberately no environment variable equivalent for the
>> sslpassword
>>     option.
>>
>>

I should also mention that this patch provides for support for DER
format certificates and keys.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: merging HashJoin and Hash nodes

2019-10-31 Thread Andres Freund
Hi,

On 2019-10-31 23:59:19 +0100, Tomas Vondra wrote:
> On Tue, Oct 29, 2019 at 02:00:00PM +1300, Thomas Munro wrote:
> > On Tue, Oct 29, 2019 at 12:15 PM Andres Freund  wrote:
> > > I've groused about this a few times, but to me it seems wrong that
> > > HashJoin and Hash are separate nodes. They're so tightly bound together
> > > that keeping them separate just doesn't architecturally makes sense,
> > > imo. So I wrote a prototype.
> > > 
> > > Evidence of being tightly bound together:
> > > - functions in nodeHash.h that take a HashJoinState etc
> > > - how many functions in nodeHash.h and nodeHashjoin.h are purely exposed
> > >   so the other side can call them
> > > - there's basically no meaningful separation of concerns between code in
> > >   nodeHash.c and nodeHashjoin.c
> > > - the Hash node doesn't really exist during most of the planning, it's
> > >   kind of faked up in create_hashjoin_plan().
> > > - HashJoin knows that the inner node is always going to be a Hash node.
> > > - HashJoinState and HashState both have pointers to HashJoinTable, etc
> > > 
> > > Besides violating some aesthetical concerns, I think it also causes
> > > practical issues:
> > > 
> > > - code being in different translation code prevents the compiler from
> > >   inlining etc. There's a lot of HOT calls going between both. For each
> > >   new outer tuple we e.g. call, from nodeHashjoin.c separately into
> > >   nodeHash.c for ExecHashGetHashValue(), ExecHashGetBucketAndBatch(),
> > >   ExecHashGetSkewBucket(), ExecScanHashBucket(). They each have to
> > >   do memory loads from HashJoinState/HashJoinTable, even though previous
> > >   code *just* has done so.
> 
> I wonder how much we can gain by this. I don't expect any definitive
> figures from a patch at this stage, but maybe you have some guesses?

It's measureable, but not a world-changing difference. Some of the gains
are limited by the compiler not realizing that it does not have to
reload values across some external function calls. I saw somewhere
around ~3% for a case that was bottlenecked by HJ lookups (not
build).

I think part of the effect size is also limited by other unnecessary
inefficiencies being a larger bottleneck. E.g.

1) the fact that ExecScanHashBucket() contains branches that have
   roughly 50% likelihoods, making them unpredictable ( 1. on a
   successfull lookup we oscillate between the first hashTuple != NULL
   test succeeding and failing except in case of bucket conflict; 2. the
   while (hashTuple != NULL) oscillates similarly, because it tests for
   I. initial lookup succeeding, II. further tuple in previous bucket
   lookup III. further tuples in case of hashvalue mismatch.  Quite
   observable by profiling for br_misp_retired.conditional.
2) The fact that there's *two* indirections for a successfull lookup
   that are very likely to be cache misses. First we need to look up the
   relevant bucket, second we need to actually fetch hashvalue from the
   pointer stored in the bucket.


But even independent of these larger inefficiencies, I suspect we could
also gain more from inlining by changing nodeHashjoin a bit. E.g. moving
the HJ_SCAN_BUCKET code into an always_inline function, and also
referencing it from the tail end of the HJ_NEED_NEW_OUTER code, instead
of falling through, would allow to optimize away a number of loads (and
I think also stores), and improve branch predictor
efficiency. E.g. optimizing away store/load combinations for
node->hj_CurHashValue, node->hj_CurBucketNo, node->hj_CurSkewBucketNo;
loads of hj_HashTable, ...; stores of node->hj_JoinState,
node->hj_MatchedOuter.  And probably make the code easier to read, to
boot.


> But IMO we should make it obvious which side of the join is hashed,
> instead of relyin on users to "know which way around the relations are
> shown". The explain is often used by users who're learning stuff, or
> maybe investigating it for the first time, and we should not make it
> unnecessarily hard to understand.

I agree. I wonder if just outputting something like 'Hashed Side:
second' (or "right", or ...) could work. Not perfect, but I don't really
have a better idea.

We somewhat rely on users understanding inner/outer for explain output
(I doubt this is good, to be clear), e.g. "Inner Unique: true ". Made
worse by the fact that "inner"/"outer" is also used to describe
different kinds of joins, with a mostly independent meaning.



> OTOH we have tools visualizing execution plans, so maybe backwards
> compatibility of the output is a concern too (I know we don't promise
> anything, though).

Well, execution *would* work a bit differently, so I don't feel too bad
about tools having to adapt to that. E.g. graphical explain tool really
shouldn't display a separate Hash nodes anymore.


> > The fact that EXPLAIN doesn't label relations seems to be a separate
> > concern that applies equally to nestloop joins, and could perhaps be
> > addressed with some more optional verbosity, not a fake

Improve checking for pg_index.xmin

2019-10-31 Thread Alexander Korotkov
Hi!

Our customer faced with issue, when index is invisible after creation.
The reproducible case is following.

$ psql db2
# begin;
# select txid_current();
$ psql db1
# select i as id, 0 as v into t from generate_series(1, 10) i;
# create unique index idx on t (id);
# update t set v = v + 1 where id = 1;
# update t set v = v + 1 where id = 1;
# update t set v = v + 1 where id = 1;
# update t set v = v + 1 where id = 1;
# update t set v = v + 1 where id = 1;
# drop index idx;
# create unique index idx on t (id);
# explain analyze select v from t where id = 1;

There is no issue if there is no parallel session in database db2.
The fact that index visibility depends on open transaction in
different database is ridiculous for users.

This happens so, because we're checking that there is no broken HOT
chains after index creation by comparison pg_index.xmin and
TransactionXmin.   So, we check that pg_index.xmin is in the past for
current transaction in lossy way by comparison just xmins.  Attached
patch changes this check to XidInMVCCSnapshot().

With patch the issue is gone.  My doubt about this patch is that it
changes check with TransactionXmin to check with GetActiveSnapshot(),
which might be more recent.  However, query shouldn't be executer with
older snapshot than one it was planned with.

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-improve-check-for-pg_index-xmin.patch
Description: Binary data


Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2019-10-31 Thread Andrew Dunstan

This patch achieves  $SUBJECT and also provides some testing of the
sslpassword setting.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 298f652afc..c2661320bc 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -174,6 +174,18 @@ WARNING:  extension "bar" is not installed
 ALTER SERVER testserver1 OPTIONS (DROP extensions);
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (DROP user, DROP password);
+-- Attempt to add a valid option that's not allowed in a user mapping
+ALTER USER MAPPING FOR public SERVER testserver1
+	OPTIONS (ADD sslmode 'require');
+ERROR:  invalid option "sslmode"
+HINT:  Valid options in this context are: user, password, sslpassword, password_required, sslcert, sslkey
+-- But we can add valid ones fine
+ALTER USER MAPPING FOR public SERVER testserver1
+	OPTIONS (ADD sslpassword 'dummy');
+-- Ensure valid options we haven't used in a user mapping yet are
+-- permitted to check validation.
+ALTER USER MAPPING FOR public SERVER testserver1
+	OPTIONS (ADD sslkey 'value', ADD sslcert 'value');
 ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 07fc7fa00a..0f35dfe54f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -196,6 +196,16 @@ InitPgFdwOptions(void)
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
+		/*
+		 * Extra room for the user mapping copies of sslcert and sslkey. These
+		 * are really libpq options but we repeat them here to allow them to
+		 * appear in both foreign server context (when we generate libpq
+		 * options) and user mapping context (from here). Bit of a hack
+		 * putting this in "non_libpq_options".
+		 */
+		{"sslcert", UserMappingRelationId, true},
+		{"sslkey", UserMappingRelationId, true},
+
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 5f50c65566..bfdd5dd4d2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -188,6 +188,19 @@ ALTER SERVER testserver1 OPTIONS (DROP extensions);
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (DROP user, DROP password);
 
+-- Attempt to add a valid option that's not allowed in a user mapping
+ALTER USER MAPPING FOR public SERVER testserver1
+	OPTIONS (ADD sslmode 'require');
+
+-- But we can add valid ones fine
+ALTER USER MAPPING FOR public SERVER testserver1
+	OPTIONS (ADD sslpassword 'dummy');
+
+-- Ensure valid options we haven't used in a user mapping yet are
+-- permitted to check validation.
+ALTER USER MAPPING FOR public SERVER testserver1
+	OPTIONS (ADD sslkey 'value', ADD sslcert 'value');
+
 ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bdd6ff8ae4..ef203d225c 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -107,13 +107,13 @@
 A foreign server using the postgres_fdw foreign data wrapper
 can have the same options that libpq accepts in
 connection strings, as described in ,
-except that these options are not allowed:
+except that these options are not allowed or have special handling:
 
 
  
   
user, password and sslpassword (specify these
-   in a user mapping, instead)
+   in a user mapping, instead, or use a service file)
   
  
  
@@ -128,6 +128,14 @@
postgres_fdw)
   
  
+ 
+  
+   sslkey and sslpassword - these may
+   appear in either or both a connection and a user
+   mapping. If both are present, the user mapping setting overrides the
+   connection setting.
+  
+ 
 

 


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-10-31 Thread Moon, Insung
Hello.

On Thu, Oct 31, 2019 at 11:25 PM Masahiko Sawada  wrote:
>
> On Fri, Sep 6, 2019 at 3:34 PM Smith, Peter  
> wrote:
> >
> > -Original Message-
> > From: Masahiko Sawada  Sent: Thursday, 15 August 
> > 2019 7:10 PM
> >
> > > BTW I've created PoC patch for cluster encryption feature. Attached patch 
> > > set has done some items of TODO list and some of them can be used even 
> > > for finer granularity encryption. Anyway, the implemented components are 
> > > followings:
> >
> > Hello Sawada-san,
> >
> > I guess your original patch code may be getting a bit out-dated by the 
> > ongoing TDE discussions, but I have done some code review of it anyway.
> >
> > Hopefully a few comments below can still be of use going forward:
> >
> > ---
> >
> > REVIEW COMMENTS
> >
> > * src/backend/storage/encryption/enc_cipher.c – For functions 
> > EncryptionCipherValue/String maybe should log warnings for unexpected 
> > values instead of silently assigning to default 0/”off”.
> >
> > * src/backend/storage/encryption/enc_cipher.c – For function 
> > EncryptionCipherString, purpose of returning ”unknown” if unclear because 
> > that will map back to “off” again anyway via EncryptionCipherValue. Why not 
> > just return "off" (with warning logged).
> >
> > * src/include/storage/enc_common.h – Typo in comment: "Encrypton".
> >
> > * src/include/storage/encryption.h - The macro DataEncryptionEnabled may be 
> > better to be using enum TDE_ENCRYPTION_OFF instead of magic number 0
> >
> > * src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will 
> > report error if USE_OPENSSL is not defined. The check seems premature 
> > because it would fail even if the user is not using encryption. Shouldn't 
> > the lack of openssl be OK when user is not using TDE at all (i.e. when 
> > encryption is "none")?
> >
> > * src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest 
> > better to check if (bootstrap_data_encryption_cipher == TDE_ENCRYPTION_OFF) 
> > using enum instead of the magic number 0.
> >
> > * src/backend/storage/encryption/kmgr.c - The function 
> > run_cluster_passphrase_command function seems mostly a clone of an existing 
> > run_ssl_passphrase_command function. Is it possible to refactor to share 
> > the common code?
> >
> > * src/backend/storage/encryption/kmgr.c - The function 
> > derive_encryption_key declares a char key_len. Why char? It seems int 
> > everywhere else.
> >
> > * src/backend/bootstrap/bootstrap.c - Suggest better if variable 
> > declaration bootstrap_data_encryption_cipher = 0 uses enum 
> > TDE_ENCRYPTION_OFF instead of magic number 0
> >
> > * src/backend/utils/misc/guc.c - It looks like the default value for GUC 
> > variable data_encryption_cipher is AES128. Wouldn't "off" be the more 
> > appropriate default value? Otherwise it seems inconsistent with the logic 
> > of initdb (which insists that the -e option is mandatory if you wanted any 
> > encryption).
> >
> > * src/backend/utils/misc/guc.c - There is a missing entry in the 
> > config_group_names[]. The patch changed the config_group[] in guc_tables.h, 
> > so I think there needs to be a matching item in the config_group_names.
> >
> > * src/bin/initdb/initdb.c - The function check_encryption_cipher would 
> > disallow an encryption value of "none". Although maybe it is not very 
> > useful to say -e none, it does seem inconsistent to reject it, given that 
> > "none" was a valid value for the GUC variable data_encryption_cipher.
> >
> > * contrib/bloom/blinsert.c - In function btbuildempty the arguments for 
> > PageEncryptionInPlace seem in the wrong order (forknum should be 2nd).
> >
> > * src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the 
> > arguments for PageEncryptionInPlace seem in the wrong order (forknum should 
> > be 2nd).
> >
> > * src/backend/access/spgist/spginsert.c - In function spgbuildempty the 
> > arguments for PageEncryptionInPlace seem in the wrong order (forknum should 
> > be 2nd). This error looks repeated 3X.
> >
> > * in multiple files - The encryption enums have equivalent strings ("off", 
> > "aes-128", "aes-256") but they are scattered as string literals in many 
> > places (e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it 
> > would be better to declare those as string constants in one place only.em
> >
>
> Thank you for reviewing this patch.
>
> I've updated TDE patches. These patches implements key system, buffer
> encryption and WAL encryption. Please refer to ToDo of cluster-wide
> encryption for more details of design and components. It lacks
> temporary file encryption and front end tools encryption. For
> temporary file encryption, we are discussing which files should be
> encrypted on other thread and I thought that temporary file encryption
> might be related to that. So I'm currently studying the temporary
> encryption patch that Antonin already submitted[1] but some changes
> might be needed based on that discu

Re: [BUG] Partition creation fails after dropping a column and adding a partial index

2019-10-31 Thread Amit Langote
On Thu, Oct 31, 2019 at 1:45 PM Michael Paquier  wrote:
> On Tue, Oct 29, 2019 at 01:16:58PM +0900, Michael Paquier wrote:
> > Yes, something looks wrong with that.  I have not looked at it in
> > details yet though.  I'll see about that tomorrow.
>
> So..  When building the attribute map for a cloned index (with either
> LIKE during the transformation or for partition indexes), we store
> each attribute number with 0 used for dropped columns.  Unfortunately,
> if you look at the way the attribute map is built in this case the
> code correctly generates the mapping with convert_tuples_by_name_map.
> But, the length of the mapping used is incorrect as this makes use of
> the number of attributes for the newly-created child relation, and not
> the parent which includes the dropped column in its count.  So the
> answer is simply to use the parent as reference for the mapping
> length.
>
> The patch is rather simple as per the attached, with extended
> regression tests included.  I have not checked on back-branches yet,
> but that's visibly wrong since 8b08f7d down to v11 (will do that when
> back-patching).

The patch looks correct and applies to both v12 and v11.

> There could be a point in changing convert_tuples_by_name_map & co so
> as they return the length of the map on top of the map to avoid such
> mistakes in the future.  That's a more invasive patch not really
> adapted for a back-patch, but we could do that on HEAD once this bug
> is fixed.  I have also checked other calls of this API and the
> handling is done correctly.

I've been bitten by this logical error when deciding what length to
use for the map, so seems like a good idea.

Thanks,
Amit




Re: Restore replication settings when modifying a field type

2019-10-31 Thread Quan Zongliang

On 2019/10/28 12:39, Kyotaro Horiguchi wrote:

Hello.

# The patch no longer applies on the current master. Needs a rebasing.

At Sat, 26 Oct 2019 16:50:48 +0800, Quan Zongliang  
wrote in

In fact, the replication property of the table has not been modified,
and it is still 'i'(REPLICA_IDENTITY_INDEX). But the previously
specified index property 'indisreplident' is set to false because of
the rebuild.


I suppose that the behavior is intended. Change of column types on the
publisher side can break the agreement on replica identity with
subscribers. Thus replica identity setting cannot be restored
unconditionally. For (somewhat artifitial :p) example:

P=# create table t (c1 integer, c2 text unique not null);
P=# alter table t replica identity using index t_c2_key;
P=# create publication p1 for table t;
P=# insert into t values (0, '00'), (1, '01');
S=# create table t (c1 integer, c2 text unique not null);
S=# alter table t replica identity using index t_c2_key;
S=# create subscription s1 connection '...' publication p1;

Your patch allows change of the type of c2 into integer.

P=# alter table t alter column c2 type integer using c2::integer;
P=# update t set c1 = c1 + 1 where c2 = '01';

This change doesn't affect perhaps as expected.

S=# select * from t;
  c1 | c2
+
   0 | 00
   1 | 01
(2 rows)



So I developed a patch. If the user modifies the field type. The
associated index is REPLICA IDENTITY. Rebuild and restore replication
settings.


Explicit setting of replica identity premises that they are sure that
the setting works correctly. Implicit rebuilding after a type change
can silently break it.

At least we need to guarantee that the restored replica identity
setting is truly compatible with all existing subscribers. I'm not
sure about potential subscribers..

Anyway I think it is a problem that replica identity setting is
dropped silently. Perhaps a message something like "REPLICA IDENTITY
setting is lost, please redefine after confirmation of compatibility
with subscribers." is needed.

In fact, the scene we encountered is like this. The field of a user's 
table is of type "smallint", and it turns out that this range is not 
sufficient. So change it to "int". At this point, the REPLICA IDENTITY 
is lost and the user does not realize it. When they found out, the 
logical replication for this period of time did not output normally. 
Users have to find other ways to get the data back.
The logical replication of this user is to issue standard SQL statements 
to other relational databases using the plugin developed by himself. And 
they have thousands of tables to replicate.
So I think this patch is appropriate in this scenario. As for the 
matching problem between publishers and subscribers, I'm afraid it's 
hard to solve here. If this is not a suitable modification, I can 
withdraw it. And see if there's a better way.


If necessary, I'll modify it again. Rebase to the master branch.


regards.







abs function for interval

2019-10-31 Thread Euler Taveira
Hi,

Sometimes you want to answer if a difference between two timestamps is
lesser than x minutes but you are not sure which timestamp is greater
than the other one (to obtain a positive result -- it is not always
possible). However, if you cannot obtain the absolute value of
subtraction, you have to add two conditions.

The attached patch implements abs function and @ operator for
intervals. The following example illustrates the use case:

postgres=# create table xpto (a timestamp, b timestamp);
CREATE TABLE
postgres=# insert into xpto (a, b) values(now(), now() - interval '1
day'),(now() - interval '5 hour', now()),(now() + '3 hour', now());
INSERT 0 3
postgres=# select *, a - b as t from xpto;
 a  | b  | t
++---
 2019-10-31 22:43:30.601861 | 2019-10-30 22:43:30.601861 | 1 day
 2019-10-31 17:43:30.601861 | 2019-10-31 22:43:30.601861 | -05:00:00
 2019-11-01 01:43:30.601861 | 2019-10-31 22:43:30.601861 | 03:00:00
(3 rows)

postgres=# select *, a - b as i from xpto where abs(a - b) < interval '12 hour';
 a  | b  | i
++---
 2019-10-31 17:43:30.601861 | 2019-10-31 22:43:30.601861 | -05:00:00
 2019-11-01 01:43:30.601861 | 2019-10-31 22:43:30.601861 | 03:00:00
(2 rows)

postgres=# select @ interval '1 years -2 months 3 days 4 hours -5
minutes 6.789 seconds' as t;
  t
-
 10 mons 3 days 03:55:06.789
(1 row)


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From b11a05e3304250803c7aa2ac811e0d49b0adfc00 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 31 Oct 2019 23:07:00 -0300
Subject: [PATCH] Add abs function for interval.

Sometimes you want to answer if a difference between two timestamps is
lesser than x minutes. However, if you cannot obtain the absolute value
of subtraction, you have to add two conditions. Let's make it simple and
add abs function and @ operator for intervals.
---
 doc/src/sgml/func.sgml | 19 +++
 src/backend/utils/adt/timestamp.c  | 17 +
 src/include/catalog/pg_operator.dat|  3 +++
 src/include/catalog/pg_proc.dat|  6 ++
 src/test/regress/expected/interval.out |  8 
 src/test/regress/sql/interval.sql  |  4 
 6 files changed, 57 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 28eb322f3f..9882742aba 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -7370,6 +7370,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 interval '1 hour' / double precision '1.5'
 interval '00:40:00'

+
+   
+ @ 
+@ interval '-2 hour'
+interval '02:00:00'
+   
   
  
 
@@ -7391,6 +7397,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');

 
  
+  abs
+ 
+ abs(interval)
+
+interval
+Absolute value
+abs(interval '6 days -08:16:27')
+6 days 08:16:27
+   
+
+   
+
+ 
   age
  
  age(timestamp, timestamp)
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 1dc4c820de..a6b8b8c221 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2435,6 +2435,23 @@ interval_cmp(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(interval_cmp_internal(interval1, interval2));
 }
 
+Datum
+interval_abs(PG_FUNCTION_ARGS)
+{
+	Interval   *interval = PG_GETARG_INTERVAL_P(0);
+	Interval   *result;
+
+	result = palloc(sizeof(Interval));
+	*result = *interval;
+
+	/* convert all struct Interval members to absolute values */
+	result->month = (interval->month < 0) ? (-1 * interval->month) : interval->month;
+	result->day = (interval->day < 0) ? (-1 * interval->day) : interval->day;
+	result->time = (interval->time < 0) ? (-1 * interval->time) : interval->time;
+
+	PG_RETURN_INTERVAL_P(result);
+}
+
 /*
  * Hashing for intervals
  *
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index fa7dc96ece..09ce9f2765 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -2164,6 +2164,9 @@
 { oid => '1803', descr => 'subtract',
   oprname => '-', oprleft => 'timetz', oprright => 'interval',
   oprresult => 'timetz', oprcode => 'timetz_mi_interval' },
+{ oid => '8302', descr => 'absolute value',
+  oprname => '@', oprkind => 'l', oprleft => '0', oprright => 'interval',
+  oprresult => 'interval', oprcode => 'interval_abs' },
 
 { oid => '1804', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprleft => 'varbit', oprright => 'varbit',
diff --git a/src/include/c

Re: abs function for interval

2019-10-31 Thread Andres Freund
Hi,

On 2019-10-31 23:20:07 -0300, Euler Taveira wrote:
> diff --git a/src/backend/utils/adt/timestamp.c 
> b/src/backend/utils/adt/timestamp.c
> index 1dc4c820de..a6b8b8c221 100644
> --- a/src/backend/utils/adt/timestamp.c
> +++ b/src/backend/utils/adt/timestamp.c
> @@ -2435,6 +2435,23 @@ interval_cmp(PG_FUNCTION_ARGS)
>   PG_RETURN_INT32(interval_cmp_internal(interval1, interval2));
>  }
>
> +Datum
> +interval_abs(PG_FUNCTION_ARGS)
> +{
> + Interval   *interval = PG_GETARG_INTERVAL_P(0);
> + Interval   *result;
> +
> + result = palloc(sizeof(Interval));
> + *result = *interval;
> +
> + /* convert all struct Interval members to absolute values */
> + result->month = (interval->month < 0) ? (-1 * interval->month) : 
> interval->month;
> + result->day = (interval->day < 0) ? (-1 * interval->day) : 
> interval->day;
> + result->time = (interval->time < 0) ? (-1 * interval->time) : 
> interval->time;
> +
> + PG_RETURN_INTERVAL_P(result);
> +}
> +

Several points:

1) I don't think you can do the < 0 check on an elementwise basis. Your
   code would e.g. make a hash out of abs('1 day -1 second'), by
   inverting the second, but not the day (whereas nothing should be
   done).

   It'd probably be easiest to implement this by comparing with a 0
   interval using inteval_lt() or interval_cmp_internal().

2) This will not correctly handle overflows, I believe. What happens if you
   do SELECT abs('-2147483648 days'::interval)? You probably should
   reuse interval_um() for this.


> --- a/src/test/regress/expected/interval.out
> +++ b/src/test/regress/expected/interval.out
> @@ -927,3 +927,11 @@ select make_interval(secs := 7e12);
>   @ 19 hours 26 mins 40 secs
>  (1 row)
>
> +-- test absolute operator
> +set IntervalStyle to postgres;
> +select @ interval '1 years -2 months 3 days 4 hours -5 minutes 6.789 
> seconds' as t;
> +  t
> +-
> + 10 mons 3 days 03:55:06.789
> +(1 row)
> +
> diff --git a/src/test/regress/sql/interval.sql 
> b/src/test/regress/sql/interval.sql
> index bc5537d1b9..8f9a2bda29 100644
> --- a/src/test/regress/sql/interval.sql
> +++ b/src/test/regress/sql/interval.sql


> @@ -308,3 +308,7 @@ select make_interval(months := 'NaN'::float::int);
>  select make_interval(secs := 'inf');
>  select make_interval(secs := 'NaN');
>  select make_interval(secs := 7e12);
> +
> +-- test absolute operator
> +set IntervalStyle to postgres;
> +select @ interval '1 years -2 months 3 days 4 hours -5 minutes 6.789 
> seconds' as t;
> --
> 2.11.0

This is not even remotely close to enough tests. In your only test abs()
does not change the value, as there's no negative component (the 1 year
-2 month result in a positive 10 months, and the hours, minutes and
seconds get folded together too).

At the very least a few boundary conditions need to be tested (see b)
above), a few more complicated cases with different components being
of different signs, and you need to show the values with and without
applying abs().

Greetings,

Andres Freund




Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"

2019-10-31 Thread Fujii Masao
On Fri, Nov 1, 2019 at 6:34 AM Ibrar Ahmed  wrote:
>
>
>
> On Thu, Oct 31, 2019 at 6:56 PM Tom Lane  wrote:
>>
>> Fujii Masao  writes:
>> > ... I found that the command tag of
>> > ALTER MATERIALIZED VIEW RENAME COLUMN is "ALTER TABLE", not "ALTER VIEW".
>>
>> > =# ALTER MATERIALIZED VIEW hoge RENAME COLUMN j TO x;
>> > ALTER TABLE
>>
>> > Is this intentional? Or bug?
>>
>> Seems like an oversight.

Thanks for the check!

> The same issue is with ALTER FOREIGN TABLE

Yes.

> Attached patch fixes that for ALTER VIEW , ALTER MATERIALIZED VIEW and ALTER 
> FOREIGN TABLE

You introduced subtype in your patch, but I think it's better and simpler
to just give relationType to AlterObjectTypeCommandTag()
if renaming the columns (i.e., renameType = OBJECT_COLUMN).

To avoid this kind of oversight about command tag, I'd like to add regression
tests to make sure that SQL returns valid and correct command tag.
But currently there seems no mechanism for such test, in regression
test. Right??
Maybe we will need that mechanism.

Regards,

-- 
Fujii Masao


cmdtag_of_alter_mv_v1.patch
Description: Binary data


Re: Restore replication settings when modifying a field type

2019-10-31 Thread Euler Taveira
Em seg, 28 de out de 2019 às 01:41, Kyotaro Horiguchi
 escreveu:
>
> At Sat, 26 Oct 2019 16:50:48 +0800, Quan Zongliang  
> wrote in
> > In fact, the replication property of the table has not been modified,
> > and it is still 'i'(REPLICA_IDENTITY_INDEX). But the previously
> > specified index property 'indisreplident' is set to false because of
> > the rebuild.
>
> I suppose that the behavior is intended. Change of column types on the
> publisher side can break the agreement on replica identity with
> subscribers. Thus replica identity setting cannot be restored
> unconditionally. For (somewhat artifitial :p) example:
>
I don't think so. The actual logical replication behavior is that DDL
will always break replication. If you add a new column or drop a
column, you will stop replication for that table while you don't
execute the same DDL in the subscriber. What happens in the OP case is
that a DDL is *silently* breaking the logical replication. IMHO it is
a bug. If the behavior was intended it should clean
pg_class.relreplident but it is not.

> Explicit setting of replica identity premises that they are sure that
> the setting works correctly. Implicit rebuilding after a type change
> can silently break it.
>
The current behavior forces the OP to execute 2 DDLs in the same
transaction to ensure that it won't "loose" transactions for logical
replication.

> At least we need to guarantee that the restored replica identity
> setting is truly compatible with all existing subscribers. I'm not
> sure about potential subscribers..
>
Why? Replication will break and to fix it you should apply the same
DDL you apply in publisher. It is the right thing to do.

[poking the code...]

ATExecAlterColumnType records everything that depends on the column
and for indexes it saves the definition (via pg_get_indexdef_string).
Definition is not sufficient for reconstructing the replica identity
information because there is not such keyword for replica identity in
CREATE INDEX. The new index should call relation_mark_replica_identity
to fix pg_index.indisreplident.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: Allow CREATE OR REPLACE VIEW to rename the columns

2019-10-31 Thread Fujii Masao
On Thu, Oct 31, 2019 at 10:54 PM Tom Lane  wrote:
>
> Fujii Masao  writes:
> > On Thu, Oct 31, 2019 at 1:42 PM Tom Lane  wrote:
> >> Fujii Masao  writes:
> >>> Currently CREATE OR REPLACE VIEW command fails if the column names
> >>> are changed.
>
> >> That is, I believe, intentional.  It's an effective aid to catching
> >> mistakes in view redefinitions, such as misaligning the new set of
> >> columns relative to the old.  [example]
>
> > This example makes me wonder if the addtion of column by
> > CREATE OR REPLACE VIEW also has the same (or even worse) issue.
> > That is, it may increase the oppotunity for users' mistake.
>
> The idea in CREATE OR REPLACE VIEW is to allow addition of new
> columns at the end, the same as you can do with tables.  Checking
> the column name matchups is a way to ensure that you actually do
> add at the end, rather than insert, which wouldn't act as you
> expect.  Admittedly it's only heuristic.
>
> We could, perhaps, have insisted that adding a column also requires
> special syntax, but we didn't.  Consider for example a case where
> the new column needs to come from an additionally joined table;
> then you have to be able to edit the underlying view definition along
> with adding the column.  So that seems like kind of a pain in the
> neck to insist on.
>
> > But what's the worse is that, currently there is no way to
> > drop the column from the view, except recreation of the view.
>
> I think this has been discussed, as well.  It's not necessarily
> simple to drop a view output column.  For example, if the view
> uses SELECT DISTINCT, removing an output column would have
> semantic effects on the set of rows that can be returned, since
> distinct-ness would now mean something else than it did before.
>
> It's conceivable that we could enumerate all such effects and
> then allow DROP COLUMN (probably replacing the output column
> with a null constant) if none of them apply, but I can't get
> terribly excited about it.  The field demand for such a feature
> has not been high.  I'd be a bit worried about bugs arising
> from failures to check attisdropped for views, too; so that
> the cost of getting this working might be greater than it seems
> on the surface.

Thanks for the explanation! Understood.

Regards,

-- 
Fujii Masao




Re: abs function for interval

2019-10-31 Thread Euler Taveira
Em qui, 31 de out de 2019 às 23:45, Andres Freund  escreveu:
>
> 1) I don't think you can do the < 0 check on an elementwise basis. Your
>code would e.g. make a hash out of abs('1 day -1 second'), by
>inverting the second, but not the day (whereas nothing should be
>done).
>
>It'd probably be easiest to implement this by comparing with a 0
>interval using inteval_lt() or interval_cmp_internal().
>
Hmm. Good idea. Let me try it.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: Problem with synchronous replication

2019-10-31 Thread lingce.ldm
On Oct 31, 2019, at 10:11, Michael Paquier  wrote:
> 
> On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:
>> At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao  
>> wrote in 
>>> This change causes every ending backends to always take the exclusive lock
>>> even when it's not in SyncRep queue. This may be problematic, for example,
>>> when terminating multiple backends at the same time? If yes,
>>> it might be better to check SHMQueueIsDetached() again after taking the 
>>> lock.
>>> That is,
>> 
>> I'm not sure how much that harms but double-checked locking
>> (releasing) is simple enough for reducing possible congestion here, I
>> think.
> 
> FWIW, I could not measure any actual difference with pgbench -C, up to
> 500 sessions and an empty input file (just have one meta-command) and
> -c 20.
> 
> I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
> the patch with the suggestion from Fujii-san.  Any comments?

Thanks for the patch. Looks good to me +1.

Regards,

—
Dongming Liu



smime.p7s
Description: S/MIME cryptographic signature


Re: progress report for ANALYZE

2019-10-31 Thread Tatsuro Yamada

Hi vignesh!


On 2019/09/17 20:51, vignesh C wrote:

On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera  wrote:


There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.


+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.

Like in below, there is single space after ".":
  Time when this process' current transaction was started, or null
   if no transaction is active. If the current
   query is the first of its transaction, this column is equal to the
   query_start column.
  



Sorry for the late reply.

Probably, it is intentional because there are many extra space
in other documents. See below:

# Results of grep
=
$ grep '\.  ' doc/src/sgml/monitoring.sgml | wc -l
114
$ grep '\.  ' doc/src/sgml/information_schema.sgml | wc -l
184
$ grep '\.  ' doc/src/sgml/func.sgml | wc -l
577
=

Therefore, I'm going to leave it as is. :)


Thanks,
Tatsuro Yamada






On disable_cost

2019-10-31 Thread Zhenghua Lyu
Hi,

Postgres has a global variable `disable_cost`. It is set the value
1.0e10.

This value will be added to the cost of path if related GUC is set off.
For example,
if enable_nestloop is set off, when planner trys to add nestloop join
path, it continues
to add such path but with a huge cost `disable_cost`.

But 1.0e10 may not be large enough. I encounter this issue in
Greenplum(based on postgres).
Heikki tolds me that someone also encountered the same issue on Postgres.
So I send it here to
have a discussion.

My issue: I did some spikes and tests on TPCDS 1TB Bytes data. For
query 104, it generates
 nestloop join even with enable_nestloop set off. And the final plan's
total cost is very huge (about 1e24). But If I enlarge the disable_cost to
1e30, then, planner will generate hash join.

So I guess that disable_cost is not large enough for huge amount of
data.

It is tricky to set disable_cost a huge number. Can we come up with
better solution?

The following thoughts are from Heikki:

> Aside from not having a large enough disable cost, there's also the
> fact that the high cost might affect the rest of the plan, if we have to
> use a plan type that's disabled. For example, if a table doesn't have any
> indexes, but enable_seqscan is off, we might put the unavoidable Seq Scan
> on different side of a join than we we would with enable_seqscan=on,
> because of the high cost estimate.



> I think a more robust way to disable forbidden plan types would be to
> handle the disabling in add_path(). Instead of having a high disable cost
> on the Path itself, the comparison add_path() would always consider
> disabled paths as more expensive than others, regardless of the cost.


  Any thoughts or ideas on the problem? Thanks!

Best Regards,
Zhenghua Lyu


Re: On disable_cost

2019-10-31 Thread Thomas Munro
On Fri, Nov 1, 2019 at 7:42 PM Zhenghua Lyu  wrote:
> It is tricky to set disable_cost a huge number. Can we come up with 
> better solution?

What happens if you use DBL_MAX?