Re: Fix some error handling for read() and errno

2018-06-25 Thread Michael Paquier
On Sat, Jun 23, 2018 at 08:48:03AM +0900, Michael Paquier wrote:
> That's exactly why I have started this thread so as both problems are
> addressed separately:
> https://www.postgresql.org/message-id/20180622061535.gd5...@paquier.xyz
> And back-patching the errno patch while only bothering about messages on
> HEAD matches also what I got in mind.  I'll come back to this thread
> once the errno issues are all addressed.

As this one is done, I have been looking at that this thread again.
Peter Eisentraut has pushed as e5d11b9 something which does not need to
worry about pluralization of error messages.  So I have moved to this
message style for all messages.  All of this is done as 0001.

I have been thinking as well about a common interface which could be
used to read/write/fsync transient files:
void WriteTransientFile(int fd, char *buf, Size count, int elevel,
const char *filename, uint32 wait_event_info);
bool ReadTransientFile(int fd, char *buf, Size count, int elevel,
const char *filename, uint32 wait_event_info);
void SyncTransientFile(int fd, int elevel, const char *filename
uint32 wait_event_info);

There are rather equivalent things with FileRead and FileWrite but the
purpose is different as well.

If you look at 0002, this shaves a bit of code:
6 files changed, 128 insertions(+), 200 deletions(-)

There are also a couple of advantages here:
- Centralize errno handling for transient files with ENOSPC for write(2)
and read count for read(2)
- Wait events have to be defined, so those would unlikely get forgotten
in the future.
- Error handling for CloseTransientFile in code paths is centralized.

ReadTransientFile could be redefined to return the number of bytes read
as result with caller checking for errno, but that feels a bit duplicate
work for twophase.c.  WriteTransientFile and SyncTransientFile could
also have the same treatment for consistency but they would not really
be used now.  Do you guys think that this is worth pursuing?  Merging
0001 and 0002 together may make sense then.
--
Michael
From 041068b821dd555e437f77a99e95795eceff3189 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 25 Jun 2018 14:37:18 +0900
Subject: [PATCH 1/2] Rework error messages around file handling

Some error messages related to file handling are using the code path
context to define their state.  For example, 2PC-related errors are
referring to "two-phase status files", or "relation mapping file" is
used for catalog-to-filenode mapping, however those prove to be
difficult to translate, and are not more helpful than just referring to
the path of the file being manipulated.  So simplify all those error
messages by just referring to files with their path used.  In some
cases, like the manipulation of WAL segments, the context is helpful so
those are kept.

Calls to the system function read() have also been rather inconsistent
with their error handling sometimes not reporting the number of bytes
read, and some other code paths trying to use an errno which has not
been set.  The in-core functions are using a more consistent pattern
with this patch, which checks for both errno if set or if an
inconsistent read is happening.

So as to care about pluralization when reading an unexpected number of
bytes, "could not read: read %d of %d" is used as error message.

Author: Michael Paquier
Discussion: https://postgr.es/m/20180622234803.ga1...@paquier.xyz
---
 src/backend/access/transam/twophase.c   | 40 ++--
 src/backend/access/transam/xlog.c   | 40 
 src/backend/replication/logical/origin.c| 13 +++-
 src/backend/replication/logical/snapbuild.c | 68 +++--
 src/backend/replication/slot.c  | 26 +---
 src/backend/replication/walsender.c | 16 -
 src/backend/utils/cache/relmapper.c | 28 ++---
 src/bin/pg_basebackup/pg_receivewal.c   | 12 +++-
 src/bin/pg_rewind/file_ops.c| 14 -
 src/bin/pg_rewind/parsexlog.c   | 14 -
 src/bin/pg_waldump/pg_waldump.c | 17 --
 11 files changed, 200 insertions(+), 88 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index a9ef1b3d73..10c1e31c0f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1219,6 +1219,7 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	uint32		crc_offset;
 	pg_crc32c	calc_crc,
 file_crc;
+	int			r;
 
 	TwoPhaseFilePath(path, xid);
 
@@ -1228,8 +1229,7 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 		if (give_warnings)
 			ereport(WARNING,
 	(errcode_for_file_access(),
-	 errmsg("could not open two-phase state file \"%s\": %m",
-			path)));
+	 errmsg("could not open file \"%s\": %m", path)));
 		return NULL;
 	}
 
@@ -1249,8 +1249,7 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 			errno = save_errno;
 			ereport(WARNING,
 	(errcode_for_file_access(),

Re: [bug fix] ECPG: freeing memory for pgtypes crashes on Windows

2018-06-25 Thread Kyotaro HORIGUCHI
Hello. Thank you for commiting this, Thomas.

At Mon, 18 Jun 2018 07:25:13 +, "Tsunakawa, Takayuki" 
 wrote in 
<0A3221C70F24FB45833433255569204D1FA1BCD9@G01JPEXMBYT05>
> > > I want some remedy for older releases.  Our customer worked around this
> > problem by getting a libpq connection in their ECPG application and calling
> > PQfreemem().  That's an ugly kludge, and I don't want other users to follow
> > it.
> > >
> > > I don't see a problem with back-patching as-is, because existing users
> > who just call free() or don't call free() won't be affected.  I think that
> > most serious applications somehow state their supported minor releases like
> > "this application supports (or is certified against) PostgreSQL 10.5 or
> > later", just like other applications support "RHEL 6.2 or later" or "Windows
> > XP Sp2 or later."
> > 
> > If there is a consensus that we should do that then I'll back-patch,
> > but so far no one else has spoken up in support.
> 
> I'll follow the community decision.  But I'm afraid that not
> enough people will comment on this to call it a consensus,
> because this topic will not be interesting...

ECPG x Windows makes very small cardinarity even if it is not
zero. Anyway the vast majority of deverloper here are only
working on Unix like OSes. So only two or three persons can make
consensus on this issue:p

> FWIW, I thought back-patching would make committers' future
> burdon smaller thanks to the smaller difference in the code of
> multiple major versions.

However I also don't see a problem to back-patch it, I don't see
a problem on such difference between versions.

I vote for back-patching this up to 9.5 (quite arbitrary..) but
it is fine for me if the documentation of 9.6 and earlier mention
a restriction like "For Windows environment, the application may
crash when it is using free() to the return values from
PGTYPES*_to_ascii functions. Make sure to use the same version of
CRT libray with the ECPG compiler in the case."

.. Is there any means to find the library version on the
installed environment?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: [bug fix] ECPG: freeing memory for pgtypes crashes on Windows

2018-06-25 Thread Tsunakawa, Takayuki
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> However I also don't see a problem to back-patch it, I don't see
> a problem on such difference between versions.

Thank you, Horiguchi-san and Robert.


> .. Is there any means to find the library version on the
> installed environment?

You can manually see the library version on the [Details] tab in the properties 
dialog with Windows Explorer.  I don't know how to get the version in a program.


Regards
Takayuki Tsunakawa






Re: [bug fix] ECPG: freeing memory for pgtypes crashes on Windows

2018-06-25 Thread Kyotaro HORIGUCHI
At Mon, 25 Jun 2018 08:16:10 +, "Tsunakawa, Takayuki" 
 wrote in 
<0A3221C70F24FB45833433255569204D1FA241F9@G01JPEXMBYT05>
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> > However I also don't see a problem to back-patch it, I don't see
> > a problem on such difference between versions.
> 
> Thank you, Horiguchi-san and Robert.
> 
> 
> > .. Is there any means to find the library version on the
> > installed environment?
> 
> You can manually see the library version on the [Details] tab in the 
> properties dialog with Windows Explorer.  I don't know how to get the version 
> in a program.

I meant by the "version" not the version-number but the
identification of /MT /MD of CRT libraries. Such description
(mentioned in my previous mail) makes sense if the reader has any
means to see what "version" of the CRT library the target ECPG
library (and/or compiler) uses.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Incorrect fsync handling in pg_basebackup's tar_finish

2018-06-25 Thread Magnus Hagander
On Mon, Jun 25, 2018 at 4:43 AM, Michael Paquier 
wrote:

> Hi all,
>
> I was just looking at the code of pg_basebackup, and noticed that we
> don't actually check if the two last empty blocks of any tar file
> produced are correctly fsync'd or not:
> @@ -957,7 +957,10 @@ tar_finish(void)
>
>  /* sync the empty blocks as well, since they're after the last file */
>  if (tar_data->sync)
> -   fsync(tar_data->fd);
> +   {
> +   if (fsync(tar_data->fd) != 0)
> +   return false;
> +   }
>
> That looks incorrect to me, hence shouldn't something like the attached
> be done?  Magnus and others, any opinions?
>

Yup, that seems like an issue and a correct fix to me.
-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pg_verify_checksums review

2018-06-25 Thread sixela
Hello,

There is a similar utility that I wrote that does offline checksum
verification as well.

https://github.com/google/pg_page_verification

This utility includes a verbose option as well as scanning multiple
subsequent segment files.

Alexis



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: PATCH: backtraces for error messages

2018-06-25 Thread Craig Ringer
On 25 June 2018 at 14:21, Kyotaro HORIGUCHI  wrote:

> Hi.
>
> At Mon, 25 Jun 2018 09:32:36 +0800, Craig Ringer 
> wrote in  gmail.com>
> > On 21 June 2018 at 19:09, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp
> > > wrote:
> >
> > I think this for assertion failure is no problem but I'm not sure
> > > for other cases.
> >
> >
> > I think it's pretty strongly desirable for PANIC.
>
> Ah, I forgot about that. I agree to that. The cost to collect the
> information is not a problem on PANIC. However still I don't
> think stack trace is valuable on all PANIC messages. I can accept
> the guc to control it but it is preferable that this works fine
> without such an extra setup.
>

Places such as?


>
> > > We could set proper context description or other
> > > additional information in error messages before just dumping a
> > > trace for known cases.
> > >
> >
> > Yeah. The trouble there is that there are a _lot_ of places to touch for
> > such things, and inevitably the one you really want to see will be
> > something that didn't get suitably annotated.
>
> Agreed, it is the reality.  Instaed, can't we make a new error
> classes PANIC_STACKDUMP and ERROR_STACKDUMP to explicitly
> restrict stack dump for elog()?  Or elog_stackdump() and elog()
> is also fine for me. Using it is easier than proper
> annotating. It would be perfect if we could invent an automated
> way but I don't think it is realistic.
>

That needlessly complicates error severity levels with information not
really related to the severity. -1 from me.


> Mmm. If I understand you correctly, I mean that perf doesn't dump
> a backtrace on a probe point but trace points are usable to take
> a symbolic backtrace. (I'm sorry that I cannot provide an example
> since stap doesn't work in my box..)
>

perf record --call-graph dwarf -e sdt_postgresql:checkpoint__start -u
postgres
perf report -g


> If your intention is to take back traces without any setting (I
> think it is preferable), it should be restricted to the required
> points. It can be achieved by the additional error classes or
> substitute error output functions.
>

Who's classifying all the possible points?

Which PANICs or assertion failures do you want to exempt?

I definitely do not want to emit stacks for everything, like my patch
currently does. It's just a proof of concept. Later on I'll want control on
a fine grained level at runtime of when that happens, but that's out of
scope for this. For now the goal is emit stacks at times it's obviously
pretty sensible to have a stack, and do it in a way that doesn't require
per-error-site maintenance/changes or create backport hassle.


> As just an idea but can't we use an definition file on that
> LOCATION of error messages that needs to dump a backtrace are
> listed? That list is usually empty and should be very short if
> any. The LOCATION information is easily obtained from a verbose
> error message itself if once shown but a bit hard to find
> otherwise..
>

That's again veering into selective logging control territory. Rather than
doing it for stack dump control only, it should be part of a broader
control over dynamic and scoped verbosity, selective logging, and log
options, like Pavan raised. I see stacks as just one knob that can be
turned on/off here.

> (That reminds me, I need to chat with Devrim about creating a longer lived
> > debuginfo + old versions rpms repo for Pg its self, if not the accessory
> > bits and pieces. I'm so constantly frustrated by not being able to get
> > needed debuginfo packages to investigate some core or running system
> > problem because they've been purged from the PGDG yum repo as soon as a
> new
> > version comes out.)
>
> We in our department take care to preserve them for ourselves for
> the necessity of supporting older systems. I sometimes feel that
> It is very helpful if they were available on the official


Maybe if I can get some interest in that, you might be willing to
contribute your archives as a starter so we have them for back-versions?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: libpq compression

2018-06-25 Thread Konstantin Knizhnik



On 18.06.2018 23:34, Robbie Harwood wrote:


###

Documentation!  You're going to need it.  There needs to be enough
around for other people to implement the protocol (or if you prefer,
enough for us to debug the protocol as it exists).

In conjunction with that, please add information on how to set up
compressed vs. uncompressed connections - similarly to how we've
documentation on setting up TLS connection (though presumably compressed
connection documentation will be shorter).



Document protocol changes needed for libpq compression.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/configure b/configure
index 0aafd9c..fc5685c 100755
--- a/configure
+++ b/configure
@@ -699,6 +699,7 @@ ELF_SYS
 EGREP
 GREP
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 with_libxml
@@ -863,6 +864,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 enable_float4_byval
@@ -8017,6 +8019,86 @@ fi
 
 
 #
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
+
+#
 # Elf
 #
 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 498b8df..40c3187 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1115,6 +1115,17 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  compression
+  
+  
+Request compression of libpq traffic. If server is supporting compression, then all libpq messages send both from client to server and
+visa versa will be compressed. Right now compression algorithm is hardcoded: is it is either zlib (default), either zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+  
+ 
+
  
   client_encoding
   
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index cfc805f..c01a51d 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -92,6 +92,15 @@
such as COPY.
   
 
+  
+Is is possible to compress protocol data to reduce traffic and speed-up client-server interaction.
+Compression is especialy useful for importing/exprorting data to/from database using COPY command
+and for replication (oth physical and logical). Also compression can reduce server response time
+in case of queries, requestion larger amount of data (for example returning JSON, BLOBs, text,...)
+Right now compression algorithm is hardcoded: is it is either zlib (default), either zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+
  
   Messaging Overview
 
@@ -263,6 +272,18 @@
  
 
  
+  CompressionOk
+  
+   
+ Server acknowledge using compression for client-server communication protocol.
+ Compression can be requested by client by including "compression" option in connection string.
+ Right now compression algorithm is hardcoded, but in future client and server may negotiate to
+ choose proper compression algorithm.
+   
+  
+ 
+
+ 
   AuthenticationOk
   

@@ -3410,6 +3431,33 @@ AuthenticationSASLFinal (B)
 
 
 
+
+
+CompressionOk (B)
+
+
+
+
+
+
+
+Byte1('z')
+
+
+
+  Acknowledge use of compression for protocol data. After receiving this message bother server 

Re: Concurrency bug in UPDATE of partition-key

2018-06-25 Thread Amit Khandekar
On 23 June 2018 at 15:46, Amit Kapila  wrote:
> On Fri, Jun 22, 2018 at 10:25 PM, Amit Khandekar  
> wrote:
>> On 20 June 2018 at 18:54, Amit Kapila  wrote:
>>> On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar  
>>> wrote:
>>>
>>> 2.
>>> ExecBRDeleteTriggers(..)
>>> {
>>> ..
>>> + /* If requested, pass back the concurrently updated tuple, if any. */
>>> + if (epqslot != NULL)
>>> + *epqslot = newSlot;
>>> +
>>>   if (trigtuple == NULL)
>>>   return false;
>>> +
>>> + /*
>>> + * If the tuple was concurrently updated and the caller of this
>>> + * function requested for the updated tuple, skip the trigger
>>> + * execution.
>>> + */
>>> + if (newSlot != NULL && epqslot != NULL)
>>> + return false;
>>> ..
>>> }
>>>
>>> Can't we merge the first change into second, something like:
>>>
>>> if (newSlot != NULL && epqslot != NULL)
>>> {
>>> *epqslot = newSlot;
>>> return false;
>>> }
>>
>> We want to update *epqslot with whatever the value of newSlot is. So
>> if newSlot is NULL, epqslot should be NULL. So we can't do that in the
>> "if (newSlot != NULL && epqslot != NULL)" condition.
>>
>
> Why do you need to update when newslot is NULL?  Already *epqslot is
> initialized as NULL in the caller (ExecUpdate). I think we only want
> to update it when trigtuple is not null.

But GetTupleForTrigger() updates the epqslot to NULL even when it
returns NULL. So to be consistent with it, we want to do the same
thing for ExecBRDeleteTriggers() : Update the epqslot even when
GetTupleForTrigger() returns NULL.

I think the reason we are doing "*newSlot = NULL" as the very first
thing in the GetTupleForTrigger() code, is so that callers don't have
to initialise newSlot to NULL before calling GetTupleForTrigger. And
so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
can follow the same approach while calling ExecDelete().

I understand that before calling ExecDelete() epqslot is initialized
to NULL, so it is again not required to do the same inside
GetTupleForTrigger(), but as per my above explanation, it is actually
not necessary to initialize to NULL before calling ExecDelete(). So I
can even remove that initialization.

> Apart from looking bit
> awkward, I think it is error-prone as well because the code of
> GetTupleForTrigger is such that it can return NULL with a valid value
> of newSlot in which case the behavior will become undefined. The case
> which I am worried is when first-time EvalPlanQual returns some valid
> value of epqslot, but in the next execution after heap_lock_tuple,
> returns NULL.  In practise, it won't happen because EvalPlanQual locks
> the tuple, so after that heap_lock_tuple shouldn't fail again, but it
> seems prudent to not rely on that unless we need to.

Yes, I had given a thought on exactly this. I think the intention of
the code is to pass back NULL epqslot when there is no concurrently
updated tuple. I think the code in GetTupleForTrigger() is well aware
that EvalPlanQual() will never be called twice. The only reason it
goes back to ltrmark is to re-fetch the locked tuple. The comment also
says so :
/*
* EvalPlanQual already locked the tuple, but we
* re-call heap_lock_tuple anyway as an easy way of
* re-fetching the correct tuple.  Speed is hardly a
* criterion in this path anyhow.
*/

So newSlot is supposed to be updated only once.



>
> For now, I have removed it in the attached patch, but if you have any
> valid reason, then we can add back.
>
>>>
>>> 4.
>>> +/* --
>>> + * ExecBRDeleteTriggers()
>>> + *
>>> + * Called to execute BEFORE ROW DELETE triggers.
>>> + *
>>> + * Returns false in following scenarios :
>>> + * 1. Trigger function returned NULL.
>>> + * 2. The tuple was concurrently deleted OR it was concurrently updated 
>>> and the
>>> + * new tuple didn't pass EvalPlanQual() test.
>>> + * 3. The tuple was concurrently updated and the new tuple passed the
>>> + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, 
>>> this
>>> + * function skips the trigger function execution, because the caller has
>>> + * indicated that it wants to further process the updated tuple. The 
>>> updated
>>> + * tuple slot is passed back through epqslot.
>>> + *
>>> + * In all other cases, returns true.
>>> + * --
>>> + */
>>>  bool
>>>  ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
>>>
> ..
>>
>> If it looks complicated, can you please suggest something to make it a
>> bit simpler.
>>
>
> See attached.
>
> Apart from this, I have changed few comments and fixed indentation
> issues.  Let me know what you think about attached?

Yes, the changes look good, except for the above one point on which we
haven't concluded yet.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: bug with expression index on partition

2018-06-25 Thread Amit Langote
On 2018/06/23 6:51, Alvaro Herrera wrote:
> On 2018-Jun-21, Amit Langote wrote:
> 
>> explain (costs off) select p from p order by p;
>>   QUERY PLAN
>> ---
>>  Merge Append
>>Sort Key: ((p1.*)::p)
>>->  Index Scan using p1_p_idx on p1
>>->  Index Scan using p2_p_idx on p2
>>->  Index Scan using p3_p_idx on p3
>> (5 rows)
> 
> Nice, but try adding a row > operator in the where clause.
> 
> I think it's clearly desirable to allow this row-based search to use indexes;
> as I recall, we mostly enable pagination of results via this kind of
> constructs.  However, we're lacking planner or executor features apparently,
> because a query using a row > operator does not use indexes:
> 
> create table partp (a int, b int) partition by range (a);
> create table partp1 partition of partp for values from (0) to (35);
> create table partp2 partition of partp for values from (35) to (100);
> create index on partp1 ((partp1.*));
> create index on partp2 ((partp2.*));
> explain select * from partp where partp > row(0,0) order by partp limit 25 ;
> QUERY PLAN
> ──
>  Limit  (cost=6.69..6.75 rows=25 width=40)
>->  Sort  (cost=6.69..6.86 rows=66 width=40)
>  Sort Key: ((partp1.*)::partp)
>  ->  Append  (cost=0.00..4.83 rows=66 width=40)
>->  Seq Scan on partp1  (cost=0.00..1.88 rows=23 width=40)
>  Filter: ((partp1.*)::partp > '(0,0)'::record)
>->  Seq Scan on partp2  (cost=0.00..2.62 rows=43 width=40)
>  Filter: ((partp2.*)::partp > '(0,0)'::record)
> (8 filas)
> 
> Note the indexes are ignored, as opposed to what it does in a non-partitioned
> table:

Ah, yes.

IIUC, that happens because any whole-row Vars in WHERE quals and
EquivalenceClass expressions corresponding to child relations each has a
ConvertRowtypeExpr on top, whereas, a child index's expressions read off
pg_index doesn't contain ConvertRowtypeExpr expressions.  So, WHERE quals
and/or ORDER BY expressions containing references to the parent's
whole-row Vars cannot be matched to a child index containing same
whole-row Vars.

It's a bit unfortunate that the WHERE quals and EC expressions are
transformed such that they contain ConvertRowtypeExpr nodes at a point
where they're perhaps not necessary (such as the point when a WHERE clause
or EC expression is matched with an index expression).  A related
discussion is underway on a nearby thread titled "Expression errors with
"FOR UPDATE" and postgres_fdw with partition wise join enabled", so it'd
be nice if that thread concludes such that whole-row child indexes start
becoming useful.

Thanks,
Amit




Re: Concurrency bug in UPDATE of partition-key

2018-06-25 Thread Amit Kapila
On Mon, Jun 25, 2018 at 3:06 PM, Amit Khandekar  wrote:
> On 23 June 2018 at 15:46, Amit Kapila  wrote:
>>>
>>
>> Why do you need to update when newslot is NULL?  Already *epqslot is
>> initialized as NULL in the caller (ExecUpdate). I think we only want
>> to update it when trigtuple is not null.
>
> But GetTupleForTrigger() updates the epqslot to NULL even when it
> returns NULL. So to be consistent with it, we want to do the same
> thing for ExecBRDeleteTriggers() : Update the epqslot even when
> GetTupleForTrigger() returns NULL.
>
> I think the reason we are doing "*newSlot = NULL" as the very first
> thing in the GetTupleForTrigger() code, is so that callers don't have
> to initialise newSlot to NULL before calling GetTupleForTrigger. And
> so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
> can follow the same approach while calling ExecDelete().
>

Yeah, we can do that if it is required.  I see your point, but I feel
that is making code bit less readable.

> I understand that before calling ExecDelete() epqslot is initialized
> to NULL, so it is again not required to do the same inside
> GetTupleForTrigger(), but as per my above explanation, it is actually
> not necessary to initialize to NULL before calling ExecDelete(). So I
> can even remove that initialization.
>

If you remove that initialization, then won't you need to do something
in ExecDelete() as well because there the patch assigns a value to
epqslot only if EvalPlanQual returns non-null value?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Loaded footgun open_datasync on Windows

2018-06-25 Thread Kuntal Ghosh
On Wed, Jun 20, 2018 at 7:36 PM, Laurenz Albe  wrote:
> Kuntal Ghosh wrote:
> [pg_test_fsync doesn't use pgwin32_open and hence doesn't respect O_DSYNC]
>> On Wed, Jun 6, 2018 at 2:39 PM, Amit Kapila  wrote:
>> > On Wed, Jun 6, 2018 at 10:18 AM, Michael Paquier  
>> > wrote:
>> > > On Wed, Jun 06, 2018 at 09:58:34AM +0530, Amit Kapila wrote:
>> > > > One another alternative could be that we
>> > > > define open as pgwin32_open (for WIN32) wherever we need it.
>> > >
>> > > Which is what basically happens on any *nix platform, are you foreseeing
>> > > anything bad here?
>> >
>> > Nothing apparent, but I think we should try to find out why at the first
>> > place this has been made backend specific.
>>
>> It seems the "#ifndef FRONTEND" restriction was added around
>> pgwin32_open() for building libpq with Visual C++ or Borland C++.  The
>> restriction was added in commit 422d4819 to build libpq with VC++[1].
>> Later, in commit fd7c3f67e0bc4, the support for Borland C++ was also
>> added.
>>
>> So, I'm not sure whether removing that restriction will work for all
>> front-end modules.
>
> Thanks for the research, and sorry for my long silence.
>
> If I remove the "#ifndef FRONTEND", building with MSVC fails for all
> call sites that use the two-argument version of open(2).
>
> If I use the three-argument version throughout, which should be no problem,
> PostgreSQL builds just fine with MSVC.
>
I don't have enough experience on MSVC/Windows to comment on the same.

> How about checking what the buildfarm thinks about the attached?
+1


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Auto-partitioning in PostgreSQL 10

2018-06-25 Thread zafiirah jumeen
Hello,

I was trying to do auto partitioning in PostgreSQL 10.
First of all, I created 2 tables t_dossier_bac_history_insert_table and 
t_dossier_bac_history_sensor_collections.
And then, I created a trigger which would execute a function (which would 
create my partitions) before inputting data in 
t_dossier_bac_history_insert_table.
But, I am having an error which is as follows when I try to insert data from an 
existing table to t_dossier_bac_history_insert_table:

ERROR: query string argument of EXECUTE is null CONTEXT: PL/pgSQL function 
sensor_partition() line 27 at EXECUTE SQL state: 22004

Could you please help me in resolving this issue.
Please see below for part of my codes.
Thank you in advance.

---
CREATE TABLE t_dossier_bac_history_insert_table
(
id_dossier_bac_history character(32) COLLATE pg_catalog."default" NOT NULL,
boo_supprime boolean,
dat_create timestamp without time zone DEFAULT timezone('utc'::text, now()),
dat_supprime timestamp without time zone,
dat_update timestamp without time zone,
num_version bigint NOT NULL,
dat_date_entree timestamp without time zone,
dat_date_sortie timestamp without time zone,
id_bac character(32) COLLATE pg_catalog."default" NOT NULL,
id_dossier character(32) COLLATE pg_catalog."default" NOT NULL,
boo_en_migration boolean DEFAULT false

)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;


CREATE TABLE t_dossier_bac_history_sensor_collections
(
id_dossier_bac_history character(32) COLLATE pg_catalog."default" NOT NULL,
boo_supprime boolean,
dat_create timestamp without time zone DEFAULT timezone('utc'::text, now()),
dat_supprime timestamp without time zone,
dat_update timestamp without time zone,
num_version bigint NOT NULL,
dat_date_entree timestamp without time zone,
dat_date_sortie timestamp without time zone,
id_bac character(32) COLLATE pg_catalog."default" NOT NULL,
id_dossier character(32) COLLATE pg_catalog."default" NOT NULL,
boo_en_migration boolean DEFAULT false

)
PARTITION BY LIST (id_bac)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;


CREATE TRIGGER insert_to_t_dossier_bac_history_sensor_collections
BEFORE INSERT
ON t_dossier_bac_history_insert_table
FOR EACH ROW
EXECUTE PROCEDURE sensor_partition();



CREATE OR REPLACE FUNCTION sensor_partition()
RETURNS TRIGGER AS $$
DECLARE
sensor_table TEXT;
new_table TEXT;
new_insert TEXT;
BEGIN
sensor_table='id_bac_'||NEW.id_bac;

   IF NOT EXISTS (SELECT relname FROM pg_class  
--CHECK IF TABLE 'sensor_table' exists. If not, create the table.
   WHERE relname=sensor_table) THEN 
   RAISE NOTICE 'Creating Partition:%', sensor_table;
   
   new_table := 'CREATE TABLE '|| sensor_table  
--Table does not exists, create table/partition
 || ' PARTITION OF 
t_dossier_bac_history_sensor_collections' || ' (id_dossier_bac_history, 
dat_create, dat_supprime, dat_update, num_version, dat_date_entree, 
dat_date_sortie, id_bac, id_dossier, boo_en_migration)'
 || ' FOR VALUES IN ( 
'''|| NEW.id_bac ||''' ) ;';
 


 
   EXECUTE new_table;
   ELSE
   new_table:= "The table exist already";   
  --Table already exists, do not create table
   END IF;
   
  
   new_insert := 'INSERT INTO 
t_dossier_bac_history_sensor_collections VALUES('''
|| 
NEW.id_dossier_bac_history ||''', '
|| NEW.boo_supprime 
||', '''
|| NEW.dat_create 
||''','' '
|| NEW.dat_supprime 
||''','' '
|| NEW.dat_update 
||''', '
|| NEW.num_version 
||','' '
|| NEW.dat_date_entree 
||''','' '
|| NEW.dat_date_sortie 
||''','' '
|| NEW.id_bac ||''','' '
|| NEW.id_dossier 
||''', '
   

Re: Incorrect fsync handling in pg_basebackup's tar_finish

2018-06-25 Thread Kuntal Ghosh
On Mon, Jun 25, 2018 at 2:27 PM, Magnus Hagander  wrote:
>
>
> On Mon, Jun 25, 2018 at 4:43 AM, Michael Paquier 
> wrote:
>>
>> Hi all,
>>
>> I was just looking at the code of pg_basebackup, and noticed that we
>> don't actually check if the two last empty blocks of any tar file
>> produced are correctly fsync'd or not:
>> @@ -957,7 +957,10 @@ tar_finish(void)
>>
>>  /* sync the empty blocks as well, since they're after the last file */
>>  if (tar_data->sync)
>> -   fsync(tar_data->fd);
>> +   {
>> +   if (fsync(tar_data->fd) != 0)
>> +   return false;
>> +   }
>>
>> That looks incorrect to me, hence shouldn't something like the attached
>> be done?  Magnus and others, any opinions?
In the same note, in tar_close(), we fsync on close. We're not
checking the status of fsync there. Should we introduce the same check
there as well?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: PATCH: backtraces for error messages

2018-06-25 Thread Korry Douglas
On Sat, Jun 23, 2018 at 8:22 AM, Craig Ringer  wrote:

> On 22 June 2018 at 23:26, Korry Douglas 
> wrote:
>
>> A few misc comments:
>>
>> In general, +1.
>>
>> It might be nice to move the stack trace code into a separate function
>> (not static to elog.c) - I have often found it useful to attach backtraces
>> to objects so I can debug complex allocation/deallocation problems.
>>
>
> Good suggestion.
>
>
>>
>> The major expense in capturing a stack trace is in the symbolization of
>> the stack addresses, not the stack walk itself.  You typically want to
>> symbolize the addresses at the time you generate the trace, but that's not
>> always the case.  For example, if you want to record the stack trace of
>> every call to AllocSetAlloc() (and attach the trace to the AllocChunk)
>> performance gets unbearable if you symbolize every trace.  So a flag that
>> specifies whether to symbolize would be nice too.
>>
>
> libunwind has some nifty caching that makes that a _lot_ cheaper; that's
> part of why I went with it despite the extra lib requirement.
>

I've not used libunwind before - looking through the docs now.  It does
seem much more flexible than backtrace(), thanks.


>
>
>> If you don't symbolize, you need a way to capture the address range of
>> each dynamically loaded shared object (so that you can later symbolize
>> using something like addr2line).
>>
>
> Yeah. libunwind doesn't expose any interface to get that information, so
> you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or
> capture /proc/self/maps. You have to make sure that info makes it to the
> log, and is re-output often enough not to be lost to log rotation, and is
> invalidated and re-output if mappings change due to new lib loads etc. But
> you don't want to print it all the time either.
>
> Then you have to walk the stack and print the instruction pointers and
> stack pointers and spit out raw traces for the user to reassemble.
>
> I'd argue that if you're doing the sort of thing where you want a stack of
> every AllocSetAlloc, you shouldn't be trying to do that in-process. That's
> where tools like perf, systemtap, etc really come into their own. I'm
> focused on making additional diagnostic info for errors and key log
> messages collectable for systems that aren't easily accessed, like users
> who have 12-hour read/response latencies and security setups as strict as
> they are insane and nonsensical.
>

Understood - I realized after I replied that instrumenting AllocSetAlloc()
is a bit more complex than I had suggested.  When I need to capture the
call stack of each allocation, I store the stack trace in malloc'ed
buffers, otherwise I get recursive (in AllocSetAlloc()).  I suspect that's
unreasonable for a general-purpose solution.

I'd have no objection to the option to disable symbolic back traces and
> print the raw call stack. It's trivial to do; in fact, I only removed the
> ip/sp info to keep the output more concise.
>
> I'm not interested in writing anything to handle the library load address
> mapping collection etc though. I don't see a really sensible way to do that
> in a log-based system.
>

Agreed - I tend to use (saved, not logged) stack traces to find memory
leaks that valgrind can't see (or more precisely, excessive resource
consumption that is not actually leaked).

Thanks for your consideration.

 -- Korry


Re: Auto-partitioning in PostgreSQL 10

2018-06-25 Thread Mark Dilger


> On Jun 25, 2018, at 3:00 AM, zafiirah jumeen  wrote:
> 
> Hello,
>  
> I was trying to do auto partitioning in PostgreSQL 10.
> First of all, I created 2 tables t_dossier_bac_history_insert_table and 
> t_dossier_bac_history_sensor_collections.
> And then, I created a trigger which would execute a function (which would 
> create my partitions) before inputting data in 
> t_dossier_bac_history_insert_table.
> But, I am having an error which is as follows when I try to insert data from 
> an existing table to t_dossier_bac_history_insert_table:
>  
> ERROR: query string argument of EXECUTE is null CONTEXT: PL/pgSQL function 
> sensor_partition() line 27 at EXECUTE SQL state: 22004
>  
> Could you please help me in resolving this issue.


The variable 'new_insert' will be null when any of the new columns
are null, because concatenation of null with anything else renders
a null result.  You could look at the documentation for the function
named COALESCE() and see how to use that to prevent the 'new_insert'
variable from becoming null.

There may be broader problems in the general design of your solution,
too.

I think you should post questions of this sort to pgsql-general
rather than here.

> Please see below for part of my codes.
> Thank you in advance.
>  
> ---
> CREATE TABLE t_dossier_bac_history_insert_table
> (
> id_dossier_bac_history character(32) COLLATE pg_catalog."default" NOT 
> NULL,
> boo_supprime boolean,
> dat_create timestamp without time zone DEFAULT timezone('utc'::text, 
> now()),
> dat_supprime timestamp without time zone,
> dat_update timestamp without time zone,
> num_version bigint NOT NULL,
> dat_date_entree timestamp without time zone,
> dat_date_sortie timestamp without time zone,
> id_bac character(32) COLLATE pg_catalog."default" NOT NULL,
> id_dossier character(32) COLLATE pg_catalog."default" NOT NULL,
> boo_en_migration boolean DEFAULT false
>  
> )
> WITH (
> OIDS = FALSE
> )
> TABLESPACE pg_default;
>  
>  
> CREATE TABLE t_dossier_bac_history_sensor_collections
> (
> id_dossier_bac_history character(32) COLLATE pg_catalog."default" NOT 
> NULL,
> boo_supprime boolean,
> dat_create timestamp without time zone DEFAULT timezone('utc'::text, 
> now()),
> dat_supprime timestamp without time zone,
> dat_update timestamp without time zone,
> num_version bigint NOT NULL,
> dat_date_entree timestamp without time zone,
> dat_date_sortie timestamp without time zone,
> id_bac character(32) COLLATE pg_catalog."default" NOT NULL,
> id_dossier character(32) COLLATE pg_catalog."default" NOT NULL,
> boo_en_migration boolean DEFAULT false
>  
> )
> PARTITION BY LIST (id_bac)
> WITH (
> OIDS = FALSE
> )
> TABLESPACE pg_default;
>  
>  
> CREATE TRIGGER insert_to_t_dossier_bac_history_sensor_collections
> BEFORE INSERT
> ON t_dossier_bac_history_insert_table
> FOR EACH ROW
> EXECUTE PROCEDURE sensor_partition();
>  
>  
>  
> CREATE OR REPLACE FUNCTION sensor_partition()
> RETURNS TRIGGER AS $$
> DECLARE
> sensor_table TEXT;
> new_table TEXT;
> new_insert TEXT;
> BEGIN
> sensor_table='id_bac_'||NEW.id_bac;
>  
>IF NOT EXISTS (SELECT relname FROM pg_class
>   --CHECK IF TABLE 'sensor_table' exists. If not, create the table.
>WHERE 
> relname=sensor_table) THEN 
>RAISE NOTICE 'Creating 
> Partition:%', sensor_table;
>
>new_table := 'CREATE TABLE 
> '|| sensor_table  --Table does not exists, 
> create table/partition
>  || ' 
> PARTITION OF t_dossier_bac_history_sensor_collections' || ' 
> (id_dossier_bac_history, dat_create, dat_supprime, dat_update, num_version, 
> dat_date_entree, dat_date_sortie, id_bac, id_dossier, boo_en_migration)'
>   
>   || ' FOR VALUES IN ( '''|| NEW.id_bac ||''' 
> ) ;';
>   
>   
>   
>   
> 
>   
>
>EXECUTE new_table;
>ELSE
>new_table:= "The t

Re: Incorrect fsync handling in pg_basebackup's tar_finish

2018-06-25 Thread Michael Paquier
On Mon, Jun 25, 2018 at 05:48:54PM +0530, Kuntal Ghosh wrote:
> In the same note, in tar_close(), we fsync on close. We're not
> checking the status of fsync there. Should we introduce the same check
> there as well?

Yes, there is a second one.  I just looked at walmethods.c and I did not
spot any other issues.  What do you think about the updated version
attached?
--
Michael
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index 331d0e7275..fbfee05a5a 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -865,7 +865,8 @@ tar_close(Walfile f, WalCloseMethod method)
 		return -1;
 
 	/* Always fsync on close, so the padding gets fsynced */
-	tar_sync(f);
+	if (tar_sync(f) < 0)
+		return -1;
 
 	/* Clean up and done */
 	pg_free(tf->pathname);
@@ -896,7 +897,7 @@ tar_finish(void)
 			return false;
 	}
 
-	/* A tarfile always ends with two empty  blocks */
+	/* A tarfile always ends with two empty blocks */
 	MemSet(zerobuf, 0, sizeof(zerobuf));
 	if (!tar_data->compression)
 	{
@@ -957,7 +958,10 @@ tar_finish(void)
 
 	/* sync the empty blocks as well, since they're after the last file */
 	if (tar_data->sync)
-		fsync(tar_data->fd);
+	{
+		if (fsync(tar_data->fd) != 0)
+			return false;
+	}
 
 	if (close(tar_data->fd) != 0)
 		return false;


signature.asc
Description: PGP signature


Regarding the correct and best way to fetching a tablename in contrib module

2018-06-25 Thread Gaurav Mishra
Hi folks ,

I have developed a new data type in PostgreSQL. The handler associated with
this data type is written in C language. This handler encrypts the column
data. I need help on figuring out how to get the table name when my
handler code is called during CRUD and Alter table operations.

let me elaborate a bit about the things i am gonna do here , So the thing
is i am developing a cont-rib module and wants to capture table name inside
that cont-rib module .

question:

1. Is there any way to capture a table name during CRUD and alter table  ?
I have seen some of the triggers but not able to make it (i don't thing
there is any create table trigger exist ). in case If it is possible tell
me a way to achieve it.

2. I though of extracting meta-data using pg_class but not helping it seems
because i have to give explicitly a  rel-name(table-name) in where clause
do you think any other way to achieve it ? please elaborate if any and
please let me know.

3. Is there any callback functions available so that i can call those
callback functions inside a code and if yes at what level we get tablename
attached with colname .

Here is some example which will make you understand a bit about the things
i am gonna do .

create table new_table(name varchar , new integer) ;

insert into new_table values('abcdefghijkl' , 5004);

create table new_table1(name1 varchar , new1 integer) ;

insert into new_table1 values('mnopqrst' , 5005);

So my extension should extact a tablename ,means i should know tablename
with the built-in datatype I have declared .

here what i expect as a output :

create extension table_name-extract;

select extract_tablename();

table-name  datatype-name
new_table  name new
new_table1 name1 new1

extract_tablename : this function should give me a table name with
data_type inside contrib module so that i can use in the extension .

Any help would be much appropriated .

Thanks and Regards,
Gaurav Mishra
9986425323


Re: comma to delimit fractional seconds

2018-06-25 Thread Chris Howard





p.s.  I've noticed that the error msg for badly formed time strings 
always
says the same thing "at character 13"  no matter how long the time 
string is.




nevermind, user error





Re: Incorrect fsync handling in pg_basebackup's tar_finish

2018-06-25 Thread Kuntal Ghosh
On Mon, Jun 25, 2018 at 6:47 PM, Michael Paquier  wrote:
> Yes, there is a second one.  I just looked at walmethods.c and I did not
> spot any other issues.  What do you think about the updated version
> attached?
> --
I've also verified the same. The patch looks good to me.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Regarding the correct and best way to fetching a tablename in contrib module

2018-06-25 Thread Gaurav Mishra
 Hi ,

I have developed a new data type in PostgreSQL. The handler associated with
this data type is written in C language. This handler encrypts the column
data. I need help on figuring out how to get the table name when my
handler code is called during CRUD and Alter table operations.

let me elaborate a bit about the things i am gonna do here , So the thing
is i am developing a contrib module and wants to capture table name inside
that contrib module .

question:

1. Is there any way to capture a table name during CRUD and alter table  ?
I have seen some of the triggers but not able to make it (i don't thing
there is any create table trigger exist ). in case If it is possible tell
me a way to achieve it.

2. I though of extracting meta-data using pg_class but not helping it seems
because i have to give explicitly a  rel-name(table-name) in where clause
do you think any other way to achieve it ? please elaborate if any and
please let me know.

3. Is there any callback functions available so that i can call those
callback functions inside a code and if yes at what level we get tablename
attached with new datatype .

Here is some example which will make you understand a bit about the things
i am gonna do .

create table new_table(name varchar , newdata newdatatype) ;

insert into new_table values('abcdefghijkl' , '5004');

create table new_table1(name1 varchar ,   newdata newdatatype ) ;

insert into new_table1 values('mnopqrst' , 'abcde');

So my extension should extact a tablename ,means i should know tablename
with the built-in datatype I have declared .

here what i expect as a output :

create extension table_name-extract;

select extract_tablename();

table-name  datatype-name

new_table  newdatatype
new_table1 newdatatype

extract_tablename : this function should give me a table name with new data
type inside contrib module so that i can use in the extension .

Any help would be much appropriated .

Thanks and Regards,
Gaurav Mishra


Re: Constraint documentation

2018-06-25 Thread Lætitia Avrot
Thanks!
I'll correct the patch ASAP including your modifications.

Le sam. 23 juin 2018 à 19:15, Fabien COELHO  a écrit :

>
> Hello lætitia,
>
> My 0.02 € to try to simplify the suggested documentation.
>
> > Check constraint were not
>
> are not
>
> > designed to enforce business rules across tables.
>
> > Avoid using check constraints with function accessing to other tables
>
> accessing other tables (no "to")
>
> > and prefer triggers instead (please refer to 
> > for more information about triggers).
>
> ... and use  instead.
>
> > PostgreSQL won't prevent you from doing so,
>
> Although PostgreSQL ... so,
>
> > but be aware you might encounter difficulties to restore dumps
> > (generated with pg_dump or pg_dumpall) if you do.
>
> beware that dumps generated by pg_*<...> or <...> may be hard
> to restore because of such checks, as the underlying dependencies are not
> taken into account.
>
> --
> Fabien.



-- 
*Think! Do you really need to print this email ? *
*There is no Planet B.*


Re: Removing obsolete comment block at the top of nbtsort.c.

2018-06-25 Thread Alvaro Herrera
On 2018-Jun-24, Peter Geoghegan wrote:

> nbtsort.c has a comment block from the Berkeley days that reads:
> 
>  * This code is moderately slow (~10% slower) compared to the regular
>  * btree (insertion) build code on sorted or well-clustered data. On
>  * random data, however, the insertion build code is unusable -- the
>  * difference on a 60MB heap is a factor of 15 because the random
>  * probes into the btree thrash the buffer pool. (NOTE: the above
>  * "10%" estimate is probably obsolete, since it refers to an old and
>  * not very good external sort implementation that used to exist in
>  * this module. tuplesort.c is almost certainly faster.)
> 
> I propose removing this whole comment block (patch attached),

Makes sense to me, +1.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Using JSONB directly from application

2018-06-25 Thread Andrew Dunstan




On 06/24/2018 12:42 PM, Tomas Vondra wrote:



Sending raw JSONB to Postgres might also be interesting, but I'd start
with receiving.

Would implementing raw_jsonb be as trivial as it sounds?  What about
usages like SELECT raw_jsonb(col3->'foo'); does the subobject returned
by '->' share structure with the containing object, making the
conversion to a self-contained JSONB value less direct?

Can these conversions be implemented without copying the bytes?


I don't think you need the function, actually. PostgreSQL protocol
supports both text and binary mode - in the text mode the server formats
everything as text before sending it to the client. I guess this is what
you mean by "convert to json".

But with the extended protocol you (or rather the connection library
you're using) can specify that the output should be handed in binary,
i.e. as exact copy of the data. This happens at "bind" phase, see the
"Bind" message docs here:

https://www.postgresql.org/docs/current/static/protocol-message-formats.html






jsonb_send just sends 1 followed by the stringified value. If you want 
real binary transmission I think you'll need a new output version, by 
adjusting jsonb_send and jsonb_recv.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Removing unneeded self joins

2018-06-25 Thread Alexander Kuzmenkov

David,

I tried to implement your suggestions, here are the patches.

The first patch mostly moves some code around without changing 
functionality. It modifies innerrel_is_unique to not only cache the fact 
that the relation is unique, but also

cache the index that guarantees uniqueness.

The second patch adds the unique self join removal code. It goes along 
the lines of your plan. I didn't even have to examine join clauses 
because the constraints imposed by innerrel_is_unique are strong enough 
to guarantee that when it finds the same unique index for both 
relations, the join can be removed. Namely, it requires mergejoinable 
equality clauses for all columns of a unique index.


As a simple benchmark, I measured the duration of query_planner and 
remove_useless_self_joins with clock_gettime() on the regression tests. 
The following table shows average times in microseconds, median over 11 
runs. First row is with this patch, and the second row doesn't call 
remove_useless_self_joins and just calls clock_gettime to measure its 
overhead.


  query_planner  remove_useless_self_joins

with removal   39.61    0.61

no removal     39.45    0.38


So, on this workload, unique self join removal adds about 0.2 mcs, or 
0.6% of total time, to query_planner. I also tried a query that joins 26 
relations, remove_useless_self_joins takes about 40 mcs. Still, this 
time grows quadratically with number of relations we have to process, so 
in the final patch I limit it to join_collapse_limit, which brings the 
time down to 15 mcs. This is negligible compared to the total 
query_planner time, which for 8 relations is about 3 ms, that is, 3 
orders of magnitude higher.


These benchmarks mostly measure the path where we don't remove any 
joins. I didn't time the join removal itself, because it wins quite some 
time by allowing to plan one relation and one join less.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From ad5f5b59265762b3674068ee3656baec3ec66b70 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov 
Date: Wed, 13 Jun 2018 21:45:32 +0300
Subject: [PATCH 2/2] Remove unique self joins.

---
 src/backend/optimizer/plan/analyzejoins.c | 513 ++
 src/backend/optimizer/plan/planmain.c |   5 +
 src/include/optimizer/planmain.h  |   1 +
 src/test/regress/expected/join.out|  38 ++-
 src/test/regress/sql/join.sql |  16 +
 5 files changed, 569 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c03010c..f593a17 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
 #include "optimizer/var.h"
 #include "utils/lsyscache.h"
@@ -1122,3 +1125,513 @@ is_innerrel_unique_for(PlannerInfo *root,
 	/* Let rel_is_distinct_for() do the hard work */
 	return rel_is_distinct_for(root, innerrel, clause_list, unique_index);
 }
+
+static bool
+set_varno_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, Var))
+		((Var *) node)->varno = (Index) (intptr_t) context;
+
+	return expression_tree_walker(node, set_varno_walker, context);
+}
+
+/*
+ * Replace varno with 'relid' for all Vars in the expression.
+ */
+static void
+set_varno(Expr *expr, Index relid)
+{
+	set_varno_walker((Node *) expr, (void*) (intptr_t) relid);
+}
+
+static bool
+references_relation_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, Var))
+		return ((Var *) node)->varno == (Index) (intptr_t) context;
+
+	return expression_tree_walker(node, references_relation_walker, context);
+}
+
+/*
+ * Check whether the expression has any Vars from the relation identified
+ * by 'relid'. For EquivalenceClass nodes, checks the equivalence members.
+ */
+static bool
+references_relation(Node *node, Index relid)
+{
+	if (IsA(node, EquivalenceClass))
+	{
+		EquivalenceClass *ec = (EquivalenceClass *) node;
+		ListCell *lc;
+		foreach (lc, ec->ec_members)
+		{
+			if (references_relation(
+		(Node *) ((EquivalenceMember *) lfirst(lc))->em_expr,
+		relid))
+			{
+return true;
+			}
+		}
+		return false;
+	}
+	return references_relation_walker(node, (void *) (intptr_t) relid);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+	if (bms_is_member(oldId, *relids))
+		*relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Remove a relation after we h

Some pgq table rewrite incompatibility with logical decoding?

2018-06-25 Thread Jeremy Finzel
I am hoping someone here can shed some light on this issue - I apologize if
this isn't the right place to ask this but I'm almost some of you all were
involving in pgq's dev and might be able to answer this.

We are actually running 2 replication technologies on a few of our dbs,
skytools and pglogical.  Although we are moving towards only using logical
decoding-based replication, right now we have both for different purposes.

There seems to be a table rewrite happening on table pgq.event_58_1 that
has happened twice, and it ends up in the decoding stream, resulting in the
following error:

ERROR,XX000,"could not map filenode ""base/16418/1173394526"" to relation
OID"

In retracing what happened, we discovered that this relfilenode was
rewritten.  But somehow, it is ending up in the logical decoding stream as
is "undecodable".  This is pretty disastrous because the only way to fix it
really is to advance the replication slot and lose data.

The only obvious table rewrite I can find in the pgq codebase is a truncate
in pgq.maint_rotate_tables.sql.  But there isn't anything surprising
there.  If anyone has any ideas as to what might cause this so that we
could somehow mitigate the possibility of this happening again until we
move off pgq, that would be much appreciated.

Thanks,
Jeremy


Re: Constraint documentation

2018-06-25 Thread Lætitia Avrot
Hello,

Ok, I corrected the patch as suggested. I hope I did it right.

Have a nice day,

Lætitia

Le lun. 25 juin 2018 à 16:02, Lætitia Avrot  a
écrit :

> Thanks!
> I'll correct the patch ASAP including your modifications.
>
> Le sam. 23 juin 2018 à 19:15, Fabien COELHO  a
> écrit :
>
>>
>> Hello lætitia,
>>
>> My 0.02 € to try to simplify the suggested documentation.
>>
>> > Check constraint were not
>>
>> are not
>>
>> > designed to enforce business rules across tables.
>>
>> > Avoid using check constraints with function accessing to other tables
>>
>> accessing other tables (no "to")
>>
>> > and prefer triggers instead (please refer to 
>> > for more information about triggers).
>>
>> ... and use  instead.
>>
>> > PostgreSQL won't prevent you from doing so,
>>
>> Although PostgreSQL ... so,
>>
>> > but be aware you might encounter difficulties to restore dumps
>> > (generated with pg_dump or pg_dumpall) if you do.
>>
>> beware that dumps generated by pg_*<...> or <...> may be
>> hard
>> to restore because of such checks, as the underlying dependencies are not
>> taken into account.
>>
>> --
>> Fabien.
>
>
>
> --
> *Think! Do you really need to print this email ? *
> *There is no Planet B.*
>


-- 
*Think! Do you really need to print this email ? *
*There is no Planet B.*
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 50dc25f..c677dc4 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -403,6 +403,18 @@ CREATE TABLE products (
 ensure that a column does not contain null values, the not-null
 constraint described in the next section can be used.

+
+   
+
+ Check constraint are not designed to enforce business rules across tables.
+ Avoid using check constraints with function accessing other tables and
+ use  instead. Although PostgreSQL won't prevent you
+ from doing so, but beware that dumps generated by pg_dump
+ or pg_dumpall may be hard
+ to restore because of such checks, as the underlying dependencies are not
+ taken into account.
+
+   
   
 
   


Re: [PATCH] Include application_name in "connection authorized" log message

2018-06-25 Thread Gilles Darold

Le 21/06/2018 à 16:21, Don Seiler a écrit :
On Wed, Jun 20, 2018 at 2:45 PM, Stephen Frost > wrote:



diff --git a/src/include/libpq/libpq-be.h
b/src/include/libpq/libpq-be.h
index 7698cd1f88..088ef346a8 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -135,6 +135,7 @@ typedef struct Port
         */
        char       *database_name;
        char       *user_name;
+       char       *application_name;
        char       *cmdline_options;
        List       *guc_options;

We should have some comments here about how this is the "startup
packet
application_name" and that it's specifically used for the "connection
authorized" message and that it shouldn't really be used
post-connection-startup (the GUC should be used instead, as
applications
can change it post-startup).


Makes sense. I've now moved that variable declaration underneath the 
others with a small comment block above it.



diff --git a/src/backend/postmaster/postmaster.c
b/src/backend/postmaster/postmaster.c
index a4b53b33cd..8e75c80728 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2094,6 +2094,10 @@ retry1:
                              pstrdup(nameptr));
                                port->guc_options =
lappend(port->guc_options,
                              pstrdup(valptr));
+
+                               /* Copy application_name to port
as well */
+                               if (strcmp(nameptr,
"application_name") == 0)
+  port->application_name = pstrdup(valptr);
                        }
                        offset = valoffset + strlen(valptr) + 1;
                }

That comment feels a bit lacking- this is a pretty special case which
should be explained.


I've added longer comment explaining again why we are doing this here.


diff --git a/src/backend/utils/init/postinit.c
b/src/backend/utils/init/postinit.c
index 09e0df290d..86db7630d5 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -266,8 +266,8 @@ PerformAuthentication(Port *port)
 #ifdef USE_SSL
                        if (port->ssl_in_use)
                                ereport(LOG,
-  (errmsg("connection authorized: user=%s database=%s SSL enabled
(protocol=%s, cipher=%s, bits=%d, compression=%s)",
-      port->user_name, port->database_name,
+  (errmsg("connection authorized: user=%s database=%s
application=%s SSL enabled (protocol=%s, cipher=%s, bits=%d,
compression=%s)",
+      port->user_name, port->database_name, port->application_name
      be_tls_get_version(port),
      be_tls_get_cipher(port),
      be_tls_get_cipher_bits(port),
@@ -275,8 +275,8 @@ PerformAuthentication(Port *port)
                        else
 #endif
                                ereport(LOG,
-  (errmsg("connection authorized: user=%s database=%s",
-      port->user_name, port->database_name)));
+  (errmsg("connection authorized: user=%s database=%s
application=%s",
+      port->user_name, port->database_name,
port->application_name)));
                }
        }

You definitely need to be handling the case where application_name is
*not* passed in more cleanly.  I don't think we should output anything
different from what we do today in those cases.


I've added some conditional logic similar to the ternary operator 
usage in src/backend/catalog/pg_collation.c:92. If application_name is 
set, we'll include "application=%s" in the log message, otherwise we 
make no mention of it now (output will be identical to what we 
currently do).


Also, these don't need to be / shouldn't be 3 seperate
patches/commits,
and there should be a sensible commit message which explains what the
change is entirely.


After much head scratching/banging on both our parts yesterday, I've 
finally figured this out. Thanks again for your patience and time.


If you could update the patch accordingly and re-send, that'd be
awesome. :)


See attached.



--
Don Seiler
www.seiler.us 



Hi,


I've reviewed this patch, log entry is the following when 
"log_connection = on" ajnd application_name is set:



    2018-06-25 17:01:30.079 CEST [22002] LOG:  connection authorized: 
user=postgres database=postgres application=psql



when application_name is not set:


    2018-06-25 17:04:22.653 CEST [22076] LOG:  connection authorized: 
user=dbuser database=test



I would prefer the following output to conform to elog.c output:


    2018-06-25 17:32:46.854 CEST [23265] LOG:  connection authorized: 
user=postgres database=postgres application=[unknown]


I think this will make changes to src/backend/utils/init/postinit.c code 
more readab

Re: Loaded footgun open_datasync on Windows

2018-06-25 Thread Laurenz Albe
Kuntal Ghosh wrote:
> > If I use the three-argument version throughout, which should be no problem,
> > PostgreSQL builds just fine with MSVC.
> > 
> 
> I don't have enough experience on MSVC/Windows to comment on the same.
> 
> > How about checking what the buildfarm thinks about the attached?
> 
> +1

I have added it to the July commitfest.

Yours,
Laurenz Albe



Re: MERGE SQL Statement for PG11

2018-06-25 Thread André Brière
The status of this is quite unclear to me:
- There are two email threads and the most recent email is in the original one 
(https://www.postgresql.org/message-id/flat/canp8+jkitbsrb7otgt9cy2i1obfot36z0xmraqc+xrz8qb0...@mail.gmail.com/);
 I think the status should be set to "Waiting for Author" but it was reset to 
"Needs review" by Pavan on 06/19/2018 (based on the second email thread?)
- The patch was reverted in 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=08ea7a2291db21a618d19d612c8060cda68f1892

Re: Some pgq table rewrite incompatibility with logical decoding?

2018-06-25 Thread Andres Freund
Hi,

On 2018-06-25 10:37:18 -0500, Jeremy Finzel wrote:
> I am hoping someone here can shed some light on this issue - I apologize if
> this isn't the right place to ask this but I'm almost some of you all were
> involving in pgq's dev and might be able to answer this.
> 
> We are actually running 2 replication technologies on a few of our dbs,
> skytools and pglogical.  Although we are moving towards only using logical
> decoding-based replication, right now we have both for different purposes.
> 
> There seems to be a table rewrite happening on table pgq.event_58_1 that
> has happened twice, and it ends up in the decoding stream, resulting in the
> following error:
> 
> ERROR,XX000,"could not map filenode ""base/16418/1173394526"" to relation
> OID"
> 
> In retracing what happened, we discovered that this relfilenode was
> rewritten.  But somehow, it is ending up in the logical decoding stream as
> is "undecodable".  This is pretty disastrous because the only way to fix it
> really is to advance the replication slot and lose data.
> 
> The only obvious table rewrite I can find in the pgq codebase is a truncate
> in pgq.maint_rotate_tables.sql.  But there isn't anything surprising
> there.  If anyone has any ideas as to what might cause this so that we
> could somehow mitigate the possibility of this happening again until we
> move off pgq, that would be much appreciated.

I suspect the issue might be that pgq does some updates to catalog
tables. Is that indeed the case?

Greetings,

Andres Freund



Re: MERGE SQL statement for PG12

2018-06-25 Thread Alvaro Herrera
On 2018-Jun-19, Pavan Deolasee wrote:

> Hello,
> 
> I would like to resubmit the MERGE patch for PG12. The past discussions
> about the patch can be found here [1] [2].

Hello.  A very minor thing, please see commit 15a8f8caad14 and the
discussion that led to it.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Some pgq table rewrite incompatibility with logical decoding?

2018-06-25 Thread Jeremy Finzel
On Mon, Jun 25, 2018 at 12:41 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-06-25 10:37:18 -0500, Jeremy Finzel wrote:
> > I am hoping someone here can shed some light on this issue - I apologize
> if
> > this isn't the right place to ask this but I'm almost some of you all
> were
> > involving in pgq's dev and might be able to answer this.
> >
> > We are actually running 2 replication technologies on a few of our dbs,
> > skytools and pglogical.  Although we are moving towards only using
> logical
> > decoding-based replication, right now we have both for different
> purposes.
> >
> > There seems to be a table rewrite happening on table pgq.event_58_1 that
> > has happened twice, and it ends up in the decoding stream, resulting in
> the
> > following error:
> >
> > ERROR,XX000,"could not map filenode ""base/16418/1173394526"" to relation
> > OID"
> >
> > In retracing what happened, we discovered that this relfilenode was
> > rewritten.  But somehow, it is ending up in the logical decoding stream
> as
> > is "undecodable".  This is pretty disastrous because the only way to fix
> it
> > really is to advance the replication slot and lose data.
> >
> > The only obvious table rewrite I can find in the pgq codebase is a
> truncate
> > in pgq.maint_rotate_tables.sql.  But there isn't anything surprising
> > there.  If anyone has any ideas as to what might cause this so that we
> > could somehow mitigate the possibility of this happening again until we
> > move off pgq, that would be much appreciated.
>
> I suspect the issue might be that pgq does some updates to catalog
> tables. Is that indeed the case?
>

I also suspected this.  The only case I found of this is that it is doing
deletes and inserts to pg_autovacuum.  I could not find anything quickly
otherwise but I'm not sure if I'm missing something in some of the C code.

Thanks,
Jeremy


Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-06-25 Thread Andrey V. Lepikhov




On 23.06.2018 01:14, Peter Geoghegan wrote:

On Fri, Jun 22, 2018 at 12:43 PM, Peter Geoghegan  wrote:

On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
 wrote:

According to your feedback, i develop second version of the patch.
In this version:
1. High-level functions index_beginscan(), index_rescan() not used. Tree
descent made by _bt_search(). _bt_binsrch() used for positioning on the
page.
2. TID list introduced in amtargetdelete() interface. Now only one tree
descent needed for deletion all tid's from the list with equal scan key
value - logical duplicates deletion problem.
3. Only one WAL record for index tuple deletion per leaf page per
amtargetdelete() call.


Cool.

What is this "race" code about?


I introduce this check because keep in mind about another vacuum 
workers, which can make cleaning a relation concurrently. may be it is 
redundant.



I noticed another bug in your patch, when running a
"wal_consistency_checking=all" smoke test. I do this simple, generic
test for anything that touches WAL-logging, actually -- it's a good
practice to adopt.

I enable "wal_consistency_checking=all" on the installation, create a
streaming replica with pg_basebackup (which also has
"wal_consistency_checking=all"), and then run "make installcheck"
against the primary. Here is what I see on the standby when I do this
with v2 of your patch applied:

9524/2018-06-22 13:03:12 PDT LOG:  entering standby mode
9524/2018-06-22 13:03:12 PDT LOG:  consistent recovery state reached
at 0/3D0
9524/2018-06-22 13:03:12 PDT LOG:  invalid record length at 0/3D0:
wanted 24, got 0
9523/2018-06-22 13:03:12 PDT LOG:  database system is ready to accept
read only connections
9528/2018-06-22 13:03:12 PDT LOG:  started streaming WAL from primary
at 0/300 on timeline 1
9524/2018-06-22 13:03:12 PDT LOG:  redo starts at 0/3D0
9524/2018-06-22 13:03:32 PDT FATAL:  inconsistent page found, rel
1663/16384/1259, forknum 0, blkno 0
9524/2018-06-22 13:03:32 PDT CONTEXT:  WAL redo at 0/3360B00 for
Heap2/CLEAN: remxid 599
9523/2018-06-22 13:03:32 PDT LOG:  startup process (PID 9524) exited
with exit code 1
9523/2018-06-22 13:03:32 PDT LOG:  terminating any other active server processes
9523/2018-06-22 13:03:32 PDT LOG:  database system is shut down

I haven't investigated this at all, but I assume that the problem is a
simple oversight. The new ItemIdSetDeadRedirect() concept that you've
introduced probably necessitates changes in both the WAL logging
routines and the redo/recovery routines. You need to go make those
changes. (By the way, I don't think you should be using the constant
"3" with the ItemIdIsDeadRedirection() macro definition.)

Let me know if you get stuck on this, or need more direction.

I was investigated the bug of the simple smoke test. You're right: make 
any manipulations with line pointer in heap_page_prune() without 
reflection in WAL record is no good idea.
But this consistency problem arises even on clean PostgreSQL 
installation (without my patch) with ItemIdSetDead() -> ItemIdMarkDead() 
replacement.
Byte-by-byte comparison of master and replay pages shows only 2 bytes 
difference in the tuple storage part of page.

I don't stuck on yet, but good ideas are welcome.

--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company



Re: Constraint documentation

2018-06-25 Thread Vik Fearing
On 25/06/18 17:45, Lætitia Avrot wrote:
> +   
> +
> + Check constraint are not designed to enforce business rules across 
> tables.
> + Avoid using check constraints with function accessing other tables and

"with functions" or "with a function".  I prefer the former.

> + use  instead. Although PostgreSQL won't 
> prevent you
> + from doing so, but beware that dumps generated by 
> pg_dump

No but.

> + or pg_dumpall may be hard
> + to restore because of such checks, as the underlying dependencies are 
> not
> + taken into account.
> +
> +   

-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: [PATCH] Include application_name in "connection authorized" log message

2018-06-25 Thread Don Seiler
On Mon, Jun 25, 2018 at 10:50 AM, Gilles Darold 
wrote:

> I would prefer the following output to conform to elog.c output:
>
> 2018-06-25 17:32:46.854 CEST [23265] LOG:  connection authorized:
> user=postgres database=postgres application=[unknown]
>
Hello Gilles,

The different cases were for Stephen's request earlier in this thread. I'm
happy to write it either way and agree it would be nice to not have a lot
of duplicate code. Do you want me to submit another patch?

Don.

-- 
Don Seiler
www.seiler.us


Re: [PATCH] Include application_name in "connection authorized" log message

2018-06-25 Thread Stephen Frost
Greetings

On Mon, Jun 25, 2018 at 15:57 Don Seiler  wrote:

> On Mon, Jun 25, 2018 at 10:50 AM, Gilles Darold 
> wrote:
>
>> I would prefer the following output to conform to elog.c output:
>>
>> 2018-06-25 17:32:46.854 CEST [23265] LOG:  connection authorized:
>> user=postgres database=postgres application=[unknown]
>>
> Hello Gilles,
>
> The different cases were for Stephen's request earlier in this thread. I'm
> happy to write it either way and agree it would be nice to not have a lot
> of duplicate code. Do you want me to submit another patch?
>

I think it’s better to not change the line when application name isn’t
being set, as the latest patch has. That also matches how we handle the SSL
info- it’s only shown if ssl is in place on the connection.

Thanks!

Stephen

>


Re: [PATCH] Include application_name in "connection authorized" log message

2018-06-25 Thread Gilles Darold

Le 25/06/2018 à 21:57, Don Seiler a écrit :
On Mon, Jun 25, 2018 at 10:50 AM, Gilles Darold 
mailto:gilles.dar...@dalibo.com>> wrote:


I would prefer the following output to conform to elog.c output:

    2018-06-25 17:32:46.854 CEST [23265] LOG: connection
authorized: user=postgres database=postgres application=[unknown]

Hello Gilles,

The different cases were for Stephen's request earlier in this thread. 
I'm happy to write it either way and agree it would be nice to not 
have a lot of duplicate code. Do you want me to submit another patch?


Yes please send a new patch this will be easiest to manage with a single 
patch. I will give it a new test and change the status to "ready for 
committer". I will also prepare a patch to pgbadger to support the change.


Best regards,

--
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



Re: [PATCH] Include application_name in "connection authorized" log message

2018-06-25 Thread Gilles Darold

Le 25/06/2018 à 22:32, Stephen Frost a écrit :

Greetings

On Mon, Jun 25, 2018 at 15:57 Don Seiler > wrote:


On Mon, Jun 25, 2018 at 10:50 AM, Gilles Darold
mailto:gilles.dar...@dalibo.com>> wrote:

I would prefer the following output to conform to elog.c output:

    2018-06-25 17:32:46.854 CEST [23265] LOG: connection
authorized: user=postgres database=postgres application=[unknown]

Hello Gilles,

The different cases were for Stephen's request earlier in this
thread. I'm happy to write it either way and agree it would be
nice to not have a lot of duplicate code. Do you want me to submit
another patch?


I think it’s better to not change the line when application name isn’t 
being set, as the latest patch has. That also matches how we handle 
the SSL info- it’s only shown if ssl is in place on the connection.


Oh ok, sorry Stephen I've not seen your anwser before my last reply. I 
would prefer to have the information that the application_name is 
unknown but this is just to conform with the log parser and obviously 
the actual patch is enough, I change the status to "ready for 
committer", nothing need to be change.


--
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-06-25 Thread Alvaro Herrera
Hello

Firstly -- this is top-notch detective work, kudos and thanks for the
patch and test cases.  (I verified that both fail before the code fix.)

Here's a v3.  I applied a lot of makeup in order to try to understand
what's going on.  I *think* I have a grasp on the original code and your
bugfix, not terribly firm I admit.

Some comments

* you don't need to Assert that things are not NULL if you're
  immediately going to dereference them.  The assert is there to make
  the code crash in case it's a NULL pointer, but the subsequent
  dereference is going to have the same effect, so the assert is
  redundant.

* I think setting snapshot_base to NULL in ReorderBufferCleanupTXN is
  pointless, since the struct is gonna be freed shortly afterwards.

* I rewrote many comments (both existing and some of the ones your patch
  adds), and added lots of comments where there were none.

* I'm a bit unsure about the comment atop ReorderBufferSetBaseSnapshot.
  Obviously, the bit within the #if 0/#endif I'm going to remove before
  push.  I don't understand why it says "Needs to be called before any
  changes are added with ReorderBufferQueueChange"; but if you edit that
  function and add an assert that the base snapshot is set, it crashes
  pretty quickly in the test_decoding tests.  (The supposedly bogus
  comment was there before your patch -- I'm not saying your comment
  addition is faulty.)

* I also noticed that we're doing subtxn cleanup one by one in both
  ReorderBufferAssignChild and ReorderBufferCommitChild, which means the
  top-level txn is sought in the hash table over and over, which seems a
  bit silly.  Not this patch's problem to fix ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 1d601d8144..afcab930f7 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -50,7 +50,8 @@ regresscheck-install-force: | submake-regress 
submake-test_decoding temp-install
$(pg_regress_installcheck) \
$(REGRESSCHECKS)
 
-ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml
+ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
+   oldest_xmin snapshot_transfer
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
$(pg_isolation_regress_check) \
diff --git a/contrib/test_decoding/expected/oldest_xmin.out 
b/contrib/test_decoding/expected/oldest_xmin.out
new file mode 100644
index 00..d09342c4be
--- /dev/null
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit 
s0_checkpoint s0_get_changes s1_commit s0_vacuum s0_get_changes
+step s0_begin: BEGIN;
+step s0_getxid: SELECT txid_current() IS NULL;
+?column?   
+
+f  
+step s1_begin: BEGIN;
+step s1_insert: INSERT INTO harvest VALUES ((1, 2, 3));
+step s0_alter: ALTER TYPE basket DROP ATTRIBUTE mangos;
+step s0_commit: COMMIT;
+step s0_checkpoint: CHECKPOINT;
+step s0_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
+data   
+
+step s1_commit: COMMIT;
+step s0_vacuum: VACUUM FULL;
+step s0_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
+data   
+
+BEGIN  
+table public.harvest: INSERT: fruits[basket]:'(1,2,3)'
+COMMIT 
+?column?   
+
+stop   
diff --git a/contrib/test_decoding/expected/snapshot_transfer.out 
b/contrib/test_decoding/expected/snapshot_transfer.out
new file mode 100644
index 00..87bed03f76
--- /dev/null
+++ b/contrib/test_decoding/expected/snapshot_transfer.out
@@ -0,0 +1,49 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s0_begin s0_begin_sub0 s0_log_assignment 
s0_sub_get_base_snap s1_produce_new_snap s0_insert s0_end_sub0 s0_commit 
s0_get_changes
+step s0_begin: BEGIN;
+step s0_begin_sub0: SAVEPOINT s0;
+step s0_log_assignment: SELECT txid_current() IS NULL;
+?column?   
+
+f  
+step s0_sub_get_base_snap: INSERT INTO dummy VALUES (0);
+step s1_produce_new_snap: ALTER TABLE harvest ADD COLUMN mangos int;
+step s0_insert: INSERT INTO harvest VALUES (1, 2, 3);
+step s0_end_sub0: RELEASE SAVEPOINT s0;
+step s0_commit: COMMIT;
+step s0_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
+data   
+
+BEGIN  
+table public.dummy: INSERT: i[integer]:0
+table public.harvest: INSERT: apples[integer]:1 pears[integer]:2 
mangos[integer]:3
+COMMIT 
+?column?   
+
+stop   
+
+starting permutation: s0_begin s0_begin_sub0 s0_log_assignment s0_begin_sub1 
s0

Re: Psql patch to show access methods info

2018-06-25 Thread Nikita Glukhov

On 22.06.2018 16:48, Sergey Cherkashin wrote:


Hello!

There are command in psql to list access methods, but there are no fast
way to look detailed info about them. So here a patch with new
commands:


Hi!

I've done a preliminary in-company review of this patch several times.
Here is my review of its first published version.


\dAp [PATTERN]   list access methods with properties (Table
pg_am)


 * Should we rename it to \dAip and include "index" word into the table header?
   As you know, we are going to support table AMs in the future.


\dAf[+]  [AMPTRN [OPFPTRN]]  list operator families of access method. +
prints owner of operator family. (Table pg_opfamily)



\dAfp[AMPTRN [OPFPTRN]]  list procedures of operator family related
to access method (Table pg_amproc)


 * Reorder "Left"/"Right" and "Strategy"/"Proc name" columns.
 * Include "Left"/"Right" columns into ORDER BY clause.
 * Show procedure's argument types, because procedure's name does not completely
   identify procedure (for example, in_range() is used in several opclasses with
   different signatures).  Or maybe show arguments only if procedure name is not
   unique?


\dAfo[AMPTRN [OPFPTRN]]  list operators of family related to access
method (Table pg_amop)


 * Reorder "Left"/"Right" and "Strategy"/"Operator" columns.
 * Include "Left"/"Right" columns into ORDER BY clause.
 * Operator's schema is shown only if operator is invisible for the current
   user -- I'm not sure if this is correct.


\dAoc[+] [AMPTRN [OPCPTRN]]  list operator classes of index access
methods. + prints owner of operator class. (Table pg_opclass)


 * Maybe it would be better to show stored type only if it differs from the
   indexed type?


\dip[S]  [PATTERN]   list indexes with properties (Table
pg_class)



\dicp[S] [IDXNAME [COLNAME]] show index column properties (Table
pg_class)


 * Fix duplicate rows that appear in the table for composite indices.
 * Include "Column #" into ORDER BY clause.
 * Rename column "Null first" to "Nulls First" or "NULLS LAST".
 * Maybe it is not necessary to show "Access method" column here?
 * ASC, NULLS are shown as TRUE/FALSE only if the index is orderable, and as
   NULL if unorderable -- I'm not sure if this is correct.  Maybe we should
   simply show these properties in the literal form, not as booleans
   (as strings 'ASC'/'DESC', 'NULLS FIRST'/'NULLS LAST')?
 * I think we should show column's properties in the separate table for each
   index, because it is not so easy to understand the combined table.
   The same, perhaps, can be applied to \dAfp and \dAfo commands.
  


I also have a question about testing commands \dAf+ and \dAoc+: is it
good idea to test them by changing an owner of one operator family or
class to created new one, checking the output, and restoring the owner
back? Or we should create a new opclass or opfamily with proper owner.
Or maybe it is not necesary to test these commands?

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-25 Thread Alvaro Herrera
On 2018-Jun-18, David Rowley wrote:

> I've attached a patch which cleans up my earlier version and moves the
> setup of the append_rel_array into its own function instead of
> sneaking code into setup_simple_rel_arrays(). I've also now updated
> the comment above find_childrel_appendrelinfo(), which is now an
> unused function.

I checked that this patch fixes the originally reported performance
regression.

Unless there are objections, I intend to push this patch tomorrow.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Scariest patch tournament, PostgreSQL 11 edition

2018-06-25 Thread Alvaro Herrera
Hackers,

One month of beta testing has flown by, and enough bugs have already
been reported that your view of what patches are scariest might have
matured.  You still have a few days before we close the contest at the
end of the month.  Let us know what patches you think are scariest:

  
https://docs.google.com/forms/d/e/1FAIpQLSeE20Zzny34U6cEk20HYlQN9vsJzmT9maiLYCp1BdNHYvYCPA/viewform

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Constraint documentation

2018-06-25 Thread Brad DeJong
 On 25/06/18 17:45, Lætitia Avrot wrote:
> +   
> +
> + Check constraint are not designed to enforce business rules across
tables.
> + Avoid using check constraints with function accessing other tables
and

Subject/verb agreement - either "A check constraint is ..." or "Check
constraints are ..." would be appropriate.


Re: Scariest patch tournament, PostgreSQL 11 edition

2018-06-25 Thread Jeff Janes
On Mon, Jun 25, 2018 at 6:52 PM, Alvaro Herrera 
wrote:

> Hackers,
>
> One month of beta testing has flown by, and enough bugs have already
> been reported that your view of what patches are scariest might have
> matured.  You still have a few days before we close the contest at the
> end of the month.  Let us know what patches you think are scariest:
>
>   https://docs.google.com/forms/d/e/1FAIpQLSeE20Zzny34U6cEk20HYlQN
> 9vsJzmT9maiLYCp1BdNHYvYCPA/viewform
>
> Thanks,
>
>
Is there a summary of the results of the previous rounds?  I didn't see any
announcements of them.  I've been trying to find some crash recovery
torture testing to do for v11 release, but can't find features to focus on
which might be scariest from a WAL perspective.

Cheers,

Jeff


Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-25 Thread Tom Lane
Alvaro Herrera  writes:
> On 2018-Jun-18, David Rowley wrote:
>> I've attached a patch which cleans up my earlier version and moves the
>> setup of the append_rel_array into its own function instead of
>> sneaking code into setup_simple_rel_arrays(). I've also now updated
>> the comment above find_childrel_appendrelinfo(), which is now an
>> unused function.

> I checked that this patch fixes the originally reported performance
> regression.
> Unless there are objections, I intend to push this patch tomorrow.

If find_childrel_appendrelinfo is now unused, we should remove it.

regards, tom lane



Re: Incorrect fsync handling in pg_basebackup's tar_finish

2018-06-25 Thread Michael Paquier
On Mon, Jun 25, 2018 at 07:21:27PM +0530, Kuntal Ghosh wrote:
> I've also verified the same. The patch looks good to me.

Thanks for confirming.  I have pushed the fix down to 10.
--
Michael


signature.asc
Description: PGP signature


automatic restore point

2018-06-25 Thread Yotsunaga, Naoki
Hi, I'm a newbie to the hackers but I'd like to propose the "automatic restore 
point" feature. 
This feature automatically create backup label just before making a huge change 
to DB. It's useful when this change is accidental case.

The following is a description of "automatic restore point".
【Background】
  When DBA's operation failure, for example DBA accidently drop table, the 
database is restored from the file system backup and recovered by using time or 
transaction ID. The transaction ID is identified from WAL.
  But below are the following problems in using time or transaction ID.
   -Time
   ・Need to memorize the time of failure operation.
 (It is possible to identify the time from WAL. But it takes time and 
effort to identify the time.)
   ・Difficult to specify detail point.
   -Transaction ID
   ・It takes time and effort to identify the transaction ID.
  
  In order to solve the above problem, 
  I'd like propose a feature to implement automatic recording function of 
recovery point.
  
【Feature Description】
  In PostgreSQL, there is a backup control function "pg_create_restore_point()".
  User can create a named point for performing restore by using 
"pg_create_restore_point()".
  And user can recover by using the named point.
  So, execute "pg_create_restore_point()" automatically before executing the 
following command to create a point for performing restore(recovery point).
  The name of recovery point is the date and time when the command was executed.
  In this operation, target resource (database name, table name) and recovery 
point name are output as a message to PostgreSQL server log.
  
  - Commands wherein this feature can be appended  
   ・TRUNCATE
  ・DROP
   ・DELETE(Without WHERE clause)
  ・UPDATE(Without WHERE clause)
  ・COPY FROM
  
【How to use】
  1) When executing the above command, identify the command and recovery point 
name that matches the resource indicating the operation failure from the server 
log.
 
 ex)Message for executing TRUNCATE at 2018/6/1 12:30:30 (database 
name:testdb, table name:testtb)
    set recovery point. operation = 'truncate'
    database = 'testdb' relation = 'testtb' recovery_point_name = 
'2018-06-01-12:30:30'

   2) Implement PostgreSQL document '25 .3.4.Recovering Using a Continuous 
Archive Backup.'
 ※Set "recovery_target_name = 'recovery_point name'" at recovery.conf.

【Setting file】
  Set postgres.conf.
  auto_create_restore_point = on # Switch on/off automatic recording function 
of recovery point. The default value is 'off'.

So what do you think about it? Do you think is it useful?

Also, when recovering with the current specification, tables other than the 
returned table also return to the state of the specified recovery point. 
So, I’m looking for ways to recover only specific tables. Do you have any ideas?

--
Naoki Yotsunaga





Re: automatic restore point

2018-06-25 Thread David G. Johnston
On Mon, Jun 25, 2018 at 6:17 PM, Yotsunaga, Naoki <
yotsunaga.na...@jp.fujitsu.com> wrote:

> ​​
> So what do you think about it? Do you think is it useful?
>

​The cost/benefit ratio seems low...​

Also, when recovering with the current specification, tables other than the
> returned table also return to the state of the specified recovery point.
> So, I’m looking for ways to recover only specific tables. Do you have any
> ideas?
>

...and this lowers it even further.
​
I'd rather spend effort making the initial execution of said commands less
likely.  Something like:

TRUNCATE table YES_I_REALLY_WANT_TO_DO_THIS;

which will fail if you don't add the keyword "YES_I..." to the end of the
command and the system was setup to require it.

Or, less annoyingly:

BEGIN;
SET LOCAL perform_dangerous_action = true; --can we require local?
TRUNCATE table;
COMMIT;

David J.
​


Re: using expression syntax for partition bounds

2018-06-25 Thread Amit Langote
Horiguchi-san,

Thanks a lot for the review and sorry it took me a while to reply.
Thought I'd refresh the patch as it's in the July CF.

On 2018/04/24 18:08, Kyotaro HORIGUCHI wrote:
> Thanks. I have almost missed this.
> 
> At Mon, 23 Apr 2018 11:44:14 +0900, Amit Langote wrote:
>> On 2018/04/23 11:37, Amit Langote wrote:
>>> I tried to update the patch to do things that way.  I'm going to create a
>>> new entry in the next CF titled "generalized expression syntax for
>>> partition bounds" and add the patch there.
>>
>> Tweaked the commit message to credit all the authors.
> 
> +  any variable-free expression (subqueries, window functions, aggregate
> +  functions, and set-returning functions are not allowed).  The data type
> +  of the partition bound expression must match the data type of the
> +  corresponding partition key column.
> 
> Parititioning value is slimiar to column default expression in
> the sense that it appeas within a DDL. The description for
> DEFAULT is:
> 
> |  The value is any variable-free expression (subqueries and
> |  cross-references to other columns in the current table are not
> |  allowed)
> 
> It actually refuses aggregates, window functions and SRFs but it
> is not mentioned.  Whichever we choose, they ought to have the
> similar description.

Actually, I referenced the default value expression syntax a lot when
working on this issue, so agree that there's a close resemblance.

Maybe, we should fix the description for default expression to say that it
can't contain subqueries, cross-references to other columns in the table,
aggregate expressions, window functions, and set-returning functions.  I
also sent a patch separately:

https://www.postgresql.org/message-id/2059e8f2-e6e6-7a9f-0de8-11eed291e...@lab.ntt.co.jp

>>   /*
>>* Strip any top-level COLLATE clause, as they're inconsequential.
>>* XXX - Should we add a WARNING here?
>>*/
>>   while (IsA(value, CollateExpr))
>>   value = (Node *) ((CollateExpr *) value)->arg;
> 
> Ok, I'll follow Tom's suggestion but collate is necessarilly
> appears in this shape. For example ('a' collate "de_DE") || 'b')
> has the collate node under the top ExprOp and we get a complaint
> like following with it.
> 
>> ERROR:  unrecognized node type: 123
> 
> 123 is T_CollateExpr. The expression "value" (mmm..) reuires
> eval_const_expressions to eliminate CollateExprs.  It requires
> assign_expr_collations to retrieve value's collation but we don't
> do that since we ignore collation this in this case.

Ah yes, it seems better to call eval_const_expressions as a first step to
get rid of CollateExpr's, followed by evaluate_expr if the first step
didn't return a Const node.

> The following results in a strange-looking error.
> 
>> =# create table pt1 partition of p for values in (a);
>> ERROR:  column "a" does not exist
>> LINE 1: create table pt1 partition of p for values in (a);
> 
> The p/pt1 actually have a column a.
> 
> The following results in similar error and it is wrong, too.
> 
>> =# create table pr1 partition of pr for values from (a + 1) to (a + 2);
>> ERROR: column "a" does not exist
>> LINE 1: create table pr1 partition of pr for values from (a + 1) to ...

This one is better fixed by initializing ParseState properly as
demonstrated in your v3-2 patch.

> Being similar but a bit different, the following command leads to
> a SEGV since it creates PARTITION_RANGE_DATUM_VALUE with value =
> NULL. Even if it leaves the original node, validateInfiniteBounds
> rejects it and aborts.
> 
>> =# create table pr1 partition of pr for values from (hoge) to (hige);
> (crash)

Oops.

> I fixed this using pre-columnref hook in the attached v3. This
> behavles for columnrefs differently than DEFAULT.
> 
> The v3-2 does the same thing with DEFAULT. The behevior differs
> whether the column exists or not.
> 
>> ERROR:  cannot use column referencees in bound value
>> ERROR:  column "b" does not exist
> 
> I personally think that such similarity between DEFALUT and
> partition bound (v3-2) is not required. But inserting the hook
> (v3) doesn't look good for me.

Actually, I too like the solution of v3-2 better, instead of using the
hook, so I adopted it in the updated patch.

I also changed how range bounds are processed in transformPartitionBound,
moving some code into newly added transformPartitionRangeBounds to reduce
code duplication.

Updated patch attached.

Thanks,
Amit
>From 81aacd6f66fd935aa1df6997b3678726e3d951ee Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 18 Apr 2018 18:53:44 +0900
Subject: [PATCH v4] Allow generalized expression syntax for partition bounds

Authors: Kyotaro Horiguchi, Tom Lane, Amit Langote
---
 doc/src/sgml/ref/alter_table.sgml  |   6 +-
 doc/src/sgml/ref/create_table.sgml |  16 +--
 src/backend/commands/tablecmds.c   |   8 ++
 src/backend/optimizer/util/clauses.c   |   4 +-
 src/backend/parser/gram.y  |  60 +
 src/backend/parser/par

Re: PATCH: backtraces for error messages

2018-06-25 Thread Kyotaro HORIGUCHI
Hello. Thaks for discussing.

At Mon, 25 Jun 2018 17:08:41 +0800, Craig Ringer  wrote 
in 
> On 25 June 2018 at 14:21, Kyotaro HORIGUCHI  > > I think it's pretty strongly desirable for PANIC.
> >
> > Ah, I forgot about that. I agree to that. The cost to collect the
> > information is not a problem on PANIC. However still I don't
> > think stack trace is valuable on all PANIC messages. I can accept
> > the guc to control it but it is preferable that this works fine
> > without such an extra setup.
> >
> 
> Places such as?

Sorry but I didn't get this.

> > Agreed, it is the reality.  Instaed, can't we make a new error
> > classes PANIC_STACKDUMP and ERROR_STACKDUMP to explicitly
> > restrict stack dump for elog()?  Or elog_stackdump() and elog()
> > is also fine for me. Using it is easier than proper
> > annotating. It would be perfect if we could invent an automated
> > way but I don't think it is realistic.
> >
> 
> That needlessly complicates error severity levels with information not
> really related to the severity. -1 from me.

Ok.

> > Mmm. If I understand you correctly, I mean that perf doesn't dump
> > a backtrace on a probe point but trace points are usable to take
> > a symbolic backtrace. (I'm sorry that I cannot provide an example
> > since stap doesn't work in my box..)
> >
> 
> perf record --call-graph dwarf -e sdt_postgresql:checkpoint__start -u
> postgres
> perf report -g

Good to know that but unfortunately it doesn't work for me. It
(USDT support) is still in the perf Wiki Todo list. Some googling
told me that it is available on Linux 4.4. CentOS 7's kernel is
3.10. It seems a cutting-edge feature. However, I suppose that
what it emits is different from what wanted here.

But regardless of that, I agree that it is a bit hard to use
on-site.. (stap doesn't work for me from uncertain reason..)

> > If your intention is to take back traces without any setting (I
> > think it is preferable), it should be restricted to the required
> > points. It can be achieved by the additional error classes or
> > substitute error output functions.
> >
> 
> Who's classifying all the possible points?

I didn't mean classifying all points. I meant that only the
points where it is needed would be enough for the first step.

> Which PANICs or assertion failures do you want to exempt?

I didn't intended to cover all required casees. Intended that
save just a few or several obviously valuable cases, where we
gave up to add adequate context information for some reasons and
backtrace gives valuable information. Most xlog stuff wouldn't
need it. Lock stuff might need it. Heapam doesn't seem to need
since they seem to caused by calling order. etc... But such
control is out of this patch, I know.

> I definitely do not want to emit stacks for everything, like my patch
> currently does. It's just a proof of concept. Later on I'll want control on
> a fine grained level at runtime of when that happens, but that's out of
> scope for this. For now the goal is emit stacks at times it's obviously
> pretty sensible to have a stack, and do it in a way that doesn't require
> per-error-site maintenance/changes or create backport hassle.

Ok. The PoC is focusing on the means to emit backtraces and
controlling logs are the different issue, as Andres said
upthread.

I think that backtrace() generates somewhat strange-looking
output and we can get cleaner one using unwind. So I vote for
unwind to implement this. The cost is not a matter as far as we
intend to emit this for only Asserts or PANICs. Choosing some
ERRORs by a GUC is also fine for me.

> > As just an idea but can't we use an definition file on that
> > LOCATION of error messages that needs to dump a backtrace are
> > listed? That list is usually empty and should be very short if
> > any. The LOCATION information is easily obtained from a verbose
> > error message itself if once shown but a bit hard to find
> > otherwise..
> >
> 
> That's again veering into selective logging control territory. Rather than
> doing it for stack dump control only, it should be part of a broader
> control over dynamic and scoped verbosity, selective logging, and log
> options, like Pavan raised. I see stacks as just one knob that can be
> turned on/off here.
> 
> > (That reminds me, I need to chat with Devrim about creating a longer lived
> > > debuginfo + old versions rpms repo for Pg its self, if not the accessory
> > > bits and pieces. I'm so constantly frustrated by not being able to get
> > > needed debuginfo packages to investigate some core or running system
> > > problem because they've been purged from the PGDG yum repo as soon as a
> > new
> > > version comes out.)
> >
> > We in our department take care to preserve them for ourselves for
> > the necessity of supporting older systems. I sometimes feel that
> > It is very helpful if they were available on the official
> 
> 
> Maybe if I can get some interest in that, you might be willing to
> contribute your archives as a starter so w

Re: automatic restore point

2018-06-25 Thread Isaac Morland
On 25 June 2018 at 21:33, David G. Johnston 
wrote:

> On Mon, Jun 25, 2018 at 6:17 PM, Yotsunaga, Naoki <
> yotsunaga.na...@jp.fujitsu.com> wrote:
>
>> ​​
>> So what do you think about it? Do you think is it useful?
>>
> ​
> I'd rather spend effort making the initial execution of said commands less
> likely.  Something like:
>
> TRUNCATE table YES_I_REALLY_WANT_TO_DO_THIS;
>

I think an optional setting making DELETE and UPDATE without a WHERE clause
illegal would be handy. Obviously this would have to be optional for
backward compatibility. Perhaps even just a GUC setting, with the intent
being that one would set it in .psqlrc so that omitting the WHERE clause at
the command line would just be a syntax error. If one actually does need to
affect the whole table one can just say WHERE TRUE. For applications, which
presumably have their SQL queries tightly controlled and pre-written
anyway, this would most likely not be particularly useful.


Re: Few cosmetic suggestions for commit 16828d5c (Fast Alter Table Add Column...)

2018-06-25 Thread Amit Kapila
On Fri, Jun 15, 2018 at 9:46 AM, Amit Kapila  wrote:
> On Fri, Jun 15, 2018 at 12:54 AM, Andrew Dunstan
>  wrote:
>>
>> On 06/14/2018 02:01 PM, Alvaro Herrera wrote:
>>>
>>> We used to use prefixes for common struct members names to help
>>> disambiguate across members that would otherwise have identical names in
>>> different structs.  Our convention was to use _ as a separator.  This
>>> convention has been partially lost, but seems we can use it to good
>>> effect here, by renaming ammissingPresent to am_present and ammissing to
>>> am_missing (I would go as far as suggesting am_default or am_substitute
>>> or something like that).
>>
>> am_present and am_value perhaps? I'm not dogmatic about it.
>>
>
> +1.  Attached patch changed the names as per suggestion.
>
>>
>>
>>
>>> BTW I think "the result stored" is correct English.
>>>
>>
>> Yes, it certainly is.
>>
>
> Okay.
>
> How about attached?
>

Andrew, Alvaro, do you think we can go ahead with above naming
suggestions or do we want to brainstorm more on it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: automatic restore point

2018-06-25 Thread Rui DeSousa
Why not use auto commit off in the session or .psqlrc file or begin and then 
use rollback?  \set AUTOCOMMIT off

What would be nice is if a syntax error didn’t abort the transaction when auto 
commit is off — being a bad typist.








Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-06-25 Thread Kyotaro HORIGUCHI
Hello.

At Fri, 15 Jun 2018 11:19:21 +0530, Ashutosh Bapat 
 wrote in 

> On Tue, Jun 12, 2018 at 8:49 AM, Kyotaro HORIGUCHI
>  wrote:
> > Thanks for the discussion.
> >
> > At Thu, 7 Jun 2018 19:16:57 +0530, Ashutosh Bapat 
> >  wrote in 
> > 
> >> What's the problem that you faced?
> >
> > The required condtion for PARAM_EXEC to work is that executor
> > ensures the correspondence between the setter the reader of a
> > param like ExecNestLoop is doing. The Sort node breaks the
> > correspondence between the tuple obtained from the Foreign Scan
> > and that ForeignUpdate is updating. Specifically Foreign Update
> > upadtes the first tuple using the tableoid for the last tuple
> > from the Foreign Scan.
> 
> Ok. Thanks for the explanation.
> 
> >
> >> > Even if this worked fine, it cannot be back-patched.  We need an
> >> > extra storage moves together with tuples or prevent sorts or
> >> > something like from being inserted there.
> >>
> >> I think your approach still has the same problem that it's abusing the
> >> tableOid field in the heap tuple to store tableoid from the remote as
> >> well as local table. That's what Robert and Tom objected to [1], [2]
> >
> > It's wrong understanding. PARAM_EXEC conveys remote tableoids
> > outside tuples and each tuple is storing correct (= local)
> > tableoid.
> 
> In the patch I saw that we were setting tableoid field of HeapTuple to
> the remote table oid somewhere. Hence the comment. I might be wrong.

You should have seen make_tuple_from_result_row. The patch sets
real tableOid to returning tuples since I didn't find an usable
storage for the per-tuple value. Afterwards the parameters are
set from tup->t_tableOid in postgresIterateForeignScan.

ForeignNext overwrites t_tableOid of returned tuples with the
foreign table's OID if system column is requested.

> >> > At Fri, 1 Jun 2018 10:21:39 -0400, Ashutosh Bapat 
> >> >  wrote in 
> >> > 
> >> >> I am not suggesting to commit 0003 in my patch set, but just 0001 and
> >> >> 0002 which just raise an error when multiple rows get updated when
> >> >> only one row is expected to be updated.
> >> >
> >> > So I agree to commit the two at least in order to prevent doing
> >> > wrong silently.
> >>
> >> I haven't heard any committer's opinion on this one yet.
> >>
> >> [1] 
> >> https://www.postgresql.org/message-id/CA+TgmobUHHZiDR=hcu4n30yi9_pe175ittbfk6t8jxzwkra...@mail.gmail.com
> >> [2] https://www.postgresql.org/message-id/7936.1526590932%40sss.pgh.pa.us
> >
> > Agreed. We need any comment to proceed.
> >
> > I have demonstrated and actually shown a problem of the
> > PARAM_EXEC case. (It seems a bit silly that I actually found the
> > problem after it became almost workable, though..)
> 
> I think the general idea behind Tom's suggestion is that we have to
> use some node other than Var node when we update the targetlist with
> junk columns. He suggested Param since that gives us some place to
> store remote tableoid. But if that's not working, another idea (that
> Tom mentioned during our discussion at PGCon) is to invent a new node
> type like ForeignTableOid or something like that, which gets deparsed
> to "tableoid" and evaluated to the table oid on the foreign server.
> That will not work as it is since postgres_fdw code treats a foreign
> table almost like a local table in many ways e.g. it uses attr_used to

I think treating a foreign table as a local object is right. But
anyway it doesn't work.

> know which attributes are to be requested from the foreign server,
> build_tlist_to_deparse() only pulls Var nodes from the targelist of
> foreign table and so on. All of those assumptions will need to change
> with this approach.

Maybe. I agree.

> But good thing is because of join and aggregate
> push-down we already have ability to push arbitrary kinds of nodes
> down to the foreign server through the targetlist. We should be able
> to leverage that capability. It looks like a lot of change, which
> again doesn't seem to be back-portable.

After some struggles as you know, I agree to the opinion. As my
first impression, giving (physical) base relations (*1) an
ability to have junk attribute is rather straightforward.

Well, is our conclusion here like this?

- For existing versions, check the errorneous situation and ERROR out.
  (documentaion will be needed.)

- For 12, we try the above thing.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: automatic restore point

2018-06-25 Thread Justin Pryzby
On Tue, Jun 26, 2018 at 12:04:59AM -0400, Rui DeSousa wrote:
> Why not use auto commit off in the session or .psqlrc file or begin and then 
> use rollback?  \set AUTOCOMMIT off
> 
> What would be nice is if a syntax error didn’t abort the transaction when 
> auto commit is off — being a bad typist.

I think you'll get that behavior with ON_ERROR_ROLLBACK.

Justin



Re: automatic restore point

2018-06-25 Thread Rui DeSousa


> On Jun 26, 2018, at 12:37 AM, Justin Pryzby  wrote:
> 
> I think you'll get that behavior with ON_ERROR_ROLLBACK.
> 

Awesome. Thanks! 



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-06-25 Thread Ashutosh Bapat
On Tue, Jun 26, 2018 at 9:59 AM, Kyotaro HORIGUCHI
 wrote:
>
>> But good thing is because of join and aggregate
>> push-down we already have ability to push arbitrary kinds of nodes
>> down to the foreign server through the targetlist. We should be able
>> to leverage that capability. It looks like a lot of change, which
>> again doesn't seem to be back-portable.
>
> After some struggles as you know, I agree to the opinion. As my
> first impression, giving (physical) base relations (*1) an
> ability to have junk attribute is rather straightforward.

By giving base relations an ability to have junk attribute you mean to
add junk attribute in the targetlist of DML, something like
postgresAddForeignUpdateTargets(). You seem to be fine with the new
node approach described above. Just confirm.

>
> Well, is our conclusion here like this?
>
> - For existing versions, check the errorneous situation and ERROR out.
>   (documentaion will be needed.)
>
> - For 12, we try the above thing.

I think we have to see how invasive the fix is, and whether it's
back-portable. If it's back-portable, we back-port it and the problem
is fixed in previous versions as well. If not, we fix previous
versions to ERROR out instead of corrupting the database.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



partition tree inspection functions

2018-06-25 Thread Amit Langote
Hi.

As discussed a little while back [1] and also recently mentioned [2], here
is a patch that adds a set of functions to inspect the details of a
partition tree.  There are three functions:

pg_partition_parent(regclass) returns regclass
pg_partition_root_parent(regclass) returns regclass
pg_partition_tree_tables(regclass) returns setof regclass

Here is an example showing how one may want to use them.

create table p (a int, b int) partition by range (a);
create table p0 partition of p for values from (minvalue) to (0) partition
by hash (b);
create table p00 partition of p0 for values with (modulus 2, remainder 0);
create table p01 partition of p0 for values with (modulus 2, remainder 1);
create table p1 partition of p for values from (0) to (maxvalue) partition
by hash (b);
create table p10 partition of p1 for values with (modulus 2, remainder 0);
create table p11 partition of p1 for values with (modulus 2, remainder 1);
insert into p select i, i from generate_series(-5, 5) i;

select pg_partition_parent('p0') as parent;
 parent

 p
(1 row)

Time: 1.469 ms
select pg_partition_parent('p01') as parent;
 parent

 p0
(1 row)

Time: 1.330 ms
select pg_partition_root_parent('p01') as root_parent;
 root_parent
-
 p
(1 row)

selectp as relname,
  pg_partition_parent(p) as parent,
  pg_partition_root_parent(p) as root_parent
from  pg_partition_tree_tables('p') p;
 relname | parent | root_parent
-++-
 p   || p
 p0  | p  | p
 p1  | p  | p
 p00 | p0 | p
 p01 | p0 | p
 p10 | p1 | p
 p11 | p1 | p
(7 rows)

selectp as relname,
  pg_partition_parent(p) as parent,
  pg_partition_root_parent(p) as root_parent,
  pg_relation_size(p) as size
from  pg_partition_tree_tables('p') p;
 relname | parent | root_parent | size
-++-+--
 p   || p   |0
 p0  | p  | p   |0
 p1  | p  | p   |0
 p00 | p0 | p   | 8192
 p01 | p0 | p   | 8192
 p10 | p1 | p   | 8192
 p11 | p1 | p   | 8192
(7 rows)


selectsum(pg_relation_size(p)) as total_size
from  pg_partition_tree_tables('p') p;
 total_size
-
   32768
(1 row)

Feedback is welcome!

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/flat/495cec7e-f8d9-7e13-4807-90dbf4eec4ea%40lab.ntt.co.jp

[2]
https://www.postgresql.org/message-id/18e000e8-9bcc-1bb5-2f50-56d434c8be1f%40lab.ntt.co.jp
From 190158bd4b937b1978bfa29e8e9801fa04e0df0d Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 16 Jan 2018 19:02:13 +0900
Subject: [PATCH v1] Add assorted partition reporting functions

---
 doc/src/sgml/func.sgml  |  34 ++
 src/backend/catalog/partition.c | 129 ++--
 src/backend/utils/cache/lsyscache.c |  22 ++
 src/include/catalog/partition.h |   1 +
 src/include/catalog/pg_proc.dat |  18 +
 src/include/utils/lsyscache.h   |   1 +
 6 files changed, 201 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5dce8ef178..df621d1e17 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19995,6 +19995,40 @@ postgres=# SELECT * FROM 
pg_walfile_name_offset(pg_stop_backup());
 The function returns the number of new collation objects it created.

 
+   
+Partitioning Information Functions
+
+ 
+  Name Return Type 
Description
+ 
+
+ 
+  
+   
pg_partition_root_parent(regclass)
+   regclass
+   get root table of partition tree of which the table is 
part
+  
+  
+   
pg_partition_parent(regclass)
+   regclass
+   get parent table if the table is a partition, 
NULL otherwise
+  
+  
+   
pg_partition_tree_tables(regclass)
+   setof regclass
+   get all tables in partition tree under given root table
+  
+ 
+
+   
+
+   
+If the table passed to pg_partition_root_parent is not
+a partition, the same table is returned as the result.  Result of
+pg_partition_tree_tables also contains the table
+that's passed to it as the first row.
+   
+
   
 
   
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 558022647c..5b3e8d52c5 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -23,13 +23,16 @@
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_partitioned_table.h"
+#include "funcapi.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/prep.h"
 #include "optimizer/var.h"
 #include "partitioning/partbounds.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/builtins.h"
 #include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/partcache.h"
 #include "utils/rel.h"
 #include "ut

Re: automatic restore point

2018-06-25 Thread Michael Paquier
On Mon, Jun 25, 2018 at 11:01:06PM -0400, Isaac Morland wrote:
> I think an optional setting making DELETE and UPDATE without a WHERE clause
> illegal would be handy. Obviously this would have to be optional for
> backward compatibility. Perhaps even just a GUC setting, with the intent
> being that one would set it in .psqlrc so that omitting the WHERE clause at
> the command line would just be a syntax error. If one actually does need to
> affect the whole table one can just say WHERE TRUE. For applications, which
> presumably have their SQL queries tightly controlled and pre-written
> anyway, this would most likely not be particularly useful.

There was a patch doing exactly that which was discussed last year:
https://commitfest.postgresql.org/13/948/
https://www.postgresql.org/message-id/20160721045746.ga25...@fetter.org
What was proposed was rather limiting though, see my messages on the
thread.  Using a hook, that's simple enough to develop an extension
which does that.
--
Michael


signature.asc
Description: PGP signature


Re: automatic restore point

2018-06-25 Thread Michael Paquier
On Tue, Jun 26, 2018 at 01:17:31AM +, Yotsunaga, Naoki wrote:
> The following is a description of "automatic restore point".
> 【Background】
>   When DBA's operation failure, for example DBA accidently drop table,
> the database is restored from the file system backup and recovered by
> using time or transaction ID. The transaction ID is identified from
> WAL.
>   
>   In order to solve the above problem, 
>   I'd like propose a feature to implement automatic recording function
>   of recovery point.

There is also recovery_target_lsn which is new as of v10.  This
parameter is way better than having to track down time or XID, which is
a reason why I developped it.  Please note that this is also one of the
reasons why it is possible to delay WAL replays on standbys, so as an
operator has room to fix such operator errors.  Having of course cold
backups with a proper WAL archive and a correct retention policy never
hurts.

> 【Setting file】
>   Set postgres.conf.
>   auto_create_restore_point = on # Switch on/off automatic recording
>   function of recovery point. The default value is 'off'.
> 
> So what do you think about it? Do you think is it useful?

So basically what you are looking for here is a way to enforce a restore
point to be created depending on a set of pre-defined conditions?  How
would you define and choose those?

> Also, when recovering with the current specification, tables other
> than the returned table also return to the state of the specified
> recovery point.
> So, I’m looking for ways to recover only specific tables. Do you have
> any ideas? 

Why not using the utility hook which filters out for commands you'd
like to forbid, in this case TRUNCATE or a DROP TABLE on a given
relation?  Or why not simply using an event trigger at your application
level so as you can actually *prevent* the error to happen first?  With
the last option you don't have to write C code, but this would not
filter TRUNCATE.  In short, what you propose looks over-complicated to
me and there are options on the table which allow the problem you are
trying to solve to not happen at all.  You could also use the utility
hook to log or register somewhere hte XID/time/LSN associated to a given
command and then use it as your restore point.  This could also happen
out of core.
--
Michael


signature.asc
Description: PGP signature


Re: Concurrency bug in UPDATE of partition-key

2018-06-25 Thread Amit Khandekar
On 25 June 2018 at 17:20, Amit Kapila  wrote:
> On Mon, Jun 25, 2018 at 3:06 PM, Amit Khandekar  
> wrote:
>> On 23 June 2018 at 15:46, Amit Kapila  wrote:

>>>
>>> Why do you need to update when newslot is NULL?  Already *epqslot is
>>> initialized as NULL in the caller (ExecUpdate). I think we only want
>>> to update it when trigtuple is not null.
>>
>> But GetTupleForTrigger() updates the epqslot to NULL even when it
>> returns NULL. So to be consistent with it, we want to do the same
>> thing for ExecBRDeleteTriggers() : Update the epqslot even when
>> GetTupleForTrigger() returns NULL.
>>
>> I think the reason we are doing "*newSlot = NULL" as the very first
>> thing in the GetTupleForTrigger() code, is so that callers don't have
>> to initialise newSlot to NULL before calling GetTupleForTrigger. And
>> so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
>> can follow the same approach while calling ExecDelete().
>>
>
> Yeah, we can do that if it is required.

It is not required as such, and there is no correctness issue.

> I see your point, but I feel that is making code bit less readable.

I did it that way only to be consistent with the existing trigger.c
code, namely ExecBRUpdateTriggers() where it sets newSlot to NULL
immediately. If you find some appropriate comments to make that
snippet more readable, that can work.

Or else, we can go ahead with the way you did it in your patch; and
additionally mention in the ExecBRDeleteTriggers() header comments
that epqslot value is undefined if there was no concurrently updated
tuple passed. To me, explaining this last part seems confusing.

>
>> I understand that before calling ExecDelete() epqslot is initialized
>> to NULL, so it is again not required to do the same inside
>> GetTupleForTrigger(), but as per my above explanation, it is actually
>> not necessary to initialize to NULL before calling ExecDelete(). So I
>> can even remove that initialization.
>>
>
> If you remove that initialization, then won't you need to do something
> in ExecDelete() as well because there the patch assigns a value to
> epqslot only if EvalPlanQual returns non-null value?

Ah, right. So skipping the initialization before calling ExecDelete()
is not a good idea then.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: Loaded footgun open_datasync on Windows

2018-06-25 Thread Michael Paquier
On Mon, Jun 25, 2018 at 06:10:21PM +0200, Laurenz Albe wrote:
> I have added it to the July commitfest.

Have you looked at the possibility of removing the log file constraints
in pg_upgrade with the change you are doing here so as things would be
more consistent with non-Windows platforms, simplifying some code on the
way?  This could also be used with "vcregress upgradecheck" to see if
the concurrency is able to work its way, validating the proposed
patch...
--
Michael


signature.asc
Description: PGP signature


Re: Regarding the correct and best way to fetching a tablename in contrib module

2018-06-25 Thread Michael Paquier
On Mon, Jun 25, 2018 at 07:06:06PM +0530, Gaurav Mishra wrote:
> here what i expect as a output :
> 
> create extension table_name-extract;
> 
> select extract_tablename();
> 
> table-name  datatype-name
> new_table  name new
> new_table1 name1 new1
> 
> extract_tablename : this function should give me a table name with
> data_type inside contrib module so that i can use in the extension .

It is possible to know what is the set of objects registered within an
extension using a lookup to system catalogs.  The internal set of
queries used by psql is full of good patterns, so why not looking at
what psql -E does when using psql shortcuts and build up a query which
does what you are looking for?  If you filter the set of objects by type
(a table), then you can look at pg_attribute and get all the column names
you are looking for.  There is also not special need to write C code for
that.
--
Michael


signature.asc
Description: PGP signature


"Access privileges" is missing after pg_dumpall

2018-06-25 Thread Prabhat Sahu
Hi,

I have taken pg_dumpall in pg-master and after restoring the dump I am not
able to see the "Access privileges" as below:
Same is reproducible in back branches as well, is this fine ?


CREATE ROLE user1 PASSWORD 'user1' SUPERUSER LOGIN;
CREATE DATABASE db1 OWNER=user1;
GRANT ALL ON DATABASE db1 TO user1;

postgres=# \l+ db1
 List of databases
 Name | Owner | Encoding |  Collate  |   Ctype| Access
privileges   |  Size   | Tablespace | Description
--+---+--+++---+-++-
 db1| user1   | UTF8   | en_US.utf8 | en_US.utf8 | =Tc/user1
 +| 7621 kB | pg_default  |
   | | ||
  | user1=CTc/user1|   ||
(1 row)

postgres=# SELECT datname as "Relation", datacl as "Access permissions"
FROM pg_database WHERE datname = 'db1';
 Relation | Access permissions
--+-
 db1  | {=Tc/user1,user1=CTc/user1}
(1 row)


-- pg_dumpall
./pg_dumpall > /tmp/dumpall.sql

-- Restore
./psql -a -f /tmp/dumpall.sql


postgres=# \l+ db1
 List of databases
 Name | Owner | Encoding |  Collate   |   Ctype   | Access
privileges |  Size  | Tablespace | Description
--+---+--+++---+-++-
 db1| user1   | UTF8   | en_US.utf8 | en_US.utf8 |
| 7699 kB | pg_default |
(1 row)

postgres=# SELECT datname as "Relation", datacl as "Access permissions"
FROM pg_database WHERE datname = 'db1';
 Relation | Access permissions
--+
 db1  |
(1 row)

--

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Corporation

The Postgres Database Company


Global shared meta cache

2018-06-25 Thread Ideriha, Takeshi
Hi, hackers!

My customer created hundreds of thousands of partition tables and tried to 
select data from hundreds of applications,
which resulted in enormous consumption of memory because it consumed # of 
backend multiplied by # of local memory (ex. 100 backends X 1GB = 100GB).
Relation caches are loaded on each backend local memory. 

To address this issue I'm trying to move meta caches like catcache or relcache 
into shared memory.

This topic seems to have been discussed several times.
For instance this thread: 
https://www.postgresql.org/message-id/CA%2BTgmobjDw_SWsxyJwT9z-YOwWv0ietuQx5fb%3DWEYdDfvCbzGQ%40mail.gmail.com
 

In my understanding, it discussed moving catcache and relcache to shared memory 
rather than current local backend memory,
and is most concerned with performance overhead.

Robert Haas wrote:
> I think it would be interested for somebody to build a prototype here
> that ignores all the problems but the first and uses some
> straightforward, relatively unoptimized locking strategy for the first
> problem. Then benchmark it. If the results show that the idea has
> legs, then we can try to figure out what a real implementation would
> look like.
> (One possible approach: use Thomas Munro's DHT stuff to build the shared 
> cache.)

I'm inspired by this comment and now developing a prototype (please see 
attached),
but I haven't yet put cache structure on shared memory.
Instead, I put dummy data on shared memory which is initialized at startup, 
and then acquire/release lock just before/after searching/creating catcache 
entry.

I haven't considered relcache and catcachelist either.
It is difficult for me to do everything at one time with right direction. 
So I'm trying to make small prototype and see what I'm walking on the proper 
way.

I tested pgbench to compare master branch with my patch. 

0) Environment
   - RHEL 7.4
   - 16 cores
   - 128 GB memory

1) Initialized with pgbench -i -s10

2) benchmarked 3 times for each conditions and got the average result of TPS.
 |master branch | prototype  | 
proto/master (%)
   

   pgbench -c48 -T60 -Msimple -S   | 131297|130541 |101%
   pgbench -c48 -T60 -Msimple  | 4956  |4965   |95%
   pgbench -c48 -T60 -Mprepared -S |129688 |132538 |97%
   pgbench -c48 -T60 -Mprepared|5113   |4615   |84%

  This result seems to show except for prepared protocol with "not only SELECT" 
it didn't make much difference.
   

What do you think about it?
Before I dig deeper, I want to hear your thoughts.

Best regards,
Takeshi Ideriha



001_global_meta_cache.patch
Description: 001_global_meta_cache.patch