Re: libpq compression

2020-11-01 Thread Konstantin Knizhnik

Hi

On 01.11.2020 12:37, Daniil Zakhlystov wrote:

Hi,

I have a couple of comments regarding the last patch, mostly these are 
minor issues.


In src/backend/libpq/pqcomm.c, starting from the line 1114:

int
pq_getbyte_if_available(unsigned char *c)
{
intr;

Assert(PqCommReadingMsg);

if (PqRecvPointer < PqRecvLength || (0) > 0)   // not easy to 
understand optimization (maybe add a comment?)

{
*c = PqRecvBuffer[PqRecvPointer++];
return 1;
}
return r;             // returned value is not initialized
}


Sorry, I don't understand it.
This is the code of pq_getbyte_if_available:

int
pq_getbyte_if_available(unsigned char *c)
{
    int            r;

    Assert(PqCommReadingMsg);

    if (PqRecvPointer < PqRecvLength || (r = pq_recvbuf(true)) > 0)
    {
        *c = PqRecvBuffer[PqRecvPointer++];
        return 1;
    }
    return r;
}


So "return r" branch is executed when both conditions are false: 
(PqRecvPointer < PqRecvLength)

and ((r = pq_recvbuf(true)) > 0)
Last condition cause assignment of "r" variable.

I wonder how did you get this "returned value is not initialized" warning?
Is it produced by some static analyze tool or compiler?

In any case, I will initialize "r" variable to make compiler happy.



In src/interfaces/libpq/fe-connect.c, starting from the line 3255:

pqGetc(&algorithm, conn);
impl = zpq_get_algorithm_impl(algorithm);
{// I believe that if (impl < 0) condition is missing here, otherwise 
there is always an error

   appendPQExpBuffer(&conn->errorMessage,
                 libpq_gettext(
"server is not supported requested compression algorithm %c\n"), 
algorithm);

   goto error_return;
}



Sorry I have fixed this mistyping several days ago in GIT repository
 g...@github.com:postgrespro/libpq_compression.git
but did;t attach new version of the patch because I plan to make more 
changes as a result of Andres review.





In configure, starting from the line 1587:

--without-zlib          do not use Zlib
--with-zstd             do not use zstd // is this correct?



Thank you for noting it: fixed.


diff --git a/configure b/configure
index ace4ed5dec..deba608bcf 100755
--- a/configure
+++ b/configure
@@ -700,6 +700,7 @@ LD
 LDFLAGS_SL
 LDFLAGS_EX
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 XML2_LIBS
@@ -867,6 +868,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 '
@@ -8571,6 +8573,85 @@ fi
 
 
 
+#
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
 
 #
 # Zlib
diff --git a/configure.ac b/configure.ac
index 5b91c83fd0..93a528539a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -999,6 +999,13 @@ PGAC_ARG_BOOL(with, zlib, yes,
   [do not use Zlib])
 AC_SUBST(with_zlib)
 
+#
+# Zstd
+#
+PGAC_ARG_BOOL(with, zstd, no,
+  [use zstd])
+AC_SUBST(with_zstd)
+
 #
 # Assignments
 #
@@ -1186,6 +1193,14 @@ failure.  It is possible the compiler isn't looking in the proper directory.
 Use --without-zlib to disable zlib support.])])
 fi
 
+if test "$with_zstd" = yes; then
+  AC_CHECK_LIB(zstd, ZSTD_decompressStream, [],
+   [AC_MSG_ERROR([zstd library not found
+If you have zstd already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-zstd to disable zstd support.])])
+fi
+
 if test "$enable_spinlocks" = yes; then
  

Re: Consistent error reporting for encryption/decryption in pgcrypto

2020-11-01 Thread Michael Paquier
On Sat, Oct 31, 2020 at 09:40:12PM +0100, Daniel Gustafsson wrote:
> Ah yes, I accidentally fat-fingered the git add -p when splitting up the NSS
> patch into bite size pieces. Sorry about that. The attached v2 has the error
> declaration.

Thanks for updatng the patch.  Applied.
--
Michael


signature.asc
Description: PGP signature


Re: libpq compression

2020-11-01 Thread Daniil Zakhlystov
Hi,

I have a couple of comments regarding the last patch, mostly these are minor 
issues.

In src/backend/libpq/pqcomm.c, starting from the line 1114:

int
pq_getbyte_if_available(unsigned char *c)
{
int r;

Assert(PqCommReadingMsg);

if (PqRecvPointer < PqRecvLength || (0) > 0)   // not easy to 
understand optimization (maybe add a comment?)
{
*c = PqRecvBuffer[PqRecvPointer++];
return 1;
}
return r; // returned value is not initialized
}

In src/interfaces/libpq/fe-connect.c, starting from the line 3255:

pqGetc(&algorithm, conn);
impl = zpq_get_algorithm_impl(algorithm);
{   
 // I believe that if (impl < 0) condition is missing here, otherwise there is 
always an error
   appendPQExpBuffer(&conn->errorMessage,
 libpq_gettext(
"server is not supported 
requested compression algorithm %c\n"), algorithm);
   goto error_return;
}

In configure, starting from the line 1587:

--without-zlib  do not use Zlib
--with-zstd do not use zstd // is this correct?

Thanks

--
Daniil Zakhlystov

> On Nov 1, 2020, at 12:08 AM, Konstantin Knizhnik  
> wrote:
> 
> Hi
> 
> On 31.10.2020 00:03, Andres Freund wrote:
>> Hi,
>> 
>> On 2020-10-29 16:45:58 +0300, Konstantin Knizhnik wrote:
 - What does " and it is up to the client whether to continue work
without compression or report error" actually mean for a libpq 
 parameter?
>>> It can not happen.
>>> The client request from server use of compressed protocol only if
>>> "compression=XXX" was specified in connection string.
>>> But XXX should be supported by client, otherwise this request will be
>>> rejected.
>>> So supported protocol string sent by client can never be empty.
>> I think it's pretty important for features like this to be able to fail
>> softly when the feature is not available on the other side. Otherwise a
>> lot of applications need to have unnecessary version dependencies coded
>> into them.
> Sorry, may be I do not completely understand your suggestion.
> Right now user jut specify that he wants to use compression.
> Libpq client sends to the server list of supported algorithms
> and server choose among them the best one is supported.
> It sends it chose to the client and them are both using this algorithm.
> 
> Sorry, that in previous mail I have used incorrect samples: client is not 
> explicitly specifying compression algorithm -
> it just request compression. And server choose the most efficient algorithm 
> which is supported both by client and server.
> So client should not know names of the particular algorithms (i.e. zlib, 
> zstd) and
> choice is based on the assumption that server (or better say programmer)
> knows better than user which algorithms is more efficient.
> Last assumption me be contested because user better know which content will 
> be send and which
> algorithm is more efficient for this content. But right know when the choice 
> is between zstd and zlib,
> the first one is always better: faster and provides better quality.
> 
>> 
 - What is the point of the "streaming mode" reference?
>>> There are ways of performing compression:
>>> - block mode when each block is individually compressed (compressor stores
>>> dictionary in each compressed blocked)
>>> - stream mode
>>> Block mode allows to independently decompress each page. It is good for
>>> implementing page or field compression (as pglz is used to compress toast
>>> values).
>>> But it is not efficient for compressing client-server protocol commands.
>>> It seems to me to be important to explain that libpq is using stream mode
>>> and why there is no pglz compressor
>> To me that seems like unnecessary detail in the user oriented parts of
>> the docs at least.
>> 
> Ok, I will remove this phrase.
> 
 Why are compression methods identified by one byte identifiers? That
 seems unnecessarily small, given this is commonly a once-per-connection
 action?
>>> It is mostly for simplicity of implementation: it is always simple to work
>>> with fixed size messages (or with array of chars rather than array of
>>> strings).
>>> And I do not think that it can somehow decrease flexibility: this one-letter
>>> algorihth codes are not visible for user. And I do not think that we
>>> sometime will support more than 127 (or even 64 different compression
>>> algorithms).
>> It's pretty darn trivial to have a variable width protocol message,
>> given that we have all the tooling for that. Having a variable length
>> descriptor allows us to extend the compression identifier with e.g. the
>> level without needing to change the protocol. E.g. zstd:9 or
>> zstd:level=9 or such.
>> 
>> I suggest using a variable length string as the identifier, and split it
>> at the first : for the lookup, and pas

Re: libpq compression

2020-11-01 Thread Daniil Zakhlystov
It looks there was a different version of pq_getbyte_if_available on my side, 
my bad.

I’ve reapplied the patch and compiler is happy now, thanks!

—
Daniil Zakhlystov

> On Nov 1, 2020, at 3:01 PM, Konstantin Knizhnik  
> wrote:
> 
> Hi
> 
> On 01.11.2020 12:37, Daniil Zakhlystov wrote:
>> Hi,
>> 
>> I have a couple of comments regarding the last patch, mostly these are minor 
>> issues.
>> 
>> In src/backend/libpq/pqcomm.c, starting from the line 1114:
>> 
>> int
>> pq_getbyte_if_available(unsigned char *c)
>> {
>>  int r;
>> 
>>  Assert(PqCommReadingMsg);
>> 
>>  if (PqRecvPointer < PqRecvLength || (0) > 0)   // not easy to 
>> understand optimization (maybe add a comment?)
>>  {
>>  *c = PqRecvBuffer[PqRecvPointer++];
>>  return 1;
>>  }
>>  return r; // returned value is not initialized
>> }
> 
> Sorry, I don't understand it.
> This is the code of pq_getbyte_if_available:
> 
> int
> pq_getbyte_if_available(unsigned char *c)
> {
> intr;
> 
> Assert(PqCommReadingMsg);
> 
> if (PqRecvPointer < PqRecvLength || (r = pq_recvbuf(true)) > 0)
> {
> *c = PqRecvBuffer[PqRecvPointer++];
> return 1;
> }
> return r;
> }
> 
> 
> So "return r" branch is executed when both conditions are false: 
> (PqRecvPointer < PqRecvLength)
> and ((r = pq_recvbuf(true)) > 0)
> Last condition cause assignment of "r" variable.
> 
> I wonder how did you get this "returned value is not initialized" warning?
> Is it produced by some static analyze tool or compiler? 
> 
> In any case, I will initialize "r" variable to make compiler happy.
> 
> 
>> In src/interfaces/libpq/fe-connect.c, starting from the line 3255:
>> 
>> pqGetc(&algorithm, conn);
>> impl = zpq_get_algorithm_impl(algorithm);
>> {
>>  // I believe that if (impl < 0) condition is missing here, otherwise there 
>> is always an error
>>appendPQExpBuffer(&conn->errorMessage,
>>  libpq_gettext(
>> "server is not supported 
>> requested compression algorithm %c\n"), algorithm);
>>goto error_return;
>> }
>> 
> 
> Sorry I have fixed this mistyping several days ago in GIT repository
>  g...@github.com:postgrespro/libpq_compression.git 
> 
> but did;t attach new version of the patch because I plan to make more changes 
> as a result of Andres review.
> 
> 
> 
>> In configure, starting from the line 1587:
>> 
>> --without-zlib  do not use Zlib
>> --with-zstd do not use zstd // is this correct?
>> 
> 
> Thank you for noting it: fixed.
> 
> 
> 



Re: Add important info about ANALYZE after create Functional Index

2020-11-01 Thread Michael Paquier
On Sun, Nov 01, 2020 at 09:23:44AM +0900, Michael Paquier wrote:
> By doing so, there is no need to include pg_statistic.h in index.c.
> Except that, the logic looks fine at quick glance.  In the long-term,
> I also think that it would make sense to move both routnes out of
> heap.c into a separate pg_statistic.c.  That's material for a separate
> patch of course.

I have looked again at that, and applied it after some tweaks.
Particularly, I did not really like the use of "old" and "new" for the
copy from the old to a new relation in the new function, so I have
replaced that by "from" and "to".
--
Michael


signature.asc
Description: PGP signature


how to replicate test results in cf-bot on travis

2020-11-01 Thread Dave Cramer
For my patch https://commitfest.postgresql.org/30/2522/

When I run

make -j4 all contrib && make check-world
locally

I see 2 errors.

When cf-bot runs this it sees
35 out of 93 failed.

How can I see the same errors?

Dave Cramer


Re: Support for NSS as a libpq TLS backend

2020-11-01 Thread Andrew Dunstan


On 10/29/20 11:20 AM, Daniel Gustafsson wrote:
>> On 28 Oct 2020, at 07:39, Andres Freund  wrote:
>> Have you done testing to ensure that NSS PG cooperates correctly with
>> openssl PG? Is there a way we can make that easier to do? E.g. allowing
>> to build frontend with NSS and backend with openssl and vice versa?
> When I wrote the Secure Transport patch I had a patch against PostgresNode
> which allowed for overriding the server binaries like so:
>
>SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check
>
> I've used that coupled with manual testing so far to make sure that an openssl
> client can talk to an NSS backend and so on.  Before any other backend is 
> added
> we clearly need *a* way of doing this, one which no doubt will need to be
> improved upon to suit more workflows.
>
> This is sort of the same situation as pg_upgrade, where two trees is needed to
> really test it.
>
> I can clean that patch up and post as a starting point for discussions.
>
> if test "$with_openssl" = yes ; then
> +  if test x"$with_nss" = x"yes" ; then
> +AC_MSG_ERROR([multiple SSL backends cannot be enabled 
> simultaneously"])
> +  fi
 Based on a quick look there's no similar error check for the msvc
 build. Should there be?
>>> Thats a good question.  When embarking on this is seemed quite natural to me
>>> that it should be, but now I'm not so sure.  Maybe there should be a
>>> --with-openssl-preferred like how we handle readline/libedit or just allow
>>> multiple and let the last one win?  Do you have any input on what would make
>>> sense?
>>>
>>> The only thing I think makes no sense is to allow multiple ones at the same
>>> time given the current autoconf switches, even if it would just be to pick 
>>> say
>>> pg_strong_random from one and libpq TLS from another.
>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing
>> to check for errors.
> Thats another option, with --with-openssl being an alias for 
> --with-ssl=openssl.
>
> After another round of thinking I like this even better as it makes the build
> infra cleaner, so the attached patch has this implemented.
>
>> Even better, of course, would be to allow switching of the SSL backend
>> based on config options (PGC_POSTMASTER GUC for backend, connection
>> string for frontend). Mainly because that would make testing of
>> interoperability so much easier.  Obviously still a few places like
>> pgcrypto, randomness, etc, where only a compile time decision seems to
>> make sense.
> It would make testing easier, but the expense seems potentially rather high.
> How would a GUC switch be allowed to operate, would we have mixed backends or
> would be require all openssl connectins to be dropped before serving nss ones?
>
> +  CLEANLDFLAGS="$LDFLAGS"
> +  # TODO: document this set of LDFLAGS
> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
 Shouldn't this use nss-config or such?
>>> Indeed it should, where available.  I've added rudimentary support for that
>>> without a fallback as of now.
>> When would we need a fallback?
> One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship
> an nss-config AFAICT. I wouldn't be surprised if there are other cases.
>
 I think it'd also be better if we could include these files as nss/ssl.h
 etc - ssl.h is a name way too likely to conflict imo.
>>> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
>>> will still need the direct path to the headers (from autoconf) since nss.h
>>> includes NSPR headers as #include  and so on.
>> Hm. Then it's probably not worth going there...
> It does however make visual parsing of the source files easer since it's clear
> which ssl.h is being referred to.  I'm in favor of keeping it.
>
> +static SECStatus
> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool 
> isServer)
> +{
> + SECStatus   status;
> + Port   *port = (Port *) arg;
> + CERTCertificate *cert;
> + char   *peer_cn;
> + int len;
> +
> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, 
> checksig, PR_TRUE);
> + if (status == SECSuccess)
> + {
> + cert = SSL_PeerCertificate(port->pr_fd);
> + len = strlen(cert->subjectName);
> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
> + if (strncmp(cert->subjectName, "CN=", 3) == 0)
> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len 
> + 1);
> + else
> + strlcpy(peer_cn, cert->subjectName, len + 1);
> + CERT_DestroyCertificate(cert);
> +
> + port->peer_cn = peer_cn;
> + port->peer_cert_valid = true;
 Hm. We either should have something similar to

/*
 * Reject embedded NULLs in certificate com

Re: psql \df choose functions by their arguments

2020-11-01 Thread Greg Sabino Mullane
Thanks for the feedback, attached is version two of the patch. Major
changes:

* Use booleans not generic "int x"
* Build a quick list of abbreviations at the top of the function
* Add array mapping for all types
* Removed the tab-complete bit, it was too fragile and unhelpful

Cheers,
Greg
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..cf3e8c7134 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1574,7 +1574,7 @@ testdb=>
 
 
   
-\df[anptwS+] [ pattern ]
+\df[anptwS+] [ pattern ] [ types ]
 
 
 
@@ -1587,6 +1587,7 @@ testdb=>
 If pattern is specified, only
 functions whose names match the pattern are shown.
+Any additional words are considered type arguments to help narrow the list of returned functions.
 By default, only user-created
 objects are shown; supply a pattern or the S
 modifier to include system objects.
@@ -1598,7 +1599,7 @@ testdb=>
 
 
 
-To look up functions taking arguments or returning values of a specific
+To look up functions returning values of a specific
 data type, use your pager's search capability to scroll through the
 \df output.
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index c7a83d5dfc..426603b0cb 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -783,6 +783,8 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 			case 'f':			/* function subsystem */
 switch (cmd[2])
 {
+		char	   *funcargs;
+
 	case '\0':
 	case '+':
 	case 'S':
@@ -791,7 +793,9 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 	case 'p':
 	case 't':
 	case 'w':
-		success = describeFunctions(&cmd[2], pattern, show_verbose, show_system);
+		funcargs = psql_scan_slash_option(scan_state, OT_WHOLE_LINE, NULL, true);
+		success = describeFunctions(&cmd[2], pattern, show_verbose, show_system, funcargs);
+		free(funcargs);
 		break;
 	default:
 		status = PSQL_CMD_UNKNOWN;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 07d640021c..a8d3f3ba53 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -26,6 +26,7 @@
 #include "fe_utils/print.h"
 #include "fe_utils/string_utils.h"
 #include "settings.h"
+#include "stringutils.h"
 #include "variables.h"
 
 static bool describeOneTableDetails(const char *schemaname,
@@ -312,7 +313,7 @@ describeTablespaces(const char *pattern, bool verbose)
  * and you can mix and match these in any order.
  */
 bool
-describeFunctions(const char *functypes, const char *pattern, bool verbose, bool showSystem)
+describeFunctions(const char *functypes, const char *pattern, bool verbose, bool showSystem, const char *funcargs)
 {
 	bool		showAggregate = strchr(functypes, 'a') != NULL;
 	bool		showNormal = strchr(functypes, 'n') != NULL;
@@ -626,6 +627,67 @@ describeFunctions(const char *functypes, const char *pattern, bool verbose, bool
 		  "n.nspname", "p.proname", NULL,
 		  "pg_catalog.pg_function_is_visible(p.oid)");
 
+	/*
+	 * Check for any additional arguments to narrow down which functions are
+	 * desired
+	 */
+	if (funcargs)
+	{
+
+		bool		is_initial_run = true;
+		bool		found_abbreviation;
+		int			argoffset = 0;
+		char	   *functoken;
+
+		static const char *type_abbreviations[] = {
+			"bool", "boolean", "bool[]", "boolean[]",
+			"char", "character", "char[]", "character[]",
+			"double", "double precision", "double[]", "double precision[]",
+			"float", "double precision", "float[]", "double precision[]",
+			"int", "integer", "int[]", "integer[]",
+			"time", "time without time zone", "time[]", "time without time zone[]",
+			"timetz", "time with time zone", "timetz[]", "time with time zone[]",
+			"timestamp", "timestamp without timestamp zone", "timestamp[]", "timestamp without timestamp zone[]",
+			"timestamptz", "timestamp with timestamp zone", "timestamptz[]", "timestamp with timestamp zone[]",
+			"varbit", "bit varying", "varbit[]", "bit varying[]",
+			"varchar", "character varying", "varchar[]", "character varying[]",
+			NULL
+		};
+
+		while ((functoken = strtokx(is_initial_run ? funcargs : NULL, " \t\n\r", ".,();", "\"", 0, false, true, pset.encoding)))
+		{
+			is_initial_run = false;
+			found_abbreviation = false;
+
+			if (isalpha(functoken[0]))
+			{
+appendPQExpBuffer(&buf, "  AND p.proargtypes[%d]::regtype::text = ", argoffset++);
+for (int i = 0; NULL != *(type_abbreviations + i); i += 2)
+{
+	const char *shortname = *(type_abbreviations + i);
+	const char *longname = *(type_abbreviations + i + 1);
+
+	if (pg_strcasecmp(functoken, shortname) == 0)
+	{
+		appendPQExpBuffer(&buf, "LOWER('%s')::text\n", longname);
+		found_abbreviation = true;
+		break;
+	}
+}
+if (!found_abbreviatio

Re: Add important info about ANALYZE after create Functional Index

2020-11-01 Thread Fabrízio de Royes Mello
On Sun, 1 Nov 2020 at 09:29 Michael Paquier  wrote:

> On Sun, Nov 01, 2020 at 09:23:44AM +0900, Michael Paquier wrote:
> > By doing so, there is no need to include pg_statistic.h in index.c.
> > Except that, the logic looks fine at quick glance.  In the long-term,
> > I also think that it would make sense to move both routnes out of
> > heap.c into a separate pg_statistic.c.  That's material for a separate
> > patch of course.
>
> I have looked again at that, and applied it after some tweaks.
> Particularly, I did not really like the use of "old" and "new" for the
> copy from the old to a new relation in the new function, so I have
> replaced that by "from" and "to".
>
>
Awesome thanks!!

Regards,

-- 
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: psql \df choose functions by their arguments

2020-11-01 Thread David Christensen


> * Removed the tab-complete bit, it was too fragile and unhelpful

I can’t speak for the specific patch, but tab completion of proc args for \df, 
\ef and friends has long been a desired feature of mine, particularly when you 
are dealing with functions with huge numbers of arguments and the same name 
which I have (sadly) come across many times in the wild. 

Removing this because it was brittle is fine, but would be good to see if we 
could figure out a way to have this kind of feature in psql IMHO. 

Best,

David



Re: how to replicate test results in cf-bot on travis

2020-11-01 Thread Thomas Munro
On Mon, Nov 2, 2020 at 1:58 AM Dave Cramer  wrote:
> For my patch https://commitfest.postgresql.org/30/2522/
>
> When I run
>
> make -j4 all contrib && make check-world
> locally
>
> I see 2 errors.
>
> When cf-bot runs this it sees
> 35 out of 93 failed.
>
> How can I see the same errors?

Hi Dave,
I applied your patch here and see the same 35 errors from
src/test/isolation, make check as cfbot reports.  Is it possible that
you forgot to add the changes under that directory when posting the
patch to the list?




Re: how to replicate test results in cf-bot on travis

2020-11-01 Thread Dave Cramer
On Sun., Nov. 1, 2020, 1:41 p.m. Thomas Munro, 
wrote:

> On Mon, Nov 2, 2020 at 1:58 AM Dave Cramer  wrote:
> > For my patch https://commitfest.postgresql.org/30/2522/
> >
> > When I run
> >
> > make -j4 all contrib && make check-world
> > locally
> >
> > I see 2 errors.
> >
> > When cf-bot runs this it sees
> > 35 out of 93 failed.
> >
> > How can I see the same errors?
>
> Hi Dave,
> I applied your patch here and see the same 35 errors from
> src/test/isolation, make check as cfbot reports.  Is it possible that
> you forgot to add the changes under that directory when posting the
> patch to the list?
>


I rebased my branch and tried it. I'll  try again

Thx

>


Getting Gen_fmgrtab.pl to generate macros for all pg_proc entries

2020-11-01 Thread Tom Lane
Over in [1] I noted a need for fmgroids.h macros for pg_proc entries
that overload a particular C function.  The F_XXX macros seem to be
designed on the assumption that the only thing they're needed for
is to call the function, so you don't really care which SQL alias
you use to get there.  That's unhelpful though when you are concerned
with the SQL-level function more than the C function.  I seem to recall
that we've run into this before, but up to now have been able to work
around it.

A narrow solution would be to revert the policy we just established
(in commit 36b931214) against allowing oid_symbol entries in
pg_proc.dat, and just manually label such entries when we need to refer
to them in the C code.  I don't especially care for that though, for
the reasons cited in 36b931214 --- it'll lead to a lot of random
variation in how people refer to pg_proc entries, rather than having
a uniform convention.

I experimented with generating the fmgroids.h macros by using, not
the prosrc field, but the concatenation of the proname and proargtypes
fields.  That's guaranteed unique within pg_proc, but it would have
resulted in several hundred required changes in our source tree,
because of common references to F_OIDEQ and suchlike.  But I discovered
that we could alter the rule to be "append proargtypes only if proname
is not unique", and that eliminates the pain almost completely: only
two existing code references need to change.

As an additional benefit, we can generate macros for all the bootstrap
pg_proc entries, not only ones that are for C functions; that fixes
another issue noted in [1].

Hence, I propose the attached.  Any objections?

regards, tom lane

[1] https://www.postgresql.org/message-id/298489.1604188643%40sss.pgh.pa.us

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index e7d814651b..85ef873caa 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -779,7 +779,7 @@ contain_volatile_functions_not_nextval(Node *clause)
 static bool
 contain_volatile_functions_not_nextval_checker(Oid func_id, void *context)
 {
-	return (func_id != F_NEXTVAL_OID &&
+	return (func_id != F_NEXTVAL &&
 			func_volatile(func_id) == PROVOLATILE_VOLATILE);
 }
 
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index b7c7b4c8fa..a3f7307263 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -64,22 +64,26 @@ foreach my $datfile (@ARGV)
 
 # Collect certain fields from pg_proc.dat.
 my @fmgr = ();
+my %proname_counts;
 
 foreach my $row (@{ $catalog_data{pg_proc} })
 {
 	my %bki_values = %$row;
 
-	# Select out just the rows for internal-language procedures.
-	next if $bki_values{prolang} ne 'internal';
-
 	push @fmgr,
 	  {
 		oid=> $bki_values{oid},
+		name   => $bki_values{proname},
+		args   => $bki_values{proargtypes},
 		strict => $bki_values{proisstrict},
 		retset => $bki_values{proretset},
 		nargs  => $bki_values{pronargs},
+		lang   => $bki_values{prolang},
 		prosrc => $bki_values{prosrc},
 	  };
+
+	# Count so that we can detect overloaded pronames.
+	$proname_counts{ $bki_values{proname} }++;
 }
 
 # Emit headers for both files
@@ -122,13 +126,10 @@ print $ofh <{oid} <=> $b->{oid} } @fmgr)
 {
-	next if $seenit{ $s->{prosrc} };
-	$seenit{ $s->{prosrc} } = 1;
-	print $ofh "#define F_" . uc $s->{prosrc} . " $s->{oid}\n";
-	print $pfh "extern Datum $s->{prosrc}(PG_FUNCTION_ARGS);\n";
+	my $sqlname = $s->{name};
+	$sqlname .= "_" . $s->{args} if ($proname_counts{ $s->{name} } > 1);
+	$sqlname =~ tr/ /_/;
+	print $ofh "#define F_" . uc $sqlname . " $s->{oid}\n";
+	# We want only one extern per internal-language function
+	if ($s->{lang} eq 'internal' && !$seenit{ $s->{prosrc} })
+	{
+		$seenit{ $s->{prosrc} } = 1;
+		print $pfh "extern Datum $s->{prosrc}(PG_FUNCTION_ARGS);\n";
+	}
 }
 
 # Create the fmgr_builtins table, collect data for fmgr_builtin_oid_index
@@ -206,22 +213,16 @@ my $last_builtin_oid = 0;
 my $fmgr_count   = 0;
 foreach my $s (sort { $a->{oid} <=> $b->{oid} } @fmgr)
 {
+	next if $s->{lang} ne 'internal';
+
+	print $tfh ",\n" if ($fmgr_count > 0);
 	print $tfh
 	  "  { $s->{oid}, $s->{nargs}, $bmap{$s->{strict}}, $bmap{$s->{retset}}, \"$s->{prosrc}\", $s->{prosrc} }";
 
 	$fmgr_builtin_oid_index[ $s->{oid} ] = $fmgr_count++;
 	$last_builtin_oid = $s->{oid};
-
-	if ($fmgr_count <= $#fmgr)
-	{
-		print $tfh ",\n";
-	}
-	else
-	{
-		print $tfh "\n";
-	}
 }
-print $tfh "};\n";
+print $tfh "\n};\n";
 
 printf $tfh qq|
 const int fmgr_nbuiltins = (sizeof(fmgr_builtins) / sizeof(FmgrBuiltin));
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 6c656586e8..7446193252 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -10144,7 +10144,7 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
 		RangeTblFunction *rtfunc = (RangeTb

Getting rid of aggregate_dummy()

2020-11-01 Thread Tom Lane
While fooling with Gen_fmgrtab.pl for a nearby patch [1], I noticed
that fmgrtab.c had a lot of entries pointing at aggregate_dummy,
which seemed rather useless.  So I experimented with removing them.

It turns out that nodeWindowAgg.c is carelessly expecting them to be
there, because it does fmgr_info_cxt() on the target window function
even if it will never call it because it's a plain aggregate.
But that's pretty trivial to fix, just need to relocate that call.

With that, we don't actually need aggregate_dummy() to exist at
all, because it's never referenced.  Having "aggregate_dummy"
as the prosrc value for an aggregate function is now just a
random convention; any other string would do as well.  (We could
save a few bytes in pg_proc by choosing a shorter string, but
probably it's better to stick to the existing convention.)

Anyway, this saves about 3KB in fmgrtab.o, without any downside
that I can see.  If someone accidentally called an aggregate as
a normal function, they'd now get a different error message,
namely "internal function "aggregate_dummy" is not in internal lookup
table" instead of "aggregate function NNN called as normal function".
That doesn't really seem like a problem.

The attached patch is a delta over the one in [1].

regards, tom lane

[1] https://www.postgresql.org/message-id/472274.1604258384%40sss.pgh.pa.us

diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 0cf1da6ebb..7664bb6285 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -620,7 +620,7 @@ AggregateCreate(const char *aggName,
 			 GetUserId(),	/* proowner */
 			 INTERNALlanguageId,	/* languageObjectId */
 			 InvalidOid,	/* no validator */
-			 "aggregate_dummy", /* placeholder proc */
+			 "aggregate_dummy", /* placeholder (no such proc) */
 			 NULL,	/* probin */
 			 PROKIND_AGGREGATE,
 			 false, /* security invoker (currently not
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 75e5bbf209..d87677d659 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4935,24 +4935,6 @@ AggRegisterCallback(FunctionCallInfo fcinfo,
 }
 
 
-/*
- * aggregate_dummy - dummy execution routine for aggregate functions
- *
- * This function is listed as the implementation (prosrc field) of pg_proc
- * entries for aggregate functions.  Its only purpose is to throw an error
- * if someone mistakenly executes such a function in the normal way.
- *
- * Perhaps someday we could assign real meaning to the prosrc field of
- * an aggregate?
- */
-Datum
-aggregate_dummy(PG_FUNCTION_ARGS)
-{
-	elog(ERROR, "aggregate function %u called as normal function",
-		 fcinfo->flinfo->fn_oid);
-	return (Datum) 0;			/* keep compiler quiet */
-}
-
 /* 
  *		Parallel Query Support
  * 
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 4cc7da268d..de58df3d3f 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2446,11 +2446,6 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
 		perfuncstate->wfuncstate = wfuncstate;
 		perfuncstate->wfunc = wfunc;
 		perfuncstate->numArguments = list_length(wfuncstate->args);
-
-		fmgr_info_cxt(wfunc->winfnoid, &perfuncstate->flinfo,
-	  econtext->ecxt_per_query_memory);
-		fmgr_info_set_expr((Node *) wfunc, &perfuncstate->flinfo);
-
 		perfuncstate->winCollation = wfunc->inputcollid;
 
 		get_typlenbyval(wfunc->wintype,
@@ -2479,6 +2474,11 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
 			winobj->argstates = wfuncstate->args;
 			winobj->localmem = NULL;
 			perfuncstate->winobj = winobj;
+
+			/* It's a real window function, so set up to call it. */
+			fmgr_info_cxt(wfunc->winfnoid, &perfuncstate->flinfo,
+		  econtext->ecxt_per_query_memory);
+			fmgr_info_set_expr((Node *) wfunc, &perfuncstate->flinfo);
 		}
 	}
 
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index 1edd5195d7..34e27950e7 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -75,6 +75,7 @@ foreach my $row (@{ $catalog_data{pg_proc} })
 		oid=> $bki_values{oid},
 		name   => $bki_values{proname},
 		lang   => $bki_values{prolang},
+		kind   => $bki_values{prokind},
 		strict => $bki_values{proisstrict},
 		retset => $bki_values{proretset},
 		nargs  => $bki_values{pronargs},
@@ -195,8 +196,10 @@ foreach my $s (sort { $a->{oid} <=> $b->{oid} } @fmgr)
 	$sqlname .= "_" . $s->{args} if ($proname_counts{ $s->{name} } > 1);
 	$sqlname =~ tr/ /_/;
 	print $ofh "#define F_" . uc $sqlname . " $s->{oid}\n";
-	# We want only one extern per internal-language function
-	if ($s->{lang} eq 'internal' && !$seenit{ $s->{prosrc} })
+	# We wan

Re: how to replicate test results in cf-bot on travis

2020-11-01 Thread Dave Cramer
On Sun, 1 Nov 2020 at 13:46, Dave Cramer  wrote:

>
>
> On Sun., Nov. 1, 2020, 1:41 p.m. Thomas Munro, 
> wrote:
>
>> On Mon, Nov 2, 2020 at 1:58 AM Dave Cramer  wrote:
>> > For my patch https://commitfest.postgresql.org/30/2522/
>> >
>> > When I run
>> >
>> > make -j4 all contrib && make check-world
>> > locally
>> >
>> > I see 2 errors.
>> >
>> > When cf-bot runs this it sees
>> > 35 out of 93 failed.
>> >
>> > How can I see the same errors?
>>
>> Hi Dave,
>> I applied your patch here and see the same 35 errors from
>> src/test/isolation, make check as cfbot reports.  Is it possible that
>> you forgot to add the changes under that directory when posting the
>> patch to the list?
>>
>
>
> I rebased my branch and tried it. I'll  try again
>
> Thx
>

OK, checked and I definitely have the changes. I don't think the isolation
test is running. Is there some configuration that enables it?

Dave


Re: how to replicate test results in cf-bot on travis

2020-11-01 Thread Tom Lane
Dave Cramer  writes:
> OK, checked and I definitely have the changes. I don't think the isolation
> test is running. Is there some configuration that enables it?

No, top-level "check-world" should invoke that ... but if you're unsure,
you could cd to src/test/isolation and run check or installcheck there.

regards, tom lane




RE: Disable WAL logging to speed up data loading

2020-11-01 Thread osumi.takami...@fujitsu.com
Hello


On Friday, October 30, 2020 1:32 PM Masahiko Sawada wrote:
> > On 2020/10/29 19:21, Laurenz Albe wrote:
> > > On Thu, 2020-10-29 at 11:42 +0900, Fujii Masao wrote:
> > >>> But what if someone sets wal_level=none, performs some data
> > >>> modifications, sets wal_level=archive and after dome more
> > >>> processing decides to restore from a backup that was taken before
> the cluster was set to wal_level=none?
> > >>> Then they would end up with a corrupted database, right?
> > >>
> > >> I think that the guard to prevent the server from starting up from
> > >> the corrupted database in that senario is necessary.
> > >
> > > That wouldn't apply, I think, because the backup from which you
> > > start was taken with wal_level = replica, so the guard wouldn't alert.
> > >
> > > But as I said in the other thread, changing wal_level emits a WAL
> > > record, and I am sure that recovery will refuse to proceed if
> > > wal_level < replica.
> >
> > Yes. What I meant was such a safe guard needs to be implemented.
> >
> > This may mean that if we want to recover the database from that
> > backup, we need to specify the recovery target so that the archive
> > recovery stops just before the WAL record indicating wal_level change.
> 
> Yeah, it also means that setting wal_level to none makes the previous backup
> no use even if the user has some generations of backup.
> 
> Does it make things simple if the usage of wal_level = 'none' is limited to
> initial data loading for example? I mean we add a special flag to initdb that
> sets wal_level to 'none' after initialization and the user does initial data
> loading and set wal_level to >= minimal.
> That is, we allow users to set from none to >= minimal but not for the 
> reverse.
> Since we prevent the database cluster from backup when wal_level is none,
> the recovery never across wal_level = none. Not sure this idea can address
> the case Osumi-san concerned though.
Thanks you so much for the discussion.

Hmm, this was a good idea to implement
some kind of valve to prevent backflow of wal_levels.
But, ideally I'd like to allow users to switch wal_levels from/to 'none',
in order to let them operate bulk load in a faster way by this feature
even if that operation isn't for the initial data loading.


Regards,
Takamichi Osumi


Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

2020-11-01 Thread Kyotaro Horiguchi
At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela  wrote 
in 
> Hi,
> 
> Per Coverity.
> 
> If test set->latch against NULL, is why it can be NULL.
> ResetEvent can dereference NULL.

If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
shouldn't inadvertently ignore the unexpected or broken situation.We
could put Assert instead, but I think that we don't need do something
here at all since SIGSEGV would be raised at the right location.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Parallel Append can break run-time partition pruning

2020-11-01 Thread David Rowley
On Tue, 27 Oct 2020 at 19:40, Amit Langote  wrote:
> Some comments:

Thanks for having a look at this.

I've made some adjustments to those comments and pushed.

David




Re: Online checksums verification in the backend

2020-11-01 Thread Michael Paquier
On Thu, Oct 29, 2020 at 10:08:52PM -0700, Andres Freund wrote:
> I think its pretty much *never* OK to do IO while holding a buffer
> mapping lock. You're locking a significant fraction of shared buffers
> over IO. That's just not OK.  Don't think there's any place doing so
> currently either.

There is no place doing that on HEAD.

This specific point was mentioned in the first message of this thread,
7th paragraph.  That was a long thread, so it is easy to miss:
https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5q0y+wulml5sr...@mail.gmail.com

I am wondering what you have in mind regarding the use of
BM_IO_IN_PROGRESS or a similar flag.  Wouldn't that imply some
consequences for other existing buffers in the table, like a possible
eviction?  I'd like to think that we should not do any manipulation of
the buffer tables in this case.  Hence, in order to prevent a
concurrent activity to load in shared buffers the page currently
checked on disk, I got to think that we would need something new here,
like a filtering hash table that would be checked each time a backend
tries to insert an entry into the buffer tables.  That's something I
was wondering here:
https://www.postgresql.org/message-id/20200316030638.ga2...@paquier.xyz
I called that a preemptive lock, but you could also call that a
discard filter or a virtual pin, just something to mean that a page
locked this way cannot be loaded into the shared buffers.  I'd like to
think that this should not touch the existing buffer table, but it
would impact the performance when attempting to insert an entry in the
tables, as anything would need to be pre-checked.

Assuming that we could make this thing work without holding the
partition lock, and assuming that we only hold a share lock on the 
relation, we have two cases:
1) If the buffer is in shared buffers, we have the APIs to solve that
by using a pin, unlock the partition, and then do the I/O.  (Still
that's unsafe with the smgrwrite() argument?)
2) If the buffer is not in shared buffers, we don't have what it takes
to solve the problem yet.  But even if we solve this problem, we will
never really be sure that this is entirely safe, as per the argument
with concurrent smgrwrite() calls.  Current in-core code assumes that
this can happen only for init forks of unlogged relations which would
not be visible yet in the backend doing a page check, still it can be
really easy to break this assumption with any new code added by a new
feature.

These arguments bring down to reduce the scope of CheckBuffer() as
follows:
- Use an AEL on the relation, pass down a Relation instead of
SMgrRelation, and add on the way an assertion to make sure that the
caller holds an AEL on the relation.  I wanted to study the possiblity
to use that stuff for base backups, but if you bring the concurrent
smgrwrite() calls into the set of possibilities this shuts down the
whole study from the start.
- It is still useful to check that a page is in shared buffers IMO, so
as if it is dirty we just discard it from the checks and rely on the
next checkpoint to do a flush.  It is also useful to check the state
of the on-disk data is good or not if the page is not dirty, as the
page could have gone rogue on-disk while a system was up for weeks.
--
Michael


signature.asc
Description: PGP signature


Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c)

2020-11-01 Thread Kyotaro Horiguchi
At Sat, 31 Oct 2020 11:49:07 -0300, Ranier Vilela  wrote 
in 
> Per Coverity.
> 
> make_ruledef function can dereference a NULL pointer (actions),
> if "ev_qual" is provided and "actions" does not exist.
> 
> The comment there is contradictory: " /* these could be nulls */ "
> Because if "ev_qual" is not null, "actions" cannot be either.
> 
> Solution proposed merely as a learning experience.

We cannot reach there with ev_action == NULL since it comes from a
non-nullable column. Since most of the other columns has an assertion
that !isnull, I think we should do the same thing for ev_action (and
ev_qual).  SPI_getvalue() returns C-NULL for SQL-NULL (or for some
other unexpected situations.).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 6c656586e8..6ba2b4a5ae 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4766,12 +4766,13 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 	/* these could be nulls */
 	fno = SPI_fnumber(rulettc, "ev_qual");
 	ev_qual = SPI_getvalue(ruletup, rulettc, fno);
+	Assert(ev_qual != NULL);
 
 	fno = SPI_fnumber(rulettc, "ev_action");
 	ev_action = SPI_getvalue(ruletup, rulettc, fno);
-	if (ev_action != NULL)
-		actions = (List *) stringToNode(ev_action);
-
+	Assert(ev_action != NULL);
+	actions = (List *) stringToNode(ev_action);
+		
 	ev_relation = table_open(ev_class, AccessShareLock);
 
 	/*


Re: Online checksums verification in the backend

2020-11-01 Thread Andres Freund
Hi,

I'm a bit limited writing - one handed for a while following an injury
on Friday...

On 2020-11-02 10:05:25 +0900, Michael Paquier wrote:
> On Thu, Oct 29, 2020 at 10:08:52PM -0700, Andres Freund wrote:
> > I think its pretty much *never* OK to do IO while holding a buffer
> > mapping lock. You're locking a significant fraction of shared buffers
> > over IO. That's just not OK.  Don't think there's any place doing so
> > currently either.
> 
> There is no place doing that on HEAD.

Err?

/* see if the block is in the buffer pool or not */
LWLockAcquire(partLock, LW_SHARED);
buf_id = BufTableLookup(&buf_tag, buf_hash);
if (buf_id >= 0)
{
...
/* Read the buffer from disk, with the I/O lock still held */
smgrread(smgr, forknum, blkno, buffer);
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
}
else
{
/*
 * Simply read the buffer.  There's no risk of modification on 
it as
 * we are holding the buffer pool partition mapping lock.
 */
smgrread(smgr, forknum, blkno, buffer);
}

/* buffer lookup done, so now do its check */
LWLockRelease(partLock);

How is this not doing IO while holding a buffer mapping lock?


> This specific point was mentioned in the first message of this thread,
> 7th paragraph.  That was a long thread, so it is easy to miss:
> https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5q0y+wulml5sr...@mail.gmail.com

The code clearly doesnt implement it that way.


> I am wondering what you have in mind regarding the use of
> BM_IO_IN_PROGRESS or a similar flag.  Wouldn't that imply some
> consequences for other existing buffers in the table, like a possible
> eviction?

You'd need exactly one empty buffer for that - it can be reused for the
next to-be-checked buffer.


> I'd like to think that we should not do any manipulation of
> the buffer tables in this case.

Why? Its the way we lock buffers - why is this so special that we need
to do differently?


> Hence, in order to prevent a
> concurrent activity to load in shared buffers the page currently
> checked on disk, I got to think that we would need something new here,
> like a filtering hash table that would be checked each time a backend
> tries to insert an entry into the buffer tables.

Thats going to slow down everything a bit - the mapping already is a
bottleneck.


> 1) If the buffer is in shared buffers, we have the APIs to solve that
> by using a pin, unlock the partition, and then do the I/O.  (Still
> that's unsafe with the smgrwrite() argument?)

Thats why you need an appropriate relation lock... Something CheckBuffer
didnt bother to document. Its a restriction, but one we probably can
live with.


> 2) If the buffer is not in shared buffers, we don't have what it takes
> to solve the problem yet.

We do. Set up enough state for the case to be otherwise the same as the
in s_b case.

> But even if we solve this problem, we will
> never really be sure that this is entirely safe, as per the argument
> with concurrent smgrwrite() calls.  Current in-core code assumes that
> this can happen only for init forks of unlogged relations which would
> not be visible yet in the backend doing a page check, still it can be
> really easy to break this assumption with any new code added by a new
> feature.

It also happens in a few other cases than just init forks. But
visibility & relation locking can take care of that. But you need to
document that. If the locking allows concurent readers - and especially
concurrent writers, then you cant really use smgrwite for anything but
relation extension.

Greetings,

Andres Freund




Re: Online checksums verification in the backend

2020-11-01 Thread Michael Paquier
On Sun, Nov 01, 2020 at 05:39:40PM -0800, Andres Freund wrote:
> I'm a bit limited writing - one handed for a while following an injury
> on Friday...

Oops.  Take care.

> On 2020-11-02 10:05:25 +0900, Michael Paquier wrote:
> > There is no place doing that on HEAD.
> 
> Err?
> 
>   /* see if the block is in the buffer pool or not */
>   LWLockAcquire(partLock, LW_SHARED);
>   buf_id = BufTableLookup(&buf_tag, buf_hash);
>
> [...]
>
> How is this not doing IO while holding a buffer mapping lock?

Well, other than the one we are discussing of course :)

> 
> 
> > This specific point was mentioned in the first message of this thread,
> > 7th paragraph.  That was a long thread, so it is easy to miss:
> > https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5q0y+wulml5sr...@mail.gmail.com
> 
> The code clearly doesnt implement it that way.
> 
> 
> > I am wondering what you have in mind regarding the use of
> > BM_IO_IN_PROGRESS or a similar flag.  Wouldn't that imply some
> > consequences for other existing buffers in the table, like a possible
> > eviction?
> 
> You'd need exactly one empty buffer for that - it can be reused for the
> next to-be-checked buffer.
> 
> 
> > I'd like to think that we should not do any manipulation of
> > the buffer tables in this case.
> 
> Why? Its the way we lock buffers - why is this so special that we need
> to do differently?
> 
> 
> > Hence, in order to prevent a
> > concurrent activity to load in shared buffers the page currently
> > checked on disk, I got to think that we would need something new here,
> > like a filtering hash table that would be checked each time a backend
> > tries to insert an entry into the buffer tables.
> 
> Thats going to slow down everything a bit - the mapping already is a
> bottleneck.
> 
> 
> > 1) If the buffer is in shared buffers, we have the APIs to solve that
> > by using a pin, unlock the partition, and then do the I/O.  (Still
> > that's unsafe with the smgrwrite() argument?)
> 
> Thats why you need an appropriate relation lock... Something CheckBuffer
> didnt bother to document. Its a restriction, but one we probably can
> live with.
> 
> 
> > 2) If the buffer is not in shared buffers, we don't have what it takes
> > to solve the problem yet.
> 
> We do. Set up enough state for the case to be otherwise the same as the
> in s_b case.
> 
> > But even if we solve this problem, we will
> > never really be sure that this is entirely safe, as per the argument
> > with concurrent smgrwrite() calls.  Current in-core code assumes that
> > this can happen only for init forks of unlogged relations which would
> > not be visible yet in the backend doing a page check, still it can be
> > really easy to break this assumption with any new code added by a new
> > feature.
> 
> It also happens in a few other cases than just init forks. But
> visibility & relation locking can take care of that. But you need to
> document that. If the locking allows concurent readers - and especially
> concurrent writers, then you cant really use smgrwite for anything but
> relation extension.

--
Michael


signature.asc
Description: PGP signature


Re: Online checksums verification in the backend

2020-11-01 Thread Andres Freund
Hi

On 2020-11-02 10:45:00 +0900, Michael Paquier wrote:
> On Sun, Nov 01, 2020 at 05:39:40PM -0800, Andres Freund wrote:
> > I'm a bit limited writing - one handed for a while following an injury
> > on Friday...
> 
> Oops.  Take care.

Thanks!


> > On 2020-11-02 10:05:25 +0900, Michael Paquier wrote:
> > > There is no place doing that on HEAD.
> > 
> > Err?
> > How is this not doing IO while holding a buffer mapping lock?
> 
> Well, other than the one we are discussing of course :)

I am not following? Were you just confirming that its not a thing we do?


Greetings,

Andres Freund




Re: Parallel Append can break run-time partition pruning

2020-11-01 Thread Amit Langote
On Mon, Nov 2, 2020 at 9:51 AM David Rowley  wrote:
>
> On Tue, 27 Oct 2020 at 19:40, Amit Langote  wrote:
> > Some comments:
>
> Thanks for having a look at this.
>
> I've made some adjustments to those comments and pushed.

Thank you.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c)

2020-11-01 Thread Tom Lane
Kyotaro Horiguchi  writes:
> We cannot reach there with ev_action == NULL since it comes from a
> non-nullable column. Since most of the other columns has an assertion
> that !isnull, I think we should do the same thing for ev_action (and
> ev_qual).  SPI_getvalue() returns C-NULL for SQL-NULL (or for some
> other unexpected situations.).

Isn't the comment just above there wrong?

/* these could be nulls */

I wonder just when that became outdated.

regards, tom lane




Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c)

2020-11-01 Thread Kyotaro Horiguchi
At Mon, 02 Nov 2020 10:36:10 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Sat, 31 Oct 2020 11:49:07 -0300, Ranier Vilela  wrote 
> in 
> > Per Coverity.
> > 
> > make_ruledef function can dereference a NULL pointer (actions),
> > if "ev_qual" is provided and "actions" does not exist.
> > 
> > The comment there is contradictory: " /* these could be nulls */ "
> > Because if "ev_qual" is not null, "actions" cannot be either.
> > 
> > Solution proposed merely as a learning experience.
> 
> We cannot reach there with ev_action == NULL since it comes from a
> non-nullable column. Since most of the other columns has an assertion
> that !isnull, I think we should do the same thing for ev_action (and
> ev_qual).  SPI_getvalue() returns C-NULL for SQL-NULL (or for some
> other unexpected situations.).

The following code is found there, since 1998. (15cb32d93e)

>   /* If the rule has an event qualification, add it */
>   if (ev_qual == NULL)
>   ev_qual = "";

The problem code here was written as the follows.

+fno = SPI_fnumber(rulettc, "is_instead");
+is_instead = (bool)SPI_getbinval(ruletup, rulettc, fno, &isnull);
+
+fno = SPI_fnumber(rulettc, "ev_qual");
+ev_qual = SPI_getvalue(ruletup, rulettc, fno);
+if (isnull) ev_qual = NULL;
+
+fno = SPI_fnumber(rulettc, "ev_action");
+ev_action = SPI_getvalue(ruletup, rulettc, fno);
+if (isnull) ev_action = NULL;
+if (ev_action != NULL) {
+actions = (List *)stringToNode(ev_action);
+}

I'm not sure what the code means by just reading there but at least it
seems impossible for the current code to return NULL for legit values.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 6c656586e8..ab4f102e9b 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4766,12 +4766,13 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 	/* these could be nulls */
 	fno = SPI_fnumber(rulettc, "ev_qual");
 	ev_qual = SPI_getvalue(ruletup, rulettc, fno);
+	Assert(ev_qual != NULL);
 
 	fno = SPI_fnumber(rulettc, "ev_action");
 	ev_action = SPI_getvalue(ruletup, rulettc, fno);
-	if (ev_action != NULL)
-		actions = (List *) stringToNode(ev_action);
-
+	Assert(ev_action != NULL);
+	actions = (List *) stringToNode(ev_action);
+		
 	ev_relation = table_open(ev_class, AccessShareLock);
 
 	/*
@@ -4819,9 +4820,6 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 	 generate_relation_name(ev_class, NIL) :
 	 generate_qualified_relation_name(ev_class));
 
-	/* If the rule has an event qualification, add it */
-	if (ev_qual == NULL)
-		ev_qual = "";
 	if (strlen(ev_qual) > 0 && strcmp(ev_qual, "<>") != 0)
 	{
 		Node	   *qual;


[PATCH] Support negative indexes in split_part

2020-11-01 Thread Nikhil Benesch

I posted the idea of allowing negative indexes in split_part to pgsql-general
last week, and it seemed there was some interest:

http://postgr.es/m/CAPWqQZR%2B-5pAZNSSrnmYczRaX-huemc%3DoO8URvDZvUA-M%3DMOBA%40mail.gmail.com

Attached is a patch, based on master, that implements the approach as described
in that discussion.

The motivation is that the existing idioms for splitting a string and selecting
the nth-to-last element are rather complicated and/or inefficient:

  1. (string_to_array('foo bar baz', ' '))[cardinality(string_to_array('foo bar 
baz', ' ')) - 1]
  2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
  3. (regexp_match('foo baz bar', '\S*$'))[1]

With the patch, split_part(haystack, needle, -1) selects the last field of the
string, split_part(haystack, needle, -2) selects the second-to-last field, and
so on. Per Tom Lane, there is precedent for this design, where negative indices
meaning "count from the end", namely the left and right string functions.

The patch includes updates to the docs and regression tests. If the feature is
deemed desirable, I believe the patch is "commit quality" (though, fair warning,
this is my first contribution to Postgres, so I might have the wrong notion
of what a committable patch looks like).

Note that the implementation is deliberately a bit inefficient to keep things
simple. When presented with a negative index, the implementation does an extra
pass over the string to count the total number of fields, in order to convert
the negative index to a positive index. Then it proceeds as it normally would.

One can imagine adding support for backwards B-M-H, but I'm not sure that could
be made to work with all multibyte encodings. We could at least avoid the extra
pass over the string by allocating a circular buffer of size |n| when n is
negative, but that wasn't clearly worthwhile. (I did implement the optimization
for the special case of -1, since its implementation was trivial.)

Cheers,
Nikhil
From a09f95bf56a7b6400ebce9e6faad90a434d21ac8 Mon Sep 17 00:00:00 2001
From: Nikhil Benesch 
Date: Sun, 1 Nov 2020 15:53:17 -0500
Subject: [PATCH] Support negative indexes in split_part

Negative indexes count from the right instead of the left.
For example:

split_part('a@b@c@d', -2) -> c

The motivation is to provide a straightforward alternative to the
complicated and inefficient idioms available for this today, e.g.:

 reverse(split_part(reverse(haystack), reverse(needle), 1))
 (regexp_match(haystack, needle || '$'))[1]

Discussion: 
http://postgr.es/m/CAPWqQZR%2B-5pAZNSSrnmYczRaX-huemc%3DoO8URvDZvUA-M%3DMOBA%40mail.gmail.com
---
 doc/src/sgml/func.sgml|  4 +-
 src/backend/utils/adt/varlena.c   | 62 ---
 src/test/regress/expected/strings.out | 62 ++-
 src/test/regress/sql/strings.sql  | 20 +
 4 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7ef2ec9972..d0880ab1f3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3356,7 +3356,9 @@ repeat('Pg', 4) PgPgPgPg

 Splits string at occurrences
 of delimiter and returns
-the n'th field (counting from one).
+the n'th field (counting from one),
+or the |n|'th-to-last field if
+n is negative.


 split_part('abc~@~def~@~ghi', '~@~', 2)
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index d7bc330541..c90550a183 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1471,6 +1471,19 @@ text_position_get_match_pos(TextPositionState *state)
}
 }
 
+/*
+ * Resets to the initial state installed by text_position_setup.
+ * The next call to text_position_next will search from the beginning
+ * of the string.
+ */
+static void
+text_position_reset(TextPositionState *state)
+{
+   state->last_match = NULL;
+   state->refpoint = state->str1;
+   state->refpos = 0;
+}
+
 static void
 text_position_cleanup(TextPositionState *state)
 {
@@ -4582,7 +4595,7 @@ replace_text_regexp(text *src_text, void *regexp,
 /*
  * split_part
  * parse input string
- * return ord item (1 based)
+ * return ord item (1 based, negative counts from end)
  * based on provided field separator
  */
 Datum
@@ -4600,10 +4613,10 @@ split_part(PG_FUNCTION_ARGS)
boolfound;
 
/* field number is 1 based */
-   if (fldnum < 1)
+   if (fldnum == 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-errmsg("field position must be greater than 
zero")));
+errmsg("field position must not be zero")));
 
inputstring_len = VARSIZE_ANY_EXHDR(inputstring);
fldsep_len = VARSIZE_ANY_EXHDR(fldsep);
@@ -4615,8 +4628,8 @@ split_part(PG_FUNCTION_ARGS)
/* empty field separ

Re: [PATCH] Support negative indexes in split_part

2020-11-01 Thread Tom Lane
Nikhil Benesch  writes:
> I posted the idea of allowing negative indexes in split_part to pgsql-general
> last week, and it seemed there was some interest:
> http://postgr.es/m/CAPWqQZR%2B-5pAZNSSrnmYczRaX-huemc%3DoO8URvDZvUA-M%3DMOBA%40mail.gmail.com
> Attached is a patch, based on master, that implements the approach as 
> described
> in that discussion.

Please add an entry to the upcoming commitfest, to make sure we don't
lose track of this:

https://commitfest.postgresql.org/30/

regards, tom lane




Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

2020-11-01 Thread Thomas Munro
On Mon, Nov 2, 2020 at 1:49 PM Kyotaro Horiguchi
 wrote:
> At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela  wrote 
> in
> > Per Coverity.
> >
> > If test set->latch against NULL, is why it can be NULL.
> > ResetEvent can dereference NULL.
>
> If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
> shouldn't inadvertently ignore the unexpected or broken situation.We
> could put Assert instead, but I think that we don't need do something
> here at all since SIGSEGV would be raised at the right location.

Hmm.  I changed that to support set->latch == NULL, so that you can
use the long lived WES in the rare code paths that call WaitLatch()
without a latch (for example the code I proposed at [1]).  The Windows
version leaves the event handle of the most recently used latch in
set->handles[n] (because AFAICS there is no way to have a "hole" in
the handles array).  The event can fire while you are waiting on "no
latch".  Perhaps it should be changed to
ResetEvent(set->handles[cur_event->pos + 1])?

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com




Re: [PATCH] Support negative indexes in split_part

2020-11-01 Thread Nikhil Benesch

On 11/1/20 9:56 PM, Tom Lane wrote:

Please add an entry to the upcoming commitfest, to make sure we don't
lose track of this:

https://commitfest.postgresql.org/30/


Done: https://commitfest.postgresql.org/30/2816/




Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c)

2020-11-01 Thread Kyotaro Horiguchi
At Sun, 01 Nov 2020 21:05:29 -0500, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > We cannot reach there with ev_action == NULL since it comes from a
> > non-nullable column. Since most of the other columns has an assertion
> > that !isnull, I think we should do the same thing for ev_action (and
> > ev_qual).  SPI_getvalue() returns C-NULL for SQL-NULL (or for some
> > other unexpected situations.).
> 
> Isn't the comment just above there wrong?
> 
>   /* these could be nulls */
> 
> I wonder just when that became outdated.

Mmm. I investigated that.

At the very beginning of CREATE RULE (d31084e9d1, 1996), InsertRule()
did the following.

>template = "INSERT INTO pg_rewrite \
>(rulename, ev_type, ev_class, ev_attr, action, ev_qual, is_instead) VALUES \
>('%s', %d::char, %d::oid, %d::int2, '%s'::text, '%s'::text, \
> '%s'::bool);";
>if (strlen(template) + strlen(rulname) + strlen(actionbuf) +
>   strlen(qualbuf) + 20 /* fudge fac */ >  RULE_PLAN_SIZE) {
>   elog(WARN, "DefineQueryRewrite: rule plan string too big.");
>}
>sprintf(rulebuf, template,
>   rulname, evtype, eventrel_oid, evslot_index, actionbuf,
>   qualbuf, is_instead);

Doesn't seem that ev_qual and ev_action can be NULL.  The same
function in the current converts action list to string using
nodeToSTring so NIL is converted into '<>', which is not NULL.

So I think ev_action cannot be null from the beginning of the history
unless the columns is modified manually.  ev_qual and ev_action are
marked as non-nullable (9b39b799db, in 2018). They could be null if we
modified that columns nullable then set NULL, but that could happen on
all other columns in pg_rewite catalog, which are Assert(!null)ed.

Although ev_action cannot be a empty list using SQL interface. So we
can get rid of the case list_length(action) == 0, but I'm not sure
it's worth doing (but the attaches does..).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 6c656586e8..120b8869ea 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4763,15 +4763,15 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 	Assert(!isnull);
 	is_instead = DatumGetBool(dat);
 
-	/* these could be nulls */
 	fno = SPI_fnumber(rulettc, "ev_qual");
 	ev_qual = SPI_getvalue(ruletup, rulettc, fno);
+	Assert(ev_qual != NULL);
 
 	fno = SPI_fnumber(rulettc, "ev_action");
 	ev_action = SPI_getvalue(ruletup, rulettc, fno);
-	if (ev_action != NULL)
-		actions = (List *) stringToNode(ev_action);
-
+	Assert(ev_action != NULL);
+	actions = (List *) stringToNode(ev_action);
+		
 	ev_relation = table_open(ev_class, AccessShareLock);
 
 	/*
@@ -4819,10 +4819,8 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 	 generate_relation_name(ev_class, NIL) :
 	 generate_qualified_relation_name(ev_class));
 
-	/* If the rule has an event qualification, add it */
-	if (ev_qual == NULL)
-		ev_qual = "";
-	if (strlen(ev_qual) > 0 && strcmp(ev_qual, "<>") != 0)
+	Assert(strlen(ev_qual) > 0);
+	if (strcmp(ev_qual, "<>") != 0)
 	{
 		Node	   *qual;
 		Query	   *query;
@@ -4875,6 +4873,7 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 		appendStringInfoString(buf, "INSTEAD ");
 
 	/* Finally the rules actions */
+	Assert(list_length(actions) > 0);
 	if (list_length(actions) > 1)
 	{
 		ListCell   *action;
@@ -4893,10 +4892,6 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 		}
 		appendStringInfoString(buf, ");");
 	}
-	else if (list_length(actions) == 0)
-	{
-		appendStringInfoString(buf, "NOTHING;");
-	}
 	else
 	{
 		Query	   *query;


Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW

2020-11-01 Thread Yuki Seino
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

+1.
I checked the patch and there were no problems.
I hope this fix will be reflected.

The new status of this patch is: Ready for Committer


Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

2020-11-01 Thread Kyotaro Horiguchi
At Mon, 2 Nov 2020 16:22:09 +1300, Thomas Munro  wrote 
in 
> On Mon, Nov 2, 2020 at 1:49 PM Kyotaro Horiguchi
>  wrote:
> > At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela  
> > wrote in
> > > Per Coverity.
> > >
> > > If test set->latch against NULL, is why it can be NULL.
> > > ResetEvent can dereference NULL.
> >
> > If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
> > shouldn't inadvertently ignore the unexpected or broken situation.We
> > could put Assert instead, but I think that we don't need do something
> > here at all since SIGSEGV would be raised at the right location.
> 
> Hmm.  I changed that to support set->latch == NULL, so that you can
> use the long lived WES in the rare code paths that call WaitLatch()
> without a latch (for example the code I proposed at [1]).  The Windows

Ooo.  We don't update epoll events in that case. Ok, I understand
WL_LATCH_SET can fire while set->latch == NULL.

(I was confused by WaitEventAdjust* asserts set->latch != NULL for
 WL_LATCH_SET. Shouldn't we move the check from ModifyWaitEvent() to
 WaitEventAdjust*()?))

> version leaves the event handle of the most recently used latch in
> set->handles[n] (because AFAICS there is no way to have a "hole" in
> the handles array).  The event can fire while you are waiting on "no
> latch".  Perhaps it should be changed to
> ResetEvent(set->handles[cur_event->pos + 1])?
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com

Seems right. Just doing that *seems* to work fine, but somehow I
cannot build on Windows for now...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 24d44c982d..318261962e 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1835,7 +1835,11 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 
 	if (cur_event->events == WL_LATCH_SET)
 	{
-		if (!ResetEvent(set->latch->event))
+		/*
+		 * We cannot use set->latch->event to reset the fired event if we
+		 * actually aren't waiting on this latch now.
+		 */
+		if (!ResetEvent(set->handles[cur_event->pos + 1]))
 			elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
 
 		if (set->latch && set->latch->is_set)


Re: extension patch of CREATE OR REPLACE TRIGGER

2020-11-01 Thread Peter Smith
Hello Osumi-san.

Below are my v14 patch review comments for your consideration.

===

(1) COMMENT
File: NA
Maybe next time consider using format-patch to make the patch. Then
you can include a comment to give the background/motivation for this
change.

===

(2) COMMENT
File: doc/src/sgml/ref/create_trigger.sgml
@@ -446,6 +454,13 @@ UPDATE OF column_name1
[, column_name2replace)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("trigger \"%s\" for relation \"%s\" already exists",
+ stmt->trigname, RelationGetRelationName(rel;
+ else
+ {
+ /*
+ * An internal trigger cannot be replaced by another user defined
+ * trigger. This should exclude the case that internal trigger is
+ * child trigger of partition table and needs to be rewritten when
+ * the parent trigger is replaced by user.
+ */
+ if (internal_trigger && isInternal == false && in_partition == false)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("trigger \"%s\" for relation \"%s\" is an internal trigger",
+ stmt->trigname, RelationGetRelationName(rel;
+
+ /*
+ * CREATE OR REPLACE TRIGGER command can't replace constraint
+ * trigger.
+ */
+ if (OidIsValid(existing_constraint_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("trigger \"%s\" for relation \"%s\" is a constraint trigger",
+ stmt->trigname, RelationGetRelationName(rel;
+ }

It is not really necessary for the "OR REPLACE" code to need to be
inside an "else" block like that because the "if (!stmt->replace)" has
already been tested above. Consider removing the "else {" to remove
unnecessary indent if you want to.

===

(6) COMMENT
File: src/backend/commands/trigger.c
(same code block as above)

Condition is strangely written:
e.g.
Before: if (internal_trigger && isInternal == false && in_partition == false)
After: if (internal_trigger && !isInternal && !in_partition)

===

(7) COMMENT
File: src/backend/commands/trigger.c
(same code block as above)
/*
 * CREATE OR REPLACE TRIGGER command can't replace constraint
 * trigger.
 */

--

Only need to say
/* Can't replace a constraint trigger. */

===

(8) COMMENT
File: src/include/nodes/parsenodes.h
@@ -2429,6 +2429,9 @@ typedef struct CreateAmStmt

The comment does not need to say "when true,". Just saying "replace
trigger if already exists" is enough.

===

(9) COMMENT
File: src/test/regress/expected/triggers.out
+-- test for detecting incompatible replacement of trigger
+create table my_table (id integer);
+create function funcA() returns trigger as $$
+begin
+  raise notice 'hello from funcA';
+  return null;
+end; $$ language plpgsql;
+create function funcB() returns trigger as $$
+begin
+  raise notice 'hello from funcB';
+  return null;
+end; $$ language plpgsql;
+create or replace trigger my_trig
+  after insert on my_table
+  for each row execute procedure funcA();
+create constraint trigger my_trig
+  after insert on my_table
+  for each row execute procedure funcB(); -- should fail
+ERROR:  trigger "my_trig" for relation "my_table" already exists

--

I think this test has been broken in v14. That last "create constraint
trigger my_trig" above can never be expected to work simply because
you are not specifying the "OR REPLACE" syntax. So in fact this is not
properly testing for incompatible types at all. It needs to say
"create OR REPLACE constraint trigger my_trig" to be testing what it
claims to be testing.

I also think there is a missing check in the code - see COMMENT (2) -
for handling this scenario. But since this test case is broken you do
not then notice the code check is missing.

===

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Parallel copy

2020-11-01 Thread Amit Kapila
On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas  wrote:
>
> Leader process:
>
> The leader process is simple. It picks the next FREE buffer, fills it
> with raw data from the file, and marks it as FILLED. If no buffers are
> FREE, wait.
>
> Worker process:
>
> 1. Claim next READY block from queue, by changing its state to
> PROCESSING. If the next block is not READY yet, wait until it is.
>
> 2. Start scanning the block from 'startpos', finding end-of-line
> markers. (in CSV mode, need to track when we're in-quotes).
>
> 3. When you reach the end of the block, if the last line continues to
> next block, wait for the next block to become FILLED. Peek into the
> next block, and copy the remaining part of the split line to a local
> buffer, and set the 'startpos' on the next block to point to the end
> of the split line. Mark the next block as READY.
>
> 4. Process all the lines in the block, call input functions, insert
> rows.
>
> 5. Mark the block as DONE.
>
> In this design, you don't need to keep line boundaries in shared memory,
> because each worker process is responsible for finding the line
> boundaries of its own block.
>
> There's a point of serialization here, in that the next block cannot be
> processed, until the worker working on the previous block has finished
> scanning the EOLs, and set the starting position on the next block,
> putting it in READY state. That's not very different from your patch,
> where you had a similar point of serialization because the leader
> scanned the EOLs,
>

But in the design (single producer multiple consumer) used by the
patch the worker doesn't need to wait till the complete block is
processed, it can start processing the lines already found. This will
also allow workers to start much earlier to process the data as it
doesn't need to wait for all the offsets corresponding to 64K block
ready. However, in the design where each worker is processing the 64K
block, it can lead to much longer waits. I think this will impact the
Copy STDIN case more where in most cases (200-300 bytes tuples) we
receive line-by-line from client and find the line-endings by leader.
If the leader doesn't find the line-endings the workers need to wait
till the leader fill the entire 64K chunk, OTOH, with current approach
the worker can start as soon as leader is able to populate some
minimum number of line-endings

The other point is that the leader backend won't be used completely as
it is only doing a very small part (primarily reading the file) of the
overall work.

We have discussed both these approaches (a) single producer multiple
consumer, and (b) all workers doing the processing as you are saying
in the beginning and concluded that (a) is better, see some of the
relevant emails [1][2][3].

[1] - 
https://www.postgresql.org/message-id/20200413201633.cki4nsptynq7blhg%40alap3.anarazel.de
[2] - 
https://www.postgresql.org/message-id/20200415181913.4gjqcnuzxfzbbzxa%40alap3.anarazel.de
[3] - 
https://www.postgresql.org/message-id/78C0107E-62F2-4F76-BFD8-34C73B716944%40anarazel.de

-- 
With Regards,
Amit Kapila.




Re: Additional Chapter for Tutorial

2020-11-01 Thread Erik Rijkers

On 2020-11-01 16:38, Jürgen Purtz wrote:

On 30.10.20 17:45, Erik Rijkers wrote:


And I wrote down some separate items:

1.
'Two Phase Locking' and 'TPL' should be, I think,
'Two-Phase Commit'. Please someone confirm.
(no changes made)

Erik Rijkers


All suggestions so far are summarized in the attached patch with the
following exceptions:

- 'Two Phase Locking' is the intended term.


OK, so what is 'Two Phase Locking'?  The term is not explained, and not 
used anywhere else in the manual.  You propose to introduce it here, in 
the tutorial.  I don't know what it means, and I am not really a 
beginner.


'Two Phase Locking' should be explained somewhere, and how it relates 
(or not) to Two-Phase Commit (2PC), don't you agree?



Erik Rijkers


























Re: bulk typos

2020-11-01 Thread Michael Paquier
On Fri, Oct 30, 2020 at 09:08:01PM -0500, Justin Pryzby wrote:
> On Sun, Oct 25, 2020 at 02:48:49PM -0500, Justin Pryzby wrote:
>> Pavel, I can't understand this one.
>> I guess s/producement/producing/ is too simple of a fix.
>> Please check my proposed language.
>> +XmlTableFetchRow(TableFuncScanState *state)
>> +* XmlTable returns table - set of composite values. The error 
>> context, is
>> +* used for producement more values, between two calls, there can be
>> +* created and used another libxml2 error context. ...
>> 
>> Surafel, this typo existed twice in the original commit (357889eb1).
>> One instance was removed by Tom in 35cb574aa.  Should we simply fix the typo,
>> or borrow Julien/Tom's lanuage?
>> +   * Tuple at limit is needed for comparation in subsequent
>> +   * execution to detect ties.

What you have sent for xml.c is a clear improvement, but I am not sure
if that's the best we can do.  So I have left this part out, and
applied the rest.

PS: I am also quite fond of "unbetimes".
--
Michael


signature.asc
Description: PGP signature


Re: Parallel INSERT (INTO ... SELECT ...)

2020-11-01 Thread Greg Nancarrow
On Fri, Oct 30, 2020 at 5:00 PM Amit Kapila  wrote:
>
> > So then to avoid
> > errors from when parallel-mode is forced on and such unsafe INSERTs
> > are run, the only real choice is to only allow parallelModeNeeded to
> > be true for SELECT only (not INSERT), and this is kind of cheating and
> > also not picking up cases where parallel-safe INSERT is run but
> > invokes parallel-mode-unsafe features.
> > My conclusion, at least for the moment, is to leave the check where it is.
> >
>
> Okay, then can we integrate the functionality of
> MaxParallelHazardForModify in max_parallel_hazard? Calling it
> separately looks bit awkward.
>

Looking into that.

> >
> > > Few other comments on latest patch:
> > > ===
> > > 1.
> > > MaxRelParallelHazardForModify()
> > > {
> > > ..
> > > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
> > > + {
> > > + /*
> > > ..
> > >
> > > Why to check CMD_UPDATE here?
> > >
> >
> > That was a bit of forward-thinking, for when/if UPDATE/DELETE is
> > supported in parallel-mode.
> > Column default expressions and check-constraints are only applicable
> > to INSERT and UPDATE.
> > Note however that currently this function can only ever be called with
> > commandType == CMD_INSERT.
> >
>
> I feel then for other command types there should be an Assert rather
> than try to handle something which is not yet implemented nor it is
> clear what all is required for that. It confuses the reader, at least
> it confused me. Probably we can write a comment but I don't think we
> should have any check for Update at this stage of work.
>

OK, for now I'll restrict the checks to INSERT, but I'll add comments
to assist with potential future UPDATE support.

> > > 2.
> > > +void PrepareParallelModeForModify(CmdType commandType, bool
> > > isParallelModifyLeader)
> > > +{
> > > + Assert(!IsInParallelMode());
> > > +
> > > + if (isParallelModifyLeader)
> > > + (void)GetCurrentCommandId(true);
> > > +
> > > + (void)GetCurrentFullTransactionId();
> > >
> > > Here, we should use GetCurrentTransactionId() similar to heap_insert
> > > or other heap operations. I am not sure why you have used
> > > GetCurrentFullTransactionId?
> > >
> >
> > GetCurrentTransactionId() and GetCurrentFullTransactionId() actually
> > have the same functionality, just a different return value (which is
> > not being used here).
> >
>
> Sure but lets use what is required.
>
> > But anyway I've changed it to use GetCurrentTransactionId().
> >
>
> But comments in ExecutePlan and PrepareParallelModeForModify still
> refer to FullTransactionId.
>

I believe those comments are technically correct.
GetCurrentTransactionId() calls AssignTransactionId() to do all the
work - and the comment for that function says "Assigns a new permanent
FullTransactionId to the given TransactionState".

> >
> >
> > > 4. Have you checked the overhead of this on the planner for different
> > > kinds of statements like inserts into tables having 100 or 500
> > > partitions? Similarly, it is good to check the overhead of domain
> > > related checks added in the patch.
> > >
> >
> > Checking that now and will post results soon.
> >
>

I am seeing a fair bit of overhead in the planning for the INSERT
parallel-safety checks (mind you, compared to the overall performance
gain, it's not too bad).
Some representative timings for a parallel INSERT of a millions rows
into 100,250 and 500 partitions are shown below.

(1) Without patch

# PartitionsPlanning Time (ms)
Execution Time (ms)
1001.014
   4176.435
2500.404
   3842.414
5000.529
   4440.633


(2) With Parallel INSERT patch

# PartitionsPlanning Time (ms)
Execution Time (ms)
10011.420
  2131.148
25023.269
  3472.259
50036.531
  3238.868

I'm looking into how this can be improved by better integration into
the current code, and addressing locking concerns that you've
previously mentioned.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: should INSERT SELECT use a BulkInsertState?

2020-11-01 Thread Luc Vlaming

On 30.10.20 05:51, Justin Pryzby wrote:

On Thu, Oct 22, 2020 at 01:29:53PM +0100, Simon Riggs wrote:

On Fri, 16 Oct 2020 at 22:05, Justin Pryzby  wrote:


I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on 
that.


I think it would be better if this was self-tuning. So that we don't
allocate a bulkinsert state until we've done say 100 (?) rows
inserted.


I made it an optional, non-default behavior in response to the legitimate
concern for performance regression for the cases where a loader needs to be as
fast as possible - as compared with our case, where we want instead to optimize
for our reports by making the loaders responsible for their own writes, rather
than leaving behind many dirty pages, and clobbering the cache, too.

Also, INSERT SELECT doesn't immediately help us (telsasoft), since we use
INSERT .. VALUES () .. ON CONFLICT.  This would handle that case, which is
great, even though that wasn't a design goal.  It could also be an integer GUC
to allow configuring the size of the ring buffer.


You should also use table_multi_insert() since that will give further
performance gains by reducing block access overheads. Switching from
single row to multi-row should also only happen once we've loaded a
few rows, so we don't introduce overahads for smaller SQL statements.


Good idea...multi_insert (which reduces the overhead of individual inserts) is
mostly independent from BulkInsert state (which uses a ring-buffer to avoid
dirtying the cache).  I made this 0002.

This makes INSERT SELECT several times faster, and not clobber the cache too.

Time: 4700.606 ms (00:04.701)
123 |  1
 37 |  2
 20 |  3
 11 |  4
   4537 |  5
  11656 |

Time: 1125.302 ms (00:01.125)
   2171 |  1
 37 |  2
 20 |  3
 11 |  4
111 |  5
  14034 |

When enabled, this passes nearly all regression tests, and all but 2 of the
changes are easily understood.  The 2nd patch still needs work.



Hi,

Came across this thread because I'm working on an improvement for the 
relation extension to improve the speed of the bulkinsert itself in 
(highly) parallel cases and would like to make sure that our approaches 
work nicely together.


Given what I've seen and tried so far with various benchmarks I would 
also really like to see a different approach here. The "BEGIN BULK" can 
be problematic for example if you mix small amounts of inserts and big 
amounts in the same transaction, or if your application possibly does a 
bulk insert but otherwise mostly OLTP transactions.


To me the idea from Simon sounds good to only use a bulk insert state 
after inserting e.g. a 1000 rows, and this also seems more applicable to 
most applications compared to requiring a change to any application that 
wishes to have faster ingest.


Another approach could be to combine this, for example, with a few extra 
requirements to limit the amount of regressions and first learn more how 
this behaves in the field.

We could, for example, only (just throwing out some ideas), require that:
- the relation has a certain size
- a BufferStrategy a maximum certain size is used
- there is a certain amount of lock waiters on relation extension. (like 
we do with bulk extend)
- we have extended the relation for at least e.g. 4 MB and not used the 
FSM anymore thereby proving that we are doing bulk operations instead of 
random small extensions everywhere into the relation that use the FSM.


Another thing is that we first try to improve the bulk operation 
facilities in general and then have another shot at this? Not sure if 
there is some benchmark / query that shows where such a 10x slowdown 
would appear but maybe that would be worth a look as well possibly.


Regards,
Luc




Re: Collation versioning

2020-11-01 Thread Julien Rouhaud
On Sat, Oct 31, 2020 at 10:28 AM Thomas Munro  wrote:
>
> On Fri, Oct 30, 2020 at 1:20 AM Thomas Munro  wrote:
> > 4.  I didn't really like the use of '' for unknown.  I figured out how
> > to use NULL for that.
>
> Hrmph.  That logic didn't work for the pattern ops case.  I fixed
> that, by partially reintroducing a special value, but this time
> restricting the code that knows about that to just pg_dump, and I used
> the value 'unknown', so really it's not special at all as far as the
> server is concerned and there is only one kind of warning message.

Ok, I'm fine with that.

> Furthermore, I realised that I really don't like the policy of
> assuming that all text-related indexes imported from older releases
> need the "unknown" warning.  That'll just make this feature
> unnecessarily noisy and unpopular when 14 comes out, essentially
> crying wolf, even though it's technically true that the collations in
> imported-from-before-14 indexes are of unknown version.  Even worse,
> instructions might be shared around the internet to show how to shut
> the warnings up without reindexing, and then later when there really
> is a version change, someone might find those instructions and follow
> them!  So I propose that the default should be to assume that indexes
> are not corrupted, unless you opt into the more pessimistic policy
> with --index-collation-versions-unknown.  Done like that.

Yes, I was also worried about spamming this kind of messages after an
upgrade.  Note that this was initially planned for REL_13_STABLE,
whose release date was very close to glibc 2.28, so at that time this
would have been more likely to have a real corruption on the indexes.
I'm fine with the new behavior.

> I also realised that I don't like carrying a bunch of C code to
> support binary upgrades, when it's really just a hand-coded trivial
> UPDATE of pg_depend.  Is there any reason pg_dump --binary-upgrade
> shouldn't just dump UPDATE statements, and make this whole feature a
> lot less mysterious, and shorter?  Done like that.

I just thought that it wouldn't be acceptable to do plain DML on the
catalogs.  If that's ok, then definitely this approach is better.

> While testing on another OS that will be encountered in the build farm
> when I commit this, I realised that I needed to add --encoding=UTF8 to
> tests under src/bin/pg_dump and src/test/locale, because they now do
> things with ICU collations (if built with ICU support) and that only
> works with UTF8.

Oh I didn't know that.

> Another option would be to find a way to skip those
> tests if the encoding is not UTF8.  Hmm, I wonder if it's bad to
> effectively remove the testing that comes for free from buildfarm
> animals running this under non-UTF8 encodings; but if we actually
> valued that, I suppose we'd do it explicitly as another test pass with
> SQL_ASCII.

I guess it would be better to keep checking non-UTF8 encodings, but
the current approach looks quite random and I don't have any better
suggestions.

Note that v34 now fails when run on a without that don't have
defined(__GLIBC__) (e.g. macos).  The failure are in
collate.icu.utf8.sql, of the form:

- icuidx06_d_en_fr_ga   | "default"  | up to date
+ icuidx06_d_en_fr_ga   | "default"  | version not tracked

Given the new approach, the only option I can see is to simply remove
any attempt to cover the default collation in the tests.  An
alternative file would be a pain to maintain and it wouldn't bring any
value apart from checking that the default collation is either always
tracked or never, but not a mix of those.




reindex partitioned indexes: refactor ReindexRelationConcurrently ?

2020-11-01 Thread Justin Pryzby
This is a new thread about possible refactoring and commit a6642b3ae
("Add support for partitioned tables and indexes in REINDEX")

Currently, ReindexPartitions() calls ReindexMultipleInternal() which loops
around ReindexRelationConcurrently(), similar to what's done with
ReindexMultipleTables()

Contrast that with ReindexTable(), which calls ReindexRelationConcurrently()
with the table's OID, which then handles all indexes at once:

|postgres=# REINDEX TABLE CONCURRENTLY onek;
|DEBUG:  0: building index "onek_unique1_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  0: index "onek_unique1_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  0: building index "onek_unique2_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  0: index "onek_unique2_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  0: building index "onek_hundred_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  0: index "onek_hundred_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  0: building index "onek_stringu1_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  0: index "onek_stringu1_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  0: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  0: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  0: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  0: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|REINDEX

Should we refactor ReindexRelationConcurrently() into two functions ?  One to
build a list of indexes on a relation, and one to concurrently reindex an
list of indexes.  Then, ReindexPartitions() would make a list of leaf indexes
of a partitioned index, or leaf indexes of partitioned indexes of a partitioned
table, and then reindex those indexes all at once.  For CIC, we could call
that from DefineIndex().

The reason is that concurrent Reindex must wait for longrunning transactions,
and if we call it in a loop, then we wait for longrunning transactions N times.
I can imagine scenarios where it's easy for an DBA to schedule maintenance to
do reindex concurrently and restart processes to allow the reindex to proceed.
But it might be infeasible to restart processes every 5min for 3 hours to allow
reindex to proceed on each partition.

ReindexMultipleTables avoids doing that to avoid deadlocks, which makes great
sense for REINDEX SCHEMA/DATABASE/SYSTEM.  But I wonder if that reasoning
doesn't apply to partitioned tables.

We currently have this:

ReindexIndex()
=> ReindexPartitions
=> ReindexRelationConcurrently
=> reindex_index
ReindexTable()
=> ReindexPartitions
=> ReindexRelationConcurrently
=> reindex_relation
ReindexPartitions()
=> ReindexMultipleInternal()
=> ReindexRelationConcurrently()
=> reindex_relation()
=> reindex_index()

And I'm proposing to consider this:

ReindexIndex()
=> ReindexPartitions
=> ReindexIndexesConcurrently
=> reindex_index
ReindexTable()
=> ReindexPartitions
=> ReindexRelationConcurrently - this would be just a wrapper to 
collect the list of index Oids
=> reindex_relation
ReindexPartitions - this exists mainly to make a list of all leaf indexes on 
all childs of a partitioned table
=> ReindexIndexesConcurrently - this processes all indexes at once
=> ReindexMultipleInternal - this loops around everything
=> ReindexRelationConcurrently()
=> reindex_index()
=> reindex_relation()
ReindexRelationConcurrently
=> ReindexIndexesConcurrently - this is the worker function factored 
out of ReindexRelationConcurrently

I'm not sure if I'm missing any opportunities to simplify...

So then it's processed similar to REINDEX TABLE (rather than REINDEX DATABASE).

|postgres=# REINDEX TABLE CONCURRENTLY hash_parted;
|DEBUG:  building index "hpart0_a_idx_ccnew" on table "hpart0" serially
|DEBUG:  index "hpart0_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart1_a_idx_ccnew" on table "hpart1" serially
|DEBUG:  index "hpart1_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart2_a_idx_ccnew" on table "hpart2" serially
|DEBUG:  index "hpart2_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart3_a_idx_ccnew" on table "hpart3" serially
|DEBUG:  index "hpart3_a_

Re: Parallel copy

2020-11-01 Thread Heikki Linnakangas

On 02/11/2020 08:14, Amit Kapila wrote:

On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas  wrote:


Leader process:

The leader process is simple. It picks the next FREE buffer, fills it
with raw data from the file, and marks it as FILLED. If no buffers are
FREE, wait.

Worker process:

1. Claim next READY block from queue, by changing its state to
 PROCESSING. If the next block is not READY yet, wait until it is.

2. Start scanning the block from 'startpos', finding end-of-line
 markers. (in CSV mode, need to track when we're in-quotes).

3. When you reach the end of the block, if the last line continues to
 next block, wait for the next block to become FILLED. Peek into the
 next block, and copy the remaining part of the split line to a local
 buffer, and set the 'startpos' on the next block to point to the end
 of the split line. Mark the next block as READY.

4. Process all the lines in the block, call input functions, insert
 rows.

5. Mark the block as DONE.

In this design, you don't need to keep line boundaries in shared memory,
because each worker process is responsible for finding the line
boundaries of its own block.

There's a point of serialization here, in that the next block cannot be
processed, until the worker working on the previous block has finished
scanning the EOLs, and set the starting position on the next block,
putting it in READY state. That's not very different from your patch,
where you had a similar point of serialization because the leader
scanned the EOLs,


But in the design (single producer multiple consumer) used by the
patch the worker doesn't need to wait till the complete block is
processed, it can start processing the lines already found. This will
also allow workers to start much earlier to process the data as it
doesn't need to wait for all the offsets corresponding to 64K block
ready. However, in the design where each worker is processing the 64K
block, it can lead to much longer waits. I think this will impact the
Copy STDIN case more where in most cases (200-300 bytes tuples) we
receive line-by-line from client and find the line-endings by leader.
If the leader doesn't find the line-endings the workers need to wait
till the leader fill the entire 64K chunk, OTOH, with current approach
the worker can start as soon as leader is able to populate some
minimum number of line-endings


You can use a smaller block size. However, the point of parallel copy is 
to maximize bandwidth. If the workers ever have to sit idle, it means 
that the bottleneck is in receiving data from the client, i.e. the 
backend is fast enough, and you can't make the overall COPY finish any 
faster no matter how you do it.



The other point is that the leader backend won't be used completely as
it is only doing a very small part (primarily reading the file) of the
overall work.


An idle process doesn't cost anything. If you have free CPU resources, 
use more workers.



We have discussed both these approaches (a) single producer multiple
consumer, and (b) all workers doing the processing as you are saying
in the beginning and concluded that (a) is better, see some of the
relevant emails [1][2][3].

[1] - 
https://www.postgresql.org/message-id/20200413201633.cki4nsptynq7blhg%40alap3.anarazel.de
[2] - 
https://www.postgresql.org/message-id/20200415181913.4gjqcnuzxfzbbzxa%40alap3.anarazel.de
[3] - 
https://www.postgresql.org/message-id/78C0107E-62F2-4F76-BFD8-34C73B716944%40anarazel.de


Sorry I'm late to the party. I don't think the design I proposed was 
discussed in that threads. The alternative that's discussed in that 
thread seems to be something much more fine-grained, where processes 
claim individual lines. I'm not sure though, I didn't fully understand 
the alternative designs.


I want to throw out one more idea. It's an interim step, not the 	 final 
solution we want, but a useful step in getting there:


Have the leader process scan the input for line-endings. Split the input 
data into blocks of slightly under 64 kB in size, so that a line never 
crosses a block. Put the blocks in shared memory.


A worker process claims a block from shared memory, processes it from 
beginning to end. It *also* has to parse the input to split it into lines.


In this design, the line-splitting is done twice. That's clearly not 
optimal, and we want to avoid that in the final patch, but I think it 
would be a useful milestone. After that patch is done, write another 
patch to either a) implement the design I sketched, where blocks are 
fixed-size and a worker notifies the next worker on where the first line 
in next block begins, or b) have the leader process report the 
line-ending positions in shared memory, so that workers don't need to 
scan them again.


Even if we apply the patches together, I think splitting them like that 
would make for easier review.


- Heikki




Re: Online checksums verification in the backend

2020-11-01 Thread Michael Paquier
On Sun, Nov 01, 2020 at 05:50:06PM -0800, Andres Freund wrote:
> On 2020-11-02 10:45:00 +0900, Michael Paquier wrote:
> > On 2020-11-02 10:05:25 +0900, Michael Paquier wrote:
> > > > There is no place doing that on HEAD.
> > > 
> > > Err?
> > > How is this not doing IO while holding a buffer mapping lock?
> > 
> > Well, other than the one we are discussing of course :)
> 
> I am not following? Were you just confirming that its not a thing we do?

I meant that this is not done in any place other than the one
introduced by c780a7a.  So we have one place where it happens, and
no places before c780a7a.
--
Michael


signature.asc
Description: PGP signature


Reduce the number of special cases to build contrib modules on windows

2020-11-01 Thread David Rowley
Hi,

At the moment we do very basic parsing of makefiles to build the
visual studio project file in order to build our contrib modules on
MSVC.  This parsing is quite basic and still requires a number of
special cases to get enough information into the project file in order
for the build to succeed.  It might be nice if we could reduce some of
those special cases so that:

a) We reduce the amount of work specific to windows when we add new
contrib modules, and;
b) We can work towards a better way for people to build their own
extensions on windows.

I admit to not being much of an expert in either perl or make, but I
came up with the attached which does allow a good number of the
special cases to be removed.

I'm keen to get some feedback on this idea.

Patch attached.

David
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90594bd41b..491a465e2f 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -32,16 +32,13 @@ my $libpq;
 my @unlink_on_exit;
 
 # Set of variables for modules in contrib/ and src/test/modules/
-my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport   = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my $contrib_defines = undef;
+my @contrib_uselibpq = undef;
+my @contrib_uselibpgport   = ('pg_standby');
+my @contrib_uselibpgcommon = ('pg_standby');
 my $contrib_extralibs  = undef;
 my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
-my $contrib_extrasource = {
-   'cube' => [ 'contrib/cube/cubescan.l', 'contrib/cube/cubeparse.y' ],
-   'seg'  => [ 'contrib/seg/segscan.l',   'contrib/seg/segparse.y' ],
-};
+my $contrib_extrasource = undef;
 my @contrib_excludes = (
'bool_plperl',  'commit_ts',
'hstore_plperl','hstore_plpython',
@@ -163,7 +160,7 @@ sub mkvcbuild
$postgres = $solution->AddProject('postgres', 'exe', '', 'src/backend');
$postgres->AddIncludeDir('src/backend');
$postgres->AddDir('src/backend/port/win32');
-   $postgres->AddFile('src/backend/utils/fmgrtab.c');
+   $postgres->AddFile('src/backend/utils/fmgrtab.c', 1);
$postgres->ReplaceFile('src/backend/port/pg_sema.c',
'src/backend/port/win32_sema.c');
$postgres->ReplaceFile('src/backend/port/pg_shmem.c',
@@ -316,8 +313,8 @@ sub mkvcbuild
 
my $pgregress_ecpg =
  $solution->AddProject('pg_regress_ecpg', 'exe', 'misc');
-   $pgregress_ecpg->AddFile('src/interfaces/ecpg/test/pg_regress_ecpg.c');
-   $pgregress_ecpg->AddFile('src/test/regress/pg_regress.c');
+   $pgregress_ecpg->AddFile('src/interfaces/ecpg/test/pg_regress_ecpg.c', 
1);
+   $pgregress_ecpg->AddFile('src/test/regress/pg_regress.c', 1);
$pgregress_ecpg->AddIncludeDir('src/port');
$pgregress_ecpg->AddIncludeDir('src/test/regress');
$pgregress_ecpg->AddDefine('HOST_TUPLE="i686-pc-win32vc"');
@@ -327,10 +324,10 @@ sub mkvcbuild
 
my $isolation_tester =
  $solution->AddProject('isolationtester', 'exe', 'misc');
-   $isolation_tester->AddFile('src/test/isolation/isolationtester.c');
-   $isolation_tester->AddFile('src/test/isolation/specparse.y');
-   $isolation_tester->AddFile('src/test/isolation/specscanner.l');
-   $isolation_tester->AddFile('src/test/isolation/specparse.c');
+   $isolation_tester->AddFile('src/test/isolation/isolationtester.c', 1);
+   $isolation_tester->AddFile('src/test/isolation/specparse.y', 1);
+   $isolation_tester->AddFile('src/test/isolation/specscanner.l', 1);
+   $isolation_tester->AddFile('src/test/isolation/specparse.c', 1);
$isolation_tester->AddIncludeDir('src/test/isolation');
$isolation_tester->AddIncludeDir('src/port');
$isolation_tester->AddIncludeDir('src/test/regress');
@@ -342,8 +339,8 @@ sub mkvcbuild
 
my $pgregress_isolation =
  $solution->AddProject('pg_isolation_regress', 'exe', 'misc');
-   $pgregress_isolation->AddFile('src/test/isolation/isolation_main.c');
-   $pgregress_isolation->AddFile('src/test/regress/pg_regress.c');
+   $pgregress_isolation->AddFile('src/test/isolation/isolation_main.c', 1);
+   $pgregress_isolation->AddFile('src/test/regress/pg_regress.c', 1);
$pgregress_isolation->AddIncludeDir('src/port');
$pgregress_isolation->AddIncludeDir('src/test/regress');
$pgregress_isolation->AddDefine('HOST_TUPLE="i686-pc-win32vc"');
@@ -363,22 +360,22 @@ sub mkvcbuild
}
 
my $pgbasebackup = AddSimpleFrontend('pg_basebackup', 1);
-   $pgbasebackup->AddFile('src/bin/pg_basebackup/pg_basebackup.c');
+   $pgbasebackup->AddFile('src/bin/pg_basebackup/pg_basebackup.c', 1);
$pgbasebackup->AddLibrary('ws2_32.lib');
 
my $pgreceivewal = AddSimpleFrontend('pg_basebackup', 1);

Re: making update/delete of inheritance trees scale better

2020-11-01 Thread Amit Langote
On Sat, Oct 31, 2020 at 7:26 AM Heikki Linnakangas  wrote:
> On 31/10/2020 00:12, Tom Lane wrote:
> > Heikki Linnakangas  writes:
> >>  But if you do:
> >
> >> postgres=# explain verbose update tab set a = 1, b = 2;
> >>  QUERY PLAN
> >> -
> >>Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
> >>  ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327
> >> width=14)
> >>Output: 1, 2, ctid
> >
> >> The Modify Table will still fetch the old tuple, but in this case, it's
> >> not really necessary, because both columns are overwritten.
> >
> > Ah, that I believe.  Not sure it's a common enough case to spend cycles
> > looking for, though.
> >
> > In any case, we still have to access the old tuple, don't we?
> > To lock it and update its t_ctid, whether or not we have use for
> > its user columns.  Maybe there's some gain from not having to
> > deconstruct the tuple, but it doesn't seem like it'd be much.

With the patched, the old tuple fetched by ModifyTable node will not
be deconstructed in this case, because all the values needed to form
the new tuple will be obtained from the plan's output tuple, so there
is no need to read the user columns from the old tuple.  Given that,
it indeed sounds a bit wasteful to have read the tuple as Heikki
points out, but again, that's in a rare case.

> Yeah, you need to access the old tuple to update its t_ctid, but
> accessing it twice is still more expensive than accessing it once. Maybe
> you could optimize it somewhat by keeping the buffer pinned or
> something.

The buffer containing the old tuple is already pinned first when
ExecModifyTable() fetches the tuple to form the new tuple, and then
when, in this example, heap_update() fetches it to update the old
tuple contents.

> Or push the responsibility down to the table AM, passing the
> AM only the modified columns, and let the AM figure out how to deal with
> the columns that were not modified, hoping that it can do something smart.

That sounds interesting, but maybe a sizable project on its own?

Thanks a lot for taking a look at this, BTW.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: Parallel copy

2020-11-01 Thread Heikki Linnakangas

On 02/11/2020 09:10, Heikki Linnakangas wrote:

On 02/11/2020 08:14, Amit Kapila wrote:

We have discussed both these approaches (a) single producer multiple
consumer, and (b) all workers doing the processing as you are saying
in the beginning and concluded that (a) is better, see some of the
relevant emails [1][2][3].

[1] - 
https://www.postgresql.org/message-id/20200413201633.cki4nsptynq7blhg%40alap3.anarazel.de
[2] - 
https://www.postgresql.org/message-id/20200415181913.4gjqcnuzxfzbbzxa%40alap3.anarazel.de
[3] - 
https://www.postgresql.org/message-id/78C0107E-62F2-4F76-BFD8-34C73B716944%40anarazel.de


Sorry I'm late to the party. I don't think the design I proposed was
discussed in that threads. The alternative that's discussed in that
thread seems to be something much more fine-grained, where processes
claim individual lines. I'm not sure though, I didn't fully understand
the alternative designs.


I read the thread more carefully, and I think Robert had basically the 
right idea here 
(https://www.postgresql.org/message-id/CA%2BTgmoZMU4az9MmdJtg04pjRa0wmWQtmoMxttdxNrupYJNcR3w%40mail.gmail.com):



I really think we don't want a single worker in charge of finding
tuple boundaries for everybody. That adds a lot of unnecessary
inter-process communication and synchronization. Each process should
just get the next tuple starting after where the last one ended, and
then advance the end pointer so that the next process can do the same
thing. [...]


And here 
(https://www.postgresql.org/message-id/CA%2BTgmoZw%2BF3y%2BoaxEsHEZBxdL1x1KAJ7pRMNgCqX0WjmjGNLrA%40mail.gmail.com):



On Thu, Apr 9, 2020 at 2:55 PM Andres Freund

 wrote:

I'm fairly certain that we do *not* want to distribute input data
between processes on a single tuple basis. Probably not even below
a few

hundred kb. If there's any sort of natural clustering in the loaded data
- extremely common, think timestamps - splitting on a granular basis
will make indexing much more expensive. And have a lot more contention.


That's a fair point. I think the solution ought to be that once any
process starts finding line endings, it continues until it's grabbed
at least a certain amount of data for itself. Then it stops and lets
some other process grab a chunk of data.
Yes! That's pretty close to the design I sketched. I imagined that the 
leader would divide the input into 64 kB blocks, and each block would 
have  few metadata fields, notably the starting position of the first 
line in the block. I think Robert envisioned having a single "next 
starting position" field in shared memory. That works too, and is even 
simpler, so +1 for that.


For some reason, the discussion took a different turn from there, to 
discuss how the line-endings (called "chunks" in the discussion) should 
be represented in shared memory. But none of that is necessary with 
Robert's design.


- Heikki