date:20180928

Re: [HACKERS] kqueue

2018-09-28 Thread Matteo Beccati

Hi Thomas,

On 28/09/2018 00:55, Thomas Munro wrote:
> I would like to commit this patch for PostgreSQL 12, based on this
> report.  We know it helps performance on macOS developer machines and
> big FreeBSD servers, and it is the right kernel interface for the job
> on principle.  Matteo Beccati reported a 5-10% performance drop on a
> low-end Celeron NetBSD box which we have no explanation for, and we
> have no reports from server-class machines on that OS -- so perhaps we
> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> NetBSD until someone can figure out what needs to be fixed there
> (possibly on the NetBSD side)?

Thanks for keeping me in the loop.

Out of curiosity (and time permitting) I'll try to spin up a NetBSD 8 VM
and run some benchmarks, but I guess we should leave it up to the pkgsrc
people to eventually change the build flags.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/

Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-09-28 Thread David Rowley

On 28 September 2018 at 15:12, Michael Paquier  wrote:
> On Fri, Sep 28, 2018 at 02:46:30PM +1200, David Rowley wrote:
>> I don't agree that we can skip explaining why one of the optimisations
>> can't be applied just because we've explained why a similar
>> optimisation cannot be applied somewhere close by.  I think that the
>> WAL/FSM optimisation can fairly easily be improved on and probably
>> fixed in PG12 as we can just lazily determine per-partition if it can
>> be applied to that partition or not.
>
> Have you guys looked at what the following patch does for partitions and
> how it interacts with it?
> https://commitfest.postgresql.org/19/528/

I've glanced at it. I don't think we're taking anything in the wrong
direction. The patch looks like it would need rebased if this gets in
first.

> The proposed patch is missing the point that documentation also mentions
> the optimizations for COPY with wal_level = minimal:
>
> COPY is fastest when used within the same
> transaction as an earlier CREATE TABLE or
> TRUNCATE command. In such cases no WAL
> needs to be written, because in case of an error, the files
> containing the newly loaded data will be removed anyway.
> However, this consideration only applies when
>  is minimal as all 
> commands
> must write WAL otherwise.
>

I've edited that in the attached patch.  Also reworded a comment that
Amit mentioned and made a small change to the COPY FREEZE docs to
mention no support for partitioned tables.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


fix_incorrect_setting_of_hi_options_for_partitioned_tables_v2.patch
Description: Binary data

Re: Collation versioning

2018-09-28 Thread Christoph Berg

Re: Thomas Munro 2018-09-27 

> > > 4.  After creating a new database, update that row as appropriate in
> > > the new database (!).  Or find some other way to write a new table out
> > > and switch it around, or something like that.
> >
> > I've been hatching this exact scheme since the very beginning, even
> > thinking about using the background session functionality to do this.
> > It would solve a lot of problems, but there is the question of exactly
> > how to do that "(!)" part.

Making (!) work would also allow reassigning the "public" schema to
the database owner. That would fix that gross security gap that is
left with the default search_path, while still keeping usability. It
would make a whole lot of sense to work on making this feasible.

Christoph

Re: Libpq support to connect to standby server as priority

2018-09-28 Thread Haribabu Kommi

On Thu, Jul 19, 2018 at 10:59 PM Haribabu Kommi 
wrote:

>
> On Wed, Jul 18, 2018 at 10:53 PM Robert Haas 
> wrote:
>
>> On Wed, Jul 4, 2018 at 9:14 AM, Laurenz Albe 
>> wrote:
>> >   What about keeping the first successful connection open and storing
>> it in a
>> >   variable if we are in "prefer-read" mode.
>> >   If we get the read-only connection we desire, close that cached
>> connection,
>> >   otherwise use it.
>>
>> I like this idea.  If I recall correctly, the logic in this area is
>> getting pretty complex, so we might need to refactor it for better
>> readability and maintainability.
>>
>
> OK. I will work on the code refactoring first and then provide the
> prefer-read option on top it.
>

commits d1c6a14bacf and 5ca00774194 have refactored the logic
of handling the different connection states.

Attached is a rebased patch after further refactoring the new option
code for easier maintenance.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Allow-taget-session-attrs-to-accept-prefer-read-opti_v3.patch
Description: Binary data

Re: transction_timestamp() inside of procedures

2018-09-28 Thread Peter Eisentraut

On 26/09/2018 23:48, Peter Eisentraut wrote:
> On 26/09/2018 17:54, Alvaro Herrera wrote:
>> What could be the use for the transaction timestamp?  I think one of the
>> most important uses (at least in pg_stat_activity) is to verify that
>> transactions are not taking excessively long time to complete; that's
>> known to cause all sorts of trouble in Postgres, and probably other
>> DBMSs too.  If we don't accurately measure what it really is, and
>> instead keep the compatibility behavior, we risk panicking people
>> because they think some transaction has been running for a long time
>> when in reality it's just a very long procedure which commits frequently
>> enough not to be a problem.
> 
> That's certainly a good argument.  Note that if we implemented that the
> transaction timestamp is advanced inside procedures, that would also
> mean that the transaction timestamp as observed in pg_stat_activity
> would move during VACUUM, for example.  That might or might not be
> desirable.

Attached is a rough implementation.

I'd be mildly in favor of doing this, but we have mentioned tradeoffs in
this thread.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 2a8c214479adb4b98c8b6c95875d8bebd88cb940 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 28 Sep 2018 09:33:24 +0200
Subject: [PATCH] Advance transaction timestamp in intra-procedure transactions

---
 src/backend/access/transam/xact.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 9aa63c8792..245735420c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1884,14 +1884,19 @@ StartTransaction(void)
TRACE_POSTGRESQL_TRANSACTION_START(vxid.localTransactionId);
 
/*
-* set transaction_timestamp() (a/k/a now()).  We want this to be the 
same
-* as the first command's statement_timestamp(), so don't do a fresh
-* GetCurrentTimestamp() call (which'd be expensive anyway).  Also, mark
-* xactStopTimestamp as unset.
-*/
-   xactStartTimestamp = stmtStartTimestamp;
-   xactStopTimestamp = 0;
+* set transaction_timestamp() (a/k/a now()).  Normally, we want this to
+* be the same as the first command's statement_timestamp(), so don't 
do a
+* fresh GetCurrentTimestamp() call (which'd be expensive anyway).  But
+* for transactions started inside statements (e.g., procedure calls), 
we
+* want to advance the timestamp.
+*/
+   if (xactStartTimestamp < stmtStartTimestamp)
+   xactStartTimestamp = stmtStartTimestamp;
+   else
+   xactStartTimestamp = GetCurrentTimestamp();
pgstat_report_xact_timestamp(xactStartTimestamp);
+   /* Mark xactStopTimestamp as unset. */
+   xactStopTimestamp = 0;
 
/*
 * initialize current transaction state fields
-- 
2.19.0

Re: executor relation handling

2018-09-28 Thread David Rowley

On 28 September 2018 at 20:00, Amit Langote
 wrote:
> I've made minor tweaks, which find in
> the attached updated patches (a .diff file containing changes from v6 to
> v7 is also attached).

Thanks for looking over the changes.

I've looked at the v6 to v7 diff and it seems all good, apart from:

+ * The following asserts that the necessary lock on the relation

I think we maybe should switch the word "assert" for "verifies". The
Assert is just checking we didn't get a NoLock and I don't think
you're using "assert" meaning the Assert() marco, so likely should be
changed to avoid confusion.

Apart from that, I see nothing wrong with the patches, so I think we
should get someone else to look. I'm marking it as ready for
committer.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Removing LEFT JOINs in more cases

2018-09-28 Thread David Rowley

On 18 September 2018 at 20:02, Antonin Houska  wrote:
> Following are the findings from my review.

Thanks for reviewing this.  Time is short in this commitfest to make
any changes, so I'm going to return this with feedback with the
intention of addressing the items from your review for the next 'fest.

Cheers

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

RE: [HACKERS] Cached plans and statement generalization

2018-09-28 Thread Yamaji, Ryo

On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:

> I have registered the patch for next commitfest.
> For some reasons it doesn't find the latest autoprepare-10.patch and still
> refer to autoprepare-6.patch as the latest attachement.

I'm sorry for the late reply. I'm currently reviewing autoprepare.
I could not make it in time for the commit fests in September, 
but I will continue to review for the next commitfest.

Performance tests are good results. The results are shown below.
pgbench -s 100 -c 8 -t 1 -S (average of 3 trials)
- all autoprepare statements use same memory context.
18052.22706 TPS
- each autoprepare statement use separate memory context.
18607.95889 TPS
- calculate memory usage (autoprepare_memory_limit)
19171.60457 TPS

From the above results, I think that adding/changing functions
will not affect performance. I am currently checking the behavior
when autoprepare_memory_limit is specified.

Re: executor relation handling

2018-09-28 Thread David Rowley

On 28 September 2018 at 20:28, Amit Langote
 wrote:
> On 2018/09/28 17:21, David Rowley wrote:
>> I think we maybe should switch the word "assert" for "verifies". The
>> Assert is just checking we didn't get a NoLock and I don't think
>> you're using "assert" meaning the Assert() marco, so likely should be
>> changed to avoid confusion.
>
> Okay, I've revised the text in the attached updated patch.

Meh, I just noticed that the WARNING text claims "InitPlan" is the
function name. I think it's best to get rid of that. It's pretty much
redundant anyway if you do: \set VERBOSITY verbose

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgres, fsync, and OSs (specifically linux)

2018-09-28 Thread Thomas Munro

On Thu, Aug 30, 2018 at 2:44 PM Craig Ringer  wrote:
> On 15 August 2018 at 07:32, Thomas Munro  
> wrote:
>> I will soon post some more fix-up patches that add EXEC_BACKEND
>> support, Windows support, and a counting scheme to fix the timing
>> issue that I mentioned in my first review.  I will probably squash it
>> all down to a tidy patch-set after that.

I went down a bit of a rabbit hole with the Windows support for
Andres's patch set.  I have something that works as far as I can tell,
but my Windows environment consists of throwing things at Appveyor and
seeing what sticks, so I'm hoping that someone with a real Windows
system and knowledge will be able to comment.

New patches in this WIP patch set:

0012: Fix for EXEC_BACKEND.

0013: Windows.  This involved teaching latch.c to deal with Windows
asynchronous IO events, since you can't wait for pipe readiness via
WSAEventSelect.  Pipes and sockets exist in different dimensions on
Windows, and there are no "Unix" domain sockets (well, there are but
they aren't usable yet[1]).  An alternative would be to use TCP
sockets for this, and then the code would look more like the Unix
code, but that seems a bit strange.  Note that the Windows version
doesn't actually hand off file handles like the Unix code (it could
fairly easily, but there is no reason to think that would actually be
useful on that platform).  I may be way off here...

The 0013 patch also fixes a mistake in the 0010 patch: it is not
appropriate to call CFI() while waiting to notify the checkpointer of
a dirty segment, because then ^C could cause the following checkpoint
not to flush dirty data.  SendFsyncRequest() is essentially blocking,
except that it uses non-blocking IO so that it multiplex postmaster
death detection.

0014: Fix the ordering race condition mentioned upthread[2].  All
files are assigned an increasing sequence number after [re]opening (ie
before their first write), so that the checkpointer process can track
the fd that must have the oldest Linux f_wb_err that could be relevant
for writes done by PostgreSQL.

The other patches in this tarball are all as posted already, but are
now rebased and assembled in one place.  Also pushed to
https://github.com/macdice/postgres/tree/fsyncgate .

Thoughts?

[1] 
https://blogs.msdn.microsoft.com/commandline/2017/12/19/af_unix-comes-to-windows/
[2] 
https://www.postgresql.org/message-id/CAEepm%3D04ZCG_8N3m61kXZP-7Ecr02HUNNG-QsAhwyFLim4su2g%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

fsyncgate-v3.tgz
Description: GNU Zip compressed data

Re: Postgres, fsync, and OSs (specifically linux)

2018-09-28 Thread Thomas Munro

On Fri, Sep 28, 2018 at 9:37 PM Thomas Munro
 wrote:
> The 0013 patch also fixes a mistake in the 0010 patch: it is not
> appropriate to call CFI() while waiting to notify the checkpointer of
> a dirty segment, because then ^C could cause the following checkpoint
> not to flush dirty data.

(Though of course it wouldn't actually do that due to an LWLock being
held, but still, I removed the CFI because it was at best misleading).

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Slotification of partition tuple conversion

2018-09-28 Thread Amit Khandekar

On Wed, 26 Sep 2018 at 03:33, Andres Freund  wrote:
>
> Hi Amit,
>
> Could you rebase this patch, it doesn't apply anymore.

Thanks for informing. Attached are both mine and Amit Langote's patch
rebased and attached ...


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


v2-rebased-0001-Allocate-dedicated-slots-of-partition-tuple-conve.patch
Description: Binary data


v4_rebased-0002-Slotify-partition-tuple-conversion.patch
Description: Binary data

Re: Slotification of partition tuple conversion

2018-09-28 Thread Amit Langote

On 2018/09/28 19:06, Amit Khandekar wrote:
> On Wed, 26 Sep 2018 at 03:33, Andres Freund  wrote:
>>
>> Hi Amit,
>>
>> Could you rebase this patch, it doesn't apply anymore.
> 
> Thanks for informing. Attached are both mine and Amit Langote's patch
> rebased and attached ...

Thanks Amit for also taking care of the other one.  I don't have time
today, but will try to take a look on Monday.

Regards,
Amit

Re: transction_timestamp() inside of procedures

2018-09-28 Thread Bruce Momjian

On Wed, Sep 26, 2018 at 09:23:58PM +0200, Daniel Verite wrote:
>   Tom Lane wrote:
> 
> > I agree that it would be surprising for transaction timestamp to be newer
> > than statement timestamp. 
> 
> To me it's more surprising to start a new transaction and having
> transaction_timestamp() still pointing at the start of a previous 
> transaction.
> This feels like a side-effect of being spawned by a procedure,
> and an exception to what transaction_timestamp() normally means
> or meant until now.
> 
> OTOH transaction_timestamp() being possibly newer than
> statement_timestamp() seems like a normal consequence of
> transactions being created intra-statement.

Yes, that is a good point.  My thought has always been that statements
are inside of transactions, but the opposite is now possible.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Postgres 11 release notes

2018-09-28 Thread Bruce Momjian

On Fri, Sep 28, 2018 at 10:21:16AM +0900, Amit Langote wrote:
> On 2018/09/27 23:24, Alvaro Herrera wrote:
> > On 2018-Sep-27, Amit Langote wrote:
> > 
> >> Sorry I couldn't reply sooner, but the following of your proposed text
> >> needs to be updated a bit:
> >>
> >> +   
> >> +
> >> + Having a "default" partition for storing data that does not 
> >> match a
> >> + partition key
> >> +
> >> +   
> >>
> >> I think "does not match a partition key" is not accurate.  Description of
> >> default partitions further below in the release notes says this:
> >>
> >> "The default partition can store rows that don't match any of the other
> >> defined partitions, and is searched accordingly."
> >>
> >> So, we could perhaps write it as:
> >>
> >> Having a "default" partition for storing data that does not match any of
> >> the remaining partitions
> > 
> > Yeah, I agree that "a partition key" is not the right term to use there
> > (and that term is used in the press release text also).  However I don't
> > think "remaining" is the right word there either, because it sounds as
> > if you're removing something.
> >
> > For the Spanish translation of the press release, we ended up using the
> > equivalent of "for the data that does not match any other partition".
> 
> Yeah, "any other partition" is what the existing description uses too, so:
> 
> Having a "default" partition for storing data that does not match any
> other partition

Uh, what text are you talkinga about?  I see this text in the release
notes since May:

The default partition can store rows that don't match any of the
other defined partitions, and is searched accordingly.

The press release?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Problem while setting the fpw with SIGHUP

2018-09-28 Thread Amit Kapila

On Fri, Sep 28, 2018 at 4:23 AM Michael Paquier  wrote:
>
> On Thu, Sep 27, 2018 at 07:38:31PM +0530, Amit Kapila wrote:
> > Okay, I am planning to commit the attached patch tomorrow unless you
> > or anybody else has any objections to it.
>
> None from here.  Thanks for taking care of it.
>

Thanks, pushed!  I have backpatched till 9.5 as this has been
introduced by the commit 2c03216d83 which added the XLOG machinery
initialization in RecoveryInProgress code path.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pgsql: Build src/port files as a library with -fPIC, and use that in li

2018-09-28 Thread Christoph Berg

Re: Tom Lane 2018-09-27 
> Build src/port files as a library with -fPIC, and use that in libpq.

This made the "pqsignal" symbol disappear from libpq5.so:

13:27:55 dpkg-gensymbols: error: some symbols or patterns disappeared in the 
symbols file: see diff output below
13:27:55 dpkg-gensymbols: warning: debian/libpq5/DEBIAN/symbols doesn't match 
completely debian/libpq5.symbols
13:27:55 --- debian/libpq5.symbols 
(libpq5_12~~devel~20180928.1058-1~226.git92a0342.pgdg+1_amd64)
13:27:55 +++ dpkg-gensymbolsoXZn54  2018-09-28 11:27:55.499157237 +
13:27:55 @@ -168,7 +168,7 @@
13:27:55   pg_valid_server_encoding@Base 0
13:27:55   pg_valid_server_encoding_id@Base 8.3~beta1-2~
13:27:55   pgresStatus@Base 0
13:27:55 - pqsignal@Base 0
13:27:55 +#MISSING: 12~~devel~20180928.1058-1~226.git92a0342.pgdg+1# 
pqsignal@Base 0
13:27:55   printfPQExpBuffer@Base 0
13:27:55   resetPQExpBuffer@Base 0
13:27:55   termPQExpBuffer@Base 0

Is this is a problem for libpq5 users?

Christoph

Re: clarify documentation of BGW_NEVER_RESTART ?

2018-09-28 Thread Amit Kapila

On Wed, Sep 26, 2018 at 3:17 AM Chapman Flack  wrote:
>
> I did not notice until today that there is some ambiguity in
> this paragraph:
>
>   bgw_restart_time is the interval, in seconds, that postgres should
>   wait before restarting the process, in case it crashes. It can be
>   any positive value, or BGW_NEVER_RESTART, indicating not to restart
>   the process in case of a crash.
>
> I had been reading "in case _it_ crashes" and "in case of _a_ crash"
> as "in case _the background worker_ crashes", so I assumed with
> BGW_NEVER_RESTART I was saying I don't want my worker restarted if
> _it_ flakes out while PG is otherwise operating normally.
>
> But I was surprised when the unrelated crash of a different, normal
> backend left my background worker killed and never restarted. I had
> always regarded the fatal-error kick-out-all-backends-and-recover
> handling as essentially equivalent to a PG restart, so I had expected
> it to start the bgworker over just as a real restart would.
>
> But sure enough, ResetBackgroundWorkerCrashTimes() gets called in
> that case, and treats every worker with BGW_NEVER_RESTART as gone
> and forgotten. So it seems the "it" in "it crashes" can be "the
> background worker" or "postgres itself" or "any shmem-connected
> backend".
>

I think that kind of wording might suit for BGW_NEVER_RESTART value,
but for any positive value, the current wording appears fine to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] kqueue

2018-09-28 Thread Thomas Munro

On Fri, Sep 28, 2018 at 11:09 AM Andres Freund  wrote:
> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> > Matteo Beccati reported a 5-10% performance drop on a
> > low-end Celeron NetBSD box which we have no explanation for, and we
> > have no reports from server-class machines on that OS -- so perhaps we
> > (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> > NetBSD until someone can figure out what needs to be fixed there
> > (possibly on the NetBSD side)?
>
> Yea, I'm not too worried about that. It'd be great to test that, but
> otherwise I'm also ok to just plonk that into the template.

Thanks for the review!  Ok, if we don't get a better idea I'll put
this in src/template/netbsd:

CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"

> > @@ -576,6 +592,10 @@ CreateWaitEventSet(MemoryContext context, int nevents)
> >   if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1)
> >   elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m");
> >  #endif   /* 
> > EPOLL_CLOEXEC */
> > +#elif defined(WAIT_USE_KQUEUE)
> > + set->kqueue_fd = kqueue();
> > + if (set->kqueue_fd < 0)
> > + elog(ERROR, "kqueue failed: %m");
> >  #elif defined(WAIT_USE_WIN32)
>
> Is this automatically opened with some FD_CLOEXEC equivalent?

No.  Hmm, I thought it wasn't necessary because kqueue descriptors are
not inherited and backends don't execve() directly without forking,
but I guess it can't hurt to add a fcntl() call.  Done.

> > + *((WaitEvent **)(&k_ev->udata)) = event;
>
> I'm mildly inclined to hide that behind a macro, so the other places
> have a reference, via the macro definition, to this too.

Done.

> > + if (rc < 0 && event->events == WL_POSTMASTER_DEATH && errno == ESRCH)
> > + {
> > + /*
> > +  * The postmaster is already dead.  Defer reporting this to 
> > the caller
> > +  * until wait time, for compatibility with the other 
> > implementations.
> > +  * To do that we will now add the regular alive pipe.
> > +  */
> > + WaitEventAdjustKqueueAdd(&k_ev[0], EVFILT_READ, EV_ADD, 
> > event);
> > + rc = kevent(set->kqueue_fd, &k_ev[0], count, NULL, 0, NULL);
> > + }
>
> That's, ... not particulary pretty. Kinda wonder if we shouldn't instead
> just add a 'pending_events' field, that we can check at wait time.

Done.

> > +/* Define to 1 if you have the `kqueue' function. */
> > +#undef HAVE_KQUEUE
> > +

> Should adjust pg_config.win32.h too.

Done.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-kqueue-2-support-for-WaitEventSet-v11.patch
Description: Binary data

Re: Progress reporting for pg_verify_checksums

2018-09-28 Thread Michael Banck

Hi,

On Wed, Sep 26, 2018 at 10:46:06AM +0200, Fabien COELHO wrote:
> The xml documentation should be updated! (It is kind of hard to notice what
> is not there:-)
> 
> See "doc/src/sgml/ref/pg_verify_checksums.sgml".

Right, I've added a paragraph.
 
> >>The time() granularity to the second makes the result awkward on small
> >>tests:
> >>
> >> 8/1554552 kB (0%, 8 kB/s)
> >> 1040856/1554552 kB (66%, 1040856 kB/s)
> >> 1554552/1554552 kB (100%, 1554552 kB/s)
> >>
> >>I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
> >>much better precision.
> 
> I still think it would make sense to use that instead of low-precision
> time().

As the use-case of this is not for small tests, I don't think it is
useful to make the code more complicated for this.
 
> >>The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
> >>1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
> >>about storage which is usually measured in powers of 1,000, so I'd suggest
> >>to use thousands.
> >>
> >>Another reserve I have is that on any hardware it is likely that there will
> >>be a lot of kilos, so maybe megas (MB) should be used instead.
> >
> >The display is exactly the same as in pg_basebackup (except the
> >bandwith is shown as well), so right now I think it is more useful to be
> >consistent here.
> 
> Hmmm... units are units, and the display is wrong. The fact that other
> commands are wrong as well does not look like a good argument to persist in
> an error.

I've had a look around, and "kB" seems to be a common unit for 1024
bytes, e.g. in pg_test_fsync etc., unless I am mistaken?
 
> >So if we change that, pg_basebackup should be changed as well I think.
> 
> I'm okay with fixing pg_basebackup as well! I'm unsure about the best place
> to put such a function, though. Probably under "src/common/" where there is
> already some front-end specific code ("fe_memutils.c").
> 
> >Maybe the code could be factored out to some common file in the future.
> 
> I would be okay with a progress printing function shared between some
> commands. It just needs some place. pg_rewind also has a --rewind option.

I guess you mean pg_rewind also has a --progress option.

I do agree it makes sense to refactor that, but I don't think this
should be part of this patch.
 
> >> + memset(&act, '\0', sizeof(act));
> >>
> >>pg uses 0 instead of '\0' everywhere else.
> >
> >Ok.
> 
> Not '0', plain 0 (the integer of value zero).

Oops, thanks for spotting that.

I've attached v4 of the patch.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..5550b7b2bf 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -80,6 +80,19 @@ PostgreSQL documentation
  
 
  
+  -P
+  --progress
+  
+   
+Enable progress reporting. Turning this on will deliver an approximate
+progress report during the checksum verification. Sending the
+SIGUSR1 signal will toggle progress reporting
+on or off during the verification run.
+   
+  
+ 
+
+ 
-V
--version

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 1bc020ab6c..ed0dec2325 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -10,6 +10,8 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -29,9 +31,18 @@ static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static bool show_progress = false;
 
 static const char *progname;
 
+/*
+ * Progress status information.
+ */
+int64		total_size = 0;
+int64		current_size = 0;
+pg_time_t	last_progress_update;
+pg_time_t	scan_started;
+
 static void
 usage(void)
 {
@@ -42,6 +53,7 @@ usage(void)
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
+	printf(_("  -P, --progress show progress information\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
 	printf(_("\nIf no data directory (DATADIR) is specified, "
@@ -57,6 +69,71 @@ static const char *const skip[] = {
 	NULL,
 };
 
+static void
+toggle_progre

Re: pgsql: Build src/port files as a library with -fPIC, and use that in li

2018-09-28 Thread Tom Lane

Christoph Berg  writes:
> Re: Tom Lane 2018-09-27 
>> Build src/port files as a library with -fPIC, and use that in libpq.

> This made the "pqsignal" symbol disappear from libpq5.so:

Oh, interesting.  I'd seen an actual error on prairiedog, but apparently
some other linkers just silently omit the export, if the symbol is in
a .a file rather than .o.

> Is this is a problem for libpq5 users?

I proposed in
https://www.postgresql.org/message-id/19581.1538077...@sss.pgh.pa.us

that we should remove pqsignal, as well as
pg_utf_mblen
pg_encoding_to_char
pg_char_to_encoding
pg_valid_server_encoding
pg_valid_server_encoding_id

from libpq's exports, on the grounds that (a) nobody should be using
those (they're undocumented as exports), and (b) anybody who is using
them should get them from libpgport/libpgcommon instead.  Thoughts?

regards, tom lane

Re: pgsql: Build src/port files as a library with -fPIC, and use that in li

2018-09-28 Thread Christoph Berg

Re: Tom Lane 2018-09-28 <19404.1538140...@sss.pgh.pa.us>
> > Is this is a problem for libpq5 users?
> 
> I proposed in
> https://www.postgresql.org/message-id/19581.1538077...@sss.pgh.pa.us
> 
> that we should remove pqsignal, as well as
> pg_utf_mblen
> pg_encoding_to_char
> pg_char_to_encoding
> pg_valid_server_encoding
> pg_valid_server_encoding_id
> 
> from libpq's exports, on the grounds that (a) nobody should be using
> those (they're undocumented as exports), and (b) anybody who is using
> them should get them from libpgport/libpgcommon instead.  Thoughts?

I'm fine with that if we say (a) should be true, and even if it is
not, (b) is an easy workaround. Bumping the libpq SONAME just because
of this seems excessive.

On the Debian side, I can simply remove the symbol from the tracking
file and the buildsystem will be happy again.

Christoph

Re: pgsql: Build src/port files as a library with -fPIC, and use that in li

2018-09-28 Thread Tom Lane

Christoph Berg  writes:
> Re: Tom Lane 2018-09-28 <19404.1538140...@sss.pgh.pa.us>
>> I proposed in
>> https://www.postgresql.org/message-id/19581.1538077...@sss.pgh.pa.us
>> 
>> that we should remove pqsignal, as well as
>> pg_utf_mblen
>> pg_encoding_to_char
>> pg_char_to_encoding
>> pg_valid_server_encoding
>> pg_valid_server_encoding_id
>> 
>> from libpq's exports, on the grounds that (a) nobody should be using
>> those (they're undocumented as exports), and (b) anybody who is using
>> them should get them from libpgport/libpgcommon instead.  Thoughts?

> I'm fine with that if we say (a) should be true, and even if it is
> not, (b) is an easy workaround. Bumping the libpq SONAME just because
> of this seems excessive.

Yeah, if anyone insists that this requires a soname bump, I'd probably
look for another solution.  Making the makefiles a bit cleaner is not
worth the churn that would cause, IMO.

The place where we'd probably end up if anyone's unhappy about this
is that we'd still be symlinking the three files pqsignal.c, wchar.c,
and encnames.c into libpq.  That's not very desirable, but at least
it'd be a fixed list rather than something we're continually having
to change.

regards, tom lane

Re: executor relation handling

2018-09-28 Thread Jesper Pedersen


Hi,

On 9/28/18 4:58 AM, Amit Langote wrote:

Okay, I've revised the text in the attached updated patch.


Meh, I just noticed that the WARNING text claims "InitPlan" is the
function name. I think it's best to get rid of that. It's pretty much
redundant anyway if you do: \set VERBOSITY verbose


Oops, good catch that one.  Removed "InitPlan: " from the message in the
attached.



I have looked at the patch (v9), and have no further comments. I can 
confirm a speedup in the SELECT FOR SHARE case.


Thanks for working on this !

Best regards,
 Jesper

Re: buildfarm and git pull

2018-09-28 Thread Andrew Dunstan





On 09/27/2018 11:32 AM, Alexander Kuzmenkov wrote:

Hi Andrew,

I have a question about how buildfarm works with git, could you please 
help? We use buildfarm locally at PGPro to test our branches, and it 
breaks when I rebase and force push to the repository. To get the 
remote changes, buildfarm does 'git checkout .' followed by 'git 
pull', and the latter breaks when the remote branch was rebased.


I was wondering if the buildfarm really has to do 'git pull'? Pull is 
supposed to be used to integrate local changes with remote ones, but 
buildfarm doesn't have any local changes, does it? It just has to 
checkout the remote branch as-is. To do that, when the state of 
working directory is not know, I'd do the following commands:


git fetch # get the remote changes
git checkout -f / # checkout the needed remote branch; 
on conflict, use the remote files

git reset --hard # revert all modifications in tracked files
git clean -xfd # recursively delete all unversioned and ignored files

Do you think this approach is correct or am I missing something?




possibly. It seems a little violent. We don't do rebase + forced push in 
Postgres - it's something of a nono in public repositories according to 
my understanding.


Send me a patch and I'll take a look at it.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Let's stop with the retail rebuilds of src/port/ files already

2018-09-28 Thread Andrew Dunstan





On 09/27/2018 03:48 PM, Tom Lane wrote:

I wrote:

Here's a partial patch for that: it adds the third build variant
to src/port/ and teaches libpq to use it.  We'd want to likewise
modify src/common/ and fix up other callers such as ecpg, but this
seems to be enough to test whether the idea works or not.
...
What I think would make sense is to push this and see what the
buildfarm thinks of it.  If there are unfixable problems then
we won't have wasted time fleshing out the concept.  Otherwise,
I'll do the remaining pieces.

Well, the buildfarm did turn up one problem: on really old macOS
(cf prairiedog) the libpq link step fails with

ld: symbols names listed in -exported_symbols_list: exports.list not in linked 
objects
_pqsignal

Apparently, with that linker, the exported symbols list is resolved
against just what is found in the listed *.o files, not anything pulled
in from a library file.

Now, the question that raises in my mind is why is libpq.so exporting
pqsignal() at all?  Probably there was once a reason for it, but nowadays
I would think that any client program using pqsignal() should get it
from -lpgport, not from libpq.  Having more than one place to get it from
seems more likely to create issues than solve them.  And we certainly
do not document it as a function available from libpq.

So my recommendation is to remove pqsignal from libpq's exports.txt.
I've verified that prairiedog builds happily with that change,
confirming my expectation that all consumers of the symbol can get it
from someplace else.

Now, if we go forward with that solution, there will be issues with
some other things that libpq exports without having defined itself:

src/backend/utils/mb/wchar.c:
pg_utf_mblen
src/backend/utils/mb/encnames.c:
pg_encoding_to_char
pg_char_to_encoding
pg_valid_server_encoding
pg_valid_server_encoding_id

Now, I'd already had my eye on those two files, because after applying a
similar fix for src/common/, those two files would be the only ones that
libpq needs to symlink from somewhere else.

What I was thinking of proposing was to move those two files out of the
backend and into src/common/, thereby normalizing their status as
modules available in both frontend and backend, and removing the need
for a special build rule for them in libpq.  (initdb could be simplified
too.)  Per this discovery, we'd need to also remove these symbols from
libpq's exports list, meaning that clients *must* get them from -lpgcommon
not from libpq.

There's a small chance that this'd break third-party clients that
are using these symbols out of libpq.  We've never documented them
as being available, but somebody might be using them anyway.
If that does happen, it could be repaired by linking against -lpgcommon
along with libpq, but it'd possibly still make people unhappy.




Seems a small enough price to pay.

cheers

andrew


--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-09-28 Thread Tomas Vondra




On 09/28/2018 06:02 AM, Amit Langote wrote:
> On 2018/09/28 12:12, Michael Paquier wrote:
>> On Fri, Sep 28, 2018 at 02:46:30PM +1200, David Rowley wrote:
>>> I don't agree that we can skip explaining why one of the optimisations
>>> can't be applied just because we've explained why a similar
>>> optimisation cannot be applied somewhere close by.  I think that the
>>> WAL/FSM optimisation can fairly easily be improved on and probably
>>> fixed in PG12 as we can just lazily determine per-partition if it can
>>> be applied to that partition or not.
>>
>> Have you guys looked at what the following patch does for partitions and
>> how it interacts with it?
>> https://commitfest.postgresql.org/19/528/
> 
> Just looked at that patch and noticed that the following hunk won't cope
> if COPY's target table is partitioned:
> 
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> index 7674369613..7b9a7af2d2 100644
> --- a/src/backend/commands/copy.c
> +++ b/src/backend/commands/copy.c
> @@ -2416,10 +2416,8 @@ CopyFrom(CopyState cstate)
>   {
>   hi_options |= HEAP_INSERT_SKIP_FSM;
> 
> - if (!XLogIsNeeded() &&
> - cstate->rel->trigdesc == NULL &&
> - RelationGetNumberOfBlocks(cstate->rel) == 0)
> - hi_options |= HEAP_INSERT_SKIP_WAL;
> + if (!XLogIsNeeded() && RelationGetNumberOfBlocks(cstate->rel) 
> == 0)
> + hi_options |= HEAP_INSERT_SKIP_WAL;
>   }
> 
>   /*
> 
> RelationGetNumberOfBlocks would blow up if passed a partitioned table to it.
> 
> Applying David's patch will take care of that though.
> 
>> The proposed patch is missing the point that documentation also mentions
>> the optimizations for COPY with wal_level = minimal:
>>
>> COPY is fastest when used within the same
>> transaction as an earlier CREATE TABLE or
>> TRUNCATE command. In such cases no WAL
>> needs to be written, because in case of an error, the files
>> containing the newly loaded data will be removed anyway.
>> However, this consideration only applies when
>>  is minimal as all 
>> commands
>> must write WAL otherwise.
>>
> 
> I might be wrong but I'm not sure if we should mention here that this
> optimization is not applied to partitioned tables due to what appears to
> be a implementation-level limitation?
> 

Aren't most limitations implementation-level? ;-)

IMHO it's entirely appropriate to mention this in user-level docs. Based
on the current wording, it's quite natural to assume the optimization
applies to both partitioned and non-partitioned tables. And in fact it
does not. It will still work for individual partitions, though.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-09-28 Thread Peter Eisentraut

On 19/09/2018 20:23, Peter Geoghegan wrote:
> Attached is v5,

So.  I don't know much about the btree code, so don't believe anything I
say.

I was very interested in the bloat test case that you posted on
2018-07-09 and I tried to understand it more.  The current method for
inserting a duplicate value into a btree is going to the leftmost point
for that value and then move right until we find some space or we get
"tired" of searching, in which case just make some space right there.
The problem is that it's tricky to decide when to stop searching, and
there are scenarios when we stop too soon and repeatedly miss all the
good free space to the right, leading to bloat even though the index is
perhaps quite empty.

I tried playing with the getting-tired factor (it could plausibly be a
reloption), but that wasn't very successful.  You can use that to
postpone the bloat, but you won't stop it, and performance becomes terrible.

You propose to address this by appending the tid to the index key, so
each key, even if its "payload" is a duplicate value, is unique and has
a unique place, so we never have to do this "tiresome" search.  This
makes a lot of sense, and the results in the bloat test you posted are
impressive and reproducible.

I tried a silly alternative approach by placing a new duplicate key in a
random location.  This should be equivalent since tids are effectively
random.  I didn't quite get this to fully work yet, but at least it
doesn't blow up, and it gets the same regression test ordering
differences for pg_depend scans that you are trying to paper over. ;-)

As far as the code is concerned, I agree with Andrey Lepikhov that one
more abstraction layer that somehow combines the scankey and the tid or
some combination like that would be useful, instead of passing the tid
as a separate argument everywhere.

I think it might help this patch move along if it were split up a bit,
for example 1) suffix truncation, 2) tid stuff, 3) new split strategies.
 That way, it would also be easier to test out each piece separately.
For example, how much space does suffix truncation save in what
scenario, are there any performance regressions, etc.  In the last few
versions, the patches have still been growing significantly in size and
functionality, and most of the supposed benefits are not readily visible
in tests.

And of course we need to think about how to handle upgrades, but you
have already started a separate discussion about that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-disk compatibility for nbtree-unique-key enhancement

2018-09-28 Thread Peter Eisentraut

On 21/09/2018 01:18, Peter Geoghegan wrote:
> * This means that there is a compatibility issue for anyone that is
> already right on the threshold -- we *really* don't want to see a
> REINDEX fail, but that seems like a possibility that we need to talk
> about now.

When would the REINDEX need to happen?  Will the code still be able to
read and write v3 btrees?  Could there perhaps be an amcheck or
pageinspect feature that tells you ahead of time if there are too large
items in an old index?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: C99 compliance for src/port/snprintf.c

2018-09-28 Thread Andres Freund

Hi,

On 2018-08-25 13:08:18 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-08-16 11:41:30 -0400, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> While I'd personally have no problem kicking gcc 3.4 to the curb, I'm
> >>> still confused what causes this error mode.  Kinda looks like
> >>> out-of-sync headers with gcc or something.
> 
> >> Yeah, this is *absolutely* unsurprising for a non-native gcc installation
> >> on an old platform.
> 
> > Sure, but that still requires the headers to behave differently between
> > C89 and C99 mode, as this worked before. But it turns out there's two
> > different math.h implementation headers, depending on c99 being enabled
> > (math_c99.h being the troublesome).  If I understand correctly the
> > problem is more that the system library headers are *newer* (and assume
> > a sun studio emulating/copying quite a bit of gcc) than the gcc that's
> > being used, and therefore gcc fails.
> 
> I have some more info on this issue, based on having successfully
> updated "gaur" using gcc 3.4.6 (which I picked because it was the last
> of the 3.x release series).  It seems very unlikely that there's much
> difference between 3.4.3 and 3.4.6 as far as external features go.
> What I find in the 3.4.6 documentation is
> 
>  -- Built-in Function: double __builtin_inf (void)
>  Similar to `__builtin_huge_val', except a warning is generated if
>  the target floating-point format does not support infinities.
>  This function is suitable for implementing the ISO C99 macro
>  `INFINITY'.
> 
> Note that the function is called "__builtin_inf", whereas what we see
> protosciurus choking on is "__builtin_infinity".  So I don't think this
> is a version skew issue at all.  I think that the system headers are
> written for the Solaris cc, and its name for the equivalent function is
> __builtin_infinity, whereas what gcc wants is __builtin_inf.  Likewise,
> the failures we see for __builtin_isinf and __builtin_isnan are because
> Solaris cc provides those but gcc does not.
> 
> If we wanted to keep protosciurus going without a compiler update, my
> thought would be to modify gcc's copy of math_c99.h to correct the
> function name underlying INFINITY, and change the definitions of isinf()
> and isnan() back to whatever was being used pre-C99.
> 
> It's possible that newer gcc releases have been tweaked so that they
> make appropriate corrections in this header file automatically, but
> that's not a sure thing.

I've pinged Dave about this, and he said:

On 2018-09-26 17:04:29 -0400, Dave Page wrote:
> Unfortunately, i think that whole machine is basically EOL now. It's
> already skipping OpenSSL for some builds, as being stuck on a very old
> version of the buildfarm client, as some of the modules used in newer
> versions just won't compile or work. I don't have any support contract, or
> access to newer versions of SunStudio, and the guys that used to package
> GCC for Solaris now charge to download their packages.
> 
> I could potentially build my own version of GCC, but I question whether
> it's really worth it, given the other problems.

He's now disabled building master on protosciurus and casteroides.  We
still have damselfly and rover_firefly so I don't feel too bad about
that.  I've pinged their owners to ask whether they could set up a sun
studio (or however that's called in their solaris descendants) version.

Greetings,

Andres Freund

Re: master, static inline and #ifndef FRONTEND

2018-09-28 Thread Andres Freund

Hi,

On 2018-09-10 09:50:15 -0700, Andres Freund wrote:
> On 2018-09-10 12:39:16 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > In a recent commit [1] I added a static inline which castoroides
> > > dislikes: ...
> > > It's obviously trivial to fix this case with by adding an #ifndef
> > > FRONTEND. But as castoroides appears to be the only animal showing the
> > > problem - after subtracting the animals dead due to the C99 requirement
> > > - I wonder if we shouldn't just require "proper" static inline
> > > support. castoroides runs a 13yo OS, and newer compilers that do not
> > > have the problem are readily available.
> > 
> > Given that we currently have *no* working Solaris buildfarm members,
> > I'm not prepared to tolerate "I can't be bothered to fix this",
> > which is what your argument boils down to.
> 
> Uh, this seems unnecessarily dismissive. I wrote the above email minutes
> after noticing the issue ( which in turn was shortly after the issue
> occured), asking for feedback. Hardly "I can't be bothered to fix it"
> territory. But adding a lot of #ifndef FRONTENDs isn't entirely free
> either...

Fwi, I've pinged Dave about upgrading the compiler on that machine, and
he wrote:

On 2018-09-26 17:04:29 -0400, Dave Page wrote:
> Unfortunately, i think that whole machine is basically EOL now. It's
> already skipping OpenSSL for some builds, as being stuck on a very old
> version of the buildfarm client, as some of the modules used in newer
> versions just won't compile or work. I don't have any support contract, or
> access to newer versions of SunStudio, and the guys that used to package
> GCC for Solaris now charge to download their packages.

he has subsequently disabled building master on protosciurus and
casteroides.

Greetings,

Andres Freund

Re: executor relation handling

2018-09-28 Thread Alvaro Herrera

On 2018-Sep-28, Amit Langote wrote:
> On 2018/09/28 17:48, David Rowley wrote:

> > Meh, I just noticed that the WARNING text claims "InitPlan" is the
> > function name. I think it's best to get rid of that. It's pretty much
> > redundant anyway if you do: \set VERBOSITY verbose
> 
> Oops, good catch that one.  Removed "InitPlan: " from the message in the
> attached.

Were there two cases of that?  Because one still remains.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SQL/JSON: documentation

2018-09-28 Thread Peter Eisentraut

On 28/06/2018 01:36, Nikita Glukhov wrote:
> Attached patch with draft of SQL/JSON documentation written by
> Liudmila Mantrova, Oleg Bartunov and me.
> 
> Also it can be found in our sqljson repository on sqljson_doc branch:
> https://github.com/postgrespro/sqljson/tree/sqljson_doc
> 
> We continue to work on it.

Some structural comments:

- I don't think this should be moved to a separate file.  Yes, func.sgml
is pretty big, but if we're going to split it up, we should do it in a
systematic way, not just one section.

- The refentries are not a bad idea, but again, if we just used them for
this one section, the navigation will behave weirdly.  So I'd do it
without them, just using normal subsections.

- Stick to one-space indentation in XML.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-09-28 Thread Peter Geoghegan

On Fri, Sep 28, 2018 at 7:50 AM Peter Eisentraut
 wrote:
> So.  I don't know much about the btree code, so don't believe anything I
> say.

I think that showing up and reviewing this patch makes you somewhat of
an expert, by default. There just isn't enough expertise in this area.

> I was very interested in the bloat test case that you posted on
> 2018-07-09 and I tried to understand it more.

Up until recently, I thought that I would justify the patch primarily
as a project to make B-Trees less bloated when there are many
duplicates, with maybe as many as a dozen or more secondary benefits.
That's what I thought it would say in the release notes, even though
the patch was always a broader strategic thing. Now I think that the
TPC-C multiple insert point bloat issue might be the primary headline
benefit, though.

I hate to add more complexity to get it to work well, but just look at
how much smaller the indexes are following an initial bulk load (bulk
insertions) using my working copy of the patch:

Master

customer_pkey: 75 MB
district_pkey: 40 kB
idx_customer_name: 107 MB
item_pkey: 2216 kB
new_order_pkey: 22 MB
oorder_o_w_id_o_d_id_o_c_id_o_id_key: 60 MB
oorder_pkey: 78 MB
order_line_pkey: 774 MB
stock_pkey: 181 MB
warehouse_pkey: 24 kB

Patch

customer_pkey: 50 MB
district_pkey: 40 kB
idx_customer_name: 105 MB
item_pkey: 2216 kB
new_order_pkey: 12 MB
oorder_o_w_id_o_d_id_o_c_id_o_id_key: 61 MB
oorder_pkey: 42 MB
order_line_pkey: 429 MB
stock_pkey: 111 MB
warehouse_pkey: 24 kB

All of the indexes used by oltpbench to do TPC-C are listed, so you're
seeing the full picture for TPC-C bulk loading here (actually, there
is another index that has an identical definition to
oorder_o_w_id_o_d_id_o_c_id_o_id_key for some reason, which is omitted
as redundant). As you can see, all the largest indexes are
*significantly* smaller, with the exception of
oorder_o_w_id_o_d_id_o_c_id_o_id_key. You won't be able to see this
improvement until I post the next version, though, since this is a
brand new development. Note that VACUUM hasn't been run at all, and
doesn't need to be run, as there are no dead tuples. Note also that
this has *nothing* to do with getting tired -- almost all of these
indexes are unique indexes.

Note that I'm also testing TPC-E and TPC-H in a very similar way,
which have both been improved noticeably, but to a degree that's much
less compelling than what we see with TPC-C. They have "getting tired"
cases that benefit quite a bit, but those are the minority.

Have you ever used HammerDB? I got this data from oltpbench, but I
think that HammerDB might be the way to go with TPC-C testing
Postgres.

> You propose to address this by appending the tid to the index key, so
> each key, even if its "payload" is a duplicate value, is unique and has
> a unique place, so we never have to do this "tiresome" search.This
> makes a lot of sense, and the results in the bloat test you posted are
> impressive and reproducible.

Thanks.

> I tried a silly alternative approach by placing a new duplicate key in a
> random location.  This should be equivalent since tids are effectively
> random.

You're never going to get any other approach to work remotely as well,
because while the TIDs may seem to be random in some sense, they have
various properties that are very useful from a high level, data life
cycle point of view. For insertions of duplicates, heap TID has
temporal locality --  you are only going to dirty one or two leaf
pages, rather than potentially dirtying dozens or hundreds.
Furthermore, heap TID is generally strongly correlated with primary
key values, so VACUUM can be much much more effective at killing
duplicates in low cardinality secondary indexes when there are DELETEs
with a range predicate on the primary key. This is a lot more
realistic than the 2018-07-09 test case, but it still could make as
big of a difference.

>  I didn't quite get this to fully work yet, but at least it
> doesn't blow up, and it gets the same regression test ordering
> differences for pg_depend scans that you are trying to paper over. ;-)

FWIW, I actually just added to the papering over, rather than creating
a new problem. There are plenty of instances of "\set VERBOSITY terse"
in the regression tests already, for the same reason. If you run the
regression tests with ignore_system_indexes=on, there are very similar
failures [1].

> As far as the code is concerned, I agree with Andrey Lepikhov that one
> more abstraction layer that somehow combines the scankey and the tid or
> some combination like that would be useful, instead of passing the tid
> as a separate argument everywhere.

I've already drafted this in my working copy. It is a clear
improvement. You can expect it in the next version.

> I think it might help this patch move along if it were split up a bit,
> for example 1) suffix truncation, 2) tid stuff, 3) new split strategies.
> That way, it would also be easier to test out each piece separately.
> For example,

Re: Adding a note to protocol.sgml regarding CopyData

2018-09-28 Thread Peter Eisentraut

On 25/09/2018 13:55, Amit Kapila wrote:
> On Tue, Sep 25, 2018 at 4:51 AM Bradley DeJong  wrote:
>>
>> On 2018-09-22, Amit Kapila wrote ...
>>  > ... duplicate the same information in different words at three
>> different places ...
>>
>> I count 7 different places. In the protocol docs, there is the old
>> mention in the "Summary of Changes since Protocol 2.0" section
>>
> 
> Below text is present in the section quoted by you:
> The special \. last line is not
> needed anymore, and is not sent during COPY OUT.
> (It is still recognized as a terminator during COPY
> IN, but its use is deprecated and will eventually be
> removed.)
> 
> This email started with the need to mention this in protocol.sgml and
> it is already present although at a different place than the place at
> which it is proposed.  Personally, I don't see the need to add it to
> more places.  Does anybody else have any opinion on this matter?

Yeah, I don't see why we need to document it three times in the same
chapter.

Also, that chapter is specifically about version 3.0 of the protocol, so
documenting version 2.0 is out of scope.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-09-28 Thread Stephen Frost

Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> While reviewing the archiving code, I have bumped into the fact that
> XLogArchiveCleanup() thinks that it is safe to do only a plain unlink()
> for .ready and .done files when removing a past segment.  I don't think
> that it is a smart move, as on a subsequent crash we may still see
> those, but the related segment would have gone away.  This is not really
> a problem for .done files, but it could confuse the archiver to see some
> .ready files about things that have already gone away.

Is there an issue with making the archiver able to understand that
situation instead of being confused by it..?  Seems like that'd probably
be a good thing to do regardless of this, but that would then remove the
need for this kind of change..

Thanks!

Stephen


signature.asc
Description: PGP signature

Pithy patch for more detailed error reporting when effective_io_concurrency is set to nonzero on platforms lacking posix_fadvise()

2018-09-28 Thread James Robinson

Per Tom's suggestion on bug #15396, here's a patch to have platforms such as 
OSX give a more descriptive message when rejecting a nonzero value for 
effective_io_concurrency.

I had to adjust the GUC's wiring in the #ifndef case so that 
check_effective_io_concurrency() would be called when a nonzero value is 
supplied instead of just short-circuiting in parse_and_validate_value() when 
outside of [conf->min, conf->max].

James




no_fadvise_better_warning_bug_15396.patch
Description: Binary data



-
James Robinson
ja...@jlr-photo.com
http://jlr-photo.com/

Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-28 Thread Andres Freund

Hi,

It seems Mark started a new buildfarm animal on s390x. It shows a pretty
odd failure on 9.3 and 9.4, but *not* on newer animals:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lumpsucker&dt=2018-09-26%2020%3A30%3A58

== pgsql.build/src/test/regress/regression.diffs 
===
*** 
/home/linux1/build-farm-8-clang/buildroot/REL9_4_STABLE/pgsql.build/src/test/regress/expected/uuid.out
  Mon Sep 24 17:49:30 2018
--- 
/home/linux1/build-farm-8-clang/buildroot/REL9_4_STABLE/pgsql.build/src/test/regress/results/uuid.out
   Wed Sep 26 16:31:31 2018
***
*** 64,72 
  SELECT guid_field FROM guid1 ORDER BY guid_field DESC;
guid_field
  --
-  3f3e3c3b-3a30-3938-3736-353433a2313e
-  ----
   ----
  (3 rows)

  -- = operator test
--- 64,72 
  SELECT guid_field FROM guid1 ORDER BY guid_field DESC;
guid_field
  --
   ----
+  ----
+  3f3e3c3b-3a30-3938-3736-353433a2313e
  (3 rows)

  -- = operator test

==

Mark, is there anything odd for specific branches?

I don't see anything immediately suspicious in the relevant comparator
code...

Greetings,

Andres Freund

Re: Let's stop with the retail rebuilds of src/port/ files already

2018-09-28 Thread Tom Lane

I wrote:
> Now, if we go forward with that solution, there will be issues with
> some other things that libpq exports without having defined itself:
> src/backend/utils/mb/wchar.c:
> pg_utf_mblen
> src/backend/utils/mb/encnames.c:
> pg_encoding_to_char
> pg_char_to_encoding
> pg_valid_server_encoding
> pg_valid_server_encoding_id
> What I was thinking of proposing was to move those two files out of the
> backend and into src/common/, thereby normalizing their status as
> modules available in both frontend and backend, and removing the need
> for a special build rule for them in libpq.  (initdb could be simplified
> too.)  Per this discovery, we'd need to also remove these symbols from
> libpq's exports list, meaning that clients *must* get them from -lpgcommon
> not from libpq.

After further study I've concluded that moving those two files would
be more neatnik-ism than is justified.  While it'd get rid of the
symlink-a-source-file technique in libpq, there'd still be other
occurrences of that in our tree, so the actual cleanup benefit seems
pretty limited.  And while I'm prepared to believe that nobody outside PG
uses pqsignal() or should do so, it's a little harder to make that case
for the encnames.c functions; so the risk of causing problems seems
noticeably greater.

Accordingly, I cleaned up the usage of the existing src/common/ files
but didn't move anything around.  I plan to stop here unless the
buildfarm shows more issues.

regards, tom lane

Re: doc - improve description of default privileges

2018-09-28 Thread Peter Eisentraut

Some thoughts:

We should keep the GRANT reference page about GRANT.  There is a section
about Privileges in the Data Definition chapter, which we could use to
expand on general concepts.

The ALTER DEFAULT PRIVILEGES reference page would be another place this
could be put.

The Owner column is redundant, because it's always all applicable
privileges.  (Having this column would give the impression that it's not
always all privileges, so it would be confusing.)

Privileges should be listed using their full name (e.g., "SELECT"), not
their internal abbreviation letter.

The psql commands seem out of place here.  If you want to learn about
how to use psql, you can go to the psql documentation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE on system catalogs

2018-09-28 Thread Alvaro Herrera

On 2018-Aug-21, Andres Freund wrote:

> On 2018-08-21 17:04:41 +0200, Peter Eisentraut wrote:
>
> > That doesn't solve the original problem, which is being able to set
> > reloptions on pg_attribute, because pg_attribute doesn't have a toast
> > table but would need one according to existing rules.
> 
> I still think it's wrong to work around this than to actually fix the
> issue with pg_attribute not having a toast table.

FWIW I'm still bothered by the inability to move pg_largeobject to a
different tablespace, per
https://postgr.es/m/20160502163033.GA15384@alvherre.pgsql
While that needs even more work (preservability across pg_dump for one),
this item here would be one thing to fix.

Also, I don't quite understand what's so horrible about setting
autovacuum options for system catalogs, including those that don't have
toast tables.  That seems a pretty general feature that needs fixing,
too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: automatic restore point

2018-09-28 Thread Peter Eisentraut

On 06/09/2018 02:16, Yotsunaga, Naoki wrote:
> -Original Message-
> From: Yotsunaga, Naoki [mailto:yotsunaga.na...@jp.fujitsu.com] 
> Sent: Tuesday, June 26, 2018 10:18 AM
> To: Postgres hackers 
> Subject: automatic restore point
> 
>> Hi, I attached a patch to output the LSN before execution to the server log 
>> >when executing a specific command and accidentally erasing data.
> 
> Since there was an error in the attached patch, I attached the modified patch.

I think this should be done using event triggers.  Right now, you just
have it hardcoded to TRUNCATE and DROP TABLE, which seems somewhat
arbitrary.  With event triggers, you have the full flexibility to do
what you want.  You can pick which commands to apply it to, you can log
the LSN, you can create restore points, etc.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE on system catalogs

2018-09-28 Thread Andres Freund

On 2018-09-28 16:06:30 -0300, Alvaro Herrera wrote:
> On 2018-Aug-21, Andres Freund wrote:
> 
> > On 2018-08-21 17:04:41 +0200, Peter Eisentraut wrote:
> >
> > > That doesn't solve the original problem, which is being able to set
> > > reloptions on pg_attribute, because pg_attribute doesn't have a toast
> > > table but would need one according to existing rules.
> > 
> > I still think it's wrong to work around this than to actually fix the
> > issue with pg_attribute not having a toast table.
> 
> FWIW I'm still bothered by the inability to move pg_largeobject to a
> different tablespace, per
> https://postgr.es/m/20160502163033.GA15384@alvherre.pgsql
> While that needs even more work (preservability across pg_dump for one),
> this item here would be one thing to fix.
> 
> Also, I don't quite understand what's so horrible about setting
> autovacuum options for system catalogs, including those that don't have
> toast tables.  That seems a pretty general feature that needs fixing,
> too.

I'm not sure what that has to do with my point?  What I'm saying is that
we shouldn't have some weird "should have a toast table but doesn't"
exception, not that we shouldn't allow any sort of DDL on catalogs.


Greetings,

Andres Freund

Re: doc - improve description of default privileges

2018-09-28 Thread Alvaro Herrera

On 2018-Sep-28, Peter Eisentraut wrote:

> The psql commands seem out of place here.  If you want to learn about
> how to use psql, you can go to the psql documentation.

There is a legitimate point in doing this, though, since the GRANT page
is already explaining how does psql display privileges.  Maybe the right
solution is move that stuff all to the psql documentation, and alter the
GRANT page to list privileges in terms of their names rather than the
way psql displays them.  (And of course add cross-links, so that it all
makes sense.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE on system catalogs

2018-09-28 Thread Alvaro Herrera

On 2018-Sep-28, Andres Freund wrote:

> On 2018-09-28 16:06:30 -0300, Alvaro Herrera wrote:
> > On 2018-Aug-21, Andres Freund wrote:
> > 
> > > I still think it's wrong to work around this than to actually fix the
> > > issue with pg_attribute not having a toast table.
> > 
> > FWIW I'm still bothered by the inability to move pg_largeobject to a
> > different tablespace, per
> > https://postgr.es/m/20160502163033.GA15384@alvherre.pgsql
> > While that needs even more work (preservability across pg_dump for one),
> > this item here would be one thing to fix.
> > 
> > Also, I don't quite understand what's so horrible about setting
> > autovacuum options for system catalogs, including those that don't have
> > toast tables.  That seems a pretty general feature that needs fixing,
> > too.
> 
> I'm not sure what that has to do with my point?  What I'm saying is that
> we shouldn't have some weird "should have a toast table but doesn't"
> exception, not that we shouldn't allow any sort of DDL on catalogs.

Well, the subtext in your argument seemed to be "let's just add a
toast table to pg_attribute and then we don't need any of this".  I'm
just countering that if we don't have toast tables for some catalogs,
it's because that's something we've already beaten to death; so rather
than continue to beat it, we should discuss alternative ways to attack
the resulting side-effects.

I mean, surely adding a toast table to pg_largeobject would be
completely silly.  Yet if we leave this code unchanged, trying to move
it to a different tablespace would "blow up" in a way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-28 Thread Tom Lane

Andres Freund  writes:
> It seems Mark started a new buildfarm animal on s390x. It shows a pretty
> odd failure on 9.3 and 9.4, but *not* on newer animals:

No, lumpsucker is showing the same failure on 9.5 as well.  I suspect
that the reason 9.6 and up are OK is that 9.6 is where we introduced
the abbreviated-sort-key machinery.  IOW, the problem exists in the
old-style UUID sort comparator but not the new one.  Which is pretty
darn odd, because the old-style comparator is just memcmp().  How
could that be broken without causing lots more issues?

regards, tom lane

Re: On-disk compatibility for nbtree-unique-key enhancement

2018-09-28 Thread Peter Geoghegan

On Fri, Sep 28, 2018 at 8:00 AM Peter Eisentraut
 wrote:
> On 21/09/2018 01:18, Peter Geoghegan wrote:
> > * This means that there is a compatibility issue for anyone that is
> > already right on the threshold -- we *really* don't want to see a
> > REINDEX fail, but that seems like a possibility that we need to talk
> > about now.
>
> When would the REINDEX need to happen?  Will the code still be able to
> read and write v3 btrees?

The patch doesn't do that at the moment, because I've been busy
refining it, and because there are a couple of outstanding questions
about how to go about it -- the questions I pose on this thread. I
accept that it's absolutely essential that nbtree be able to read both
v2 and v3 indexes as part of any new v4. Without a measurable
performance penalty. That's the standard that this project should be
held to.

A REINDEX will never *need* to happen. v2 and v3 indexes will
gradually go extinct, without many users actually noticing.

The on-disk representation of my patch leaves several free status bits
in INDEX_ALT_TID_MASK tuples free (3 total will remain, since I'm now
using 1 of the 4 for BT_HEAP_TID_ATTR), so it should be easier to add
various further enhancements to a v5 or v6 of nbtree. This is similar
to how changes to GIN were managed in the past (it may be interesting
to look at a GIN leaf page with pg_hexedit, since it'll show you the
gory details in a relatively accessible way). I can imagine a
INDEX_ALT_TID_MASK bit being used for tuples that point to the heap --
not just for pivot tuples. I have an eye on things like duplicate
lists on the leaf level, which would probably work like a GIN posting
list.

> Could there perhaps be an amcheck or
> pageinspect feature that tells you ahead of time if there are too large
> items in an old index?

That would be easy, but it might not be any better than just having
REINDEX or CREATE INDEX [CONCURRENTLY] throw an error. They're already
pretty fast. I could easily raise a WARNING when amcheck is run
against an index of a version before v4, that has an index tuple
that's too big to make it under the lower limit. Actually, I could
even write an SQL query that had pageinspect notice affected tuples,
without changing any C code.

Bear in mind that TOAST compression accidentally plays a big role
here. It makes it very unlikely that indexes in the field are right at
the old 2712 byte threshold, without even 8 bytes of wiggle room,
because it's impossible to predict how well the pglz compression will
work with that kind of precision. Several highly improbable things
need to happen at the same time before REINDEX can break. I cannot see
how any app could have evolved to depend on having 2712 bytes, without
even a single MAXALIGN() quantum to spare.

I wrote a stress test around the new "1/3 of a page" restriction. It
involved a large text attribute with PLAIN storage, since I couldn't
sensibly test the restriction while using pglz compression in the
index. When all of your tuples are 2704 bytes, you end up with a
ridiculously tall B-Tree, that performs horribly. I think that I saw
that it had 11 levels with the test case. The tallest B-Tree that
you'll ever see in the wild is probably one that's 5 levels deep,
which is very tall indeed. Because of the logarithmic nature of how a
new level is added to a B-Tree, 11 levels is just ludicrous. (Granted,
you only have to have one tuple that's precisely 2712 bytes in length
for REINDEX to break.)

-- 
Peter Geoghegan

Re: Pithy patch for more detailed error reporting when effective_io_concurrency is set to nonzero on platforms lacking posix_fadvise()

2018-09-28 Thread Tom Lane

James Robinson  writes:
> Per Tom's suggestion on bug #15396, here's a patch to have platforms such as 
> OSX give a more descriptive message when rejecting a nonzero value for 
> effective_io_concurrency.

Pushed with minor editorialization.

regards, tom lane

Re: Continue work on changes to recovery.conf API

2018-09-28 Thread Peter Eisentraut

On 29/08/2018 17:43, Sergei Kornilov wrote:
> Current patch moves recovery.conf settings into GUC system:
> - if startup process found recovery.conf - it throw error

OK

> - recovery mode is turned on if new special file recovery.signal found

OK

> - standby_mode setting was removed. Standby mode can be enabled if startup 
> found new special file standby.signal

OK

> - if present both standby.signal and recovery.signal - we use standby mode
> (this design from previous thread)

Didn't find the discussion on that and don't understand the reasons, but
seems OK and easy to change if we don't like it.

> - recovery parameters recovery_target_inclusive, recovery_target_timeline, 
> recovery_target_action are new GUC without logic changes

OK

> - recovery_target (immediate), recovery_target_name, recovery_target_time, 
> recovery_target_xid, recovery_target_lsn are replaced to recovery_target_type 
> and recovery_target_value (was discussed and changed in previous thread)

I think this was the major point of contention.  I reread the old
thread, and it's still not clear why we need to change this.  _type and
_value look like an EAV system to me.  GUC variables should be
verifiable independent of another variable.  The other idea of using
only one variable seems better, but using two variables seems like a
poor compromise between one and five.

I propose to move this patch forward, keep the settings as they are.  It
can be another patch to rename or reshuffle them.

> - restore_command, archive_cleanup_command, recovery_end_command, 
> primary_conninfo, primary_slot_name and recovery_min_apply_delay are just new 
> GUC

OK

> - trigger_file was renamed to promote_signal_file for clarify (my change, in 
> prev thread was removed)

OK to add the "promote" prefix, but I'd prefer to keep the "trigger"
name.  There is a lot of discussion and knowledge around that.  Seems
unnecessary to change.

> - all new GUC are PGC_POSTMASTER (discussed in prev thread)

OK

> - pg_basebackup with -R (--write-recovery-conf) option will create 
> standby.signal file and append primary_conninfo and primary_slot_name (if was 
> used) to postgresql.auto.conf instead of writing new recovery.conf

I wonder if that would cause any problems.  The settings in
postgresql.auto.conf are normally not PGC_POSTMASTER, otherwise you
couldn't use ALTER SYSTEM to put them there.  Maybe it's OK.

> - appropriate changes in tests and docs

looks good

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SQL/JSON: documentation

2018-09-28 Thread Andrew Dunstan





On 09/28/2018 01:29 PM, Peter Eisentraut wrote:

On 28/06/2018 01:36, Nikita Glukhov wrote:

Attached patch with draft of SQL/JSON documentation written by
Liudmila Mantrova, Oleg Bartunov and me.

Also it can be found in our sqljson repository on sqljson_doc branch:
https://github.com/postgrespro/sqljson/tree/sqljson_doc

We continue to work on it.

Some structural comments:

- I don't think this should be moved to a separate file.  Yes, func.sgml
is pretty big, but if we're going to split it up, we should do it in a
systematic way, not just one section.



I'm in favor of doing that. It's rather a monster.

I agree it should not be done piecemeal.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Continue work on changes to recovery.conf API

2018-09-28 Thread Tom Lane

Peter Eisentraut  writes:
> I think this was the major point of contention.  I reread the old
> thread, and it's still not clear why we need to change this.  _type and
> _value look like an EAV system to me.  GUC variables should be
> verifiable independent of another variable.

No, they MUST be independently verifiable.  The interactions between
the check_xxx functions in this patch are utterly unsafe.  We've
learned that lesson before.

> I propose to move this patch forward, keep the settings as they are.  It
> can be another patch to rename or reshuffle them.

Please do not commit this in this state.

> I wonder if that would cause any problems.  The settings in
> postgresql.auto.conf are normally not PGC_POSTMASTER, otherwise you
> couldn't use ALTER SYSTEM to put them there.  Maybe it's OK.

Actually, that works fine.  You do have to restart the postmaster
to make them take effect, but it's the same as if you edited the
main config file by hand.

regards, tom lane

Re: Continue work on changes to recovery.conf API

2018-09-28 Thread Andres Freund

Hi,

On 2018-09-28 16:36:35 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > I think this was the major point of contention.  I reread the old
> > thread, and it's still not clear why we need to change this.  _type and
> > _value look like an EAV system to me.  GUC variables should be
> > verifiable independent of another variable.
> 
> No, they MUST be independently verifiable.  The interactions between
> the check_xxx functions in this patch are utterly unsafe.  We've
> learned that lesson before.

I'm not sure those concerns apply quite the same way here - we can move
the interdependent verification to the the point where they're used
first rather than relying on guc.c infrastructure. We already have
plenty of checks interdependent that way, without it causing many
problems.  UI wise that's not too bad, if they're things that cannot be
changed arbitrarily at runtime.

Greetings,

Andres Freund

Re: SQL/JSON: documentation

2018-09-28 Thread Tom Lane

Andrew Dunstan  writes:
> On 09/28/2018 01:29 PM, Peter Eisentraut wrote:
>> - I don't think this should be moved to a separate file.  Yes, func.sgml
>> is pretty big, but if we're going to split it up, we should do it in a
>> systematic way, not just one section.

> I'm in favor of doing that. It's rather a monster.
> I agree it should not be done piecemeal.

Maybe split it into one file per existing section?

Although TBH, I am not convinced that the benefits of doing that
will exceed the back-patching pain we'll incur.

regards, tom lane

Re: Continue work on changes to recovery.conf API

2018-09-28 Thread Tom Lane

Andres Freund  writes:
> On 2018-09-28 16:36:35 -0400, Tom Lane wrote:
>> No, they MUST be independently verifiable.  The interactions between
>> the check_xxx functions in this patch are utterly unsafe.  We've
>> learned that lesson before.

> I'm not sure those concerns apply quite the same way here - we can move
> the interdependent verification to the the point where they're used
> first rather than relying on guc.c infrastructure.

And, if they're bad, what happens?  Recovery fails?

I don't think it's a great idea to lose out on whatever error checking
the existing GUC infrastructure can provide, just so as to use a GUC
design that's not very nice in the first place.

regards, tom lane

Re: ALTER TABLE on system catalogs

2018-09-28 Thread Peter Eisentraut

On 21/08/2018 17:24, Andres Freund wrote:
>> Attached is a patch that instead moves those special cases into
>> needs_toast_table(), which was one of the options suggested by Andres.
>> There were already similar checks there, so I guess this makes a bit of
>> sense.
> The big difference is that that then only takes effect when called with
> check=true. The callers without it, importantly NewHeapCreateToastTable
> & NewRelationCreateToastTable, then won't run into this check. But still
> into the error (see below).

I don't follow.  The call to needs_toast_table() is not conditional on
the check argument.  The check argument only checks that the correct
lock level is passed in.

>> @@ -145,21 +146,6 @@ create_toast_table(Relation rel, Oid toastOid, Oid 
>> toastIndexOid,
>>  ObjectAddress baseobject,
>>  toastobject;
>>  
>> -/*
>> - * Toast table is shared if and only if its parent is.
>> - *
>> - * We cannot allow toasting a shared relation after initdb (because
>> - * there's no way to mark it toasted in other databases' pg_class).
>> - */
>> -shared_relation = rel->rd_rel->relisshared;
>> -if (shared_relation && !IsBootstrapProcessingMode())
>> -ereport(ERROR,
>> -
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> - errmsg("shared tables cannot be toasted after 
>> initdb")));
> This error check imo shouldn't be removed, but moved down.

We could keep it, but it would probably be dead code since
needs_toast_table() would return false for all shared tables anyway.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-28 Thread Mark Wong

On Fri, Sep 28, 2018 at 11:52:15AM -0700, Andres Freund wrote:
> Mark, is there anything odd for specific branches?

No... I don't have anything in the config that would be applied to
specific branches...

Regards,
Mark

-- 
Mark Wonghttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services

Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-28 Thread Andres Freund

Hi,

On 2018-09-28 15:22:23 -0700, Mark Wong wrote:
> On Fri, Sep 28, 2018 at 11:52:15AM -0700, Andres Freund wrote:
> > Mark, is there anything odd for specific branches?
> 
> No... I don't have anything in the config that would be applied to
> specific branches...

Could you perhaps do some manual debugging on that machine?

Maybe starting with manually running something like:

SELECT uuid_cmp('----'::uuid, 
'----'::uuid);
SELECT uuid_cmp('----'::uuid, 
'----'::uuid);
SELECT uuid_cmp('----'::uuid, 
'1113----'::uuid);
SELECT uuid_cmp('----'::uuid, 
'1110----'::uuid);

on both master and one of the failing branches?

Greetings,

Andres Freund

Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-09-28 Thread Michael Paquier

On Fri, Sep 28, 2018 at 02:36:19PM -0400, Stephen Frost wrote:
> Is there an issue with making the archiver able to understand that
> situation instead of being confused by it..?  Seems like that'd probably
> be a good thing to do regardless of this, but that would then remove the
> need for this kind of change..

I thought about that a bit, and there is as well a lot which can be done
within the archive_command itself regarding that, so I am not sure that
there is the argument to make pgarch.c more complicated than it should.
Now it is true that for most users having a .ready file but no segment
would most likely lead in a failure.  I suspect that a large user base
is still just using plain cp in archive_command, which would cause the
archiver to be stuck.  So we could actually just tweak pgarch_readyXlog
to check if the segment fetched actually exists (see bottom of the
so-said function).  If it doesn't, then the archiver removes the .ready
file and retries fetching a new segment.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Include application_name in "connection authorized" log message

2018-09-28 Thread Stephen Frost

Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Stephen Frost (sfr...@snowman.net) wrote:
> > I still don't see that as a reason for tools to be suseptible to serious
> > issues if a funky user gets created and I'd be surprised if there
> > weren't other ways to get funky characters into the log file, but that's
> > all ultimately an independent issue from this.  I'll add the comments as
> > discussed and discourage using the clean ascii function, but otherwise
> > keep things as-is in that regard.
> 
> Updated patch with lots of comments added around pg_clean_ascii() about
> why it exists, why we do the cleaning, and warnings to discourage people
> from using it without good cause.  I've also done some additional
> testing with it and it seems to be working well.  I'll poke at it a bit
> more tomorrow but if there aren't any other concerns, I'll commit it
> towards the end of the day.

Pushed, thanks all for the help and discussion, and thanks to Don for
his patch!

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-09-28 Thread Stephen Frost

Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Fri, Sep 28, 2018 at 02:36:19PM -0400, Stephen Frost wrote:
> > Is there an issue with making the archiver able to understand that
> > situation instead of being confused by it..?  Seems like that'd probably
> > be a good thing to do regardless of this, but that would then remove the
> > need for this kind of change..
> 
> I thought about that a bit, and there is as well a lot which can be done
> within the archive_command itself regarding that, so I am not sure that
> there is the argument to make pgarch.c more complicated than it should.
> Now it is true that for most users having a .ready file but no segment
> would most likely lead in a failure.  I suspect that a large user base
> is still just using plain cp in archive_command, which would cause the
> archiver to be stuck.  So we could actually just tweak pgarch_readyXlog
> to check if the segment fetched actually exists (see bottom of the
> so-said function).  If it doesn't, then the archiver removes the .ready
> file and retries fetching a new segment.

Yes, checking if the WAL file exists before calling archive_command on
it is what I was thinking we'd do here, and if it doesn't, then just
remove the .ready file.

An alternative would be to go through the .ready files on crash-recovery
and remove any .ready files that don't have corresponding WAL files, or
if we felt that it was necessary, we could do that on every restart but
do we really think we'd need to do that..?

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [PATCH] Include application_name in "connection authorized" log message

2018-09-28 Thread Don Seiler

Thanks for all the guidance!

Don.

On Fri, Sep 28, 2018, 18:12 Stephen Frost  wrote:

> Greetings,
>
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Stephen Frost (sfr...@snowman.net) wrote:
> > > I still don't see that as a reason for tools to be suseptible to
> serious
> > > issues if a funky user gets created and I'd be surprised if there
> > > weren't other ways to get funky characters into the log file, but
> that's
> > > all ultimately an independent issue from this.  I'll add the comments
> as
> > > discussed and discourage using the clean ascii function, but otherwise
> > > keep things as-is in that regard.
> >
> > Updated patch with lots of comments added around pg_clean_ascii() about
> > why it exists, why we do the cleaning, and warnings to discourage people
> > from using it without good cause.  I've also done some additional
> > testing with it and it seems to be working well.  I'll poke at it a bit
> > more tomorrow but if there aren't any other concerns, I'll commit it
> > towards the end of the day.
>
> Pushed, thanks all for the help and discussion, and thanks to Don for
> his patch!
>
> Thanks!
>
> Stephen
>

Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-28 Thread Mark Wong

Hi Andres,

On Fri, Sep 28, 2018 at 03:41:27PM -0700, Andres Freund wrote:
> On 2018-09-28 15:22:23 -0700, Mark Wong wrote:
> > On Fri, Sep 28, 2018 at 11:52:15AM -0700, Andres Freund wrote:
> > > Mark, is there anything odd for specific branches?
> > 
> > No... I don't have anything in the config that would be applied to
> > specific branches...
> 
> Could you perhaps do some manual debugging on that machine?
> 
> Maybe starting with manually running something like:
> 
> SELECT uuid_cmp('----'::uuid, 
> '----'::uuid);
> SELECT uuid_cmp('----'::uuid, 
> '----'::uuid);
> SELECT uuid_cmp('----'::uuid, 
> '1113----'::uuid);
> SELECT uuid_cmp('----'::uuid, 
> '1110----'::uuid);
> 
> on both master and one of the failing branches?

I've attached the output for head and the 9.4 stable branch.  It appears
they are returning the same results.

I built them both by:

CC=/usr/bin/clang ./configure --enable-cassert --enable-debug \
--enable-nls --with-perl --with-python --with-tcl \
--with-tclconfig=/usr/lib64 --with-gssapi --with-openssl \
--with-ldap --with-libxml --with-libxslt

What should I try next?

Regards,
Mark

--
Mark Wonghttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
psql (12devel)
Type "help" for help.

postgres=# SELECT uuid_cmp('----'::uuid, 
'----'::uuid);
  uuid_cmp
-
 -2147483648
(1 row)

postgres=# SELECT uuid_cmp('----'::uuid, 
'----'::uuid);
 uuid_cmp
--
0
(1 row)

postgres=# SELECT uuid_cmp('----'::uuid, 
'1113----'::uuid);
  uuid_cmp
-
 -2147483648
(1 row)

postgres=# SELECT uuid_cmp('----'::uuid, 
'1110----'::uuid);
 uuid_cmp
--
1
(1 row)

psql (9.4.19)
Type "help" for help.

postgres=# SELECT uuid_cmp('----'::uuid, 
'----'::uuid);
  uuid_cmp
-
 -2147483648
(1 row)

postgres=# SELECT uuid_cmp('----'::uuid, 
'----'::uuid);
 uuid_cmp
--
0
(1 row)

postgres=# SELECT uuid_cmp('----'::uuid, 
'1113----'::uuid);
  uuid_cmp
-
 -2147483648
(1 row)

postgres=# SELECT uuid_cmp('----'::uuid, 
'1110----'::uuid);
 uuid_cmp
--
1
(1 row)

Oid returned from AddSubscriptionRelState/UpdateSubscriptionRelState

2018-09-28 Thread Andres Freund

Hi,

How come those two functions return oids, even though, as far as I
understand, the underlying relation is BKI_WITHOUT_OIDS.  I assume
that's just an oversight?

Greetings,

Andres Freund

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2018-09-28 Thread Tom Lane

Andres Freund  writes:
> Here's a refreshed version of this patch.  First patch removes
> contrib/spi/timetravel, second patch removes abstime, reltime, tinterval
> together with timeofday().

I'd kind of like to keep timeofday(); it's the only simple way to
get a time display that includes "native" timezone info.  For
instance, I get

regression=# select timeofday();
  timeofday  
-
 Sat Sep 29 00:37:35.490977 2018 EDT
(1 row)

I think every other option would show me "-04" not "EDT".

+1 for removing the rest of that, though.  Unless somebody is
motivated to recast contrib/spi/timetravel with timestamptz
as the datetime type?

regards, tom lane

RE: auto_explain: Include JIT output if applicable

2018-09-28 Thread Shinoda, Noriyoshi (PN Japan GCS Delivery)

Hi,

I tried this feature.
I think that 'if (es->costs)' of the source code auto_explain.c will always be 
‘true’.

Because it is not changed after 'es-> cost = true' in NewExplainState () 
function several rows ago.
So I attached a patch to delete this if statement.

Regards,
Noriyoshi Shinoda

From: Lukas Fittl [mailto:lu...@fittl.com]
Sent: Tuesday, September 25, 2018 6:38 AM
To: Andres Freund 
Cc: Pg Hackers 
Subject: Re: auto_explain: Include JIT output if applicable

On Mon, Sep 24, 2018 at 1:48 PM, Andres Freund 
mailto:and...@anarazel.de>> wrote:
Thanks for noticing - pushed!

Thanks!

Best,
Lukas

--
Lukas Fittl


auto_explain.diff
Description: auto_explain.diff

Re: auto_explain: Include JIT output if applicable

2018-09-28 Thread Andres Freund

Hi,

On 2018-09-29 05:04:25 +, Shinoda, Noriyoshi (PN Japan GCS Delivery) wrote:
> I tried this feature.
> I think that 'if (es->costs)' of the source code auto_explain.c will always 
> be ‘true’.
> 
> Because it is not changed after 'es-> cost = true' in NewExplainState () 
> function several rows ago.
> So I attached a patch to delete this if statement.

I think it's better to stay closer to what explain.c itself is doing -
it's not like that if statement costs us anything really...

- Andres

RE: auto_explain: Include JIT output if applicable

2018-09-28 Thread Shinoda, Noriyoshi (PN Japan GCS Delivery)

Hi, 

> I think it's better to stay closer to what explain.c itself is doing - it's 
> not like that if statement costs us anything really...

Oh, I understood.
Thank you.

-Original Message-
From: Andres Freund [mailto:and...@anarazel.de] 
Sent: Saturday, September 29, 2018 2:11 PM
To: Shinoda, Noriyoshi (PN Japan GCS Delivery) 
Cc: Lukas Fittl ; Pg Hackers 
Subject: Re: auto_explain: Include JIT output if applicable

Hi,

On 2018-09-29 05:04:25 +, Shinoda, Noriyoshi (PN Japan GCS Delivery) wrote:
> I tried this feature.
> I think that 'if (es->costs)' of the source code auto_explain.c will always 
> be ‘true’.
> 
> Because it is not changed after 'es-> cost = true' in NewExplainState () 
> function several rows ago.
> So I attached a patch to delete this if statement.

I think it's better to stay closer to what explain.c itself is doing - it's not 
like that if statement costs us anything really...

- Andres

Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-28 Thread Andrew Gierth

> "Mark" == Mark Wong  writes:

 Mark> What should I try next?

What is the size of a C "int" on this platform?

-- 
Andrew (irc:RhodiumToad)

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-09-28 Thread Andrey Lepikhov


28.09.2018 23:08, Peter Geoghegan wrote:

On Fri, Sep 28, 2018 at 7:50 AM Peter Eisentraut
 wrote:

I think it might help this patch move along if it were split up a bit,
for example 1) suffix truncation, 2) tid stuff, 3) new split strategies.
That way, it would also be easier to test out each piece separately.
For example, how much space does suffix truncation save in what
scenario, are there any performance regressions, etc.


I'll do my best. I don't think I can sensibly split out suffix
truncation from the TID stuff -- those seem truly inseparable, since
my mental model for suffix truncation breaks without fully unique
keys. I can break out all the cleverness around choosing a split point
into its own patch, though -- _bt_findsplitloc() has only been changed
to give weight to several factors that become important. It's the
"brain" of the optimization, where 90% of the complexity actually
lives.

Removing the _bt_findsplitloc() changes will make the performance of
the other stuff pretty poor, and in particular will totally remove the
benefit for cases that "become tired" on the master branch. That could
be slightly interesting, I suppose.


I am reviewing this patch too. And join to Peter Eisentraut opinion 
about splitting the patch into a hierarchy of two or three patches: 
"functional" - tid stuff and "optimizational" - suffix truncation & 
splitting. My reasons are simplification of code review, investigation 
and benchmarking.
Now benchmarking is not clear. Possible performance degradation from TID 
ordering interfere with positive effects from the optimizations in 
non-trivial way.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

69 matches

Mail list logo