date:20190221

On Wed, Feb 20, 2019 at 11:51:08AM +0100, Peter Eisentraut wrote:
> So here is a patch doing it the "normal" way of nulling out all the rows
> the user shouldn't see.

That looks fine to me.

> I haven't found any documentation of these access restrictions in the
> context of pg_stat_activity.  Is there something that I'm not seeing or
> something that should be added?

Yes, there is nothing.  I agree that it would be good to mention that
some fields are set to NULL and only visible to superusers or members of
pg_read_all_stats with a note for pg_stat_activity and
pg_stat_get_activity().  Here is an idea:
"Backend and SSL information are restricted to superusers and members
of the pg_read_all_stats role. Access may be
granted to others using GRANT.

Getting that back-patched would be nice where pg_read_all_stats was
introduced.
--
Michael

signature.asc
Description: PGP signature

RE: libpq debug log

2019-02-21 Thread Iwata, Aya

Hi Kirk,

> Currently, the patch fails to build according to CF app.
> As you know, it has something to do with the misspelling of function.
> GetTimezoneInformation --> GetTimeZoneInformation
Thank you. I fixed it. Please see my v7 patch.

Regards,
Aya Iwata

RE: Timeout parameters

On Thursday, February 21, 2019 2:56 PM (GMT+9), Tsunakawa, Takayuki wrote:

>> 1) tcp_user_timeout parameter
>> I think this can be "committed" separately when it's finalized.

> Do you mean you've reviewed and tested the patch by simulating a 
> communication failure in the way Nagaura-san suggested?

Although I did review and followed the suggested way in previous email
way back (which uses root user) and it worked as intended, I'd also like
to hear feedback also from Fabien whether it's alright without the test
script, or if there's another way we can test it (maybe none?).
Then I think it's in good shape.
However, what I meant by my statement above was to have a separate
decision in committing the two parameters, because based from the thread
tcp_user_timeout has got more progress when it comes to reviews.
So once it's ready, it can be committed separately without waiting for
the socket_timeout param to be ready. (I dont think that requires a
separate thread)

>> 2) tcp_socket_timeout parameter
>> - whether (a) it should abort the connection from pqWait() or
>>   other means, or
>> (b) cancel the statement similar to how psql
>>   does it as suggested by Fabien

> We have no choice but to terminate the connection, because we can't tell
> whether we can recover from the problem and continue to use the connection
> (e.g. long-running query) or not (permanent server or network failure).
>
> Regarding the place, pqWait() is the best (and possibly only) place. 
> The purpose of this feature is to avoid waiting for response from the
> server forever (or too long) in any case, as a last resort.

Thank you for explaining further. I think that clarifies the implementation
argument on why it needs to terminate the connection over cancelling the query.
In short, I think it's intended purpose is to prevent dead connection.
It addresses the case for when network or server failure occurs and
when DBMS is terminated abruptly. So far, no existing parameter covers that yet.

As for the place, my knowledge is limited so I won't be able to provide
substantial help on it. Fabien pointed out some comments about it that needs
clarification though.
Quoting what Fabien wrote:
"The implementation looks awkward, because part of the logic of 
pqWaitTimed 
is reimplemented in pqWait. Also, I do not understand the computation 
of finish_time, which seems to assume that pqWait is going to be called 
immediately after sending a query, which may or may not be the case, 
and 
if it is not the time() call there is not the start of the statement."

> Oracle has similar parameters called SQLNET.RECV_TIMEOUT and 
> SQLNET.SEND_TIMEOUT.
> From those names, I guess they use SO_RCVTIMEO and SO_SNDTIMEO socket 
> options. 
> However, we can't use them because use non-blocking sockets and poll(), while
> SO_RCV/SND_TIMEO do ont have an effect for poll():
> [..snipped]
>
>> - proper parameter name
>
> I think the name is good because it indicates the socket-level timeout.
> That's just like PgJDBC and Oracle, and I didn't feel strange when I read 
> their manuals.

Agree. I'm actually +1 with tcp_socket_timeout

>> Perhaps you could also clarify a bit more through documentation on how 
>> socket_timeout handles the timeout differently from statement_timeout 
>> and tcp_user_timeout.

> Maybe.  Could you suggest good description?

Ok. I just realized that requires careful explanation.
How about the one below? 
I got it from combined explanations above. I did not include the 
differences though. This can be improved as the discussion
continues along the way.

tcp_socket_timeout (integer)

Terminate and restart any session that has been idle for more than
the specified number of milliseconds to prevent client from infinite
waiting for server due to dead connection. This can be used both as
a brute force global query timeout and detecting network problems.
A value of zero (the default) turns this off.

Regards,
Kirk Jamison

Journal based VACUUM FULL

2019-02-21 Thread Ryan David Sheasby

Hi Team.

New to contributing so hopefully this is the right place. I've searched the
forum and it seems this is the place for feature requests/suggestions.

I was reading on VACUUM and VACUUM FULL and saw that the current
implementation of VACUUM FULL writes to an entirely new file and then
switches to it, as opposed to rewriting the current file in-place. I assume
the reason for this is safety in the event the database shuts down in the
middle of the vacuum, the most you will lose is the progress of the vacuum
and have to start from scratch but otherwise the database will retain its
integrity and not become corrupted. This seems to be an effective but
somewhat rudimentary system that could be improved. Most notably, the
rewriting of almost the entire database takes up basically double the
storage during the duration of the rewrite which can become expensive or
even simply inconvenient in IaaS(and probably other) installations where
the drive sizes are scaled on demand. Even if the VACUUM FULL doesn't need
to run often, having to reallocate drive space for an effective duplication
is not ideal. My suggestion is a journal based in-place rewrite of the
database files.

This means that the VACUUM FULL will do a "pre-processing" pass over the
database and figure out at a fairly low level what operations need to be
done to compact the database back to it's "correct" size. These operations
will be written in their entirety to a journal which records all the
operations about to be performed, with some mechanism for checking if they
have already been performed, using the same principle described here:
https://en.wikipedia.org/wiki/Journaling_file_system. This will allow an
in-place rewrite of the file in a safe manner such that you are able to
recover from an unexpected shutdown by resuming the VACUUM FULL from the
journal, or by detecting where the copy hole is in the file and
recording/ignoring it

The journal could be something as simple as a record of which byte ranges
need to be copied into which other byte ranges locations. The journal
should record whenever a byte range copy completes for the sake of error
recovery. Obviously, each byte range will have a max size of the copy
distance from the source to the destination so that the destination will
not overwrite the source, therefore making recovery impossible(how can you
know where in the copy you stopped?). However, this will have a snowball
effect as the further you are in the rewrite, the further the source and
destination ranges will be so you can copy bigger chunks at a time, and
won't have to update the journal's completion flags as often. In the case
of a shutdown during a copy, you merely read the journal, looking for the
first copy that isn't completed yet, and continue rewriting from there.
Even if some of the bytes have been copied already, there will be no
corruption as you haven't overwritten the source bytes at all. Finally, a
simple file truncate can take place once all the copies are complete, and
the journal can be deleted. This means the headroom required in the
filesystem would be much smaller, and would pay for itself in any copy of
at least 17 bytes or more (assuming 2*8 byte pointers plus a bit as a
completion flag). The only situation in which this system would consume
more space than a total copy is if the database has more than 50% garbage,
and the garbage is perfectly spread out i.e. isn't in large chunks that can
be copied at once and therefore recorded in the journal at once, and each
piece of garbage is smaller than 17 bytes. Obviously, the journal itself
would need a error checking mechanism to ensure the journal was correctly
and completely written, but this can be as simple as a total file hash at
the end of the file.

An alternative to the completion flags is to compute a hash of the data to
be copied and store it in the journal, and then in recovery you can compare
the destination with the hash. This has the advantage of not needing to
write to the journal to keep it up to date during the operations, but the
disadvantages associated with having to compute many hashes while
recovering and storing the hashes in the journal, taking up more space.
It's also arguably less safe as there is always the chance(albeit extremely
unlikely) of a collision, which would mean that the data is not actually
validated. I would argue the risk of this is lower than the risk of bit-rot
flipping the completion bit, however.

A journaling system like this *might* have performance benefits too,
specifically when running in less intelligent file systems like NTFS which
can become easily fragmented(causing potentially big performance issues on
spinning disks). Rewriting the same file will never require a file-system
de-fragment. The other advantage as mentioned before is in the case of
auto-scaling drives if used as storage for the DB(as they often are in
IaaS/Paas services). Not having to scale up rapidly could be a performance
boost in some cases.

Re: insensitive collations

2019-02-21 Thread Peter Eisentraut

On 2019-02-21 03:17, Peter Geoghegan wrote:
> I wonder if it would be better to break this into distinct commits?

I thought about that.  Especially the planner/executor changes could be
done separately, sort of as a way to address the thread
"ExecBuildGroupingEqual versus collations".  But I'm not sure if they
would have good test coverage on their own.  I can work on this if
people think this would be useful.

> * Why is support for non-deterministic comparisons limited to the ICU
> provider? If that's the only case that could possibly be affected,
> then why did we ever add the varstrcmp() tie-breaker call to strcmp()
> in the first place? The behavior described in the commit message of
> bugfix commit 656beff5 describes a case where Hungarian text caused
> index corruption by being strcoll()-wise equal but not bitwise equal.
> Besides, I think that you can vendor your own case insensitive
> collation with glibc, since it's based on UCA.

The original test case (described here:
https://www.postgresql.org/message-id/27064.1134753128%40sss.pgh.pa.us)
no longer works, so it was probably fixed on the glibc side.  The git
log of the hu_HU locale definition shows that it has been "fixed" in
major ways several times over the years, so that's plausible.

I tried reproducing some more practical scenarios involving combining
diacritics, but glibc apparently doesn't believe strings in different
normal forms are equal.  At that point I gave up because this doesn't
seem worthwhile to support.

Moreover, I think allowing this would require a "trusted" strxfrm(),
which is currently disabled.

>> +   return DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
>> + DEFAULT_COLLATION_OID,
>> + oldvalue, newvalue));
>>  }
> 
> The existing restriction on ICU collations (that they cannot be
> database-wide) side-steps the issue.
> 
> * Can said restriction somehow be lifted? That seems like it would be
> a lot cleaner.

Lift what restriction?  That ICU collations cannot be database-wide?
Sure that would be nice, but it's a separate project.  Even then, I'm
not sure that we would allow a database-wide collation to be
nondeterministic.  That would for example disallow the use of LIKE,
which would be weird.  In any case, the above issue can be addressed
then.  I think it's not worth complicating this right now.

> * Why have you disable this optimization?:
> 
>> /* Fast pre-check for equality, as discussed in varstr_cmp() */
>> -   if (len1 == len2 && memcmp(a1p, a2p, len1) == 0)
>> +   if ((!sss->locale || sss->locale->deterministic) &&
>> +   len1 == len2 && memcmp(a1p, a2p, len1) == 0)
> 
> I don't see why this is necessary. A non-deterministic collation
> cannot indicate that bitwise identical strings are non-equal.

Right, I went too far there.

> * Perhaps you should add a "Tip" referencing the feature to the
> contrib/citext documentation.

Good idea.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[PATCH] Allow transaction to partition table using FDW

2019-02-21 Thread mitani

Hi,

I would like to use transactions with partitioning table using FDW, but 
transactions can not be used with the following error.
 'ERROR:  could not serialize access due to concurrent update

So, I tried to write a very simple patch.
This patch works for my purpose, but I do not know if it matches general usage.
I'd like to improve this feature which can be used generally, so please check 
it.

Please find attached file.

Regards,

-- 
mitani 


allow-transaction-to-partition-table-using-FDW.patch
Description: Binary data

Re: SQL statement PREPARE does not work in ECPG

2019-02-21 Thread Michael Meskes

Takahashi-san,

> It works well for my statement
> 
> "EXEC SQL PREPARE test_prep (int) AS SELECT id from test_table where
> id = $1;".
> 
> However, since data type information is not used, it does not works
> well
> for prepare statements which need data type information such as 
> "EXEC SQL PREPARE test_prep (int, int) AS SELECT $1 + $2;".

Yes, that was to be expected. This is what Matsumura-san was trying to
fix. However, I'm not sure yet which of his ideas is the best. 

> It also works.
> (Actually, I wrote "EXEC SQL EXECUTE test_prep (:i) INTO :ID;".)

Ok, thanks. That means the parser has to handle the list of execute
arguments differently, which in hindsight is pretty obvious. 

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL

Inappropriate scope of local variable

2019-02-21 Thread Antonin Houska

In AfterTriggerSaveEvent(), the "new_shared" variable is not used outside the
"for" loop, so I think it should be defined only within the loop. The
following patch makes reading the code a little bit more convenient for me.

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 409bee24f8..d95c57f244 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5743,7 +5743,6 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo 
*relinfo,
Relationrel = relinfo->ri_RelationDesc;
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
AfterTriggerEventData new_event;
-   AfterTriggerSharedData new_shared;
charrelkind = rel->rd_rel->relkind;
int tgtype_event;
int tgtype_level;
@@ -5937,6 +5936,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo 
*relinfo,
for (i = 0; i < trigdesc->numtriggers; i++)
{
Trigger*trigger = &trigdesc->triggers[i];
+   AfterTriggerSharedData new_shared;

if (!TRIGGER_TYPE_MATCHES(trigger->tgtype,
  tgtype_level,

--
Antonin Houska
https://www.cybertec-postgresql.com

Re: WAL insert delay settings

2019-02-21 Thread Ants Aasma

On Thu, Feb 21, 2019 at 2:20 AM Stephen Frost  wrote:

> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-02-20 18:46:09 -0500, Stephen Frost wrote:
> > > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> > > > On 2/20/19 10:43 PM, Stephen Frost wrote:
> > > > > Just to share a few additional thoughts after pondering this for a
> > > > > while, but the comment Andres made up-thread really struck a
> chord- we
> > > > > don't necessairly want to throttle anything, what we'd really
> rather do
> > > > > is *prioritize* things, whereby foreground work (regular queries
> and
> > > > > such) have a higher priority than background/bulk work (VACUUM,
> REINDEX,
> > > > > etc) but otherwise we use the system to its full capacity.  We
> don't
> > > > > actually want to throttle a VACUUM run any more than a CREATE
> INDEX, we
> > > > > just don't want those to hurt the performance of regular queries
> that
> > > > > are happening.
> > > >
> > > > I think you're forgetting the motivation of this very patch was to
> > > > prevent replication lag caused by a command generating large amounts
> of
> > > > WAL (like CREATE INDEX / ALTER TABLE etc.). That has almost nothing
> to
> > > > do with prioritization or foreground/background split.
> > > >
> > > > I'm not arguing against ability to prioritize stuff, but I disagree
> it
> > > > somehow replaces throttling.
> > >
> > > Why is replication lag an issue though?  I would contend it's an issue
> > > because with sync replication, it makes foreground processes wait, and
> > > with async replication, it makes the actions of foreground processes
> > > show up late on the replicas.
> >
> > I think reaching the bandwidth limit of either the replication stream,
> > or of the startup process is actually more common than these. And for
> > that prioritization doesn't help, unless it somehow reduces the total
> > amount of WAL.
>
> The issue with hitting those bandwidth limits is that you end up with
> queues outside of your control and therefore are unable to prioritize
> the data going through them.  I agree, that's an issue and it might be
> necessary to ask the admin to provide what the bandwidth limit is, so
> that we could then avoid running into issues with downstream queues that
> are outside of our control causing unexpected/unacceptable lag.
>

If there is a global rate limit on WAL throughput it could be adjusted by a
control loop, measuring replication queue length and/or apply delay. I
don't see any sane way how one would tune a per command rate limit, or even
worse, a cost-delay parameter. It would have the same problems as work_mem
settings.

Rate limit in front of WAL insertion would allow for allocating the
throughput between foreground and background tasks, and even allow for
priority inheritance to alleviate priority inversion due to locks.

There is also an implicit assumption here that a maintenance command is a
background task and a normal DML query is a foreground task. This is not
true for all cases, users may want to throttle transactions doing lots of
DML to keep synchronous commit latencies for smaller transactions within
reasonable limits.

As a wild idea for how to handle the throttling, what if when all our wal
insertion credits are used up XLogInsert() sets InterruptPending and the
actual sleep is done inside ProcessInterrupts()?

Regards,
Ants Aasma

Solaris 10 (sparc) and unixODBC problem

2019-02-21 Thread Nariman Ibadullaev

I installed latest PostgreSQL-ODBC , OS=Solaris 10, Arch = Sparc. I can not
use PostgreSQL ODBC in unixODBC.

-bash-3.2$ LD_DEBUG=libs ./isql -v PG
debug:
debug: Solaris Linkers: 5.10-1.1518
debug:
19267:
19267: platform capability (CA_SUNW_PLAT) - sun4v
19267: machine capability (CA_SUNW_MACH) - sun4v
19267: hardware capabilities (CA_SUNW_HW_1) - 0x3ffe8dfb  [ CRC32C CBCOND
PAUSE MONT MPMUL SHA512 SHA256 SHA1 MD5 CAMELLIA KASUMI DES AES IMA HPC
VIS3 FMAF ASI_BLK_INIT VIS2 VIS POPC V8PLUS DIV32 MUL32 ]
19267:
19267:
19267: configuration file=/var/ld/ld.config: unable to process file
19267:
19267:
19267: find object=libodbc.so.2; searching
19267:  search path=/usr/local/unixODBC/lib:/usr/local/lib  (RUNPATH/RPATH
from file isql)
19267:  trying path=/usr/local/unixODBC/lib/libodbc.so.2
19267:
19267: find object=libiconv.so.2; searching
19267:  search path=/usr/local/unixODBC/lib:/usr/local/lib  (RUNPATH/RPATH
from file isql)
19267:  trying path=/usr/local/unixODBC/lib/libiconv.so.2
19267:  trying path=/usr/local/lib/libiconv.so.2
19267:
19267: find object=libthread.so.1; searching
19267:  search path=/usr/local/unixODBC/lib:/usr/local/lib  (RUNPATH/RPATH
from file isql)
19267:  trying path=/usr/local/unixODBC/lib/libthread.so.1
19267:  trying path=/usr/local/lib/libthread.so.1
19267:  search path=/lib  (default)
19267:  search path=/usr/lib  (default)
19267:  trying path=/lib/libthread.so.1
19267:
19267: find object=libc.so.1; searching
19267:  search path=/usr/local/unixODBC/lib:/usr/local/lib  (RUNPATH/RPATH
from file isql)
19267:  trying path=/usr/local/unixODBC/lib/libc.so.1
19267:  trying path=/usr/local/lib/libc.so.1
19267:  search path=/lib  (default)
19267:  search path=/usr/lib  (default)
19267:  trying path=/lib/libc.so.1
19267:
19267: find object=libiconv.so.2; searching
19267:  search path=/usr/local/lib  (RUNPATH/RPATH from file
/usr/local/unixODBC/lib/libodbc.so.2)
19267:  trying path=/usr/local/lib/libiconv.so.2
19267:
19267: find object=libthread.so.1; searching
19267:  search path=/usr/local/lib  (RUNPATH/RPATH from file
/usr/local/unixODBC/lib/libodbc.so.2)
19267:  trying path=/usr/local/lib/libthread.so.1
19267:  search path=/lib  (default)
19267:  search path=/usr/lib  (default)
19267:  trying path=/lib/libthread.so.1
19267:
19267: find object=libc.so.1; searching
19267:  search path=/usr/local/lib  (RUNPATH/RPATH from file
/usr/local/unixODBC/lib/libodbc.so.2)
19267:  trying path=/usr/local/lib/libc.so.1
19267:  search path=/lib  (default)
19267:  search path=/usr/lib  (default)
19267:  trying path=/lib/libc.so.1
19267:
19267: find object=libgcc_s.so.1; searching
19267:  search path=/usr/local/lib  (RUNPATH/RPATH from file
/usr/local/unixODBC/lib/libodbc.so.2)
19267:  trying path=/usr/local/lib/libgcc_s.so.1
19267:
19267: find object=libc.so.1; searching
19267:  search
path=/usr/local/lib:/usr/lib:/usr/openwin/lib:/usr/local/ssl/lib:/usr/local/BerkeleyDB.4.2/lib
(RUNPATH/RPATH from file /usr/local/lib/libiconv.so.2)
19267:  trying path=/usr/local/lib/libc.so.1
19267:  trying path=/usr/lib/libc.so.1
19267:
19267: find object=libgcc_s.so.1; searching
19267:  search
path=/usr/local/lib:/usr/lib:/usr/openwin/lib:/usr/local/ssl/lib:/usr/local/BerkeleyDB.4.2/lib
(RUNPATH/RPATH from file /usr/local/lib/libiconv.so.2)
19267:  trying path=/usr/local/lib/libgcc_s.so.1
19267:
19267: find object=libc.so.1; searching
19267:  search path=/usr/local/lib:/usr/local/ssl/lib  (RUNPATH/RPATH from
file /usr/local/lib/libgcc_s.so.1)
19267:  trying path=/usr/local/lib/libc.so.1
19267:  trying path=/usr/local/ssl/lib/libc.so.1
19267:  search path=/lib  (default)
19267:  search path=/usr/lib  (default)
19267:  trying path=/lib/libc.so.1
19267:
19267: find object=libc.so.1; searching
19267:  search path=/lib  (default)
19267:  search path=/usr/lib  (default)
19267:  trying path=/lib/libc.so.1
19267:
19267: 1:
19267: 1: transferring control: isql
19267: 1:
19267: 1:
19267: 1: find object=/platform/sun4v/lib/libc_psr.so.1; searching
19267: 1:  trying path=/platform/sun4v/lib/libc_psr.so.1
19267: 1:
19267: 1: find object=/usr/local/lib/psqlodbcw.so; searching
19267: 1:  trying path=/usr/local/lib/psqlodbcw.so
19267: 1:
19267: 1: find object=libpq.so.5; searching
19267: 1:  search path=/usr/local/unixODBC/lib  (RUNPATH/RPATH from file
/usr/local/lib/psqlodbcw.so)
19267: 1:  trying path=/usr/local/unixODBC/lib/libpq.so.5
19267: 1:  search path=/lib  (default)
19267: 1:  search path=/usr/lib  (default)
19267: 1:  trying path=/lib/libpq.so.5
19267: 1:  trying path=/usr/lib/libpq.so.5
19267: 1:
19267: 1: find object=libpthread.so.1; searching
19267: 1:  search path=/usr/local/unixODBC/lib  (RUNPATH/RPATH from file
/usr/local/lib/psqlodbcw.so)
19267: 1:  trying path=/usr/local/unixODBC/lib/libpthread.so.1
19267: 1:  search path=/lib  (default)
19267: 1:  search path=/usr/lib  (default)
19267: 1:  trying path=/lib/libpthread.so.1
19267: 1:
19267: 1: find object=libodbcinst.so.2; searching
19267: 1:  search path=/usr/local

Re: Reporting script runtimes in pg_regress

2019-02-21 Thread Christoph Berg

Re: Tom Lane 2019-02-18 <28360.1550506...@sss.pgh.pa.us>
> >>> We should also strive to align "FAILED" properly.
> 
> >> Yeah, not strictly required, but someone might want to play around with
> >> it a bit.
> 
> > FWIW I don't think we localize pg_regress output currently, so that
> > argument seems moot ... But I think we can get away with constant four
> > spaces for now.
> 
> I pushed Peter's suggestion for %8.0f; let's live with that for a little
> and see if it's still annoying.

The ryu changes make postgresql-unit fail quite loudly:

$ cat regression.out
test extension... ok  359 ms
test tables   ... FAILED  164 ms
test unit ... FAILED  867 ms
test binary   ... ok   20 ms
test unicode  ... ok   18 ms
test prefix   ... FAILED   43 ms
test units... FAILED  207 ms
test time ... FAILED   99 ms
test temperature  ... FAILED   22 ms
...

The misalignment annoyed me enough (especially the false alignment
between "ms" on the first row and "164" on the next row) to look into
it. Aligned it looks like this:

test extension... ok  399 ms
test tables   ... FAILED  190 ms
test unit ... FAILED  569 ms
test binary   ... ok   14 ms
test unicode  ... ok   15 ms
test prefix   ... FAILED   44 ms
test units... FAILED  208 ms
test time ... FAILED   99 ms
test temperature  ... FAILED   21 ms
...

It doesn't break translations because it prints the extra spaces
separately.

In run_single_test() (which this output is from), it simply aligns the
output with FAILED. In run_schedule(), there is the 3rd output string
"failed (ignored)" which is considerably longer. I aligned the output
with that, but also made the timestamp field shorter so it's not too
much to the right.

Christoph
>From 3acba7cc50f4711abedfb61cd06b1f8e640caac5 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Thu, 21 Feb 2019 10:35:19 +0100
Subject: [PATCH] Align timestamps in pg_regress output

---
 src/test/regress/pg_regress.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index a18a6f6c45..8080626e94 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1794,12 +1794,14 @@ run_schedule(const char *schedule, test_function tfunc)
 else
 {
 	status(_("FAILED"));
+	status("  "); /* align with failed (ignored) */
 	fail_count++;
 }
 			}
 			else
 			{
 status(_("ok"));
+status("  "); /* align with failed (ignored) */
 success_count++;
 			}
 
@@ -1807,7 +1809,7 @@ run_schedule(const char *schedule, test_function tfunc)
 log_child_failure(statuses[i]);
 
 			INSTR_TIME_SUBTRACT(stoptimes[i], starttimes[i]);
-			status(_(" %8.0f ms"), INSTR_TIME_GET_MILLISEC(stoptimes[i]));
+			status(_(" %5.0f ms"), INSTR_TIME_GET_MILLISEC(stoptimes[i]));
 
 			status_end();
 		}
@@ -1880,6 +1882,7 @@ run_single_test(const char *test, test_function tfunc)
 	else
 	{
 		status(_("ok"));
+		status(""); /* align with FAILED */
 		success_count++;
 	}
 
-- 
2.20.1

Re: [patch] Add schema total size to psql \dn+

Le 20/02/2019 à 23:26, Gilles Darold a écrit :
> Hi all,
>
>
> When we want to get total size of all relation in a schema we have to
> execute one of our favorite DBA query. It  is quite simple but what
> about displaying schema size when using \dn+ in psql ?
>
>
> gilles=# \dn+
>    List of schemas
>   Name  |  Owner   |  Access privileges   |  Size   |  Description 
> +--+--+-+
>  public | postgres | postgres=UC/postgres+| 608 kB  | standard public schema
>     |  | =UC/postgres | |
>  test   | gilles   |  | 57 MB   |
>  empty  | gilles   |  | 0 bytes |
> (3 rows)
>
> The attached simple patch adds this feature. Is there any cons adding
> this information? The patch tries to be compatible to all PostgreSQL
> version. Let me know if I have missed something.


Improve this patch by using LATERAL JOIN when version >= 9.3.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 4da6719ce7..9b57f59ec7 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4188,6 +4188,19 @@ listSchemas(const char *pattern, bool verbose, bool showSystem)
 	{
 		appendPQExpBufferStr(&buf, ",\n  ");
 		printACLColumn(&buf, "n.nspacl");
+		/* As of PostgreSQL 9.3, use LATERAL JOIN */
+		if (pset.sversion >= 93000)
+		{
+			appendPQExpBuffer(&buf, ",\n  schema_size AS \"%s\"",
+		  gettext_noop("Size"));
+		}
+		else
+		{
+			appendPQExpBuffer(&buf,
+		  ",\n  ((SELECT pg_catalog.pg_size_pretty((sum(pg_catalog.pg_relation_size(c.oid)))::bigint) FROM pg_catalog.pg_class c\nLEFT JOIN pg_catalog.pg_namespace s ON s.oid = c.relnamespace WHERE s.nspname = n.nspname)) AS \"%s\"",
+		  gettext_noop("Size"));
+		}
+
 		appendPQExpBuffer(&buf,
 		  ",\n  pg_catalog.obj_description(n.oid, 'pg_namespace') AS \"%s\"",
 		  gettext_noop("Description"));
@@ -4195,6 +4208,12 @@ listSchemas(const char *pattern, bool verbose, bool showSystem)
 
 	appendPQExpBuffer(&buf,
 	  "\nFROM pg_catalog.pg_namespace n\n");
+	/* As of PostgreSQL 9.3, use LATERAL JOIN */
+	if (pset.sversion >= 93000)
+	{
+		appendPQExpBuffer(&buf,
+		  "  LEFT JOIN LATERAL (\nSELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_relation_size(c.oid))::bigint) As \"schema_size\"\nFROM pg_catalog.pg_class c\n  LEFT JOIN pg_catalog.pg_namespace s ON s.oid = c.relnamespace\nWHERE s.nspname = n.nspname\n  ) l ON true\n");
+	}
 
 	if (!showSystem && !pattern)
 		appendPQExpBufferStr(&buf,

Re: WAL insert delay settings

2019-02-21 Thread Stephen Frost

Greetings,

* Ants Aasma (ants.aa...@eesti.ee) wrote:
> On Thu, Feb 21, 2019 at 2:20 AM Stephen Frost  wrote:
> > The issue with hitting those bandwidth limits is that you end up with
> > queues outside of your control and therefore are unable to prioritize
> > the data going through them.  I agree, that's an issue and it might be
> > necessary to ask the admin to provide what the bandwidth limit is, so
> > that we could then avoid running into issues with downstream queues that
> > are outside of our control causing unexpected/unacceptable lag.
> 
> If there is a global rate limit on WAL throughput it could be adjusted by a
> control loop, measuring replication queue length and/or apply delay. I
> don't see any sane way how one would tune a per command rate limit, or even
> worse, a cost-delay parameter. It would have the same problems as work_mem
> settings.

Yeah, having some kind of feedback loop would be interesting.  I agree
that a per-command rate limit would have similar problems to work_mem,
and that's definitely one problem we have with the way VACUUM is tuned
today but the ship has more-or-less sailed on that- I don't think we're
going to be able to simply remove the VACUUM settings.  Avoiding adding
new settings that are per-command would be good though, if we can sort
out a way how.

> Rate limit in front of WAL insertion would allow for allocating the
> throughput between foreground and background tasks, and even allow for
> priority inheritance to alleviate priority inversion due to locks.

I'm not sure how much we have to worry about priority inversion here as
you need to have conflicts for that and if there's actually a conflict,
then it seems like we should just press on.

That is, a non-concurrent REINDEX is going to prevent an UPDATE from
modifying anything in the table, which if the UPDATE is a higher
priority than the REINDEX would be priority inversion, but that doesn't
mean we should slow down the REINDEX to allow the UPDATE to happen
because the UPDATE simply can't happen until the REINDEX is complete.
Now, we might slow down the REINDEX because there's UPDATEs against
*other* tables that aren't conflicting and we want those UPDATEs to be
prioritized over the REINDEX but then that isn't priority inversion.

Basically, I'm not sure that there's anything we can do, or need to do,
differently from what we do today when it comes to priority inversion
risk, at least as it relates to this discussion.  There's an interesting
discussion to be had about if we should delay the REINDEX taking the
lock at all when there's an UPDATE pending, but you run the risk of
starving the REINDEX from ever getting the lock and being able to run in
that case.  A better approach is what we're already working on- arrange
for the REINDEX to not require a conflicting lock, so that both can run
concurrently.

> There is also an implicit assumption here that a maintenance command is a
> background task and a normal DML query is a foreground task. This is not
> true for all cases, users may want to throttle transactions doing lots of
> DML to keep synchronous commit latencies for smaller transactions within
> reasonable limits.

Agreed, that was something that I was contemplating too- and one could
possibly argue in the other direction as well (maybe that REINDEX is on
a small table but has a very high priority and we're willing to accept
that some regular DML is delayed a bit to allow that REINDEX to finish).
Still, I would think we'd basically want to use the heuristic that DDL
is bulk and DML is a higher priority for a baseline/default position,
but then provide users with a way to change the priority on a
per-session level, presumably with a GUC or similar, if they have a case
where that heuristic is wrong.

Again, just to be clear, this is all really 'food for thought' and
interesting discussion and shouldn't keep us from doing something simple
now, if we can, to help alleviate the immediate practical issue that
bulk commands can cause serious WAL lag.  I think it's good to have
these discussions since they may help us to craft the simple solution in
a way that could later be extended (or at least won't get in the way)
for these much larger changes, but even if that's not possible, we
should be open to accepting a simpler, short-term, improvement, as these
larger changes would very likely be multiple major releases away if
they're able to be done at all.

> As a wild idea for how to handle the throttling, what if when all our wal
> insertion credits are used up XLogInsert() sets InterruptPending and the
> actual sleep is done inside ProcessInterrupts()?

This comment might be better if it was made higher up in the thread,
closer to where the discussion was happening about the issues with
critical sections and the current patch's approach for throttle-based
rate limiting.  I'm afraid that it might get lost in this sub-thread
about these much larger and loftier ideas around where we might want to
go in the fut

Re: [patch] Add schema total size to psql \dn+

On Thu, Feb 21, 2019 at 11:49 AM Gilles Darold  wrote:
>
> > When we want to get total size of all relation in a schema we have to
> > execute one of our favorite DBA query. It  is quite simple but what
> > about displaying schema size when using \dn+ in psql ?
> > [...]
> > The attached simple patch adds this feature. Is there any cons adding
> > this information? The patch tries to be compatible to all PostgreSQL
> > version. Let me know if I have missed something.

I needed that quite often, so I'm +1 to add this!  Please register
this patch on the next commitfest.

Re: Pluggable Storage - Andres's take

2019-02-21 Thread Amit Khandekar

On Thu, 21 Feb 2019 at 04:17, Robert Haas  wrote:
>
> On Fri, Feb 8, 2019 at 5:18 AM Amit Khandekar  wrote:
> > In the attached v1 patch, the prefetch_distance is calculated as
> > effective_io_concurrency + 10. Also it has some cosmetic changes.
>
> I did a little brief review of this patch and noticed the following things.
>
> +} PrefetchState;
> That name seems too generic.

Ok, so something like XidHorizonPrefetchState ? On similar lines, does
prefetch_buffer() function name sound too generic as well ?

>
> +/*
> + * An arbitrary way to come up with a pre-fetch distance that grows with io
> + * concurrency, but is at least 10 and not more than the max effective io
> + * concurrency.
> + */
>
> This comment is kinda useless, because it only tells us what the code
> does (which is obvious anyway) and not why it does that.  Saying that
> your formula is arbitrary may not be the best way to attract support
> for it.

Well, I had checked the way the number of drive spindles
(effective_io_concurrency) is used to calculate the prefetch distance
for bitmap heap scans (ComputeIoConcurrency). Basically I think the
intention behind that method is to come up with a number that makes it
highly likely that we pre-fetch a block of each of the drive spindles.
But I didn't get how that exactly works, all the less for non-parallel
bitmap scans. Same is the case for the pre-fetching that we do here
for xid-horizon stuff, where we do the block reads sequentially. Me
and Andres discussed this offline, and he was of the opinion that this
formula won't help here, and instead we just keep a constant distance
that is some number greater than effective_io_concurrency. I agree
that instead of saying "arbitrary" we should explain why we have done
that, and before that, come up with an agreed-upon formula.

>
> + for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; 
> i++)
>
> It looks strange to me that next_item is stored in prefetch_state and
> nitems is passed around as an argument.  Is there some reason why it's
> like that?

We could keep the max count in the structure itself as well. There
isn't any specific reason for not keeping it there. It's just that
this function prefetch_state () is not a general function for
maintaining a prefetch state that spans across function calls; so we
might as well just pass the max count to that function instead of
having another field in that structure. I am not inclined specifically
towards either of the approaches.

>
> + /* prefetch a fixed number of pages beforehand. */
>
> Not accurate -- the number of pages we prefetch isn't fixed.  It
> depends on effective_io_concurrency.

Yeah, will change that in the next patch version, according to what we
conclude about the prefetch distance calculation.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Pluggable Storage - Andres's take

On Thu, Feb 21, 2019 at 6:44 AM Amit Khandekar  wrote:
> Ok, so something like XidHorizonPrefetchState ? On similar lines, does
> prefetch_buffer() function name sound too generic as well ?

Yeah, that sounds good.  And, yeah, then maybe rename the function too.

> > +/*
> > + * An arbitrary way to come up with a pre-fetch distance that grows with io
> > + * concurrency, but is at least 10 and not more than the max effective io
> > + * concurrency.
> > + */
> >
> > This comment is kinda useless, because it only tells us what the code
> > does (which is obvious anyway) and not why it does that.  Saying that
> > your formula is arbitrary may not be the best way to attract support
> > for it.
>
> Well, I had checked the way the number of drive spindles
> (effective_io_concurrency) is used to calculate the prefetch distance
> for bitmap heap scans (ComputeIoConcurrency). Basically I think the
> intention behind that method is to come up with a number that makes it
> highly likely that we pre-fetch a block of each of the drive spindles.
> But I didn't get how that exactly works, all the less for non-parallel
> bitmap scans. Same is the case for the pre-fetching that we do here
> for xid-horizon stuff, where we do the block reads sequentially. Me
> and Andres discussed this offline, and he was of the opinion that this
> formula won't help here, and instead we just keep a constant distance
> that is some number greater than effective_io_concurrency. I agree
> that instead of saying "arbitrary" we should explain why we have done
> that, and before that, come up with an agreed-upon formula.

Maybe something like: We don't use the regular formula to determine
how much to prefetch here, but instead just add a constant to
effective_io_concurrency.  That's because it seems best to do some
prefetching here even when effective_io_concurrency is set to 0, but
if the DBA thinks it's OK to do more prefetching for other operations,
then it's probably OK to do more prefetching in this case, too.  It
may be that this formula is too simplistic, but at the moment we have
no evidence of that or any idea about what would work better.

> > + for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; 
> > i++)
> >
> > It looks strange to me that next_item is stored in prefetch_state and
> > nitems is passed around as an argument.  Is there some reason why it's
> > like that?
>
> We could keep the max count in the structure itself as well. There
> isn't any specific reason for not keeping it there. It's just that
> this function prefetch_state () is not a general function for
> maintaining a prefetch state that spans across function calls; so we
> might as well just pass the max count to that function instead of
> having another field in that structure. I am not inclined specifically
> towards either of the approaches.

All right, count me as +0.5 for putting a copy in the structure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PROPOSAL] Shared Ispell dictionaries

On Wed, Feb 20, 2019 at 9:33 AM Arthur Zakirov  wrote:
> I'm working on the (b) approach. I thought about a priority queue
> structure. There no such ready structure within PostgreSQL sources
> except binaryheap.c, but it isn't for concurrent algorithms.

I don't see why you need a priority queue or, really, any other fancy
data structure.  It seems like all you need to do is somehow set it up
so that a backend which doesn't use a dictionary for a while will
dsm_detach() the segment.  Eventually an unused dictionary will have
no remaining references and will go away.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Unnecessary checks for new rows by some RI trigger functions?

On Wed, Feb 20, 2019 at 9:27 AM Antonin Houska  wrote:
> However I find it confusing that the trigger functions pass detectNewRows=true
> even if they only execute SELECT statement.

I don't quite see what those two things have to do with each other,
but I might be missing something.  I stuck in a debugging elog() to
see where ri_Check_Pk_Match() gets called and it fired for the
following query in the regression tests:

update pp set f1=f1+1;

That agrees with my intuition, which is that the logic here has
something to do with allowing an update that changes a referenced row
but also changes some other row in the same table so that the
reference remains valid -- it's just now a reference to some other
row.  Unfortunately, as you probably also noticed, making the change
you propose here doesn't make any tests fail in either the main
regression tests or the isolation tests.

I suspect that this is a defect in the tests rather than a sign that
the code doesn't need to be changed, though.  I'd try a statement like
the above in a multi-statement transaction with a higher-than-default
isolation level, probably REPEATABLE READ, and maybe some concurrent
activity in another session.  Sorry, I'm hand-waving here; I don't
know exactly what's going on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WAL insert delay settings

2019-02-21 Thread Ants Aasma

On Thu, Feb 21, 2019 at 12:50 PM Stephen Frost  wrote:

> > Rate limit in front of WAL insertion would allow for allocating the
> > throughput between foreground and background tasks, and even allow for
> > priority inheritance to alleviate priority inversion due to locks.
>
> I'm not sure how much we have to worry about priority inversion here as
> you need to have conflicts for that and if there's actually a conflict,
> then it seems like we should just press on.
>
> That is, a non-concurrent REINDEX is going to prevent an UPDATE from
> modifying anything in the table, which if the UPDATE is a higher
> priority than the REINDEX would be priority inversion, but that doesn't
> mean we should slow down the REINDEX to allow the UPDATE to happen
> because the UPDATE simply can't happen until the REINDEX is complete.
> Now, we might slow down the REINDEX because there's UPDATEs against
> *other* tables that aren't conflicting and we want those UPDATEs to be
> prioritized over the REINDEX but then that isn't priority inversion.
>

I was thinking along the lines that each backend gets a budget of WAL
insertion credits per time interval, and when the credits run out the
process sleeps. With this type of scheme it would be reasonably
straightforward to let UPDATEs being blocked by REINDEX to transfer their
WAL insertion budgets to the REINDEX, making it get a larger piece of the
total throughput pie.

Regards,
Ants Aasma

Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Alvaro Herrera

On 2019-Feb-21, Amit Kapila wrote:

> On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera  
> wrote:
> >
> > Please remember to keep serial_schedule in sync.
> 
> I don't understand what you mean by this?  It is already present in
> serial_schedule.  In parallel_schedule, we are just moving this test
> to one of the parallel groups.  Do we need to take care of something
> else?

Just to make sure it's in the same relative position.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-21 Thread Arthur Zakirov

On 21.02.2019 15:45, Robert Haas wrote:

On Wed, Feb 20, 2019 at 9:33 AM Arthur Zakirov wrote:

I'm working on the (b) approach. I thought about a priority queue
structure. There no such ready structure within PostgreSQL sources
except binaryheap.c, but it isn't for concurrent algorithms.

I don't see why you need a priority queue or, really, any other fancy
data structure. It seems like all you need to do is somehow set it up
so that a backend which doesn't use a dictionary for a while will
dsm_detach() the segment. Eventually an unused dictionary will have
no remaining references and will go away.

Hm, I didn't think in this way. Agree that using a new data structure is
overengineering.

Now in the current patch all DSM segments are pinned (and therefore
dsm_pin_segment() is called). So a dictionary lives in shared memory
even if nobody have the reference to it.

I thought about periodically scanning the shared hash table and
unpinning old and unused dictionaries. But this approach needs
sequential scan facility for dshash. Happily there is the patch from
Kyotaro-san (the v16-0001-sequential-scan-for-dshash.patch part):

https://www.postgresql.org/message-id/20190221.160555.191280262.horiguchi.kyot...@lab.ntt.co.jp

Your approach looks simpler. It is necessary just to periodically scan
dictionaries' cache hash table and not call dsm_pin_segment() when a DSM
segment initialized. It also means that a dictionary is loaded into DSM
only while there is a backend which attached the dictionary's DSM.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: WAL insert delay settings

2019-02-21 Thread Stephen Frost

Greetings,

* Ants Aasma (ants.aa...@eesti.ee) wrote:
> On Thu, Feb 21, 2019 at 12:50 PM Stephen Frost  wrote:
> 
> > > Rate limit in front of WAL insertion would allow for allocating the
> > > throughput between foreground and background tasks, and even allow for
> > > priority inheritance to alleviate priority inversion due to locks.
> >
> > I'm not sure how much we have to worry about priority inversion here as
> > you need to have conflicts for that and if there's actually a conflict,
> > then it seems like we should just press on.
> >
> > That is, a non-concurrent REINDEX is going to prevent an UPDATE from
> > modifying anything in the table, which if the UPDATE is a higher
> > priority than the REINDEX would be priority inversion, but that doesn't
> > mean we should slow down the REINDEX to allow the UPDATE to happen
> > because the UPDATE simply can't happen until the REINDEX is complete.
> > Now, we might slow down the REINDEX because there's UPDATEs against
> > *other* tables that aren't conflicting and we want those UPDATEs to be
> > prioritized over the REINDEX but then that isn't priority inversion.
> 
> I was thinking along the lines that each backend gets a budget of WAL
> insertion credits per time interval, and when the credits run out the
> process sleeps. With this type of scheme it would be reasonably
> straightforward to let UPDATEs being blocked by REINDEX to transfer their
> WAL insertion budgets to the REINDEX, making it get a larger piece of the
> total throughput pie.

Sure, that could possibly be done if it's a per-backend throttle
mechanism, but that's got more-or-less the same issue as a per-command
mechanism and work_mem as discussed up-thread.

Also seems like if we've solved for a way to do this transferring and
delay and such that we could come up with a way to prioritize (or 'give
more credits') to foreground and less to background (there was another
point made elsewhere in the thread that background processes should
still be given *some* amount of credits, always, so that they don't end
up starving completely, and I agree with that).

There's actually a lot of similarity or parallels between this and basic
network traffic shaping, it seems to me anyway, where you have a pipe of
a certain size and you want to prioritize some things (like interactive
SSH) while de-prioritizing other things (bulk SCP) but also using the
pipe fully.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: shared-memory based stats collector

2019-02-21 Thread Arthur Zakirov


Hello,

On 21.02.2019 10:05, Kyotaro HORIGUCHI wrote:

Done. This verison 16 looks as if the moving and splitting were
not happen. Major changes are:

  - Restored old pgstats_* names. This largily shrinks the patch
size to less than a half lines of v15.  More than that, it
gets easier to examine differences. (checkpointer.c and
bgwriter.c have a bit stale comments but it is an issue for
later.)

  - Removed "oneshot" feature at all. This simplifies pgstat API
and let this patch far less simple.

  - Moved StatsLock to LWTRANCHE_STATS, which is not necessary to
be in the main tranche.

  - Fixed several bugs revealed by the shrinked size of the patch.


I run regression tests. Unfortunately tests didn't pass, failed test is 
'rangetypes':


rangetypes ... FAILED (test process exited with exit code 2)

It seems to me that an autovacuum process terminates because of segfault.

Segfault occurs within get_pgstat_tabentry_relid(). If I'm not mistaken, 
somehow 'dbentry' hasn't valid pointer anymore.


'dbentry' is get in the line in do_autovacuum():

dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

'dbentry' becomes invalid after calling pgstat_vacuum_stat().

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: libpq host/hostaddr/conninfo inconsistencies

2019-02-21 Thread Fabien COELHO




Hello Robert,


(2) you are not against improving the documentation, although you find it
clear enough already, but you agree that some people could get confused.

The attached patch v4 only improves the documentation so that it reflects
what the implementation really does. I think it too bad to leave out the
user-friendly aspects of the patch, though.


Robert, any chance you could opine on the doc patch, given that's your
suggested direction?


I find it to be a more change than we really need, and I'm not sure
how much it helps to clarify the issue at hand. Here is a simpler
change which seems like it might do the trick (or maybe not, let's see
what others think).


It is a minimal diff on "hostaddr" documentation which clarifies what is 
its intended use. I'm okay with it.


However, it does not discuss that an IP can (and should, IMHO) be given 
through "host" if the point is to specify the target by its IP rather than 
a lookup shortcut.


--
Fabien.

Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Amit Kapila

On Thu, Feb 21, 2019 at 6:39 PM Alvaro Herrera  wrote:
>
> On 2019-Feb-21, Amit Kapila wrote:
>
> > On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera  
> > wrote:
> > >
> > > Please remember to keep serial_schedule in sync.
> >
> > I don't understand what you mean by this?  It is already present in
> > serial_schedule.  In parallel_schedule, we are just moving this test
> > to one of the parallel groups.  Do we need to take care of something
> > else?
>
> Just to make sure it's in the same relative position.
>

Okay, thanks for the input.  Attached, find the updated patch, will
try to commit it tomorrow unless you or someone else has any other
comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v5-0001-Add-more-tests-for-FSM.patch
Description: Binary data

Re: Row Level Security − leakproof-ness and performance implications

2019-02-21 Thread Pierre Ducroquet

On Wednesday, February 20, 2019 5:24:17 PM CET Tom Lane wrote:
> Pierre Ducroquet  writes:
> > For simple functions like enum_eq/enum_ne, marking them leakproof is an
> > obvious fix (patch attached to this email, including also textin/textout).
> 
> This is not nearly as "obvious" as you think.  See prior discussions,
> notably
> https://www.postgresql.org/message-id/flat/31042.1546194242%40sss.pgh.pa.us
> 
> Up to now we've taken a very strict definition of what leakproofness
> means; as Noah stated, if a function can throw errors for some inputs,
> it's not considered leakproof, even if those inputs should never be
> encountered in practice.  Most of the things we've been willing to
> mark leakproof are straight-line code that demonstrably can't throw
> any errors at all.
> 
> The previous thread seemed to have consensus that the hazards in
> text_cmp and friends were narrow enough that nobody had a practical
> problem with marking them leakproof --- but we couldn't define an
> objective policy statement that would allow making such decisions,
> so nothing's been changed as yet.  I think it is important to have
> an articulable policy about this, not just a seat-of-the-pants
> conclusion about the risk level in a particular function.
> 
>   regards, tom lane


I undestand these decisions, but it makes RLS quite fragile, with numerous un-
documented side-effects. In order to save difficulties from future users, I 
wrote this patch to the documentation, listing the biggest restrictions I hit 
with RLS so far.

Regards

 Pierre
>From 050c9777cbe417bd53a17043b24928ba5c037250 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Thu, 21 Feb 2019 15:46:59 +0100
Subject: [PATCH] Document some of the row-level security limitations

RLS relies a lot on marking functions (and thus operators) LEAKPROOF.
The current policy in PostgreSQL is extremely strict, not allowing
functions that could, for instance, leak the input size through a
malloc() call. While strong on the security side, this policy has
side effects that are not documented currently and make RLS much
harder to implement than simply adding policies on the tables.
---
 doc/src/sgml/ref/create_policy.sgml | 30 -
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
index 2e1229c4f9..372b2935ea 100644
--- a/doc/src/sgml/ref/create_policy.sgml
+++ b/doc/src/sgml/ref/create_policy.sgml
@@ -595,7 +595,35 @@ AND
user-defined functions which might not be trustworthy.  However,
functions and operators marked by the system (or the system
administrator) as LEAKPROOF may be evaluated before
-   policy expressions, as they are assumed to be trustworthy.
+   policy expressions, as they are assumed to be trustworthy. Please note that
+   marking functions as LEAKPROOF on the default pg_catalog
+   is done very carefully, thus preventing at least the following features to 
+   work or to achieve the usually expected level of performance in default 
+   settings:
+
+   
+
+ 
+  for the following types, most operators and functions are not marked
+  as LEAKPROOF: arrays, enums, ranges
+ 
+
+
+ 
+  full-text search: operators and functions are not
+  LEAKPROOF,
+ 
+
+
+ 
+  functional indexes on non-leakproof functions are not
+  considered when executing queries and enforcing policies.
+ 
+
+   
+
+   Any query using these features on a table with a policy can not use 
+   indexes other than the ones required to enforce the policy.
   
 
   
-- 
2.20.1

RE: Timeout parameters

2019-02-21 Thread MikalaiKeida

Hello, all.

> tcp_socket_timeout (integer)
> 
> Terminate and restart any session that has been idle for more than
> the specified number of milliseconds to prevent client from infinite
> waiting for server due to dead connection. This can be used both as
> a brute force global query timeout and detecting network problems.
> A value of zero (the default) turns this off.

I am not very  familiar with the PostgreSQL source code. Nevertheless, the 
main idea of this parameter is clear for me - closing a connection when 
the PostgreSQL server does not response due to any reason. However, I have 
not found  in the discussion a reference that this parameter can be 
applied to the TCP as well as to the UNIX-domain sockets. Moreover, this 
parameter works out of communication layer. When we consider TCP 
communication, the failover is covered by keep_alive and tpc_user_timeout 
parameters.

According to it, we should not use 'tcp' prefix in this parameter name, 
'socket' sub string is not suitable too.

'timeout' is OK.

This parameter works on the client side. So the word 'client' is a good 
candidate for using in this parameter name.
This parameter affects only when we send a 'query' to the pg server.

Based on it, we can build a name for this parameter 
'client_query_timeout'.

The suggested explanation of this parameter does not follow the aim of 
integrating this parameter:

client_query_timeout

Specifies the number of seconds to prevent client from infinite waiting 
for server acknowledge to the sent query due to dead connection. This can 
be used both as a force global query timeout and network problems 
detector. A value of zero (the default) turns this off.

Best regards,
Mikalai Keida

Re: libpq host/hostaddr/conninfo inconsistencies

2019-02-21 Thread Fabien COELHO




Hello Tom,


My only complaint about this is that it makes it sound like you *must*
provide "host", even when giving "hostaddr":


That is the idea, "hostaddr" is moslty never needed.

It is the initial misleading issue I've been complaining about: one can 
specify an IP *both* in host (although it is not documented) and in 
hostaddr... and when both are provided, things started being messy: the 
information displayed was plain wrong (eg telling you that you were 
connected to an IP when you were really connected to another), and one 
could not get the actual information about the current connection out of 
libpq.


A committed patch has fixed the display (\conninfo) and connection 
(\c) issues by extending libpq and being carefull about the message 
displayed or the information used to reconnect, which were basically bug 
fixes.


About host/hostaddr, my initial submission was to force "host" as the 
target (whether name, directory, or ip which already work) and "hostaddr" 
only as a lookup shortcut, which is I think its initial intended use. This 
has been rejected.


Now I'm just trying to improve the documentation so that at least it 
reflects what is done, and implicitely advise about how to use the 
features properly even if it is not enforced: Basically, a user should 
always used "host" for all purposes because it works, and that makes one 
parameter for one purpose, and "hostaddr" is only needed for lookup 
shortcut, which is basically a very rare and specialized use case.


--
Fabien.

Re: Journal based VACUUM FULL

2019-02-21 Thread Andreas Karlsson


On 2/21/19 12:16 AM, Ryan David Sheasby wrote:
I was reading on VACUUM and VACUUM FULL and saw that the current 
implementation of VACUUM FULL writes to an entirely new file and then 
switches to it, as opposed to rewriting the current file in-place. I 
assume the reason for this is safety in the event the database shuts 
down in the middle of the vacuum, the most you will lose is the progress 
of the vacuum and have to start from scratch but otherwise the database 
will retain its integrity and not become corrupted. This seems to be an 
effective but somewhat rudimentary system that could be improved. Most 
notably, the rewriting of almost the entire database takes up basically 
double the storage during the duration of the rewrite which can become 
expensive or even simply inconvenient in IaaS(and probably other) 
installations where the drive sizes are scaled on demand. Even if the 
VACUUM FULL doesn't need to run often, having to reallocate drive space 
for an effective duplication is not ideal. My suggestion is a journal 
based in-place rewrite of the database files.


Hi,

VACUUM FULL used to modify the table in-place in PostgreSQL 8.4 and 
earlier but that solution was slow and did often cause plenty of index 
bloat while moving the rows around in the table. Which is why PostgreSQL 
9.0 switched it to rewiring the whole table and its indexes.


I have not heard many requests for bringing back the old behavior, but 
I could easily have missed them. Either way I do not think there would 
be much demand for an in-place VACUUM FULL unless the index bloat 
problem is also solved.


Additionally I do not think that the project would want a whole new kind 
of infrastructure just to solve this very narrow case. PostgreSQL 
already has its own journal (the write-ahead log) which is used to 
ensure crash safety, and I think any proposed solution for this would 
need to use the WAL.


Andreas

Re: libpq host/hostaddr/conninfo inconsistencies

Fabien COELHO  writes:
>> My only complaint about this is that it makes it sound like you *must*
>> provide "host", even when giving "hostaddr":

> That is the idea, "hostaddr" is moslty never needed.

Sure, you only need it if you want to bypass DNS lookup, but if you
do that, you don't necessarily need to give "host" as well.  Changing
the documentation to imply that you do would not be an improvement.

> It is the initial misleading issue I've been complaining about: one can 
> specify an IP *both* in host (although it is not documented) and in 
> hostaddr... and when both are provided, things started being messy

Indeed, but the verbiage being suggested here actually encourages
people to do that, by falsely implying that they have to supply
both parameters.

regards, tom lane

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-02-21 Thread Kuntal Ghosh

Hello Pavan,

Thank you for the patch. It seems to me that while performing COPY
FREEZE, if we've copied tuples in a previously emptied page, we can
set the PageSetAllVisible(page) in heap_muli_insert only. Something
like,

bool init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self))
== FirstOffsetNumber &&
 PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
if (init && (options & HEAP_INSERT_FROZEN))
 PageSetAllVisible(page);

Later, you can skip the pages for
CheckAndSetAllVisibleBulkInsertState() where PD_ALL_VISIBLE flag is
already set. Do you think it's correct?


On Thu, Feb 21, 2019 at 11:35 AM Pavan Deolasee
 wrote:
>
> Hi,
>
> Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly 
> while loading data via COPY FREEZE and had also posted a draft patch.
>
> I now have what I think is a more complete patch. I took a slightly different 
> approach and instead of setting PD_ALL_VISIBLE bit initially and then not 
> clearing it during insertion, we now recheck the page for all-frozen, 
> all-visible tuples just before switching to a new page. This allows us to 
> then also mark set the visibility map bit, like we do in vacuumlazy.c
>
> Some special treatment is required to handle the last page before bulk insert 
> it shutdown. We could have chosen not to do anything special for the last 
> page and let it remain unfrozen, but I thought it makes sense to take that 
> extra effort so that we can completely freeze the table and set all VM bits 
> at the end of COPY FREEZE.
>
> Let me know what you think.
>
> Thanks,
> Pavan
>
> [1] 
> https://www.postgresql.org/message-id/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
>
> --
>  Pavan Deolasee   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: libpq host/hostaddr/conninfo inconsistencies

On Thu, Feb 21, 2019 at 10:33 AM Tom Lane  wrote:
> Sure, you only need it if you want to bypass DNS lookup, but if you
> do that, you don't necessarily need to give "host" as well.  Changing
> the documentation to imply that you do would not be an improvement.

>From my point of view, the issue here is that the way the
documentation is written right now tends to lead people to believe
that if they have a host name, they must pass it via host, and if they
have an IP address, they must pass it via hostaddr.  Thus pgAdmin 3,
at least, had code to check whether the input looked like an IP
address and put it into one field or the other accordingly.  And
that's stupid.

My feeling is that it doesn't really matter whether people think that
both host and hostaddr are required.  What's more important is that
they understand that whatever form of host identifier they've got can
be put in host, and that hostaddr doesn't normally need to be set at
all.  That's less clear than it could be at present.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: libpq host/hostaddr/conninfo inconsistencies

Fabien COELHO  writes:
> However, it does not discuss that an IP can (and should, IMHO) be given 
> through "host" if the point is to specify the target by its IP rather than 
> a lookup shortcut.

Ah, that's the crux of the problem.  There are two ways that you could
consider to be "best practice" for use of these parameters.  The one
that is currently documented is:

1. If you want to give a host name, put it in "host".
2. If you want to give a host IP address (to skip DNS), put it in
   "hostaddr".
3. ... unless your security arrangements require specifying a host name,
   in which case provide the host IP address in "hostaddr" and
   the host name in "host".

What Fabien is basically proposing is replacing rule 2 with

2. If you want to give a host IP address (to skip DNS), put it in
   "host".

While that would perhaps be an equally good best practice if we'd
started there, it's not clear to me that it has any advantage that
would justify changing the recommendation.  In particular, the
existing rule is a lot clearer from a data-type standpoint: host
is for names, hostaddr is for IP addresses.

In any case, the existing doco never comes out and states either
rule set in so many words.  Maybe it should.

regards, tom lane

Re: libpq host/hostaddr/conninfo inconsistencies

Robert Haas  writes:
> From my point of view, the issue here is that the way the
> documentation is written right now tends to lead people to believe
> that if they have a host name, they must pass it via host, and if they
> have an IP address, they must pass it via hostaddr.  Thus pgAdmin 3,
> at least, had code to check whether the input looked like an IP
> address and put it into one field or the other accordingly.  And
> that's stupid.

True, but isn't that because we fail to document at all that you
can put an IP address in "host"?  Which your proposed patch didn't
change, IIRC.

regards, tom lane

Re: libpq host/hostaddr/conninfo inconsistencies

On Thu, Feb 21, 2019 at 10:57 AM Tom Lane  wrote:
> True, but isn't that because we fail to document at all that you
> can put an IP address in "host"?  Which your proposed patch didn't
> change, IIRC.

Well, that's another way to tackle the problem.  Personally, I see
pretty much no downside in approaching this by encouraging people to
use only 'host' in normal cases and adding 'hostaddr' as an additional
field only when necessary, so that's the approach I took.  Now you
seem to think that it's important for people to know that they could
use 'hostaddr' without specifying 'host', but I think that's a detail
that nobody really needs to know.  I'm looking for a way to give
people a clearer suggestion that they should just use 'host' and
forget the rest.  Perhaps we could get there via what you propose
here, namely documenting that 'host' can be either a name or an IP
address, but I'm worried that it won't come through clearly enough and
that people will still get confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PROPOSAL] Shared Ispell dictionaries

On Thu, Feb 21, 2019 at 8:28 AM Arthur Zakirov  wrote:
> Your approach looks simpler. It is necessary just to periodically scan
> dictionaries' cache hash table and not call dsm_pin_segment() when a DSM
> segment initialized. It also means that a dictionary is loaded into DSM
> only while there is a backend which attached the dictionary's DSM.

Right.  I think that having a central facility that tries to decide
whether or not a dictionary should be kept in shared memory or not,
e.g. based on a cache size parameter, isn't likely to work well.  The
problem is that if we make a decision that a dictionary should be
evicted because it's causing us to exceed the cache size threshold,
then we have no way to implement that decision.  We can't force other
backends to remove the mapping immediately, nor can we really bound
the time before they respond to a request to unmap it.  They might be
in the middle of using it.

So I think it's better to have each backend locally make a decision
about when that particular backend no longer needs the dictionary, and
then let the system automatically clean up the ones that are needed by
nobody.

Perhaps a better approach still would be to do what Andres proposed
back in March:

#> Is there any chance we can instead can convert dictionaries into a form
#> we can just mmap() into memory?  That'd scale a lot higher and more
#> dynamicallly?

The current approach inherently involves double-buffering: you've got
the filesystem cache containing the data read from disk, and then the
DSM containing the converted form of the data.  Having something that
you could just mmap() would avoid that, plus it would become a lot
less critical to keep the mappings around.  You could probably just
have individual queries mmap() it for as long as they need it and then
tear out the mapping when they finish executing; keeping the mappings
across queries likely wouldn't be too important in this case.

The downside is that you'd probably need to teach resowner.c about
mappings created via mmap() so that you don't leak mappings on an
abort, but that's probably not a crazy difficult problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: variadic argument support for least, greatest function

2019-02-21 Thread Pavel Stehule

Hi

čt 21. 2. 2019 v 3:20 odesílatel Chapman Flack 
napsal:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> Latest patch passes installcheck-world as of d26a810 and does what it sets
> out to do.
>
> I am not sure I have an answer to the objections being raised on grounds
> of taste. To me, it's persuasive that GREATEST and LEAST are described in
> the docco as functions, they are used much like variadic functions, and
> this patch allows them to be used in the ways you would expect variadic
> functions to be usable.
>
> But as to technical readiness, this builds and passes installcheck-world.
> The functions behave as expected (and return null if passed an empty array,
> or an array containing only nulls, or variadic null::foo[]).
>
> Still no corresponding regression tests or doc.
>
> The new status of this patch is: Waiting on Author
>

I wrote doc (just one sentence) and minimal test. Both can be enhanced.

Regards

Pavel
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 86ff4e5c9e..aa4e80cca7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12679,6 +12679,12 @@ SELECT NULLIF(value, '(none)') ...
 
 
 LEAST(value , ...)
+
+
+GREATEST(VARIADIC anyarray)
+
+
+LEAST(VARIADIC anyarray)
 
 

@@ -12697,6 +12703,11 @@ SELECT NULLIF(value, '(none)') ...
 make them return NULL if any argument is NULL, rather than only when
 all are NULL.

+
+   
+These functions supports passing parameters as a array after
+VARIADIC keyword.
+   
   
  
 
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index db3777d15e..01d7f0e02c 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1854,13 +1854,18 @@ ExecInitExprRec(Expr *node, ExprState *state,
 		case T_MinMaxExpr:
 			{
 MinMaxExpr *minmaxexpr = (MinMaxExpr *) node;
-int			nelems = list_length(minmaxexpr->args);
+int			nelems;
 TypeCacheEntry *typentry;
 FmgrInfo   *finfo;
 FunctionCallInfo fcinfo;
 ListCell   *lc;
 int			off;
 
+if (minmaxexpr->args)
+	nelems = list_length(minmaxexpr->args);
+else
+	nelems = 1;
+
 /* Look up the btree comparison function for the datatype */
 typentry = lookup_type_cache(minmaxexpr->minmaxtype,
 			 TYPECACHE_CMP_PROC);
@@ -1897,16 +1902,29 @@ ExecInitExprRec(Expr *node, ExprState *state,
 scratch.d.minmax.finfo = finfo;
 scratch.d.minmax.fcinfo_data = fcinfo;
 
-/* evaluate expressions into minmax->values/nulls */
-off = 0;
-foreach(lc, minmaxexpr->args)
+if (minmaxexpr->args)
 {
-	Expr	   *e = (Expr *) lfirst(lc);
+	scratch.d.minmax.variadic_arg = false;
 
-	ExecInitExprRec(e, state,
-	&scratch.d.minmax.values[off],
-	&scratch.d.minmax.nulls[off]);
-	off++;
+	/* evaluate expressions into minmax->values/nulls */
+	off = 0;
+	foreach(lc, minmaxexpr->args)
+	{
+		Expr	   *e = (Expr *) lfirst(lc);
+
+		ExecInitExprRec(e, state,
+		&scratch.d.minmax.values[off],
+		&scratch.d.minmax.nulls[off]);
+		off++;
+	}
+}
+else
+{
+	scratch.d.minmax.variadic_arg = true;
+
+	ExecInitExprRec(minmaxexpr->variadic_arg, state,
+		&scratch.d.minmax.values[0],
+		&scratch.d.minmax.nulls[0]);
 }
 
 /* and push the final comparison */
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index a018925d4e..748c950885 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2819,6 +2819,55 @@ ExecEvalMinMax(ExprState *state, ExprEvalStep *op)
 	/* default to null result */
 	*op->resnull = true;
 
+	if (op->d.minmax.variadic_arg)
+	{
+		ArrayIterator array_iterator;
+		ArrayType  *arr;
+		Datum	value;
+		bool	isnull;
+
+		if (nulls[0])
+			return;
+
+		arr = DatumGetArrayTypeP(values[0]);
+
+		array_iterator = array_create_iterator(arr, 0, NULL);
+		while (array_iterate(array_iterator, &value, &isnull))
+		{
+			if (isnull)
+continue;
+
+			if (*op->resnull)
+			{
+/* first nonnull input */
+*op->resvalue = value;
+*op->resnull = false;
+			}
+			else
+			{
+int			cmpresult;
+
+Assert(fcinfo->nargs == 2);
+
+/* apply comparison function */
+fcinfo->args[0].value = *op->resvalue;
+fcinfo->args[1].value = value;
+
+fcinfo->isnull = false;
+cmpresult = DatumGetInt32(FunctionCallInvoke(fcinfo));
+if (fcinfo->isnull) /* probably should not happen */
+	continue;
+
+if (cmpresult > 0 && operator == IS_LEAST)
+	*op->resvalue = value;
+else if (cmpresult < 0 && operator == IS_GREATEST)
+	*op->resvalue = value;
+			}
+		}
+
+		return;
+	}
+
 	for (off = 0; off < op->d.minmax.nelems; off++)
 	{
 		/* i

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

Hi,

On Mon, Feb 4, 2019 at 1:25 AM Tsunakawa, Takayuki
 wrote:
>
> FYI, it seems that the user sees "shrink" rather than "truncate" in the 
> documentation as below, although these are about VACUUM FULL.
>
> https://www.postgresql.org/docs/devel/sql-vacuum.html
> would like the table to physically shrink to occupy less disk space
>
> https://www.postgresql.org/docs/devel/routine-vacuuming.html
> shrink a table back to its minimum size and return the disk space to the 
> operating system,
>
>
>
> Anyway, I don't have any favor about naming this, and I hope native English 
> speakers will choose the best name.  I won't object to whatever name any 
> committer chooses.

FWIW, I prefer shrink over truncate, though I'd rather go with
vacuum_shink_enabled as suggested previously.

The patch still applies cleanly and works as intended.  About:

+ * shrink_enabled can be set at ShareUpdateExclusiveLock because it
+ * is only used during VACUUM, which uses a ShareUpdateExclusiveLock,
+ * so the VACUUM will not be affected by in-flight changes. Changing its
+ * value has no affect until the next VACUUM, so no need for stronger lock.

I'm not sure that I get this comment.  Since both require a
ShareUpdateExclusiveLock, you can't change the parameter while a
VACUUM is active on that table.  Did you wanted to use another lock
mode?

Re: Delay locking partitions during INSERT and UPDATE

On Wed, Feb 20, 2019 at 4:56 PM David Rowley
 wrote:
> I've made a pass over this again and updated the header comments in
> functions that now obtain a lock to mention that fact.

Thanks.  I have committed this version.  I know Tomas Vondra was
planning to do that, but it's been close to a month since he mentioned
those plans so I hope he will not mind me jumping in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Delay locking partitions during INSERT and UPDATE

2019-02-21 Thread Tomas Vondra

On 2/21/19 5:33 PM, Robert Haas wrote:
> On Wed, Feb 20, 2019 at 4:56 PM David Rowley
>  wrote:
>> I've made a pass over this again and updated the header comments in
>> functions that now obtain a lock to mention that fact.
> 
> Thanks.  I have committed this version.  I know Tomas Vondra was
> planning to do that, but it's been close to a month since he mentioned
> those plans so I hope he will not mind me jumping in.
> 

Fine with me. It was mostly a sign that I'd like to get that committed
eventually, trying to nudge others into looking at the patch a bit more.
Relaxing locks is tricky, and I wasn't quite sure I haven't missed
something obvious.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [patch] Add schema total size to psql \dn+

Le 21/02/2019 à 12:01, Julien Rouhaud a écrit :
> On Thu, Feb 21, 2019 at 11:49 AM Gilles Darold  
> wrote:
>>> When we want to get total size of all relation in a schema we have to
>>> execute one of our favorite DBA query. It  is quite simple but what
>>> about displaying schema size when using \dn+ in psql ?
>>> [...]
>>> The attached simple patch adds this feature. Is there any cons adding
>>> this information? The patch tries to be compatible to all PostgreSQL
>>> version. Let me know if I have missed something.
> I needed that quite often, so I'm +1 to add this!  Please register
> this patch on the next commitfest.


Added to next commitfest.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

Re: ToDo: show size of partitioned table

2019-02-21 Thread Pavel Stehule

čt 21. 2. 2019 v 0:56 odesílatel Justin Pryzby 
napsal:

> On Sat, Feb 16, 2019 at 10:52:35PM +0100, Pavel Stehule wrote:
> > I like your changes. I merged all - updated patch is attached
>
> I applied and tested your v10 patch.
>
> Find attached some light modifications.
>

I have not any objection - I cannot to evaluate. It is question for some
native speaker.

I am not sure if we use labels in form "some: detail" somewhere.

Regards

Pavel

>
> Thanks,
> Justin
>

Re: Unnecessary checks for new rows by some RI trigger functions?

Robert Haas  writes:
> On Wed, Feb 20, 2019 at 9:27 AM Antonin Houska  wrote:
>> However I find it confusing that the trigger functions pass 
>> detectNewRows=true
>> even if they only execute SELECT statement.

> I don't quite see what those two things have to do with each other,
> but I might be missing something.

Me too.  It's quite easy to demonstrate that the second part of Antonin's
proposed patch (ie, don't check for new rows in the FK table during
ri_restrict) is wrong:

regression=# create table p(f1 int primary key);
CREATE TABLE
regression=# create table f(f1 int references p on delete no action deferrable 
initially deferred);
CREATE TABLE
regression=# insert into p values (1);
INSERT 0 1
regression=# begin transaction isolation level serializable ;
BEGIN
regression=# table f;
 f1 

(0 rows)

regression=# delete from p where f1=1;
DELETE 1

-- now, in another session:

regression=# insert into f values (1);
INSERT 0 1

-- next, back in first session:

regression=# commit;
ERROR:  update or delete on table "p" violates foreign key constraint 
"f_f1_fkey" on table "f"
DETAIL:  Key (f1)=(1) is still referenced from table "f".

With the patch, the first session's commit succeeds, and we're left
with a violated FK constraint.

I'm less sure about whether the change in ri_Check_Pk_Match would be
safe.  On its face, what that is on about is the case where you have
matching rows in p and f, and a serializable transaction tries to
delete the p row, and by the time it commits some other transaction
has inserted a new p row so we could allow our deletion to succeed.
If you try that, however, you'll notice that the other transaction
can't commit its insertion because we haven't committed our deletion,
so the unique constraint on the PK side would be violated.  So maybe
there's a case for simplifying the logic there.  Or maybe there are
actually live cases for that with more complex MATCH rules than what
I tried.

But TBH I think Antonin's question is backwards: the right thing to
be questioning here is whether detectNewRows = false is *ever* OK.
I think he mischaracterized what that option really does; the comments
in ri_PerformCheck explain it this way:

 * In READ COMMITTED mode, we just need to use an up-to-date regular
 * snapshot, and we will see all rows that could be interesting. But in
 * transaction-snapshot mode, we can't change the transaction snapshot. If
 * the caller passes detectNewRows == false then it's okay to do the query
 * with the transaction snapshot; otherwise we use a current snapshot, and
 * tell the executor to error out if it finds any rows under the current
 * snapshot that wouldn't be visible per the transaction snapshot.

The question therefore is under what scenarios is it okay for an FK
constraint to be enforced (in a serializable or repeatable-read
transaction) without attention to operations already committed by
concurrent transactions?  If your gut answer isn't "damn near none",
I don't think you're being paranoid enough ;-)

The one case where we use detectNewRows == false today is in
RI_FKey_check, which we run after an insert or update on the FK
table to see if there's a matching row on the PK side.  In this
case, failing to observe a newly-added PK row won't result in a
constraint violation, just an "unnecessary" error.  I think this
choice is reasonable, since if you're running in serializable
mode you're not supposed to want to depend on transactions
committed after you start.  (Note that if somebody concurrently
*deletes* the PK row we need to match, we will notice that,
because the SELECT FOR SHARE will, and then it'll throw a
serialization error which seems like the right thing here.)

Perhaps a similar argument can be constructed to justify changing
the behavior of ri_Check_Pk_Match, but I've not thought about it
very hard.  In any case, changing ri_restrict is provably wrong.

> Unfortunately, as you probably also noticed, making the change
> you propose here doesn't make any tests fail in either the main
> regression tests or the isolation tests.

Yeah.  All the RI code predates the existence of the isolation
test machinery, so it's not so surprising that we don't have good
coverage for this type of situation.  I thought I remembered that
Peter had added some more FK tests recently, but I don't see
anything in the commit log ...

regards, tom lane

Re: ATTACH/DETACH PARTITION CONCURRENTLY

On Mon, Feb 4, 2019 at 12:54 PM Robert Haas  wrote:
> On Mon, Feb 4, 2019 at 12:02 AM David Rowley
>  wrote:
> > If the PartitionDesc from the parallel worker has an extra partition
> > than what was there when the plan was built then the partition index
> > to subplan index translation will be incorrect as the
> > find_matching_subplans_recurse() will call get_matching_partitions()
> > using the context with the PartitionDesc containing the additional
> > partition. The return value from get_matching_partitions() is fine,
> > it's just that the code inside the while ((i =
> > bms_next_member(partset, i)) >= 0) loop that will do the wrong thing.
> > It could even crash if partset has an index out of bounds of the
> > subplan_map or subpart_map arrays.
>
> Is there any chance you've missed the fact that in one of the later
> patches in the series I added code to adjust the subplan_map and
> subpart_map arrays to compensate for any extra partitions?

In case that wasn't clear enough, my point here is that while the
leader and workers could end up with different ideas about the shape
of the PartitionDesc, each would end up with a subplan_map and
subpart_map array adapted to the view of the PartitionDesc with which
they ended up, and therefore, I think, everything should work.  So far
there is, to my knowledge, no situation in which a PartitionDesc index
gets passed between one backend and another, and as long as we don't
do that, it's not really necessary for them to agree; each backend
needs to individually ignore any concurrently added partitions not
contemplated by the plan, but it doesn't matter whether backend A and
backend B agree on which partitions were concurrently added, just that
each ignores the ones it knows about.

Since time is rolling along here, I went ahead and committed 0001
which seems harmless even if somebody finds a huge problem with some
other part of this.  If anybody wants to review the approach or the
code before I proceed further, that would be great, but please speak
up soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: list append syntax for postgresql.conf

On Wed, Feb 20, 2019 at 10:15 AM Peter Eisentraut
 wrote:
> Nowadays there are a number of methods for composing multiple
> postgresql.conf files for modularity.  But if you have a bunch of things
> you want to load via shared_preload_libraries, you have to put them all
> in one setting.  How about some kind of syntax for appending something
> to a list, like
>
> shared_preload_libraries += 'pg_stat_statements'
>
> Thoughts?

I like the idea of solving this problem but I'm not sure it's a good
idea to add this sort of syntax to postgresql.conf.  I think it would
be more useful to provide a way to tell ALTER SYSTEM that you want to
append rather than replace, as I see Euler also proposes.

Another off-ball idea is to maybe allow something like
shared_preload_statements.pg_stat_statments = 1.  The server would
load all libraries X for which shared_preload_statements.X is set to a
value that is one of the ways we spell a true value for a Boolean GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

psql show URL with help

2019-02-21 Thread Peter Eisentraut

As mentioned on

https://www.cybertec-postgresql.com/en/looking-at-mysql-8-with-postgresql-goggles-on/

how about this:

=> \h analyze
Command: ANALYZE
Description: collect statistics about a database
Syntax:
ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]

where option can be one of:

VERBOSE
SKIP_LOCKED

and table_and_columns is:

table_name [ ( column_name [, ...] ) ]

URL: https://www.postgresql.org/docs/12/sql-analyze.html


(Won't actually work because the web site isn't serving "12" URLs yet,
but that's something that could probably be sorted out.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [patch] Add schema total size to psql \dn+

On Thu, Feb 21, 2019 at 5:42 PM Gilles Darold  wrote:
>
> Le 21/02/2019 à 12:01, Julien Rouhaud a écrit :
> > On Thu, Feb 21, 2019 at 11:49 AM Gilles Darold  
> > wrote:
> >>> When we want to get total size of all relation in a schema we have to
> >>> execute one of our favorite DBA query. It  is quite simple but what
> >>> about displaying schema size when using \dn+ in psql ?
> >>> [...]
> >>> The attached simple patch adds this feature. Is there any cons adding
> >>> this information? The patch tries to be compatible to all PostgreSQL
> >>> version. Let me know if I have missed something.

I have a few comments about the patch.

You're using pg_class LEFT JOIN pg_namespace while we need INNER JOIN
here AFAICT.  Also, you're using pg_relation_size(), so fsm, vm won't
be accounted for.  You should also be bypassing the size for 8.0-
servers where there's no pg_*_size() functions.

Re: list append syntax for postgresql.conf

Robert Haas  writes:
> I like the idea of solving this problem but I'm not sure it's a good
> idea to add this sort of syntax to postgresql.conf.  I think it would
> be more useful to provide a way to tell ALTER SYSTEM that you want to
> append rather than replace, as I see Euler also proposes.

> Another off-ball idea is to maybe allow something like
> shared_preload_statements.pg_stat_statments = 1.  The server would
> load all libraries X for which shared_preload_statements.X is set to a
> value that is one of the ways we spell a true value for a Boolean GUC.

Either one of these would avoid the problem of having more than one
place-of-definition of a GUC value, so I find them attractive.
The second one seems a tad more modular, perhaps, and it's also
easier to figure out how to undo a change.

A possible objection to the second one is that it binds us forevermore
to allowing setting of GUCs that weren't previously known, which is
something I don't much care for.  But it can probably be constrained
enough to remove the formal indeterminacy: we can insist that all
GUCs named "shared_preload_modules.something" will be booleans, and
then we'll throw error if any of them don't correspond to a known
loadable library.

regards, tom lane

Re: psql show URL with help

2019-02-21 Thread Pavel Stehule

čt 21. 2. 2019 v 18:28 odesílatel Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> napsal:

> As mentioned on
>
>
> https://www.cybertec-postgresql.com/en/looking-at-mysql-8-with-postgresql-goggles-on/
>
> how about this:
>
> => \h analyze
> Command: ANALYZE
> Description: collect statistics about a database
> Syntax:
> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
> ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]
>
> where option can be one of:
>
> VERBOSE
> SKIP_LOCKED
>
> and table_and_columns is:
>
> table_name [ ( column_name [, ...] ) ]
>
> URL: https://www.postgresql.org/docs/12/sql-analyze.html
> 
>
> (Won't actually work because the web site isn't serving "12" URLs yet,
> but that's something that could probably be sorted out.)
>

Why not? It can be useful

Pavel


> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>

Re: psql show URL with help

On Thu, Feb 21, 2019 at 6:33 PM Pavel Stehule  wrote:
>
> čt 21. 2. 2019 v 18:28 odesílatel Peter Eisentraut 
>  napsal:
>>
>> As mentioned on
>>
>> https://www.cybertec-postgresql.com/en/looking-at-mysql-8-with-postgresql-goggles-on/
>>
>> how about this:
>>
>> => \h analyze
>> Command: ANALYZE
>> Description: collect statistics about a database
>> Syntax:
>> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
>> ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]
>>
>> where option can be one of:
>>
>> VERBOSE
>> SKIP_LOCKED
>>
>> and table_and_columns is:
>>
>> table_name [ ( column_name [, ...] ) ]
>>
>> URL: https://www.postgresql.org/docs/12/sql-analyze.html
>> 
>>
>> (Won't actually work because the web site isn't serving "12" URLs yet,
>> but that's something that could probably be sorted out.)
>
>
> Why not? It can be useful

Or just use the devel version for not-released-yet versions?

Re: [PROPOSAL] Shared Ispell dictionaries

Robert Haas  writes:
> Perhaps a better approach still would be to do what Andres proposed
> back in March:

> #> Is there any chance we can instead can convert dictionaries into a form
> #> we can just mmap() into memory?  That'd scale a lot higher and more
> #> dynamicallly?

That seems awfully attractive.  I was about to question whether we could
assume that mmap() works everywhere, but it's required by SUSv2 ... and
if anybody has anything sufficiently lame for it not to work, we could
fall back on malloc-a-hunk-of-memory-and-read-in-the-file.

We'd need a bunch of work to design a position-independent binary
representation for dictionaries, and then some tool to produce disk files
containing that, so this isn't exactly a quick route to a solution.
On the other hand, it isn't sounding like the current patch is getting
close to committable either.

(Actually, I guess you need a PI representation of a dictionary to
put it in a DSM either, so presumably that part of the work is
done already; although we might also wish for architecture independence
of the disk files, which we probably don't have right now.)

regards, tom lane

Re: Using old master as new replica after clean switchover

2019-02-21 Thread RSR999GMAILCOM

Is there any link where  the required setup and the step by step procedure
for performing the controlled switchover are listed?

Thanks
Raj

On Tue, Feb 19, 2019 at 4:44 PM Michael Paquier  wrote:

> On Tue, Feb 19, 2019 at 04:27:02PM -0800, RSR999GMAILCOM wrote:
> > So  wanted to clarify if this procedure really requires the WAL archive
> > location on a shared storage ?
>
> Shared storage for WAL archives is not a requirement.  It is perfectly
> possible to use streaming replication to get correct WAL changes.
> Using an archive is recommended for some deployments and depending on
> your requirements and data retention policy, still you could have
> those archives on a different host and have the restore_command of the
> standbyt in recovery or the archive_command of the primary save the
> segments to it.  Depending on the frequency new WAL segments are
> generated, this depends of course.
> --
> Michael
>

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-21 Thread Andres Freund




On February 21, 2019 10:08:00 AM PST, Tom Lane  wrote:
>Robert Haas  writes:
>> Perhaps a better approach still would be to do what Andres proposed
>> back in March:
>
>> #> Is there any chance we can instead can convert dictionaries into a
>form
>> #> we can just mmap() into memory?  That'd scale a lot higher and
>more
>> #> dynamicallly?
>
>That seems awfully attractive.  I was about to question whether we
>could
>assume that mmap() works everywhere, but it's required by SUSv2 ... and
>if anybody has anything sufficiently lame for it not to work, we could
>fall back on malloc-a-hunk-of-memory-and-read-in-the-file.
>
>We'd need a bunch of work to design a position-independent binary
>representation for dictionaries, and then some tool to produce disk
>files
>containing that, so this isn't exactly a quick route to a solution.
>On the other hand, it isn't sounding like the current patch is getting
>close to committable either.
>
>(Actually, I guess you need a PI representation of a dictionary to
>put it in a DSM either, so presumably that part of the work is
>done already; although we might also wish for architecture independence
>of the disk files, which we probably don't have right now.)

That's what I was pushing for ages ago...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Using old master as new replica after clean switchover

2019-02-21 Thread Claudio Freire

On Tue, Feb 19, 2019 at 9:44 PM Michael Paquier  wrote:
>
> On Tue, Feb 19, 2019 at 04:27:02PM -0800, RSR999GMAILCOM wrote:
> > So  wanted to clarify if this procedure really requires the WAL archive
> > location on a shared storage ?
>
> Shared storage for WAL archives is not a requirement.  It is perfectly
> possible to use streaming replication to get correct WAL changes.
> Using an archive is recommended for some deployments and depending on
> your requirements and data retention policy, still you could have
> those archives on a different host and have the restore_command of the
> standbyt in recovery or the archive_command of the primary save the
> segments to it.  Depending on the frequency new WAL segments are
> generated, this depends of course.

If I'm not mistaken, if you don't have WAL archive set up (a shared
filesystem isn't necessary, but the standby has to be able to restore
WAL segments from the archive), a few transactions that haven't been
streamed at primary shutdown could be lost, since the secondary won't
be able to stream anything after the primary has shut down. WAL
archive can always be restored even without a primary running, hence
why a WAL archive is needed.

Or am I missing something?

Re: Journal based VACUUM FULL

2019-02-21 Thread Ryan David Sheasby

Thanks for getting back to me. I had a small discussion with @sfrost on the
slack team and understand the issue better now. I must admit I didn't
realize that the scope of WAL extended to VACUUM operations which is why I
suggested a new journaling system. I realize now the issue is not safety(as
the WAL already sorts out that issue), but performance. I will rethink my
suggestion and let you know if I come up with a useful/performant way of
doing this.


*ThanksRyan Sheasby*


On Thu, Feb 21, 2019 at 5:27 PM Andreas Karlsson  wrote:

> On 2/21/19 12:16 AM, Ryan David Sheasby wrote:
> > I was reading on VACUUM and VACUUM FULL and saw that the current
> > implementation of VACUUM FULL writes to an entirely new file and then
> > switches to it, as opposed to rewriting the current file in-place. I
> > assume the reason for this is safety in the event the database shuts
> > down in the middle of the vacuum, the most you will lose is the progress
> > of the vacuum and have to start from scratch but otherwise the database
> > will retain its integrity and not become corrupted. This seems to be an
> > effective but somewhat rudimentary system that could be improved. Most
> > notably, the rewriting of almost the entire database takes up basically
> > double the storage during the duration of the rewrite which can become
> > expensive or even simply inconvenient in IaaS(and probably other)
> > installations where the drive sizes are scaled on demand. Even if the
> > VACUUM FULL doesn't need to run often, having to reallocate drive space
> > for an effective duplication is not ideal. My suggestion is a journal
> > based in-place rewrite of the database files.
>
> Hi,
>
> VACUUM FULL used to modify the table in-place in PostgreSQL 8.4 and
> earlier but that solution was slow and did often cause plenty of index
> bloat while moving the rows around in the table. Which is why PostgreSQL
> 9.0 switched it to rewiring the whole table and its indexes.
>
> I have not heard many requests for bringing back the old behavior, but
> I could easily have missed them. Either way I do not think there would
> be much demand for an in-place VACUUM FULL unless the index bloat
> problem is also solved.
>
> Additionally I do not think that the project would want a whole new kind
> of infrastructure just to solve this very narrow case. PostgreSQL
> already has its own journal (the write-ahead log) which is used to
> ensure crash safety, and I think any proposed solution for this would
> need to use the WAL.
>
> Andreas
>

Re: Compressed TOAST Slicing

2019-02-21 Thread Paul Ramsey

On Wed, Feb 20, 2019 at 1:12 PM Stephen Frost  wrote:
>
> * Paul Ramsey (pram...@cleverelephant.ca) wrote:
> > On Wed, Feb 20, 2019 at 10:50 AM Daniel Verite  
> > wrote:
> > >
> > > What about starts_with(string, prefix)?
> > Thanks, I'll add that.
>
> That sounds good to me, I look forward to an updated patch.
> once the patch has been updated to catch
> this case, I'll review it (again) with an eye on committing it soon.

Merci! Attached are updated patches.


compressed-datum-slicing-20190221a.patch
Description: Binary data


compressed-datum-slicing-left-20190221a.patch
Description: Binary data

Re: restrict pg_stat_ssl to superuser?

2019-02-21 Thread Peter Eisentraut

On 2019-02-21 09:11, Michael Paquier wrote:
> On Wed, Feb 20, 2019 at 11:51:08AM +0100, Peter Eisentraut wrote:
>> So here is a patch doing it the "normal" way of nulling out all the rows
>> the user shouldn't see.
> 
> That looks fine to me.

Committed, thanks.

>> I haven't found any documentation of these access restrictions in the
>> context of pg_stat_activity.  Is there something that I'm not seeing or
>> something that should be added?
> 
> Yes, there is nothing.  I agree that it would be good to mention that
> some fields are set to NULL and only visible to superusers or members of
> pg_read_all_stats with a note for pg_stat_activity and
> pg_stat_get_activity().  Here is an idea:
> "Backend and SSL information are restricted to superusers and members
> of the pg_read_all_stats role. Access may be
> granted to others using GRANT.
> 
> Getting that back-patched would be nice where pg_read_all_stats was
> introduced.

I added something.  I don't know if it's worth backpatching.  This
situation goes back all the way to when pg_stat_activity was added.
pg_read_all_stats does have documentation, it's just not linked from
everywhere.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [patch] Add schema total size to psql \dn+

Le 21/02/2019 à 18:28, Julien Rouhaud a écrit :
> On Thu, Feb 21, 2019 at 5:42 PM Gilles Darold  
> wrote:
>> Le 21/02/2019 à 12:01, Julien Rouhaud a écrit :
>>> On Thu, Feb 21, 2019 at 11:49 AM Gilles Darold  
>>> wrote:
> When we want to get total size of all relation in a schema we have to
> execute one of our favorite DBA query. It  is quite simple but what
> about displaying schema size when using \dn+ in psql ?
> [...]
> The attached simple patch adds this feature. Is there any cons adding
> this information? The patch tries to be compatible to all PostgreSQL
> version. Let me know if I have missed something.
> I have a few comments about the patch.
>
> You're using pg_class LEFT JOIN pg_namespace while we need INNER JOIN
> here AFAICT.  Also, you're using pg_relation_size(), so fsm, vm won't
> be accounted for.  You should also be bypassing the size for 8.0-
> servers where there's no pg_*_size() functions.


I agree all points. Attached is a new version of the patch that use
pg_total_relation_size() and a filter on relkind IN ('r','m','S'), JOIN
fixes and no size report before 8.1.


Thanks for the review.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 4da6719ce7..2fed622889 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4188,6 +4188,26 @@ listSchemas(const char *pattern, bool verbose, bool showSystem)
 	{
 		appendPQExpBufferStr(&buf, ",\n  ");
 		printACLColumn(&buf, "n.nspacl");
+		if (pset.sversion >= 80100)
+		{
+			/* As of PostgreSQL 9.3, use LATERAL JOIN */
+			if (pset.sversion >= 93000)
+			{
+appendPQExpBuffer(&buf, ",\n  schema_size AS \"%s\"",
+			  gettext_noop("Size"));
+			}
+			else
+			{
+appendPQExpBuffer(&buf,
+			  ",\n  ((SELECT pg_catalog.pg_size_pretty((sum(pg_catalog.pg_total_relation_size(c.oid)))::bigint) FROM pg_catalog.pg_class c\nJOIN pg_catalog.pg_namespace s ON s.oid = c.relnamespace WHERE s.nspname = n.nspname AND c.relkind IN (%s,%s,%s))) AS \"%s\"",
+			  gettext_noop("Size"),
+			  CppAsString2(RELKIND_RELATION),
+			  CppAsString2(RELKIND_MATVIEW),
+			  CppAsString2(RELKIND_SEQUENCE)
+			  );
+			}
+		}
+
 		appendPQExpBuffer(&buf,
 		  ",\n  pg_catalog.obj_description(n.oid, 'pg_namespace') AS \"%s\"",
 		  gettext_noop("Description"));
@@ -4195,6 +4215,16 @@ listSchemas(const char *pattern, bool verbose, bool showSystem)
 
 	appendPQExpBuffer(&buf,
 	  "\nFROM pg_catalog.pg_namespace n\n");
+	/* As of PostgreSQL 9.3, use LATERAL JOIN */
+	if (pset.sversion >= 93000)
+	{
+		appendPQExpBuffer(&buf,
+		  "  JOIN LATERAL (\nSELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_total_relation_size(c.oid))::bigint) As \"schema_size\"\nFROM pg_catalog.pg_class c\n  JOIN pg_catalog.pg_namespace s ON s.oid = c.relnamespace\nWHERE s.nspname = n.nspname AND c.relkind IN (%s,%s,%s)\n  ) l ON true\n",
+			  CppAsString2(RELKIND_RELATION),
+			  CppAsString2(RELKIND_MATVIEW),
+			  CppAsString2(RELKIND_SEQUENCE)
+		  );
+	}
 
 	if (!showSystem && !pattern)
 		appendPQExpBufferStr(&buf,

Re: Delay locking partitions during INSERT and UPDATE

On Fri, 22 Feb 2019 at 05:33, Robert Haas  wrote:
>
> On Wed, Feb 20, 2019 at 4:56 PM David Rowley
>  wrote:
> > I've made a pass over this again and updated the header comments in
> > functions that now obtain a lock to mention that fact.
>
> Thanks.  I have committed this version.  I know Tomas Vondra was
> planning to do that, but it's been close to a month since he mentioned
> those plans so I hope he will not mind me jumping in.

Great. Thanks for doing that, and thanks to everyone who reviewed it too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] WIP: Aggregation push-down

Antonin Houska  writes:
> Michael Paquier  wrote:
>> Latest patch set fails to apply, so moved to next CF, waiting on
>> author.

> Rebased.

This is in need of rebasing again :-(.  I went ahead and pushed the 001
part, since that seemed fairly uncontroversial.  (Note that I changed
estimate_hashagg_tablesize's result type to double on the way; you
probably want to make corresponding adjustments in your patch.)

I did not spend a whole lot of time looking at the patch today, but
I'm still pretty distressed at the data structures you've chosen.
I remain of the opinion that a grouped relation and a base relation
are, er, unrelated, even if they happen to share the same relid set.
So I do not see the value of the RelOptInfoSet struct you propose here,
and I definitely don't think there's any value in having, eg,
create_seqscan_path or create_index_path dealing with this stuff.

I also don't like changing create_nestloop_path et al to take a PathTarget
rather than using the RelOptInfo's pathtarget; IMO, it's flat out wrong
for a path to generate a tlist different from what its parent RelOptInfo
says that the relation produces.

I think basically the way this ought to work is that we generate baserel
paths pretty much the same as today, and then we run through the baserels
and see which ones are potentially worth doing partial aggregation on,
and for each one that is, we create a separate "upper relation" RelOptInfo
describing that.  The paths for this RelOptInfo would be
partial-aggregation paths using paths from the corresponding baserel as
input.  Then we'd run a join search that only considers joining grouped
rels with plain rels (I concur that joining two grouped rels is not worth
coping with, at least for now).

regards, tom lane

Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Alvaro Herrera

I think this test is going to break on nonstandard block sizes.  While
we don't promise that all tests work on such installs (particularly
planner ones), it seems fairly easy to cope with this one -- just use a
record size expressed as a fraction of current_setting('block_size').
So instead of "1024" you'd write current_setting('block_size') / 8.
And then display the relation size in terms of pages, not bytes, so
divide pg_relation_size by block size.

(I see that there was already a motion for this, but was dismissed
because of lack of interest.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: NOT IN subquery optimization

On Thu, 21 Feb 2019 at 16:27, Jim Finnerty  wrote:
> We can always correctly transform a NOT IN to a correlated NOT EXISTS.  In
> almost all cases it is more efficient to do so.  In the one case that we've
> found that is slower it does come down to a more general costing issue, so
> that's probably the right way to think about it.

I worked on this over 4 years ago [1]. I think the patch there is not
completely broken and seems just to need a few things fixed. I rebased
it on top of current master and looked at it. I think the main
remaining issue is fixing the code that ensures the outer side join
quals can't be NULL.  The code that's there looks broken still since
it attempts to use quals from any inner joined rel for proofs that
NULLs will be removed.  That might not work so well in a case like:
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b NOT IN(select b
from t3),  however, I'd need to think harder about that since if there
was such a qual then the planner should convert the left join into an
inner join. But anyway, the function expressions_are_not_nullable()
was more intended to work with targetlists to ensure exprs there can't
be NULL. I just had done a poor job of trying to modify that into
allowing it to take exprs from any random place, likely that should be
a new function and expressions_are_not_nullable() should be put back
to what Tom ended up with.

I've attached the rebased and still broken version.

[1] 
https://www.postgresql.org/message-id/CAApHDvqRB-iFBy68%3DdCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

not_in_anti_join_v1.0.patch
Description: Binary data

Re: [patch] Add schema total size to psql \dn+

2019-02-21 Thread Jerry Sievers

Gilles Darold  writes:

> Le 21/02/2019 à 18:28, Julien Rouhaud a écrit :
>
>> On Thu, Feb 21, 2019 at 5:42 PM Gilles Darold  
>> wrote:
>>> Le 21/02/2019 à 12:01, Julien Rouhaud a écrit :
 On Thu, Feb 21, 2019 at 11:49 AM Gilles Darold  
 wrote:
>> When we want to get total size of all relation in a schema we have to
>> execute one of our favorite DBA query. It  is quite simple but what
>> about displaying schema size when using \dn+ in psql ?
>> [...]
>> The attached simple patch adds this feature. Is there any cons adding
>> this information? The patch tries to be compatible to all PostgreSQL
>> version. Let me know if I have missed something.
>> I have a few comments about the patch.
>>
>> You're using pg_class LEFT JOIN pg_namespace while we need INNER JOIN
>> here AFAICT.  Also, you're using pg_relation_size(), so fsm, vm won't
>> be accounted for.  You should also be bypassing the size for 8.0-
>> servers where there's no pg_*_size() functions.
>
>
> I agree all points. Attached is a new version of the patch that use
> pg_total_relation_size() and a filter on relkind IN ('r','m','S'), JOIN
> fixes and no size report before 8.1.

Beware that those pg_relation_size() functions are going to block in
cases where existing objects are (for example) in transactionss such
as...

begin;
truncate foo;
big-nasty-reporting-jobs...;

Thus a bare-metal tallying of pg_class.relpages for heap/index/toast,
along with missing the FSM/VM size could be $preferred.

And/or at least mentioning this caveat in the related manual section :-)

FWIW



>
>
> Thanks for the review.

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net

Re: ATTACH/DETACH PARTITION CONCURRENTLY

On Tue, 5 Feb 2019 at 01:54, Robert Haas  wrote:
>
> On Mon, Feb 4, 2019 at 12:02 AM David Rowley
>  wrote:
> > If the PartitionDesc from the parallel worker has an extra partition
> > than what was there when the plan was built then the partition index
> > to subplan index translation will be incorrect as the
> > find_matching_subplans_recurse() will call get_matching_partitions()
> > using the context with the PartitionDesc containing the additional
> > partition. The return value from get_matching_partitions() is fine,
> > it's just that the code inside the while ((i =
> > bms_next_member(partset, i)) >= 0) loop that will do the wrong thing.
> > It could even crash if partset has an index out of bounds of the
> > subplan_map or subpart_map arrays.
>
> Is there any chance you've missed the fact that in one of the later
> patches in the series I added code to adjust the subplan_map and
> subpart_map arrays to compensate for any extra partitions?

I admit that I hadn't looked at the patch, I was just going on what I
had read here.   I wasn't sure how the re-map would have been done as
some of the information is unavailable during execution, but I see now
that you're modified it so we send a list of Oids that we expect and
remap based on if an unexpected Oid is found.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: variadic argument support for least, greatest function

Pavel Stehule  writes:
> čt 21. 2. 2019 v 3:20 odesílatel Chapman Flack 
> napsal:
>> I am not sure I have an answer to the objections being raised on grounds
>> of taste. To me, it's persuasive that GREATEST and LEAST are described in
>> the docco as functions, they are used much like variadic functions, and
>> this patch allows them to be used in the ways you would expect variadic
>> functions to be usable.

> I wrote doc (just one sentence) and minimal test. Both can be enhanced.

I remain of the opinion that this patch is a mess.

I don't share Pavel's opinion that this is a clean user API, though
I'll grant that others might have different opinions on that.
I could hold my nose and overlook that if it led to a clean internal
implementation.  But it doesn't: this patch just bolts a huge,
undocumented wart onto the side of MinMaxExpr.  (The arguments are
in the args field, except when they aren't?  And everyplace that
deals with MinMaxExpr needs to know that, as well as the fact that
the semantics are totally different?  Ick.)

An example of the lack of care here is that the change in struct
ExprEvalStep breaks that struct's size constraint:

 * Inline data for the operation.  Inline data is faster to access, but
 * also bloats the size of all instructions.  The union should be kept to
 * no more than 40 bytes on 64-bit systems (so that the entire struct is
 * no more than 64 bytes, a single cacheline on common systems).

Andres is going to be quite displeased if that gets committed ;-).

I still say we should reject this and invent array_greatest/array_least
functions instead.

regards, tom lane

Re: proposal: variadic argument support for least, greatest function

2019-02-21 Thread David Fetter

On Thu, Feb 21, 2019 at 04:04:41PM -0500, Tom Lane wrote:
> I still say we should reject this and invent array_greatest/array_least
> functions instead.

Might other array_* functions of this type be in scope for this patch?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: proposal: variadic argument support for least, greatest function

David Fetter  writes:
> On Thu, Feb 21, 2019 at 04:04:41PM -0500, Tom Lane wrote:
>> I still say we should reject this and invent array_greatest/array_least
>> functions instead.

> Might other array_* functions of this type be in scope for this patch?

Uh ... no, I wouldn't expect that.  Why would we insist on more
functionality than is there now?  (I'm only arguing about how we
present the functionality, not what it does.)

regards, tom lane

Re: [patch] Add schema total size to psql \dn+

Le 21/02/2019 à 21:57, Jerry Sievers a écrit :
> Gilles Darold  writes:
>
>> Le 21/02/2019 à 18:28, Julien Rouhaud a écrit :
>>
>>> On Thu, Feb 21, 2019 at 5:42 PM Gilles Darold  
>>> wrote:
 Le 21/02/2019 à 12:01, Julien Rouhaud a écrit :
> On Thu, Feb 21, 2019 at 11:49 AM Gilles Darold  
> wrote:
>>> When we want to get total size of all relation in a schema we have to
>>> execute one of our favorite DBA query. It  is quite simple but what
>>> about displaying schema size when using \dn+ in psql ?
>>> [...]
>>> The attached simple patch adds this feature. Is there any cons adding
>>> this information? The patch tries to be compatible to all PostgreSQL
>>> version. Let me know if I have missed something.
>>> I have a few comments about the patch.
>>>
>>> You're using pg_class LEFT JOIN pg_namespace while we need INNER JOIN
>>> here AFAICT.  Also, you're using pg_relation_size(), so fsm, vm won't
>>> be accounted for.  You should also be bypassing the size for 8.0-
>>> servers where there's no pg_*_size() functions.
>>
>> I agree all points. Attached is a new version of the patch that use
>> pg_total_relation_size() and a filter on relkind IN ('r','m','S'), JOIN
>> fixes and no size report before 8.1.
> Beware that those pg_relation_size() functions are going to block in
> cases where existing objects are (for example) in transactionss such
> as...
>
> begin;
> truncate foo;
> big-nasty-reporting-jobs...;
>
> Thus a bare-metal tallying of pg_class.relpages for heap/index/toast,
> along with missing the FSM/VM size could be $preferred.
>
> And/or at least mentioning this caveat in the related manual section :-)


It's true but we already have this caveats with \d+ or \dt+. They are
interactive commands so they can be canceled if they takes too long time.


I've attached the v4 of the patch that adds psql documentation update
for the \dn command to add on-disk report in verbose mode. Thanks for
the reminder :-)


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d7539ae743..4234fed01f 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1594,7 +1594,8 @@ testdb=>
 By default, only user-created objects are shown; supply a
 pattern or the S modifier to include system objects.
 If + is appended to the command name, each object
-is listed with its associated permissions and description, if any.
+is listed with its associated permissions, on-disk size and description,
+if any.
 
 
   
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 4da6719ce7..2fed622889 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4188,6 +4188,26 @@ listSchemas(const char *pattern, bool verbose, bool showSystem)
 	{
 		appendPQExpBufferStr(&buf, ",\n  ");
 		printACLColumn(&buf, "n.nspacl");
+		if (pset.sversion >= 80100)
+		{
+			/* As of PostgreSQL 9.3, use LATERAL JOIN */
+			if (pset.sversion >= 93000)
+			{
+appendPQExpBuffer(&buf, ",\n  schema_size AS \"%s\"",
+			  gettext_noop("Size"));
+			}
+			else
+			{
+appendPQExpBuffer(&buf,
+			  ",\n  ((SELECT pg_catalog.pg_size_pretty((sum(pg_catalog.pg_total_relation_size(c.oid)))::bigint) FROM pg_catalog.pg_class c\nJOIN pg_catalog.pg_namespace s ON s.oid = c.relnamespace WHERE s.nspname = n.nspname AND c.relkind IN (%s,%s,%s))) AS \"%s\"",
+			  gettext_noop("Size"),
+			  CppAsString2(RELKIND_RELATION),
+			  CppAsString2(RELKIND_MATVIEW),
+			  CppAsString2(RELKIND_SEQUENCE)
+			  );
+			}
+		}
+
 		appendPQExpBuffer(&buf,
 		  ",\n  pg_catalog.obj_description(n.oid, 'pg_namespace') AS \"%s\"",
 		  gettext_noop("Description"));
@@ -4195,6 +4215,16 @@ listSchemas(const char *pattern, bool verbose, bool showSystem)
 
 	appendPQExpBuffer(&buf,
 	  "\nFROM pg_catalog.pg_namespace n\n");
+	/* As of PostgreSQL 9.3, use LATERAL JOIN */
+	if (pset.sversion >= 93000)
+	{
+		appendPQExpBuffer(&buf,
+		  "  JOIN LATERAL (\nSELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_total_relation_size(c.oid))::bigint) As \"schema_size\"\nFROM pg_catalog.pg_class c\n  JOIN pg_catalog.pg_namespace s ON s.oid = c.relnamespace\nWHERE s.nspname = n.nspname AND c.relkind IN (%s,%s,%s)\n  ) l ON true\n",
+			  CppAsString2(RELKIND_RELATION),
+			  CppAsString2(RELKIND_MATVIEW),
+			  CppAsString2(RELKIND_SEQUENCE)
+		  );
+	}
 
 	if (!showSystem && !pattern)
 		appendPQExpBufferStr(&buf,

Re: [PATCH v20] GSSAPI encryption support

2019-02-21 Thread Robbie Harwood

Stephen Frost  writes:

> * Robbie Harwood (rharw...@redhat.com) wrote:
>> Stephen Frost  writes:
>>> * Robbie Harwood (rharw...@redhat.com) wrote:
 Stephen Frost  writes:
> * Robbie Harwood (rharw...@redhat.com) wrote:
>
>> Attached please find version 20 of the GSSAPI encryption support.
>> This has been rebased onto master (thanks Stephen for calling out
>> ab69ea9).
>
> I've looked over this again and have been playing with it
> off-and-on for a while now.  One thing I was actually looking at
> implementing before we push to commit this was to have similar
> information to what SSL provides on connection, specifically what
> printSSLInfo() prints by way of cipher, bits, etc for the
> client-side and then something like pg_stat_ssl for the server
> side.
>
> I went ahead and added a printGSSENCInfo(), and then
> PQgetgssSASLSSF(), which calls down into
> gss_inquire_sec_context_by_oid() for GSS_C_SEC_CONTEXT_SASL_SSF to
> get the bits (which also required including gssapi_ext.h), and now
> have psql producing:
>
> I don't think these things are absolutely required to push the
> patch forward, just to be clear, but it's helping my confidence
> level, at least, and would make it closer to on-par with our
> current SSL implementation.  They also seem, well, reasonably
> straight-forward to add and quite useful.

 Auditability is definitely reasonable.  I think there are a couple
 additional wrinkles discussed above, but we can probably get them
 sorted.
>>>
>>> Fantastic.  Do you think you'll have time to work through some of
>>> the above with me over the next couple of weeks?  I was just
>>> starting to look into adding things into the beentry to hold
>>> information on the server side.
>> 
>> Sure!  I'll go ahead and hack up the checks and lucid stuff and get
>> back to you.
>
> Great!  I'll finish fleshing out the basics of how to have this work
> server-side and once the checks and lucid stuff are in on the psql
> side, it should be pretty straight-forward to copy that same
> information into beentry alongside the SSL info, and expose it through
> pg_stat_get_activity() into a pg_stat_gss view.

When the mech is gss_mech_krb5 under MIT krb5:

psql (12devel)
GSSAPI encrypted connection (krb5 using aes256-cts-hmac-sha384-192)
Type "help" for help.

And the same under Heimdal:

psql (12devel)
GSSAPI encrypted connection (krb5 using aes256-cts-hmac-sha1-96)
Type "help" for help

If the mech weren't gss_krb5, or Lucid introspection failed, but we're a
SASL-aware mech (and using MIT):

psql (12devel)
GSSAPI encrypted connection (~256 bits)
Type "help" for help.

The third case (no lucid, no SASL SSF):

psql (12devel)
GSSAPI encrypted connection (unknown mechanism)
Type "help" for help.

Feel free to tweak these strings as needed.

I've also adjusted the layering somewhat and moved the actual printf()
call down into fe-secure-gssapi.c  I don't know whether this model makes
more sense in the long run, but for PoC code it was marginally easier to
reason about.

Another thing I've been thinking about here is whether the SASL SSF
logic is useful.  It'll only get invoked when the mechanism isn't
gss_mech_krb5, and only under MIT.  SPNEGO (krb5), NTLM (gss-ntlmssp),
IAKERB (krb5), and EAP (moonshot) all don't support
GSS_C_SEC_CONTEXT_SASL_SSF; I actually couldn't come up with another
mechanism that does.  I defer to you on whether to remove that, though.

I did some testing with Heimdal since we're using some extensions here,
and found out that the current build machinery doesn't support Heimdal.
(The configure.in detection logic only seems to work for MIT and
Solaris.)  However, it's possible to explicitly pass in CFLAGS/LDFLAGS
and it worked prior to my changes, so I've preserved that behavior.

Finally, as this patch touches configure.in, configure needs to be
regenerated; I'm not sure what project policy is on when that happens
(and it produces rather a lot of churn on my machine).

Patch attached after the break; apply on top of -20.

Thanks,
--Robbie

signature.asc
Description: PGP signature
>From 0f32efdb9b201a3b2d25d3fc41c9758790c84b93 Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Wed, 20 Feb 2019 17:23:06 -0500
Subject: [PATCH] Log encryption strength in libpq GSSAPI connections

---
 configure.in|  20 
 src/bin/psql/command.c  |   2 +
 src/include/pg_config.h.in  |   6 ++
 src/interfaces/libpq/exports.txt|   1 +
 src/interfaces/libpq/fe-secure-gssapi.c | 127 
 src/interfaces/libpq/fe-secure.c|  11 ++
 src/interfaces/libpq/libpq-fe.h |   3 +
 7 files changed, 170 insertions(+)

diff --git a/configure.in b/configure.in
index 89a0fb2470..be011778f3 100644
--- a/configure.in
+++ b/configure.in
@@ -1191,6 +1191,26 @@ if test "$

Re: Protect syscache from bloating with negative cache entries

2019-02-21 Thread 'Bruce Momjian'

On Tue, Feb 19, 2019 at 07:08:14AM +, Tsunakawa, Takayuki wrote:
> We all have to manage things within resource constraints.  The DBA
> wants to make sure the server doesn't overuse memory to avoid crash
> or slowdown due to swapping.  Oracle does it, and another open source
> database, MySQL, does it too.  PostgreSQL does it with shared_buffers,
> wal_buffers, and work_mem (within a single session).  Then, I thought
> it's natural to do it with catcache/relcache/plancache.

I already addressed these questions in an email from Feb 14:

https://www.postgresql.org/message-id/20190214154955.gb19...@momjian.us

I understand the operational needs of limiting resources in some cases,
but there is also the history of OS's using working set to allocate
things, which didn't work too well:

https://en.wikipedia.org/wiki/Working_set

I think we need to address the most pressing problem of unlimited cache size
bloat and then take a holistic look at all memory allocation.  If we
are going to address that in a global way, I don't see the relation
cache as the place to start.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Vectors instead of lists, specialised qsort(), binary_search(), unique()

2019-02-21 Thread Thomas Munro

On Thu, Feb 21, 2019 at 7:27 PM David Rowley
 wrote:
> As for the ArrayList patch that I'd been working on, I was a bit
> blocked on it due to Tom's comment in [1], after having quickly looked
> over your patch I see there's no solution for that complaint.
>
> (I'd been trying to think of a solution to this as a sort of idle
> background task and so far I'd only come up with a sort of List
> indexing system, where we could build additional data structures atop
> of the list and have functions like list_nth() check for such an index
> and use it if available.   I don't particularly like the idea, but it
> was the only way I could think of to work around Tom's complaint.  I
> felt there was just too much list API code that relies on the list
> being a linked list.  e.g checking for empty lists with list == NIL.
> lnext() etc.)
>
> So in short, I'm in favour of not just having braindead linked lists
> all over the place to store things. I will rejoice the day we move
> away from that, but also Tom's comment kinda stuck with me.  He's
> often right too.   Likely backpatching pain that Tom talks about would
> be much less if we used C++, but... we don't.

Right, in this patch-set I wasn't really focused on how to replace the
existing lists in a back-patch friendly style (a very legitimate
concern), I was focused on how to get the best possible machine code
for basic data structures and algorithms that are used all over the
place, by inlining as many known-at-compile-time parameters as
possible.  And using the right data-structures in the first place by
making them easily available; I guess that's the bit where your ideas
and mine overlapped.  I'm also interested in skipping gratuitous
allocation and pointer chasing, even in cases where eg linked lists
might not be "wrong" according to the access pattern, just because
it's cache-destructive and a waste of cycles.  As Alexander Stepanov
said: "Use vectors whenever you can. If you cannot use vectors,
redesign your solution so that you can use vectors", and I think there
is also something interesting about keeping objects inside in their
containing object, instead of immediately using pointers to somewhere
else (obviously tricky with variable sized data, but that's what the
small vector optimisation idea is about; whether it's worth it after
the overhead of detecting the case, I don't know; std::string
implementors seem to think so).  I was primarily interested in new
code, though I'm pretty sure there are places all over the tree where
microoptimisations are possible through specialisation (think
partitioning, sorting, searching, uniquifying in cases where the types
are fixed, or vary at runtime but there are some common cases you
might want to specialise for).

For the cases you're interested in, maybe piecemeal replication of the
planner lists that are measurably very hot is the only practical way
to do it, along with evidence that it's really worth the future
backpatching pain in each case.  Or maybe there is some way to create
a set of macros that map to List operations in back branches, as you
were hinting at, to keep client code identical...  I dunno.

-- 
Thomas Munro
https://enterprisedb.com

Re: psql show URL with help

2019-02-21 Thread David Fetter

On Thu, Feb 21, 2019 at 06:28:09PM +0100, Peter Eisentraut wrote:
> As mentioned on
> 
> https://www.cybertec-postgresql.com/en/looking-at-mysql-8-with-postgresql-goggles-on/
> 
> how about this:
> 
> => \h analyze
> Command: ANALYZE
> Description: collect statistics about a database
> Syntax:
> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
> ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]
> 
> where option can be one of:
> 
> VERBOSE
> SKIP_LOCKED
> 
> and table_and_columns is:
> 
> table_name [ ( column_name [, ...] ) ]
> 
> URL: https://www.postgresql.org/docs/12/sql-analyze.html
> 
> 
> (Won't actually work because the web site isn't serving "12" URLs yet,
> but that's something that could probably be sorted out.)

Since there's no longer any mystery as to what the upcoming major
version of PostgreSQL will be, it should be pretty straightforward to
redirect integers > max(released version) to the devel docs.

This could cause some confusion because we branch long (feature- and
bug-wise) before we do the release, and a checkout from immediately
after branching will be *very* different from one just before the
release. The way I've come up with to clear this up is *way* too
expensive: lazily create doc builds for each SHA1 requested.  Pointing
people at the latest devel docs is less confusing than pointing them
at nothing or at the previous version, those being, as far as I can
tell, the viable alternatives.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [PATCH v20] GSSAPI encryption support

2019-02-21 Thread Stephen Frost

Greetings,

* Robbie Harwood (rharw...@redhat.com) wrote:
> Stephen Frost  writes:
> > * Robbie Harwood (rharw...@redhat.com) wrote:
> >> Sure!  I'll go ahead and hack up the checks and lucid stuff and get
> >> back to you.
> >
> > Great!  I'll finish fleshing out the basics of how to have this work
> > server-side and once the checks and lucid stuff are in on the psql
> > side, it should be pretty straight-forward to copy that same
> > information into beentry alongside the SSL info, and expose it through
> > pg_stat_get_activity() into a pg_stat_gss view.
> 
> When the mech is gss_mech_krb5 under MIT krb5:
> 
> psql (12devel)
> GSSAPI encrypted connection (krb5 using aes256-cts-hmac-sha384-192)
> Type "help" for help.
> 
> And the same under Heimdal:
> 
> psql (12devel)
> GSSAPI encrypted connection (krb5 using aes256-cts-hmac-sha1-96)
> Type "help" for help
> 
> If the mech weren't gss_krb5, or Lucid introspection failed, but we're a
> SASL-aware mech (and using MIT):
> 
> psql (12devel)
> GSSAPI encrypted connection (~256 bits)
> Type "help" for help.
> 
> The third case (no lucid, no SASL SSF):
> 
> psql (12devel)
> GSSAPI encrypted connection (unknown mechanism)
> Type "help" for help.
> 
> Feel free to tweak these strings as needed.

That all looks fantastic!  Do you see any reason to not say:

GSSAPI encrypted connection (SASL SSF: ~256 bits)

instead, since that's what we are technically reporting there?

> I've also adjusted the layering somewhat and moved the actual printf()
> call down into fe-secure-gssapi.c  I don't know whether this model makes
> more sense in the long run, but for PoC code it was marginally easier to
> reason about.

No, I think we need to provide a way for libpq-using applications to get
at that information easily..

> Another thing I've been thinking about here is whether the SASL SSF
> logic is useful.  It'll only get invoked when the mechanism isn't
> gss_mech_krb5, and only under MIT.  SPNEGO (krb5), NTLM (gss-ntlmssp),
> IAKERB (krb5), and EAP (moonshot) all don't support
> GSS_C_SEC_CONTEXT_SASL_SSF; I actually couldn't come up with another
> mechanism that does.  I defer to you on whether to remove that, though.

Oh, hmm, I see.  Yeah, since there's no case where that could actually
end up being used today then perhaps it makes sense to remove it- it's
not a terribly good interface anyway since it doesn't provide the
actual encryption algorithm, I had just gone down that route because I
saw how to.  The lucid stuff is much better. :)

> I did some testing with Heimdal since we're using some extensions here,
> and found out that the current build machinery doesn't support Heimdal.

Hah!

> (The configure.in detection logic only seems to work for MIT and
> Solaris.)  However, it's possible to explicitly pass in CFLAGS/LDFLAGS
> and it worked prior to my changes, so I've preserved that behavior.

Alright.  That seems like an independent change, if we decide to make
it easier for people to build with heimdal, but there hasn't been much
call for it, clearly.

> Finally, as this patch touches configure.in, configure needs to be
> regenerated; I'm not sure what project policy is on when that happens
> (and it produces rather a lot of churn on my machine).

There's some magic there due to vendor changes to autoconf, as I recall,
which is likely why you're seeing a lot of churn.

> Patch attached after the break; apply on top of -20.

Ok.  I'm pretty amazed at how little code it was to do..  Is there
somewhere that these functions are publicly documented and how they can
be used from a GSSAPI handle is documented?

Thanks a lot for this work!

Stephen


signature.asc
Description: PGP signature

Re: proposal: variadic argument support for least, greatest function

2019-02-21 Thread Chapman Flack

On 02/21/19 11:31, Pavel Stehule wrote:
> I wrote doc (just one sentence) and minimal test. Both can be enhanced.

Attaching minmax_variadic-20190221b.patch, identical but for
s/supports/support/ and s/a/an/ in the doc.

Regards,
-Chap
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 86ff4e5c9e..aa4e80cca7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12679,6 +12679,12 @@ SELECT NULLIF(value, '(none)') ...
 
 
 LEAST(value , ...)
+
+
+GREATEST(VARIADIC anyarray)
+
+
+LEAST(VARIADIC anyarray)
 
 

@@ -12697,6 +12703,11 @@ SELECT NULLIF(value, '(none)') ...
 make them return NULL if any argument is NULL, rather than only when
 all are NULL.

+
+   
+These functions support passing parameters as an array after
+VARIADIC keyword.
+   
   
  
 
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index db3777d15e..01d7f0e02c 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1854,13 +1854,18 @@ ExecInitExprRec(Expr *node, ExprState *state,
 		case T_MinMaxExpr:
 			{
 MinMaxExpr *minmaxexpr = (MinMaxExpr *) node;
-int			nelems = list_length(minmaxexpr->args);
+int			nelems;
 TypeCacheEntry *typentry;
 FmgrInfo   *finfo;
 FunctionCallInfo fcinfo;
 ListCell   *lc;
 int			off;
 
+if (minmaxexpr->args)
+	nelems = list_length(minmaxexpr->args);
+else
+	nelems = 1;
+
 /* Look up the btree comparison function for the datatype */
 typentry = lookup_type_cache(minmaxexpr->minmaxtype,
 			 TYPECACHE_CMP_PROC);
@@ -1897,16 +1902,29 @@ ExecInitExprRec(Expr *node, ExprState *state,
 scratch.d.minmax.finfo = finfo;
 scratch.d.minmax.fcinfo_data = fcinfo;
 
-/* evaluate expressions into minmax->values/nulls */
-off = 0;
-foreach(lc, minmaxexpr->args)
+if (minmaxexpr->args)
 {
-	Expr	   *e = (Expr *) lfirst(lc);
+	scratch.d.minmax.variadic_arg = false;
 
-	ExecInitExprRec(e, state,
-	&scratch.d.minmax.values[off],
-	&scratch.d.minmax.nulls[off]);
-	off++;
+	/* evaluate expressions into minmax->values/nulls */
+	off = 0;
+	foreach(lc, minmaxexpr->args)
+	{
+		Expr	   *e = (Expr *) lfirst(lc);
+
+		ExecInitExprRec(e, state,
+		&scratch.d.minmax.values[off],
+		&scratch.d.minmax.nulls[off]);
+		off++;
+	}
+}
+else
+{
+	scratch.d.minmax.variadic_arg = true;
+
+	ExecInitExprRec(minmaxexpr->variadic_arg, state,
+		&scratch.d.minmax.values[0],
+		&scratch.d.minmax.nulls[0]);
 }
 
 /* and push the final comparison */
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index a018925d4e..748c950885 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2819,6 +2819,55 @@ ExecEvalMinMax(ExprState *state, ExprEvalStep *op)
 	/* default to null result */
 	*op->resnull = true;
 
+	if (op->d.minmax.variadic_arg)
+	{
+		ArrayIterator array_iterator;
+		ArrayType  *arr;
+		Datum	value;
+		bool	isnull;
+
+		if (nulls[0])
+			return;
+
+		arr = DatumGetArrayTypeP(values[0]);
+
+		array_iterator = array_create_iterator(arr, 0, NULL);
+		while (array_iterate(array_iterator, &value, &isnull))
+		{
+			if (isnull)
+continue;
+
+			if (*op->resnull)
+			{
+/* first nonnull input */
+*op->resvalue = value;
+*op->resnull = false;
+			}
+			else
+			{
+int			cmpresult;
+
+Assert(fcinfo->nargs == 2);
+
+/* apply comparison function */
+fcinfo->args[0].value = *op->resvalue;
+fcinfo->args[1].value = value;
+
+fcinfo->isnull = false;
+cmpresult = DatumGetInt32(FunctionCallInvoke(fcinfo));
+if (fcinfo->isnull) /* probably should not happen */
+	continue;
+
+if (cmpresult > 0 && operator == IS_LEAST)
+	*op->resvalue = value;
+else if (cmpresult < 0 && operator == IS_GREATEST)
+	*op->resvalue = value;
+			}
+		}
+
+		return;
+	}
+
 	for (off = 0; off < op->d.minmax.nelems; off++)
 	{
 		/* ignore NULL inputs */
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e15724bb0e..e8ad89799f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1942,6 +1942,7 @@ _copyMinMaxExpr(const MinMaxExpr *from)
 	COPY_SCALAR_FIELD(inputcollid);
 	COPY_SCALAR_FIELD(op);
 	COPY_NODE_FIELD(args);
+	COPY_NODE_FIELD(variadic_arg);
 	COPY_LOCATION_FIELD(location);
 
 	return newnode;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 31499eb798..6623c7800d 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -635,6 +635,7 @@ _equalMinMaxExpr(const MinMaxExpr *a, const MinMaxExpr *b)
 	COMPARE_SCALAR_FIELD(inputcollid);
 	COMPARE_SCALAR_FIELD(op);
 	COMPARE_NODE_FIELD(args);
+	COMPARE_NODE_FIELD(variadic_arg);
 	COMPARE_LOCATION_FIELD(location);
 
 	return true;
diff --git a/src

Optimization of some jsonb functions

2019-02-21 Thread Nikita Glukhov


Attached set of patches with some jsonb optimizations that were made during
comparison of performance of ordinal jsonb operators and jsonpath operators.

1. Optimize JsonbExtractScalar():
   It is better to use getIthJsonbValueFromContainer(cont, 0) instead of
   JsonIterator to get 0th element of raw-scalar pseudoarray.
   JsonbExtractScalar() is used in jsonb casts, so they speed up a bit.

2. Optimize operator #>>, jsonb_each_text(), jsonb_array_elements_text():
   These functions have direct conversion (JsonbValue => text) only for
   jbvString scalars, but indirect conversion of other types of scalars
   (JsonbValue => jsonb => text) is obviously too slow.  Extracted common
   subroutine JsonbValueAsText() and used in all suitable places.

3. Optimize JsonbContainer type recognition in get_jsonb_path_all():
   Fetching of the first token from JsonbIterator is replaced with lightweight
   JsonbContainerIsXxx() macros.

4. Extract findJsonbKeyInObject():
   Extracted findJsonbKeyInObject() from findJsonbValueFromContainer(),
   which is slightly easier to use (key string and its length is passed instead
   of filled string JsonbValue).

5. Optimize resulting value allocation in findJsonbValueFromContainer() and
   getIthJsonbValueFromContainer():
   Added ability to pass stack-allocated JsonbValue that will be filled with
   the result of operation instead of returning unconditionally palloc()ated
   JsonbValue.

Patches #4 and #5 are mostly refactorings, but they can give small speedup
(up to 5% for upcoming jsonpath operators) due to elimination of unnecessary
palloc()s.  The whole interface of findJsonbValueFromContainer() with JB_OBJECT
and JB_ARRAY flags always seemed a bit strange to me, so I think it is worth to
have separate functions for searching keys in objects and elements in arrays.


Performance tests:
 - Test data for {"x": {"y": {"z": i}}}:
   CREATE TABLE t AS
   SELECT jsonb_build_object('x',
jsonb_build_object('y',
  jsonb_build_object('z', i))) js
   FROM generate_series(1, 300) i;

 - Sample query:
   EXPLAIN (ANALYZE) SELECT js -> 'x' -> 'y' -> 'z' FROM t;

 - Results:
|   execution time, ms
 query  |  master  |   optimized
---
  {"x": {"y": {"z": i}}}
 js #> '{x,y,z}'| 1148.632 | 1005.578 -10%
 js #>> '{x,y,z}'   | 1520.160 |  849.991 -40%
 (js #> '{x,y,z}')::numeric | 1310.881 | 1067.752 -20%
 (js #>> '{x,y,z}')::numeric| 1757.179 | 1109.495 -30%

 js -> 'x' -> 'y' -> 'z'| 1030.211 |  977.267
 js -> 'x' -> 'y' ->> 'z'   |  887.101 |  838.745
 (js -> 'x' -> 'y' -> 'z')::numeric | 1184.086 | 1050.462
 (js -> 'x' -> 'y' -> 'z')::int4| 1279.315 | 1133.032
 (js -> 'x' -> 'y' ->> 'z')::numeric| 1134.003 | 1100.047
 (js -> 'x' -> 'y' ->> 'z')::int4   | 1077.216 |  991.995

 js ? 'x'   |  523.111 |  495.387
 js ?| '{x,y,z}'|  612.880 |  607.455
 js ?& '{x,y,z}'|  674.786 |  643.987
 js -> 'x' -> 'y' ? 'z' |  712.623 |  698.588
 js @> '{"x": {"y": {"z": 1}}}' | 1154.926 | 1149.069

jsonpath:
 js @@ '$.x.y.z == 123' |  973,444 |   912,08  -5%

  {"x": i, "y": i, "z": i}
 jsonb_each(js) | 2281.577 | 2262.660
 jsonb_each_text(js)| 2603.539 | 2112.200 -20%
 
   [i, i, i]

 jsonb_array_elements(js)   | 1255.210 | 1205.939
 jsonb_array_elements(js)::numeric  | 1662.550 | 1576.227  -5%
 jsonb_array_elements_text(js)  | 1555.021 | 1067.031 -30%

 js @> '1'  |  798.858 |  768.664  -4%
 js <@ '[1,2,3]'|  820.795 |  785.086  -5%
 js <@ '[0,1,2,3,4,5,6,7,8,9]'  | 1214.170 | 1165.289  -5%


As it can be seen, #> operators are always slower than equivalent series of ->.
I think it is caused by array deconstruction in "jsonb #> text[]".

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 031c74c2c8de9f38cc484b55fb2a5f11279c9bb8 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Thu, 21 Feb 2019 02:52:24 +0300
Subject: [PATCH 1/5] Optimize JsonbExtractScalar()

---
 src/backend/utils/adt/jsonb.c | 25 +++--
 1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c02c856..7e9e99f 100644

Re: boolean and bool in documentation

2019-02-21 Thread Bruce Momjian

On Tue, Feb 19, 2019 at 12:56:19AM +0900, Masahiko Sawada wrote:
> AFAICS there seems not to be explicit rules and policies for usage of
> 'boolean' and 'bool'. We use 'bool' for colum data type of almost
> system catalogs but use 'boolean' for some catalogs (pg_pltemplate and
> pg_policy). The same is true for functions and system views. Is it
> better to unify them into 'boolean' for consistency and so as not
> unnecessarily confuse users? FYI the name of boolean type is 'boolean'
> in the standard.

Yes, I think so.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Removing unneeded self joins

Alexander Kuzmenkov  writes:
> Here is a rebased version with some bugfixes.

I noticed this had bit-rotted again.  I've not really reviewed it, but
I rebased it up to HEAD, and fixed a couple small things:

* My compiler was bitching about misplaced declarations, so I moved
some variable declarations accordingly.  I couldn't help noticing
that many of those wouldn't have been a problem in the first place
if you were following project style for loops around list_delete_cell
calls, which usually look more like this:

prev = NULL;
for (cell = list_head(root->rowMarks); cell; cell = next)
{
PlanRowMark *rc = (PlanRowMark *) lfirst(cell);

next = lnext(cell);
if (rt_fetch(rc->rti, root->parse->rtable)->rtekind == RTE_RESULT)
root->rowMarks = list_delete_cell(root->rowMarks, cell, prev);
else
prev = cell;
}

* I saw you had a problem with an existing test in join.sql that
was being optimized away because it used an ill-advised self-join.
I've pushed a fix for that, so it's not a problem as of HEAD.

I notice though that there's one unexplained plan change remaining
in join.out:

@@ -4365,11 +4365,13 @@ explain (costs off)
 select p.* from
   (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
   where p.k = 1 and p.k = 2;
-QUERY PLAN
---
+   QUERY PLAN   
+
  Result
One-Time Filter: false
-(2 rows)
+   ->  Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
 
 -- bug 5255: this is not optimizable by join removal
 begin;

That sure looks like a bug.  I don't have time to look for the
cause right now.

I also noticed that the test results show that when a table
is successfully optimized away, the remaining reference seems
to have the alias of the second reference not the first one.
That seems a little ... weird.  It's just cosmetic of course, but
why is that?

Also, I did notice that you'd stuck a declaration for
"struct UniqueIndexInfo" into paths.h, which then compelled you
to include that header in planmain.h.  This seems like poor style;
I'd have been inclined to put the struct in pathnodes.h instead.
That's assuming you need it at all -- in those two usages, seems
like it'd be just about as easy to return two separate Lists.
On the other hand, given

+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.

I wonder why you didn't include the Relids into UniqueIndexInfo as well
... and maybe make it a proper Node so that unique_for_rels could be
printed by outfuncs.c.  So any way I slice it, it seems like this data
structure could use more careful contemplation.

Anyway, updated patch attached.

regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..86c9453 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueIndexInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,7 +3605,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueIndexInfo **index_info)
 {
 	ListCell   *ic;
 
@@ -3660,6 +3662,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *matched_restrictlist = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3711,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 if (match_index_to_operand(rexpr, c, ind))
 {
 	matched = true; /* column is unique */
+	matched_restrictlist = lappend(matched_restrictlist, rinfo);
 	break;
 }
 			}
@@ -3750,7 +3754,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Matched all key columns of this index? */
 		if (c == ind->nkeycolumns)
+		{
+			if (index_info != NULL)
+			{
+/* This may be called

RE: Timeout parameters

Hi,

> > tcp_socket_timeout (integer)
> >
> > Terminate and restart any session that has been idle for more than
> > the specified number of milliseconds to prevent client from infinite
> > waiting for server due to dead connection. This can be used both as
> > a brute force global query timeout and detecting network problems.
> > A value of zero (the default) turns this off.

On Friday, February 22, 2019 12:01 AM (GMT+9), 
mikalaike...@ibagroup.eu
wrote:
> I am not very  familiar with the PostgreSQL source code. Nevertheless,
> the main idea of this parameter is clear for me - closing a connection
> when the PostgreSQL server does not response due to any reason. However,
> I have not found  in the discussion a reference that this parameter can
> be applied to the TCP as well as to the UNIX-domain sockets. Moreover,
> this parameter works out of communication layer. When we consider TCP
> communication, the failover is covered by keep_alive and tpc_user_timeout
> parameters.
>
> According to it, we should not use 'tcp' prefix in this parameter name,
> 'socket' sub string is not suitable too.
>
> 'timeout' is OK.
>
> This parameter works on the client side. So the word 'client' is a good
> candidate for using in this parameter name.
> This parameter affects only when we send a 'query' to the pg server.

#Shookedt. Yeah, there should be no “tcp”in that parameter, so it was
my mistake.
Regarding whether we use the “socket_timeout” or “client_query_timeout”,
why is socket not suitable?
Although I’m not arguing against client_query_timeout as it’s also
a good candidate parameter name.

> Based on it, we can build a name for this parameter 'client_query_timeout'.
>
> The suggested explanation of this parameter does not follow the aim of
> integrating this parameter:
>
> client_query_timeout
>
> Specifies the number of seconds to prevent client from infinite waiting
> for server acknowledge to the sent query due to dead connection. This
> can be used both as a force global query timeout and network problems
> detector. A value of zero (the default) turns this off.

Thanks for the fix. In the client side though, other parameters are specified
in milliseconds, so I used the same.

Regards,
Kirk Jamison

Re: boolean and bool in documentation

Bruce Momjian  writes:
> On Tue, Feb 19, 2019 at 12:56:19AM +0900, Masahiko Sawada wrote:
>> AFAICS there seems not to be explicit rules and policies for usage of
>> 'boolean' and 'bool'. We use 'bool' for colum data type of almost
>> system catalogs but use 'boolean' for some catalogs (pg_pltemplate and
>> pg_policy). The same is true for functions and system views. Is it
>> better to unify them into 'boolean' for consistency and so as not
>> unnecessarily confuse users? FYI the name of boolean type is 'boolean'
>> in the standard.

> Yes, I think so.

FWIW, I'm not excited about this.  We accept "bool" and "boolean"
interchangeably, and it does not seem like an improvement for the
docs to use only one form.  By that argument, somebody should go
through the docs and nuke every usage of "::" in favor of
SQL-standard CAST(...) notation, every usage of "float8"
in favor of DOUBLE PRECISION, every usage of "timestamptz" in
favor of the long form, etc etc.

regards, tom lane

Removal of duplicate variable declarations in fe-connect.c

2019-02-21 Thread Haribabu Kommi

Hi Hackers,

During the development of another feature, I found that same local
variables are declared twice.
IMO, there is no need of again declaring the local variables. Patch
attached.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Removal-of-duplicate-local-variable-declaration.patch
Description: Binary data

Re: proposal: variadic argument support for least, greatest function

2019-02-21 Thread Chapman Flack

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

On 02/21/19 16:04, Tom Lane wrote:
> Pavel Stehule  writes:
>> I wrote doc (just one sentence) and minimal test. Both can be enhanced.

* dutifully submits review saying latest patch passes installcheck-world and 
has tests and docs, could be RfC

> I still say we should reject this and invent array_greatest/array_least
> functions instead.

* reflects on own pay grade, wanders off in search of other patch to review

The new status of this patch is: Ready for Committer

RE: Timeout parameters

From: Jamison, Kirk/ジャミソン カーク
> Although I did review and followed the suggested way in previous email
> way back (which uses root user) and it worked as intended, I'd also like
> to hear feedback also from Fabien whether it's alright without the test
> script, or if there's another way we can test it (maybe none?).
> Then I think it's in good shape.

I'm afraid there's not a way to incorporate the test into the regression test.  
If there is, Fabien should have responded earlier.



> However, what I meant by my statement above was to have a separate
> decision in committing the two parameters, because based from the thread
> tcp_user_timeout has got more progress when it comes to reviews.
> So once it's ready, it can be committed separately without waiting for
> the socket_timeout param to be ready. (I dont think that requires a
> separate thread)

I see, thanks.


> As for the place, my knowledge is limited so I won't be able to provide
> substantial help on it. Fabien pointed out some comments about it that needs
> clarification though.
> Quoting what Fabien wrote:
>   "The implementation looks awkward, because part of the logic of
> pqWaitTimed
>   is reimplemented in pqWait. Also, I do not understand the
> computation
>   of finish_time, which seems to assume that pqWait is going to be
> called
>   immediately after sending a query, which may or may not be the case,
> and
>   if it is not the time() call there is not the start of the statement."

The patch looks good in that respect.  This timeout is for individual socket 
operations like PgJDBC and Oracle, not the query timeout.  That's what the 
parameter name suggests.


> How about the one below?
> I got it from combined explanations above. I did not include the
> differences though. This can be improved as the discussion
> continues along the way.
> 
> tcp_socket_timeout (integer)
> 
> Terminate and restart any session that has been idle for more than
> the specified number of milliseconds to prevent client from infinite
> waiting for server due to dead connection. This can be used both as
> a brute force global query timeout and detecting network problems.
> A value of zero (the default) turns this off.

Looks better than the original one.  However,

* The unit is not millisecond, but second.
* Doesn't restart the session.
* Choose words that better align with the existing libpq parameters.  e.g. 
"connection" should be used instead of "session" as in keepalive_xxx 
parameters, because there's not a concept of database session at socket level.
* Indicate that the timeout is for individual socket operations.


Regards
Takayuki Tsunakawa

RE: libpq debug log

2019-02-21 Thread Iwata, Aya

Hi Robert, 

> I'm not really sure that I like the design of this patch in any way.
Aside from problems with my current documentation which I will fix,
could you explain more detail about the problem of the design? 
I would like to improve my current implementation based from feedback.

Regards,
Aya Iwata

Re: bgwriter_lru_maxpages limits in PG 10 sample conf

2019-02-21 Thread Bruce Momjian

On Wed, Feb 20, 2019 at 11:54:25PM +0300, Sergei Kornilov wrote:
> Hello
> 
> > I'm a bit reluctant to whack postgresql.conf around in back-branches
> > because sometimes that makes funny things happen when somebody
> > upgrades, e.g. via RPM.
> 
> If i remember correctly both deb and rpm packages will ask user about config 
> difference.
> But good point, comment change is too small difference. I am a bit late, good 
> time for such change was before last minor release (we add data_sync_retry 
> and config was changed anyway).

The other issue is that you will change share/postgresql.conf.sample,
but not $PGDATA/postgresql.conf until initdb is run, meaning if someone
diffs the two files, they will see differences that they did not make. 
Making defaults more accurate is not worth that confusion.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

RE: Timeout parameters

From: mikalaike...@ibagroup.eu [mailto:mikalaike...@ibagroup.eu]
> I am not very  familiar with the PostgreSQL source code. Nevertheless,  the
> main idea of this parameter is clear for me - closing a connection when
> the PostgreSQL server does not response due to any reason. However, I have
> not found  in the discussion a reference that this parameter can be applied
> to the TCP as well as to the UNIX-domain sockets. Moreover, this parameter
> works out of communication layer. When we consider TCP communication, the
> failover is covered by keep_alive and tpc_user_timeout parameters.

Right.  This timeout is for individual socket operations, and the last resort 
to forcibly terminate the connection when other timeouts don't work (TCP 
keepalive, tcp user timeout, statement_timeout).  Also, there's no reason to 
restrict this timeout to TCP.


> According to it, we should not use 'tcp' prefix in this parameter name,
> 'socket' sub string is not suitable too.
> 
> This parameter works on the client side. So the word 'client' is a good
> candidate for using in this parameter name.
> This parameter affects only when we send a 'query' to the pg server.

No, this timeout works not only for queries but for any socket operations to 
communicate with the server.  For example, libpq sends query cancellation 
message to the server, receives server-side parameter value change 
notifications, etc.  So, like PgJDBC and Oracle, I think the name suggesting 
socket or communication is good.  That is, socket_timeout or 
communication_timeout is OK.


Regards
Takayuki Tsunakawa

Re: [HACKERS] Restricting maximum keep segments by repslots

2019-02-21 Thread Kyotaro HORIGUCHI

At Fri, 15 Feb 2019 19:13:23 -0800, Andres Freund  wrote in 
<20190216031323.t7tfrae4l6zqt...@alap3.anarazel.de>
> Hi,
> 
> On 2019-01-30 10:42:04 +0900, Kyotaro HORIGUCHI wrote:
> > From 270aff9b08ced425b4c4e23b53193285eb2359a6 Mon Sep 17 00:00:00 2001
> > From: Kyotaro Horiguchi 
> > Date: Thu, 21 Dec 2017 21:20:20 +0900
> > Subject: [PATCH 1/6] Add WAL relief vent for replication slots
> > 
> > Adds a capability to limit the number of segments kept by replication
> > slots by a GUC variable.
> 
> Maybe I'm missing something, but how does this prevent issues with
> active slots that are currently accessing the WAL this patch now
> suddenly allows to be removed? Especially for logical slots that seems
> not unproblematic?

No matter whether logical or physical, when reading an
overwritten page of a recycled/renamed segment file, page
validation at reading-in finds that it is of a different segment
than expected. 0006 in [1] introduces more active checking on
that.

[1] 
https://www.postgresql.org/message-id/20181220.162438.121484007.horiguchi.kyotaro%40lab.ntt.co.jp

> Besides that, this patch needs substantial spelling / language / comment
> polishing. Horiguchi-san, it'd probably be good if you could make a
> careful pass, and then maybe a native speaker could go over it?

Thank you for your kind suggestion. As I did for other patches,
I'll review it by myself and come up with a new version soon.

# I often don't understand what I wrote:(

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Timeout parameters

On Friday, February 22, 2019 9:46 AM (GMT+9), Tsunakawa, Takayuki wrote:

> > Terminate any session that has been idle for more than the 
> > specified number of seconds to prevent client from infinite 
> > waiting for server due to dead connection. This can be used both as a 
> > brute force global query timeout and detecting network problems.
> > A value of zero (the default) turns this off.
>
> Looks better than the original one.  However,
> 
> * The unit is not millisecond, but second.
> * Doesn't restart the session.
> * Choose words that better align with the existing libpq parameters.
> e.g. "connection" should be used instead of "session" as in 
> keepalive_xxx parameters, because there's not a concept of database
> session at socket level.
> * Indicate that the timeout is for individual socket operations.

My bad. I was looking at the wrong documentations, so seconds should be used.
I guess we should just choose socket_timeout as parameter name.

socket_timeout (integer)

Terminate any connection that has been inactive for more than the specified 
number of seconds to prevent client from infinite waiting for individual socket 
read operations due to dead connection. This can be used both as a force global 
query timeout and network problems detector. A value of zero (the default) 
turns this off.

or

Controls the number of seconds of connection inactivity to prevent client from 
infinite waiting for individual socket read operations due to dead connection. 
This can be used both as a force global query timeout and network problems 
detector. A value of zero (the default) turns this off. 

Regards,
Kirk Jamison

Re: pg_dump multi VALUES INSERT

On Tue, 19 Feb 2019 at 02:34, Surafel Temesgen  wrote:
> I see that there are already a test for zero column table in 
> test_fourth_table_zero_col
> and if am not wrong table_index_stats is empty table

Maybe Fabien would like to see a test that dumps that table with
--rows-per-insert= to ensure the output remains
as the other test.  I think it might be a good idea too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

RE: libpq debug log

On Wednesday, February 20, 2019 12:56 PM GMT+9, Robert Haas wrote:

> On Mon, Feb 18, 2019 at 10:06 PM Jamison, Kirk  
> wrote:
> > It sounds more logical to me if there is a parameter that switches 
> > on/off the logging similar to other postgres logs. I suggest trace_log as 
> > the parameter name.

> I'm not really sure that I like the design of this patch in any way.
> But leaving that aside, trace_log seems like a poor choice of name, 
> because it's got two words telling you what kind of thing we are doing 
> (tracing, logging) and zero words describing the nature of the thing 
> being traced or logged (the wire protocol communication with the client).
> A tracing facility for the planner, or the executor, or any other part
> of the system would have an equally good claim on the same name.  That
> can't be right.

Agreed about the argument with trace_log parameter name. I just shootout
a quick idea. I didn't think about it too deeply, but just thought of a
switch that will enable or disable the feature. So there are 
definitely better names other than that. And as you suggested, should 
describe specifically what the feature does. 

Regards,
Kirk Jamison

RE: Timeout parameters

From: Jamison, Kirk [mailto:k.jami...@jp.fujitsu.com]
> socket_timeout (integer)

libpq documentation does not write the data type on the parameter name line.


> Terminate any connection that has been inactive for more than the specified
> number of seconds to prevent client from infinite waiting for individual
> socket read operations due to dead connection. This can be used both as
> a force global query timeout and network problems detector. A value of zero
> (the default) turns this off.
> 
> or
> 
> Controls the number of seconds of connection inactivity to prevent client
> from infinite waiting for individual socket read operations due to dead
> connection. This can be used both as a force global query timeout and network
> problems detector. A value of zero (the default) turns this off.

The second one is better, but:

* Just "connection inactivity" (i.e. the application hasn't submitted any 
request to the server for a long time) does not terminate the connection.  
* "due to dead connction" is not the cause for the timeout.  If the timeout 
occurs, consider the connection dead (see keepalives_count).
* Not restricted to "read" operation?


Regards
Takayuki Tsunakawa

RE: reloption to prevent VACUUM from truncating empty pages at the end of relation

From: Julien Rouhaud [mailto:rjuju...@gmail.com]
> FWIW, I prefer shrink over truncate, though I'd rather go with
> vacuum_shink_enabled as suggested previously.

Thanks.  I'd like to leave a committer to choose the name.  FWIW, I chose 
shrink_enabled rather than vacuum_shrink_enabled because this property may be 
used in other shrink situations in the future.  What I imagined was that with 
the zheap, DELETE or some maintenance operation, not vacuum, may try to shrink 
the table.  I meant this property to indicate "whether this table shrinks or 
not" regardless of the specific operation that can shrink the table.



> I'm not sure that I get this comment.  Since both require a
> ShareUpdateExclusiveLock, you can't change the parameter while a
> VACUUM is active on that table.  Did you wanted to use another lock
> mode?

No, changing the parameter acquires ShareUpdaeExclusive lock.  I just imitated 
the description for n_distinct in the same comment block.  The meaning is that 
the setting cannot be changed during VACUUM, so in-flight VACUUM is not 
affected.


Regards
Takayuki Tsunakawa

Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Amit Kapila

On Fri, Feb 22, 2019 at 1:57 AM Alvaro Herrera  wrote:
>
> I think this test is going to break on nonstandard block sizes.  While
> we don't promise that all tests work on such installs (particularly
> planner ones),
>

The reason for not pushing much on making the test pass for
nonstandard block sizes is that when I tried existing tests, there
were already some failures.  For example, see the failures in the
attached regression diff files (for block_size as 16K and 32K
respectively).  I saw those failures during the previous
investigation, the situation on HEAD might or might not be exactly the
same.  Whereas I see the value in trying to make sure that tests pass
for nonstandard block sizes, but that doesn't seem to be followed for
all the tests.

> it seems fairly easy to cope with this one -- just use a
> record size expressed as a fraction of current_setting('block_size').
> So instead of "1024" you'd write current_setting('block_size') / 8.
> And then display the relation size in terms of pages, not bytes, so
> divide pg_relation_size by block size.
>

The idea sounds good.  John, would you like to give it a try?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

regression.16.diffs
Description: Binary data

regression.32.diffs
Description: Binary data

Unified security key managment

2019-02-21 Thread Bruce Momjian

I know there has been recent discussion about implementing transparent
data encryption (TDE) in Postgres:

https://www.postgresql.org/message-id/CAD21AoAqtytk0iH6diCJW24oyJdS4roN-VhrFD53HcNP0s8pzA%40mail.gmail.com

I would like to now post a new extension I developed to handle
cryptographic key management in Postgres. It could be used with TDE,
with pgcrypto, and with an auto-encrypted data type. It is called
pgcryptokey and can be downloaded from:

https://momjian.us/download/pgcryptokey/

I am attaching its README file to this email.

The extension uses two-layer key storage, and stores the key in a
Postgres table. It allows the encryption key to be unlocked by the
client, or at boot time. (This would need to be modified to be a global
table if it was used for block-level encryption like TDE.)

I am willing to continue to develop this extension if there is interest.
Should I put it on PGXN eventually? It is something we would want in
/contrib?

--
Bruce Momjian http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
pgcryptokey - cryptographic key management extension

OVERVIEW

pgcryptokey allows the creation, rotation, selection, and deletion
of cryptographic data keys. Each cryptographic data key is
encrypted/decrypted with (i.e., wrapped inside) a key access password.
Accessing a cryptographic data key requires the proper key access password,
as illustrated below:

++
||
| key access password |
||
| +--+ |
| |encrypted_data_key| |
| +--+ |
||
++

pgcryptokey operates in two security modes:

* The key access password is set by clients, so security is at the
session level

* The default key access password is set at boot time, so all sessions
can access cryptographic data keys that require that password

Cryptographic data keys are stored in the pgcryptokey table, which is
automatically created by the extension:

CREATE TABLE pgcryptokey (
key_id SERIAL PRIMARY KEY,
name TEXT DEFAULT 'main',
encrypted_data_key BYTEA NOT NULL,
created TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT
NULL,
superseded_by INTEGER
);

INSTALLATION

To use pgcryptokey, you must install the extension with "CASCADE" to also
install the pgcrypto extension, e.g.:

CREATE EXTENSION pgcryptokey CASCADE;

PASSWORDS
-
There are three levels of passwords used by pgcryptokey:

1. A password entered by the client user, boot-time administrator, or
other method.

2. A hash of password #1 that is passed from the client to the server,
or set as a server variable at boot time. This is called the "key
access password".

3. A data encryption/decryption key stored in the table pgcryptokey and
unlocked via password #2. This is called the "cryptographic data key"
and is used as the 'password' argument to pgcrypto functions.

DEFAULT PASSWORDS
-
The server variable pgcryptokey.default_password is used as the key
access password (number two above) by all pgcryptkey functions when the
password is not supplied. This variable can be set via SQL or at database
server start.

To set pgcryptokey.default_password at server start, set the
postgresql.conf variable 'shared_preload_libraries' to
'pgcryptokey_default', copy the shell script
SHAREDIR/extension/pgcryptokey_default.sample to
PGDATA/pgcryptokey_default, set its execution permission, and restart
the database server. When pgcryptokey.default_password is set at server
start, the value is read-only.

By default, the executable gets the key access password by prompting the
terminal, but this can be modified to use a key management server,
cryptographic hardware, or ssh to access another computer. It is
insecure to store the key access password in the executable. All users
can view a boot-time-set pgcryptokey.default_password value, but they
need access to the pgcryptokey table to make use of it.

When using the default behavior of prompting the terminal, or using ssh,
the typed password is SHA-256-hashed before storing it in the
pgcryptokey.default_password server variable. This behavior can be
simulated at the session level using this SQL command (replace
'MyPassword' with the desired password):

SELECT set_config('pgcryptokey.default_password',
encode(digest('MyPassword', 'sha256'), 'hex'),
false)::VOID;

The SHA-256 hash can also be computed in psql:

\set hashed_password `printf '%s' 'MyPassword' | openssl dgst -sha256
-binary | xxd -p -c 999`
SELECT se

Re: Removal of duplicate variable declarations in fe-connect.c

On Fri, Feb 22, 2019 at 11:33:17AM +1100, Haribabu Kommi wrote:
> During the development of another feature, I found that same local
> variables are declared twice.
> IMO, there is no need of again declaring the local variables. Patch
> attached.

Indeed, fixed.  That's not a good practice, and each variable is
assigned in its own block before getting used, so there is no
overlap.
--
Michael


signature.asc
Description: PGP signature

Re: Using old master as new replica after clean switchover

On Thu, Feb 21, 2019 at 03:38:21PM -0300, Claudio Freire wrote:
> If I'm not mistaken, if you don't have WAL archive set up (a shared
> filesystem isn't necessary, but the standby has to be able to restore
> WAL segments from the archive), a few transactions that haven't been
> streamed at primary shutdown could be lost, since the secondary won't
> be able to stream anything after the primary has shut down. WAL
> archive can always be restored even without a primary running, hence
> why a WAL archive is needed.
> 
> Or am I missing something?

Well, my point is that you may not need an archive if you are able to
stream the changes from a primary using streaming if the primary has a
replication slot or if a checkpoint has not recycled yet the segments
that a standby may need.  If the primary is offline, and you need to
recover a standby, then an archive is mandatory.  When recovering from
an archive, the standby would be able to catch up to the end of the
segment archived as we don't enforce a segment switch when a node
shuts down.  If using pg_receivewal as a form of archiving with its
--synchronous mode, it is also possible to stream up to the point
where the primary has generated its shutdown checkpoint, so you would
not lose data included on the last segment the primary was working on
when stopped.
--
Michael


signature.asc
Description: PGP signature

Re: Using old master as new replica after clean switchover

On Thu, Feb 21, 2019 at 10:26:37AM -0800, RSR999GMAILCOM wrote:
> Is there any link where  the required setup and the step by step procedure
> for performing the controlled switchover are listed?

Docs about failover are here:
https://www.postgresql.org/docs/current/warm-standby-failover.html

Now I don't recall that we have a section about a step-by-step
procedure for one case of failover or another.  The docs could be
perhaps improved regarding that, particularly for the case mentioned
here where it is possible to relink a previous master to a promoted
standby without risks of corruption:
- Stop cleanly the primary with smart or fast mode.
- Promote the standby.
- Add recovery.conf to the previous primary.
- Restart the previous primary as a new standby.
--
Michael


signature.asc
Description: PGP signature

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

On Fri, Feb 22, 2019 at 02:38:56AM +, Tsunakawa, Takayuki wrote:
> From: Julien Rouhaud [mailto:rjuju...@gmail.com]
>> FWIW, I prefer shrink over truncate, though I'd rather go with
>> vacuum_shink_enabled as suggested previously.
> 
> Thanks.  I'd like to leave a committer to choose the name.  FWIW, I
> chose shrink_enabled rather than vacuum_shrink_enabled because this
> property may be used in other shrink situations in the future.  What
> I imagined was that with the zheap, DELETE or some maintenance
> operation, not vacuum, may try to shrink the table.  I meant this
> property to indicate "whether this table shrinks or not" regardless
> of the specific operation that can shrink the table. 

I don't think that we want to use a too generic name and it seems more
natural to reflect the context where it is used in the parameter name.
If we were to shrink with a similar option for other contexts, we
would most likely use a different option.  Depending on the load
pattern, users should also be able to disable or enable a subset of
contexts as well.

So I agree with Julien that [auto]vacuum_shrink_enabled is more
adapted for this stuff.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Restricting maximum keep segments by repslots

2019-02-21 Thread Kyotaro HORIGUCHI

At Fri, 22 Feb 2019 10:12:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190222.101251.0542.horiguchi.kyot...@lab.ntt.co.jp>
horiguchi.kyotaro> At Fri, 15 Feb 2019 19:13:23 -0800, Andres Freund 
 wrote in 
<20190216031323.t7tfrae4l6zqt...@alap3.anarazel.de>
> > Maybe I'm missing something, but how does this prevent issues with
> > active slots that are currently accessing the WAL this patch now
> > suddenly allows to be removed? Especially for logical slots that seems
> > not unproblematic?
> 
> No matter whether logical or physical, when reading an
> overwritten page of a recycled/renamed segment file, page
> validation at reading-in finds that it is of a different segment
> than expected. 0006 in [1] introduces more active checking on
> that.
> 
> [1] 
> https://www.postgresql.org/message-id/20181220.162438.121484007.horiguchi.kyotaro%40lab.ntt.co.jp
>
> > Besides that, this patch needs substantial spelling / language / comment
> > polishing. Horiguchi-san, it'd probably be good if you could make a
> > careful pass, and then maybe a native speaker could go over it?
> 
> Thank you for your kind suggestion. As I did for other patches,
> I'll review it by myself and come up with a new version soon.

I checked spelling comments and commit messages, then perhaps
corrected and improved them. I hope they looks nice..

0006 is separate from 0001, since I still doubt the necessity.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9bc7ca30006ebe0fe13c6ffbf4bfc87e52176876 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 21 Dec 2017 21:20:20 +0900
Subject: [PATCH 1/6] Add WAL relief vent for replication slots

Replication slot is useful to maintain replication connection in the
configurations where replication is so delayed that connection is
broken. On the other hand so many WAL files can fill up disk that the
master downs by a long delay. This feature, which is activated by a
GUC "max_slot_wal_keep_size", protects master servers from suffering
disk full by limiting the number of WAL files reserved by replication
slots.
---
 src/backend/access/transam/xlog.c | 128 +-
 src/backend/replication/slot.c|  57 
 src/backend/utils/misc/guc.c  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 src/include/replication/slot.h|   1 +
 6 files changed, 175 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ecd12fc53a..998b779277 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -100,6 +100,7 @@ int			wal_level = WAL_LEVEL_MINIMAL;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
+int			max_slot_wal_keep_size_mb = -1;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -872,6 +873,7 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
@@ -9329,6 +9331,54 @@ CreateRestartPoint(int flags)
 	return true;
 }
 
+/*
+ * Returns minimum segment number that the next checkpoint must leave
+ * considering wal_keep_segments, replication slots and
+ * max_slot_wal_keep_size.
+ *
+ * currLSN is the current insert location.
+ * minSlotLSN is the minimum restart_lsn of all active slots.
+ */
+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
+{
+	XLogSegNo	currSeg;
+	XLogSegNo	minSlotSeg;
+	uint64		keepSegs = 0;	/* # of segments actually kept */
+
+	XLByteToSeg(currLSN, currSeg, wal_segment_size);
+	XLByteToSeg(minSlotLSN, minSlotSeg, wal_segment_size);
+
+	/*
+	 * Calculate how many segments are kept by slots first. The second
+	 * term of the condition is just a sanity check.
+	 */
+	if (minSlotLSN != InvalidXLogRecPtr && minSlotSeg <= currSeg)
+		keepSegs = currSeg - minSlotSeg;
+
+	/* Cap keepSegs by max_slot_wal_keep_size */
+	if (max_slot_wal_keep_size_mb >= 0)
+	{
+		uint64 limitSegs;
+
+		limitSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
+
+		/* Reduce it if slots already reserves too many. */
+		if (limitSegs < keepSegs)
+			keepSegs = limitSegs;
+	}
+
+	/* but, keep at least wal_keep_segments segments if any */
+	if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
+		keepSegs = wal_keep_segments;
+
+	/* avoid underflow, don't go below 1 */
+	if (currSeg <= keepSegs)
+		return 1;
+
+	return currSeg - keepSegs;
+}
+
 /*
  * Retreat *logSegNo to the las

Re: Libpq support to connect to standby server as priority

2019-02-21 Thread Haribabu Kommi

On Thu, Feb 14, 2019 at 1:04 PM Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> >   No.  It's not good if the user has to be bothered by
> > default_transaction_read_only when he simply wants to a standby.
> >
> >
> >
> > OK. Understood.
> > so if we are going to differentiate between readonly and standby types,
> > then I still
> > feel that adding a prefer-read to target_session_attrs is still valid
> > improvement.
>
> I agree that it's valid improvement to add prefer-read to
> target_session_attr, as a means to "get a read-only session."
>
> > But the above improvement can be enhanced once the base work of
> GUC_REPORT
> > is finished.
>
> Is it already in progress in some thread, or are you trying to start from
> scratch?  (I may have done it, but I don't remember it well...)
>
> > Yes, I want to work on this patch, hopefully by next commitfest. In case
> > if I didn't get time,
> > I can ask for your help.
>
> I'm glad to hear that.  Sure.  I'd like to review your patch, and possibly
> add/modify code if necessary.  Are you going to add
> target_server_type={primary | standby | prefer_standby} as well as add
> prefer-read to target_session_attr?
>
>
> >   (I wonder which of server_type or server_role feels natural in
> > English.)
> >
> >
> >
> > server_type may be good as it stands with connection option
> > (target_server_type).
>
> Thanks, agreed.  That also follows PgJDBC's targetServerType.
>

Here I attached first set of patches that implemented the prefer-read
option after reporting
the transaction_read_only GUC to client. Along the lines of adding
prefer-read option patch,

1. I refactor the existing code to reduce the duplicate.
2. Added a enum to represent the user requested target_session_attrs type,
this is used in
comparison instead of doing a strcmp always.
3. Existing read-write code is modified to use the new reported GUC instead
of executing the
show command.

Basic patches are working, there may still need some documentation works.

Now I will add the another parameter target_server_type to choose the
primary, standby or prefer-standby
as discussed in the upthreads with a new GUC variable.

Regards,
Haribabu Kommi
Fujitsu Australia


0004-New-prefer-read-target_session_attrs-type.patch
Description: Binary data


0001-Restructure-the-code-to-remove-duplicate-code.patch
Description: Binary data


0002-New-TargetSessionAttrsType-enum.patch
Description: Binary data


0003-Make-transaction_read_only-as-GUC_REPORT-varaible.patch
Description: Binary data

RE: Problem during Windows service start