Re: Extra word in src/backend/optimizer/README

2018-08-31 Thread Etsuro Fujita

(2018/08/31 4:37), Magnus Hagander wrote:

On Thu, Aug 30, 2018 at 1:27 PM, Etsuro Fujita
mailto:fujita.ets...@lab.ntt.co.jp>> wrote:
Here is a small patch to remove $SUBJECT: s/has contains/contains/



Definitely looks correct. A good first test to verify your own
commit/push privileges, perhaps?


Yeah, this is a good exercise for me, so I committed the patch.

Thanks!

Best regards,
Etsuro Fujita



Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes

2018-08-31 Thread Justin Pryzby
On Thu, Aug 30, 2018 at 01:25:00PM -0700, Andres Freund wrote:
> On August 30, 2018 1:24:12 PM PDT, Justin Pryzby  wrote:
> >On Thu, Aug 30, 2018 at 01:18:59PM -0700, Andres Freund wrote:
> >> Could you check if both of the proposed attempts at fixing the issue
> >> also solves your problem?
> >
> >Just checking that you're referring to:
> >
> >1)  relcache-rebuild.diff
> >https://www.postgresql.org/message-id/20180829083730.n645apqhb2gyih3g%40alap3.anarazel.de
> >2)  fix-missed-inval-msg-accepts-1.patch
> >https://www.postgresql.org/message-id/25024.1535579899%40sss.pgh.pa.us
> 
> Right. Either should fix the issue.

Confirmed that relcache-rebuild.diff seems to fix/avoid the issue under 10.5,
and fix-missed-inval-msg-accepts-1.patch does so for HEAD (but doesn't apply
cleanly to 10.5 so not tested).  That's for both the the new easy-to-hit
symptom and the pre-existing hard-to-hit issue affecting pg10.4 and earlier.

Thanks,
Justin



Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Michael Banck
Hi,

Am Donnerstag, den 30.08.2018, 19:02 -0400 schrieb Tom Lane:
> Andres Freund  writes:
> > On 2018-08-30 18:11:40 -0400, Tom Lane wrote:
> > > I suspect people will complain about the added cost of doing that.
> > I think the compiler will just optimize it away.
> 
> Maybe.  In any case, the attached version avoids any need for memcpy
> and is, I think, cleaner code anyhow.  It fixes the problem for me
> with Fedora's gcc, though I'm not sure that it'd be enough to get rid
> of the warning Michael sees (I wonder what compiler he's using).

This fixes the bogus checksums, thanks!

I am on Debian stable, i.e. 'gcc version 6.3.0 20170516 (Debian 6.3.0-
18+deb9u1)'. 

The warning shows up only if I add -Wall *and* -O2 to CFLAGS, I think I
did not mention that explictly before.

As it is in pg_verify_checksums.c I need Magnus' patch as well to make
it go away. But the warning is independent of the runtime issue so less
of an issue (but probably worth fixing).


Michael 

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



RE: automatic restore point

2018-08-31 Thread Yotsunaga, Naoki
-Original Message-
From: Yotsunaga, Naoki [mailto:yotsunaga.na...@jp.fujitsu.com] 
Sent: Tuesday, June 26, 2018 10:18 AM
To: Postgres hackers 
Subject: automatic restore point

Hi, I attached a patch to output the LSN before execution to the server log 
when executing a specific command and accidentally erasing data.

A detailed background has been presented before.
In short explain: After the DBA's operation failure and erases the data, it is 
necessary to perform PITR immediately.
Since it is not possible to easily obtain information for doing the current 
PITR, I would like to solve it.

The specification has changed from the first proposal.
-Target command
 DROP TABLE
 TRUNCATE

-Setting file
 postgresql.conf
 log_recovery_points = on #default value is 'off'. When the switch is turned 
on, LSN is output to the server log when DROP TABLE, TRUNCATE is executed.

-How to use
1) When executing the above command, identify the command and recovery point 
that matches the resource indicating the operation failure from the server log. 
   
ex) LOG:  recovery_point_lsn: 0/201BB70
   STATEMENT:  drop table test ;
 2) Implement PostgreSQL document '25 .3.4.Recovering Using a Continuous 
Archive Backup.'
*Set "recovery_target_lsn = 'recovery_point_lsn'" at recovery.conf.

Although there was pointed out that the source becomes complicated in the past, 
we could add the function by adding about 20 steps.

What do you think about it? Do you think is it useful?
--
Naoki Yotsunaga






automatic_restore_point.patch
Description: automatic_restore_point.patch


Re: CREATE ROUTINE MAPPING

2018-08-31 Thread Masahiko Sawada
On Thu, Jan 25, 2018 at 2:13 PM, David Fetter  wrote:
> On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
>> >
>> >
>> > >
>> > > But other situations seem un-handle-able to me:
>> > >
>> > > SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
>> >
>> > Do we have any way, or any plan to make a way, to push the set (SELECT
>> > x FROM local_table WHERE active = true) to the remote side for
>> > execution there?  Obviously, there are foreign DBs that couldn't
>> > support this, but I'm guessing they wouldn't have much by way of UDFs
>> > either.
>> >
>>
>> No. The remote query has to be generated at planning time, so it can't make
>> predicates out of anything that can't be resolved into constants by the
>> planner itself. The complexities of doing so would be excessive, far better
>> to let the application developer split the queries up because they know
>> better which parts have to resolve first.
>
> So Corey and I, with lots of inputs from Andrew Gierth and Matheus
> Oliveira, have come up with a sketch of how to do this, to wit:
>
> - Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
>   LANGUAGE as parameters, but not both. This seems simpler, at least
>   in a proof of concept, than creating SQL standard compliant grammar
>   out of whole cloth.  The SQL standard grammar could be layered in
>   later via the rewriter if this turns out to work.

I'm also interested in this feature. While studying this feature, I
understood that this feature just pair a local function with a remote
function, not means that creates a kind of virtual function that can
be invoked on only foreign servers. For example, if we execute the
following SQL the local_func() is invoked in local because the col1
column of local_table is referenced by it.

SELECT * FROM local_table l WHERE local_func(l.col1) = 1;

On the other hand, suppose we have the following routine mapping,

CREATE ROUTINE MAPPING rmap FOR local_func(integer) OPTIONS
(remote_func_schema = 'myschema', remote_func_name = 'remote_func');

and execute the similar SQL for a foreign table. We will get the
following remote SQL.

- Local SQL
SELECT * FROM foreign_table f WHERE local_func(f.col1) = 1;

- Remote SQL
SELECT * FROM foreign_table f WHERE my_schema.remote_func(f.col1) = 1;

In this concept, the CREATE ROUTINE MAPPING doesn't need to specify
the return type of function but must specify the existing function in
the local PostgreSQL. The mapped remote function is expected to have
the same properly(arguments, return type etc) as the local function. I
might be missing something, please give me feedback.

Please find a attached PoC patch of ROUTINE MAPPING feature. This
patch is missing many things such as the doc and the shippability
supports but this patch adds the new system catalog pg_routine_mapping
with three attributes: name, procid, serverid and enables FDWs to
refer this mapping and and to replace the function.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


0001-PoC-Support-ROUTINE-MAPPING.patch
Description: Binary data


0002-Poc-postgres_fdw-support-routine-mappings.patch
Description: Binary data


TR: pgadmin not displaying data from postgresql_fdw

2018-08-31 Thread Olivier Leprêtre
Hi,

 

Please find a question that didn't get an answer in the pgsql-sql list. I
hope I'll get an answer here.

 

Thanks,

 

Olivier

 

De : Olivier Leprêtre [mailto:o.lepre...@gmail.com] 
Envoyé : mardi 28 août 2018 17:37
À : 'pgsql-...@lists.postgresql.org'
Objet : pgadmin not displaying data from postgresql_fdw

 

Hi,

 

When doing some testing about postgresql_fdw extension to connect local and
remote databases, I was surprised to notice that the local connected schema
seems to have no table in pgadmin 3 or pgadmin 4 trees.

 

Those commands ran fine :

CREATE SERVER fdw _prod01 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'localhost', dbname '_prod01', port '5433');

CREATE USER MAPPING FOR postgres SERVER fdw_prod01 OPTIONS (user 'postgres',
password '');

CREATE SCHEMA fdw_prod01a;

IMPORT FOREIGN SCHEMA pre_prod FROM SERVER fdw_prod01 INTO fdw_prod01a;

 

It's then possible to apply select, insert or delete commands on fdw_prod01a
schema. Going to the 'remote' database shows inserted data into pre_prod
schema.

 

this command returns tables list from local fdw_prod01a :

SELECT distinct table_schema,table_name FROM information_schema.TABLES where
table_schema = 'fdw_prod01a'

 

So everything seems fine, except the pgadmin tree which does not display any
table under the fdw_prod01 branch, even if it's possible to query it's
tables with sql.

 

Did I forget something ? Is that a bug ? Does postgresql_fdw store
informations in such a way that pgadmin is not able to get them, even when
they exists in pg tables ?

 

Thanks for any answer,

 

Olivier

 

 



Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-08-31 Thread Masahiko Sawada
On Tue, Aug 28, 2018 at 12:55 PM, Shinoda, Noriyoshi (PN Japan GCS
Delivery)  wrote:
> Hi Everyone, thank you for your comment.
>
>>> I like the direction of your thinking, but it seems to me that this
>>> would cause a problem if you want to set search_path=foo,bar.
>>... OPTIONS ( host 'remhost1', port '5433', dbname 'demodb', 
>>option='option1=foo', option='option2=bar' );
>
> I do not want to allow postgres_fdw to set "all" GUCs. Because I think it is 
> very difficult to check the validity of all input values.
> So, I attached modified the patch so that we can easily add GUC that we can 
> set to postgres_fdw module.
> If you wish to add more GUC options to the module, add values to the 
> non_libpq_options[] array of the InitPgFdwOptions function,
> And add the validator code for the GUC in the postgres_fdw_validator function.
>

FWIW, in term of usability I'm not sure that setting GUCs per foreign
servers would be useful for users. The setting parameters per foreign
servers affects all statements that touches it, which is different
behavior from the local some parameters, especially for query tuning
parameters. Is it worth to consider another ways to specify GUC
parameter per statements or transactions?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: remove ancient pre-dlopen dynloader code

2018-08-31 Thread Peter Eisentraut
On 16/08/2018 16:10, Andres Freund wrote:
>>> If I had my druthers, we'd just remove all that configure magic for
>>> selecting these files and just use ifdefs.  Personally I find it
>>> occasionally that they're linked into place, rather than built under
>>> their original name.
>>
>> Even if we all agreed that was an improvement (which I'm not sure of),
>> it wouldn't fix this problem would it?  On affected platforms, the
>> file would still be empty after preprocessing.
> 
> Well, that depends on what you put into that file, it seems
> realistically combinable with a bunch of non-conditional code...

How about this: We only have two nonstandard dlopen() implementations
left: Windows and (old) HP-UX.  We move those into src/port/dlopen.c and
treat it like a regular libpgport member.  That gets rid of all those
duplicative empty per-platform files.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 4511238fb6eab337e697e6080f1a1f33ba114f99 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 29 Aug 2018 20:57:07 +0200
Subject: [PATCH] Refactor dlopen() support

Nowadays, all platforms except Windows and older HP-UX have standard
dlopen() support.  So having a separate implementation per platform
under src/backend/port/dynloader/ is a bit excessive.  Instead, treat
dlopen() like other library functions that happen to be missing
sometimes and put a replacement implementation under src/port/.
---
 configure | 43 +--
 configure.in  |  8 +--
 src/backend/Makefile  |  2 +-
 src/backend/port/.gitignore   |  1 -
 src/backend/port/Makefile |  2 +-
 src/backend/port/dynloader/aix.c  |  7 --
 src/backend/port/dynloader/aix.h  | 39 --
 src/backend/port/dynloader/cygwin.c   |  3 -
 src/backend/port/dynloader/cygwin.h   | 36 --
 src/backend/port/dynloader/darwin.c   | 35 -
 src/backend/port/dynloader/darwin.h   |  8 ---
 src/backend/port/dynloader/freebsd.c  |  7 --
 src/backend/port/dynloader/freebsd.h  | 38 --
 src/backend/port/dynloader/hpux.c | 68 --
 src/backend/port/dynloader/hpux.h | 25 ---
 src/backend/port/dynloader/linux.c|  7 --
 src/backend/port/dynloader/linux.h| 38 --
 src/backend/port/dynloader/netbsd.c   |  7 --
 src/backend/port/dynloader/netbsd.h   | 38 --
 src/backend/port/dynloader/openbsd.c  |  7 --
 src/backend/port/dynloader/openbsd.h  | 38 --
 src/backend/port/dynloader/solaris.c  |  7 --
 src/backend/port/dynloader/solaris.h  | 38 --
 src/backend/port/dynloader/win32.h| 19 -
 src/backend/postmaster/postmaster.c   |  1 -
 src/backend/utils/fmgr/dfmgr.c| 31 
 src/include/.gitignore|  1 -
 src/include/Makefile  |  4 +-
 src/include/pg_config.h.in|  8 +++
 src/include/port.h| 23 ++
 src/include/utils/dynamic_loader.h| 25 ---
 .../port/dynloader/win32.c => port/dlopen.c}  | 72 +--
 src/tools/msvc/Install.pm |  5 +-
 src/tools/msvc/Mkvcbuild.pm   |  3 -
 src/tools/msvc/Solution.pm|  7 --
 src/tools/msvc/clean.bat  |  1 -
 36 files changed, 163 insertions(+), 539 deletions(-)
 delete mode 100644 src/backend/port/dynloader/aix.c
 delete mode 100644 src/backend/port/dynloader/aix.h
 delete mode 100644 src/backend/port/dynloader/cygwin.c
 delete mode 100644 src/backend/port/dynloader/cygwin.h
 delete mode 100644 src/backend/port/dynloader/darwin.c
 delete mode 100644 src/backend/port/dynloader/darwin.h
 delete mode 100644 src/backend/port/dynloader/freebsd.c
 delete mode 100644 src/backend/port/dynloader/freebsd.h
 delete mode 100644 src/backend/port/dynloader/hpux.c
 delete mode 100644 src/backend/port/dynloader/hpux.h
 delete mode 100644 src/backend/port/dynloader/linux.c
 delete mode 100644 src/backend/port/dynloader/linux.h
 delete mode 100644 src/backend/port/dynloader/netbsd.c
 delete mode 100644 src/backend/port/dynloader/netbsd.h
 delete mode 100644 src/backend/port/dynloader/openbsd.c
 delete mode 100644 src/backend/port/dynloader/openbsd.h
 delete mode 100644 src/backend/port/dynloader/solaris.c
 delete mode 100644 src/backend/port/dynloader/solaris.h
 delete mode 100644 src/backend/port/dynloader/win32.h
 delete mode 100644 src/include/utils/dynamic_loader.h
 rename src/{backend/port/dynloader/win32.c => port/dlopen.c} (51%)

diff --git a/configure b/configure
index dd94c5bbab..dd77742c46 100755
--- a/configure
+++ b/configure
@@ -15060,7 +15060,7 @@ fi
 LIBS_including_r

Re: Hint to set owner for tablespace directory

2018-08-31 Thread Rafia Sabih
On Fri, Aug 24, 2018 at 3:05 PM, Maksim Milyutin 
wrote:

> On 08/24/2018 05:18 AM, Michael Paquier wrote:
>
> On Thu, Aug 23, 2018 at 02:24:25PM +0300, Maksim Milyutin wrote:
>>
>>> I want to add patch that prints hint to set required owner for the
>>> tablespace directory if this is the cause of the problem (*errno ==
>>> EPERM*
>>> after calling *chmod*).
>>>
>> Please do not forget to add this patch to the next commit fest.
>>
>
> Thanks, done.


Adding to that this patch needs a rebase. And, please don't forget to run
'git diff --check', as it has some white-space issues.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


Re: Undo logs

2018-08-31 Thread Dilip Kumar
Hello hackers,

As Thomas has already mentioned upthread that we are working on an
undo-log based storage and he has posted the patch sets for the lowest
layer called undo-log-storage.

This is the next layer which sits on top of the undo log storage,
which will provide an interface for prepare, insert, or fetch the undo
records. This layer will use undo-log-storage to reserve the space for
the undo records and buffer management routine to write and read the
undo records.

To prepare an undo record, first, it will allocate required space
using undo_log_storage module. Next, it will pin and lock the required
buffers and return an undo record pointer where it will insert the
record.  Finally, it calls the Insert routine for final insertion of
prepared record. Additionally, there is a mechanism for multi-insert,
wherein multiple records are prepared and inserted at a time.

To fetch an undo record, a caller must provide a valid undo record
pointer. Optionally, the caller can provide a callback function with
the information of the block and offset, which will help in faster
retrieval of undo record, otherwise, it has to traverse the undo-chain.

These patch sets will apply on top of the undo-log-storage branch [1],
commit id fa3803a048955c4961581e8757fe7263a98fe6e6.

[1] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/


undo_interface_v1.patch is the main patch for providing the undo interface.
undo_interface_test_v1.patch is a simple test module to test the undo
interface layer.


On Thu, May 31, 2018 at 4:27 AM, Thomas Munro
 wrote:
> Hi Simon,
>
> On Mon, May 28, 2018 at 11:40 PM, Simon Riggs  wrote:
>> On 24 May 2018 at 23:22, Thomas Munro  wrote:
>>> The lowest level piece of this work is a physical undo log manager,
>>
>>> 1.  Efficient appending of new undo data from many concurrent
>>> backends.  Like logs.
>>> 2.  Efficient discarding of old undo data that isn't needed anymore.
>>> Like queues.
>>> 3.  Efficient buffered random reading of undo data.  Like relations.
>>
>> Like an SLRU?
>
> Yes, but with some difference:
>
> 1.  There is a variable number of undo logs.  Each one corresponds to
> a range of the 64 bit address space, and has its own head and tail
> pointers, so that concurrent writers don't contend for buffers when
> appending data.  (Unlike SLRUs which are statically defined, one for
> clog.c, one for commit_ts.c, ...).
> 2.  Undo logs use regular buffers instead of having their own mini
> buffer pool, ad hoc search and reclamation algorithm etc.
> 3.  Undo logs support temporary, unlogged and permanent storage (=
> local buffers and reset-on-crash-restart, for undo data relating to
> relations of those persistence levels).
> 4.  Undo logs storage files are preallocated (rather than being
> extended block by block), and the oldest file is renamed to become the
> newest file in common cases, like WAL.
>
>>> [4] 
>>> https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/access/undo
>>> [5] 
>>> https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/storage/smgr
>>
>> I think there are quite a few design decisions there that need to be
>> discussed, so lets crack on and discuss them please.
>
> What do you think about using the main buffer pool?
>
> Best case: pgbench type workload, discard pointer following closely
> behind insert pointer, we never write anything out to disk (except for
> checkpoints when we write a few pages), never advance the buffer pool
> clock hand, and we use and constantly recycle 1-2 pages per connection
> via the free list (as can be seen by monitoring insert - discard in
> the pg_stat_undo_logs view).
>
> Worst case: someone opens a snapshot and goes out to lunch so we can't
> discard old undo data, and then we start to compete with other stuff
> for buffers, and we hope the buffer reclamation algorithm is good at
> its job (or can be improved).
>
> I just talked about this proposal at a pgcon unconference session.
> Here's some of the feedback I got:
>
> 1.  Jeff Davis pointed out that I'm probably wrong about not needing
> FPI, and there must at least be checksum problems with torn pages.  He
> also gave me an idea on how to fix that very cheaply, and I'm still
> processing that feedback.
> 2.  Andres Freund thought it seemed OK if we have smgr.c routing to
> md.c for relations and undofile.c for undo, but if we're going to
> generalise this technique to put other things into shared buffers
> eventually too (like the SLRUs, as proposed by Shawn Debnath in
> another unconf session) then it might be worth investigating how to
> get md.c to handle all of their needs.  They'd all just use fd.c
> files, after all, so it'd be weird if we had to maintain several
> different similar things.
> 3.  Andres also suggested that high frequency free page list access
> might be quite contended in the "best case" described above.  I'll look
> into that.
> 4.  Someone said that segment sizes probably shouldn't be hard coded
> (cf 

Re: automatic restore point

2018-08-31 Thread Pavel Stehule
2018-08-31 10:14 GMT+02:00 Yotsunaga, Naoki 
:

> -Original Message-
> From: Yotsunaga, Naoki [mailto:yotsunaga.na...@jp.fujitsu.com]
> Sent: Tuesday, June 26, 2018 10:18 AM
> To: Postgres hackers 
> Subject: automatic restore point
>
> Hi, I attached a patch to output the LSN before execution to the server
> log when executing a specific command and accidentally erasing data.
>
> A detailed background has been presented before.
> In short explain: After the DBA's operation failure and erases the data,
> it is necessary to perform PITR immediately.
> Since it is not possible to easily obtain information for doing the
> current PITR, I would like to solve it.
>
> The specification has changed from the first proposal.
> -Target command
>  DROP TABLE
>  TRUNCATE
>
> -Setting file
>  postgresql.conf
>  log_recovery_points = on #default value is 'off'. When the switch is
> turned on, LSN is output to the server log when DROP TABLE, TRUNCATE is
> executed.
>
> -How to use
> 1) When executing the above command, identify the command and recovery
> point that matches the resource indicating the operation failure from the
> server log.
> ex) LOG:  recovery_point_lsn: 0/201BB70
>STATEMENT:  drop table test ;
>  2) Implement PostgreSQL document '25 .3.4.Recovering Using a Continuous
> Archive Backup.'
> *Set "recovery_target_lsn = 'recovery_point_lsn'" at recovery.conf.
>
> Although there was pointed out that the source becomes complicated in the
> past, we could add the function by adding about 20 steps.
>
> What do you think about it? Do you think is it useful?
>

I think it is useful and simple.

Regards

Pavel


> --
> Naoki Yotsunaga
>
>
>
>
>


Re: Hint to set owner for tablespace directory

2018-08-31 Thread Maksim Milyutin

On 08/31/2018 11:55 AM, Rafia Sabih wrote:



Adding to that this patch needs a rebase. And, please don't forget to 
run 'git diff --check', as it has some white-space issues.


Hmm, it's odd. Provided patch is fluently applied on the current master 
HEAD (2e39f69) and *git diff --check* doesn't print any warnings.


--
Regards, Maksim Milyutin



Re: Hint to set owner for tablespace directory

2018-08-31 Thread Rafia Sabih
On Fri, Aug 31, 2018 at 3:30 PM, Maksim Milyutin 
wrote:

> On 08/31/2018 11:55 AM, Rafia Sabih wrote:
>
>
> Adding to that this patch needs a rebase. And, please don't forget to run
> 'git diff --check', as it has some white-space issues.
>
>
>  git apply /home/edb/Desktop/patches/others/owner_tbsp_hint.patch
 /home/edb/Desktop/patches/others/owner_tbsp_hint.patch:9: trailing
whitespace.
{
/home/edb/Desktop/patches/others/owner_tbsp_hint.patch:10: trailing
whitespace.
bool wrong_owner = (errno == EPERM);
/home/edb/Desktop/patches/others/owner_tbsp_hint.patch:11: trailing
whitespace.

/home/edb/Desktop/patches/others/owner_tbsp_hint.patch:16: trailing
whitespace.
location),
/home/edb/Desktop/patches/others/owner_tbsp_hint.patch:17: trailing
whitespace.
 wrong_owner ? errhint("Install the PostgreSQL system
user as "
error: patch failed: src/backend/commands/tablespace.c:585
error: src/backend/commands/tablespace.c: patch does not apply

This is what I got. Please correct me if I am missng anything. I am using
centos 7,  just an FYI.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


Re: Negotiating the SCRAM channel binding type

2018-08-31 Thread Peter Eisentraut
On 05/08/2018 14:45, Michael Paquier wrote:
> On Sun, Aug 05, 2018 at 03:30:43PM +0300, Heikki Linnakangas wrote:
>> That test just tested that the scram_channel_binding libpq option works, but
>> I removed the option. I know you wanted to keep it as a feature flag, but as
>> discussed earlier, I don't think that'd be useful.
> 
> Sorry for the noise, I missed that there is still the test "Basic SCRAM
> authentication with SSL" so that would be fine.  I would have preferred
> keeping around the negative test so as we don't break SSL connections
> when the client enforced cbind_flag to 'n' as that would be useful when
> adding new SSL implementations as that would avoid manual tests which
> people will most likely forget, but well...

I was updating the gnutls patch for the changed channel binding setup,
and I noticed that the 002_scram.pl test now passes even though the
gnutls patch currently does not support channel binding.  So AFAICT,
we're not testing the channel binding functionality there at all.  Is
that as intended?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_verify_checksums vs windows

2018-08-31 Thread Amit Kapila
On Thu, Aug 30, 2018 at 5:04 PM Magnus Hagander  wrote:
>
> On Thu, Aug 30, 2018 at 1:32 PM, Amit Kapila  wrote:
>>
>>
>> Okay.  I will commit this in a day or so after once verifying it on
>> PG11 as well.  I think this needs to be backpatched, let me know if
>> you think otherwise.
>>
>
> Definitely a bug so yes, it needs backpatching.
>

Okay, pushed!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Undo logs

2018-08-31 Thread Dilip Kumar
On Fri, Aug 31, 2018 at 3:08 PM, Dilip Kumar  wrote:
> Hello hackers,
>
> As Thomas has already mentioned upthread that we are working on an
> undo-log based storage and he has posted the patch sets for the lowest
> layer called undo-log-storage.
>
> This is the next layer which sits on top of the undo log storage,
> which will provide an interface for prepare, insert, or fetch the undo
> records. This layer will use undo-log-storage to reserve the space for
> the undo records and buffer management routine to write and read the
> undo records.
>
> To prepare an undo record, first, it will allocate required space
> using undo_log_storage module. Next, it will pin and lock the required
> buffers and return an undo record pointer where it will insert the
> record.  Finally, it calls the Insert routine for final insertion of
> prepared record. Additionally, there is a mechanism for multi-insert,
> wherein multiple records are prepared and inserted at a time.
>
> To fetch an undo record, a caller must provide a valid undo record
> pointer. Optionally, the caller can provide a callback function with
> the information of the block and offset, which will help in faster
> retrieval of undo record, otherwise, it has to traverse the undo-chain.
>
> These patch sets will apply on top of the undo-log-storage branch [1],
> commit id fa3803a048955c4961581e8757fe7263a98fe6e6.
>
> [1] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/
>
>
> undo_interface_v1.patch is the main patch for providing the undo interface.
> undo_interface_test_v1.patch is a simple test module to test the undo
> interface layer.
>

Thanks to Robert Haas for designing an early prototype for forming
undo record. Later, I’ve completed the remaining parts of the code
including undo record prepare, insert, fetch and other related APIs
with help of Rafia Sabih. Thanks to Amit Kapila for providing valuable
design inputs.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Hint to set owner for tablespace directory

2018-08-31 Thread Maksim Milyutin

On 08/31/2018 01:12 PM, Rafia Sabih wrote:




On Fri, Aug 31, 2018 at 3:30 PM, Maksim Milyutin > wrote:


On 08/31/2018 11:55 AM, Rafia Sabih wrote:



Adding to that this patch needs a rebase. And, please don't
forget to run 'git diff --check', as it has some white-space issues.


 git apply /home/edb/Desktop/patches/others/owner_tbsp_hint.patch
 /home/edb/Desktop/patches/others/owner_tbsp_hint.patch:9: trailing 
whitespace.

        {
/home/edb/Desktop/patches/others/owner_tbsp_hint.patch:10: trailing 
whitespace.

            bool wrong_owner = (errno == EPERM);


This is hex+ASCII display (from *hexdump -C owner_tbsp_hint.patch* 
output) of code fragment above:


01a0  0a 20 09 09 65 6c 73 65  0a 2b 09 09 7b 0a 2b 09  |. 
..else.+..{.+.|
01b0  09 09 62 6f 6f 6c 20 77  72 6f 6e 67 5f 6f 77 6e |..bool 
wrong_own|
01c0  65 72 20 3d 20 28 65 72  72 6e 6f 20 3d 3d 20 45 |er = (errno 
== E|
01d0  50 45 52 4d 29 3b 0a 2b  0a 20 09 09 09 65 72 65 |PERM);.+. 
...ere|


Problem lines don't have trailing whitespaces, only line feed (0x0a) symbols

This is what I got. Please correct me if I am missng anything. I am 
using centos 7,  just an FYI.


My git version is 2.7.4.

--
Regards, Maksim Milyutin



RE: speeding up planning with partitions

2018-08-31 Thread Kato, Sho
Hi, Amit

Great!
With the total number of records being 6400, I benchmarked while increasing the 
number of partitions from 100 to 6400.
Applying three all patches, 20% performance improved for 100 partitions.

I have the same number of records for each partition, do you do the same?

Also, in my case, performance was better when not prepare.
I think these patches do not improve execute case, so we need faster runtime 
pruning patch[1], right?

Details of measurement conditions and results are as follows.

- base source
  master(@777e6ddf17) + Speeding up Insert v8 patch[1]

- table definition(e.g. 100 partition)
  create table test.accounts(aid serial, abalance int) partition by range(aid);
  create table test.accounts_history(id serial, aid int, delta int, mtime 
timestamp without time zone) partition by range(aid);
  
  create table test.account_part_1 partition of test.accounts for values from 
(1) to (65);
  create table test.account_part_2 partition of test.accounts for values from 
(65) to (129);
  ...
  create table test.account_part_100 partition of test.accounts for values from 
(6337) to (6400);
  
  create table test.ah_part_1 partition of test.accounts_history for values 
from (1) to (65);
  create table test.ah_part_2 partition of test.accounts_history for values 
from (65) to (129);
  ...
  create table test.ah_part_100 partition of test.accounts_history for values 
from (6337) to (6400);

- benchmark script
  \set aid random(1, 6400)
  \set delta random(-5000, 5000)
  BEGIN;
  UPDATE test.accounts SET abalance = abalance + :delta WHERE aid = :aid;
  SELECT abalance FROM test.accounts WHERE aid = :aid;
  INSERT INTO test.accounts_history (aid, delta, mtime) VALUES (:aid, :delta, 
CURRENT_TIMESTAMP);
  END;

- command option
  pgbench -d testdb -f benchmark.pgbench -T 180 -r -n -M prepare
  pgbench -d testdb -f benchmark.pgbench -T 180 -r -n
  
-results
  base source no prepared
   part_num |   tps_ex   | update_latency | select_latency | insert_latency 
  --++++
100 | 662.414805 |  0.357 |  0.265 |  0.421
200 | 494.478431 |  0.439 |  0.349 |  0.579
400 | 307.982089 |  0.651 |  0.558 |  0.927
800 | 191.360676 |  0.979 |  0.876 |  1.548
   1600 |  75.344947 |  2.253 |  2.003 |  4.301
   3200 |  30.643902 |  5.716 |  4.955 | 10.118
   6400 |  16.726056 | 12.512 |  8.582 | 18.054

  0001 no prepared
   part_num |   tps_ex   | update_latency | select_latency | insert_latency 
  --++++
100 | 429.816329 |  0.811 |   0.75 |  0.365
200 | 275.211531 |  1.333 |  1.248 |  0.501
400 | 160.499833 |  2.384 |  2.252 |  0.754
800 |  79.387776 |  4.935 |  4.698 |  1.468
   1600 |  24.787377 | 16.593 | 15.954 |  4.302
   3200 |   9.846421 |  42.96 | 42.139 |  8.848
   6400 |   4.919772 |  87.43 | 83.109 |  16.56

  0001 prepared
   part_num |   tps_ex   | update_latency | select_latency | insert_latency 
  --++++
100 | 245.100776 |  2.728 |  0.374 |  0.476
200 | 140.249283 |  5.116 |  0.603 |  0.686
400 |  67.608559 | 11.383 |  1.055 |  1.179
800 |  23.085806 | 35.781 |  2.585 |  2.677
   1600 |   6.211247 |141.093 |  7.096 |  6.785
   3200 |   1.808214 |508.045 | 15.741 | 13.243
   6400 |   0.495635 |   1919.362 | 37.691 | 28.177

  0001 + 0002 no prepared
   part_num |   tps_ex   | update_latency | select_latency | insert_latency 
  --++++
100 |  682.53091 |  0.388 |   0.35 |   0.35
200 | 469.906601 |  0.543 |  0.496 |   0.51
400 | 321.915349 |   0.78 |  0.721 |  0.752
800 | 201.620975 |  1.246 |  1.156 |  1.236
   1600 |  94.438204 |  2.612 |  2.335 |  2.745
   3200 |  38.292922 |  6.657 |  5.579 |  6.808
   6400 |   21.48462 | 11.989 | 10.104 | 12.601

  0001 + 0002 prepared
   part_num |   tps_ex   | update_latency | select_latency | insert_latency 
  --++++
100 |  591.10863 |  0.433 |  0.342 |  0.422
200

Re: Startup cost of sequential scan

2018-08-31 Thread Alexander Korotkov
On Thu, Aug 30, 2018 at 6:55 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > I understand that startup cost is not "time to find the first row".
> > But I think this example highlight not one but two issues.
> > 1) Row count estimates for joins are wrong.
>
> Yup.
>
> > 2) Rows are assumed to be continuous while in reality they are
> > discrete.
>
> Where do you get that from?

I made a similar example, where estimates are good.  It's quite
artificial, because it selects small limit from enormous virtual table
constructed by cross joins.  But it illustrates how our model,
assuming tuples to be continuous, can differ from reality.

create table t1 (id int primary key, k int);
create table t2 (id int);

insert into t1 (select i, i from generate_series(1,100) i);
insert into t2 (select 0 from generate_series(1,100) i);
vacuum analyze t1, t2;

By default, the query is processed using sequential scan for t1.

# explain analyze select * from t1,t2 x1,t2 x2,t2 x3,t2 x4,t2 x5 where
t1.id = 50 limit 100;
  QUERY PLAN
--
 Limit  (cost=0.00..1.26 rows=100 width=28) (actual
time=32.879..32.926 rows=100 loops=1)
   ->  Nested Loop  (cost=0.00..126279562.00 rows=100
width=28) (actual time=32.878..32.912 rows=100 loops=1)
 ->  Nested Loop  (cost=0.00..1279559.75 rows=1
width=24) (actual time=32.873..32.873 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..29557.50 rows=100
width=20) (actual time=32.868..32.868 rows=1 loops=1)
 ->  Nested Loop  (cost=0.00..17055.25 rows=1
width=16) (actual time=32.864..32.864 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..16928.00
rows=100 width=12) (actual time=32.856..32.856 rows=1 loops=1)
 ->  Seq Scan on t1
(cost=0.00..16925.00 rows=1 width=8) (actual time=32.841..32.841
rows=1 loops=1)
   Filter: (id = 50)
   Rows Removed by Filter: 501983
 ->  Seq Scan on t2 x1
(cost=0.00..2.00 rows=100 width=4) (actual time=0.011..0.011 rows=1
loops=1)
   ->  Materialize  (cost=0.00..2.50 rows=100
width=4) (actual time=0.007..0.007 rows=1 loops=1)
 ->  Seq Scan on t2 x2
(cost=0.00..2.00 rows=100 width=4) (actual time=0.003..0.003 rows=1
loops=1)
 ->  Materialize  (cost=0.00..2.50 rows=100
width=4) (actual time=0.003..0.003 rows=1 loops=1)
   ->  Seq Scan on t2 x3  (cost=0.00..2.00
rows=100 width=4) (actual time=0.002..0.003 rows=1 loops=1)
   ->  Materialize  (cost=0.00..2.50 rows=100 width=4)
(actual time=0.003..0.003 rows=1 loops=1)
 ->  Seq Scan on t2 x4  (cost=0.00..2.00 rows=100
width=4) (actual time=0.002..0.003 rows=1 loops=1)
 ->  Materialize  (cost=0.00..2.50 rows=100 width=4) (actual
time=0.004..0.024 rows=100 loops=1)
   ->  Seq Scan on t2 x5  (cost=0.00..2.00 rows=100
width=4) (actual time=0.003..0.010 rows=100 loops=1)
 Planning Time: 0.372 ms
 Execution Time: 32.987 ms
(20 rows)

But if we disable sequential scan and bitmap scan then it would be
index scan, which is much faster.
# set enable_seqscan = off;
# set enable_bitmapscan = off;

# explain analyze select * from t1,t2 x1,t2 x2,t2 x3,t2 x4,t2 x5 where
t1.id = 50 limit 100;

QUERY PLAN
-
 Limit  (cost=500.43..501.69 rows=100 width=28)
(actual time=0.041..0.092 rows=100 loops=1)
   ->  Nested Loop  (cost=500.43..50126262645.44
rows=100 width=28) (actual time=0.040..0.079 rows=100 loops=1)
 ->  Nested Loop  (cost=400.43..40001262643.19
rows=1 width=24) (actual time=0.036..0.036 rows=1 loops=1)
   ->  Nested Loop  (cost=300.42..3012640.94
rows=100 width=20) (actual time=0.033..0.033 rows=1 loops=1)
 ->  Nested Loop
(cost=200.42..2000138.69 rows=1 width=16) (actual
time=0.029..0.029 rows=1 loops=1)
   ->  Nested Loop
(cost=100.42..111.44 rows=100 width=12) (actual
time=0.023..0.023 rows=1 loops=1)
 ->  Index Scan using t1_pkey on t1
(cost=0.42..8.44 rows=1 width=8) (actual time=0.015..0.015 rows=1
loops=1)
   Index Cond: (id = 50)
 ->  Seq Scan on t2 x1
(cost=100.00..102.00 rows=100 width=4) (actual
time=0.007..0.007 rows=1 loops=1)
   ->  Materialize
(cost=100.00..102.50 rows=100 width=4) (a

Re: pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)

2018-08-31 Thread Yugo Nagata
On Mon, 27 Aug 2018 21:05:33 +0900
Yugo Nagata  wrote:

> On Mon, 27 Aug 2018 13:34:12 +0200
> Michael Banck  wrote:
> 
> > Hi,
> > 
> > On Mon, Aug 27, 2018 at 07:53:36PM +0900, Yugo Nagata wrote:
> > > On Fri, 24 Aug 2018 18:01:09 +0200
> > > Peter Eisentraut  wrote:
> > > > I'm curious about this option:
> > > > 
> > > >   -r RELFILENODE check only relation with specified relfilenode
> > > > 
> > > > but there is no facility to specify a database.
> > > > 
> > > > Also, referring to the relfilenode of a mapped relation seems a bit
> > > > inaccurate.
> > > > 
> > > > Maybe reframing this in terms of the file name of the file you want
> > > > checked would be better?
> > > 
> > > If we specified 1234 to -r option, pg_verify_shceksums checks not only 
> > > 1234
> > > but also 1234_vm, 1234_fsm, and 1234.1, 1234.2, ... and so on, so I think
> > > it makes senses to allow to specify a relfilenode instead of a file name.
> > > 
> > > I think it is reasonable to add a option to specify a database, although
> > > I don't know which character is good because both -d and -D are already 
> > > used
> > 
> > Maybe the -d (debug) option should be revisited as well. Mentioning
> > every scanned block generates a huge amount of output which might be
> > useful during development but does not seem very useful for a stable
> > release. AFAICT there is no other debug output for now.
> > 
> > So it could be renamed to -v (verbose) and only mention each scanned
> > file, e.g. (errors/checksum mismatches are still reported of course).
> > 
> > Then -d could (in the future, I guess that is too late for v11) be used
> > for -d/--dbname (or make that only a long option, if the above does not
> > work).
> 
> I realized after sending the previous post that we can not specify a database
> by name because pg_verify_checksum is run in offline and this can not get the
> OID from the database name.  Also, there are global and pg_tblspc directories
> not only base/. So, it seems to me good to specify a directories
> to scan which is under PGDATA. We would be able to use -d ( or --directory ?)
> for this purpose.

Attached is a patch to allow pg_verity_checksums to specify a database
to scan.  This is usefule for users who want to verify checksums of relations
in a specific database. We can specify a database by OID using -d or --dboid 
option.
Also, when -g or --global-only is used only shared relations are scaned.

Regards,
-- 
Yugo Nagata 
>From 99135eec747b0f115b8fbdbf092c85808a70da85 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Thu, 30 Aug 2018 18:48:00 +0900
Subject: [PATCH] Allow pg_verify_checksums to specify a database to scan

---
 doc/src/sgml/ref/pg_verify_checksums.sgml | 20 +++
 .../pg_verify_checksums/pg_verify_checksums.c | 35 ---
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 3a3433b1c8..782cf35ca0 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -70,6 +70,26 @@ PostgreSQL documentation
   
  
 
+ 
+  -d oid
+  --dboid=oid
+  
+   
+Only validate checksums in the relations in the database with specified OID.
+   
+  
+ 
+
+ 
+  -g
+  --globel-only
+  
+   
+Only validate checksums in the relations in the shared database.
+   
+  
+ 
+
  
   -r relfilenode
   
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 1eb3bed2b9..0b9c03ac30 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -31,6 +31,8 @@ static int64 badblocks = 0;
 static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
+static char *only_dboid = NULL;
+static bool only_global = false;
 static bool verbose = false;
 
 static const char *progname;
@@ -44,6 +46,8 @@ usage()
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
 	printf(_("  -v, --verbose  output verbose messages, list all checked files\n"));
+	printf(_("  -d, --dboid=OIDcheck only relations in database with specified OID\n"));
+	printf(_("  -g, --global-only  check only shared relations\n"));
 	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
@@ -198,7 +202,13 @@ scan_directory(char *basedir, char *subdir)
 #else
 		else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn))
 #endif
+		{
+			if (atoi(de->d_name) != 0 && strcmp(subdir, "pg_tblspc") &&
+only_dboid && strcmp(only_dboid, de->d_name) != 0)
+continue;
+
 			scan_directory(path, de->d_name);
+		}
 	}
 	closedir(dir);
 }
@@ -

Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Peter Eisentraut
On 31/08/2018 01:37, Tom Lane wrote:
> The fact that some of these are pretty old and we've not noticed is
> not good.  It suggests that we don't currently have any compilers in the
> buildfarm that under-align char[] arrays on the stack, which seems like
> a gotcha waiting to bite us.  I wonder if there is any way to persuade
> some compiler on a non-Intel box to do that.

-Wcast-align=strict (requires gcc-8) warns about the issue in
pg_verify_checksums.c, but also about a handful^W^W15211 other things,
so it wouldn't be immediately useful.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GnuTLS support

2018-08-31 Thread Peter Eisentraut
On 20/08/2018 05:13, Michael Paquier wrote:
> Patch v6 of this thread is failing to apply.  Could you rebase?

attached

Changes in v7 since v6:

- Added support for ssl_passphrase_command.

- Test suite needed some adjustment because GnuTLS doesn't appear to
understand the previously used file format for encrypted keys.

- Removed tls-unique channel binding support.  Support for
tls-server-end-point still needs to be added, but it could be a separate
patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 85465f4ad3e210a3948216ebc5c6fbd8c6993bdb Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 31 Aug 2018 13:00:26 +0200
Subject: [PATCH v7] GnuTLS support

---
 configure | 166 +++-
 configure.in  |  37 +-
 doc/src/sgml/client-auth.sgml |   2 +-
 doc/src/sgml/config.sgml  |  69 +-
 doc/src/sgml/installation.sgml|  47 +-
 doc/src/sgml/libpq.sgml   |  54 +-
 doc/src/sgml/runtime.sgml |  13 +-
 doc/src/sgml/sslinfo.sgml |   1 +
 src/Makefile.global.in|   1 +
 src/backend/libpq/Makefile|   4 +-
 src/backend/libpq/be-secure-gnutls.c  | 795 +
 src/backend/libpq/be-secure-openssl.c |   4 +-
 src/backend/libpq/be-secure.c |   3 +
 src/backend/libpq/hba.c   |   2 +-
 src/backend/utils/misc/guc.c  |  25 +
 src/backend/utils/misc/postgresql.conf.sample |  11 +-
 src/common/Makefile   |   4 +-
 src/common/sha2_gnutls.c  |  99 +++
 src/include/common/sha2.h |  14 +-
 src/include/libpq/libpq-be.h  |  15 +-
 src/include/libpq/libpq.h |   1 +
 src/include/pg_config.h.in|  17 +
 src/include/pg_config_manual.h|   2 +-
 src/interfaces/libpq/.gitignore   |   1 +
 src/interfaces/libpq/Makefile |  14 +-
 src/interfaces/libpq/fe-secure-gnutls.c   | 803 ++
 src/interfaces/libpq/fe-secure.c  |   2 +-
 src/interfaces/libpq/libpq-fe.h   |   2 +-
 src/interfaces/libpq/libpq-int.h  |  14 +-
 src/port/pg_strong_random.c   |  18 +-
 src/test/Makefile |   2 +-
 src/test/ssl/Makefile |   7 +-
 src/test/ssl/ssl/server-password.key  |  35 +-
 src/test/ssl/t/001_ssltests.pl|  63 +-
 src/test/ssl/t/002_scram.pl   |   2 +-
 src/tools/msvc/Mkvcbuild.pm   |  10 +
 src/tools/pgindent/typedefs.list  |   3 +
 37 files changed, 2248 insertions(+), 114 deletions(-)
 create mode 100644 src/backend/libpq/be-secure-gnutls.c
 create mode 100644 src/common/sha2_gnutls.c
 create mode 100644 src/interfaces/libpq/fe-secure-gnutls.c

diff --git a/configure b/configure
index dd94c5bbab..564f33ae5d 100755
--- a/configure
+++ b/configure
@@ -707,6 +707,7 @@ UUID_EXTRA_OBJS
 with_uuid
 with_systemd
 with_selinux
+with_gnutls
 with_openssl
 with_ldap
 with_krb_srvnam
@@ -853,6 +854,7 @@ with_bsd_auth
 with_ldap
 with_bonjour
 with_openssl
+with_gnutls
 with_selinux
 with_systemd
 with_readline
@@ -1553,6 +1555,7 @@ Optional Packages:
   --with-ldap build with LDAP support
   --with-bonjour  build with Bonjour support
   --with-openssl  build with OpenSSL support
+  --with-gnutls   build with GnuTLS support
   --with-selinux  build with SELinux support
   --with-systemd  build with systemd support
   --without-readline  do not use GNU Readline nor BSD Libedit for editing
@@ -7933,6 +7936,47 @@ fi
 $as_echo "$with_openssl" >&6; }
 
 
+
+#
+# GnuTLS
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with GnuTLS 
support" >&5
+$as_echo_n "checking whether to build with GnuTLS support... " >&6; }
+
+
+
+# Check whether --with-gnutls was given.
+if test "${with_gnutls+set}" = set; then :
+  withval=$with_gnutls;
+  case $withval in
+yes)
+
+$as_echo "#define USE_GNUTLS 1" >>confdefs.h
+
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-gnutls option" "$LINENO" 
5
+  ;;
+  esac
+
+else
+  with_gnutls=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_gnutls" >&5
+$as_echo "$with_gnutls" >&6; }
+
+
+if test "$with_openssl" = yes && test "$with_gnutls" = yes; then
+  as_fn_error $? "cannot specify both --with-openssl and --with-gnutls" 
"$LINENO" 5
+fi
+
+
 #
 # SELinux
 #
@@ -12052,6 +12096,107 @@ done
 
 fi
 
+if test "$with_gnutls" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing 
gnutls_init" >&5
+$as_echo_n "checking for library 

Re: FETCH FIRST clause PERCENT option

2018-08-31 Thread Surafel Temesgen
On Tue, Aug 28, 2018 at 7:33 PM Erik Rijkers  wrote:

> ;
>
> TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
> Line: 42

The attache patch include a fix for the crash .can you check it again?
Regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..8491b7831a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,8 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+with PERCENT count specifies the maximum number of rows to return 
+in percentage.ROW
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index bb28cf7d1d..e842e17542 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -43,6 +43,7 @@ ExecLimit(PlanState *pstate)
 	LimitState *node = castNode(LimitState, pstate);
 	ScanDirection direction;
 	TupleTableSlot *slot;
+	TupleDesc   tupleDescriptor;
 	PlanState  *outerPlan;
 
 	CHECK_FOR_INTERRUPTS();
@@ -52,6 +53,8 @@ ExecLimit(PlanState *pstate)
 	 */
 	direction = node->ps.state->es_direction;
 	outerPlan = outerPlanState(node);
+	slot = node->ps.ps_ResultTupleSlot;
+	tupleDescriptor = node->ps.ps_ResultTupleSlot->tts_tupleDescriptor;
 
 	/*
 	 * The main logic is a simple state machine.
@@ -60,6 +63,23 @@ ExecLimit(PlanState *pstate)
 	{
 		case LIMIT_INITIAL:
 
+			if (node->limitOption == PERCENTAGE)
+			{
+
+			/*
+			 * Find all rows the plan will return.
+			 */
+for (;;)
+{
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		break;
+	}
+	tuplestore_puttupleslot(node->totalTuple, slot);
+}
+			}
+
 			/*
 			 * First call for this node, so compute limit/offset. (We can't do
 			 * this any earlier, because parameters from upper nodes will not
@@ -87,24 +107,46 @@ ExecLimit(PlanState *pstate)
 return NULL;
 			}
 
-			/*
-			 * Fetch rows from subplan until we reach position > offset.
-			 */
-			for (;;)
+			if (node->limitOption == PERCENTAGE)
 			{
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+for (;;)
 {
-	/*
-	 * The subplan returns too few tuples for us to produce
-	 * any output at all.
-	 */
-	node->lstate = LIMIT_EMPTY;
-	return NULL;
+	slot = MakeSingleTupleTableSlot(tupleDescriptor);
+	if (!tuplestore_gettupleslot(node->totalTuple, true, true, slot))
+	{
+		node->lstate = LIMIT_EMPTY;
+		return NULL;
+	}
+	else
+	{
+		node->subSlot = slot;
+		if (++node->position > node->offset)
+		break;
+	}
+}
+			}
+			else if (node->limitOption == EXACT_NUMBER)
+			{
+
+/*
+ * Fetch rows from subplan until we reach position > offset.
+ */
+for (;;)
+{
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		/*
+		 * The subplan returns too few tuples for us to produce
+		 * any output at all.
+		 */
+		node->lstate = LIMIT_EMPTY;
+		return NULL;
+	}
+	node->subSlot = slot;
+	if (++node->position > node->offset)
+		break;
 }
-node->subSlot = slot;
-if (++node->position > node->offset)
-	break;
 			}
 
 			/*
@@ -144,18 +186,34 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+if (node->limitOption == PERCENTAGE)
+{
+	if (tuplestore_gettupleslot(node->totalTuple, true, false, slot))
+	{
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+}
+else if (node->limitOption == EXACT_NUMBER)
+{
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
-{
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	 /*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-31 Thread Etsuro Fujita

(2018/08/30 20:25), Etsuro Fujita wrote:

(2018/08/29 18:40), Etsuro Fujita wrote:

(2018/08/29 0:21), Jonathan S. Katz wrote:

On Aug 24, 2018, at 8:38 AM, Etsuro
Fujita wrote:
(2018/08/24 11:47), Michael Paquier wrote:

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:

I tried this today, but doing git behind the corporate firewall
doesn't
work. I don't know the clear cause of that, so I'll investigate that
tomorrow.


You may be able to tweak that by using https as origin point or proper
git proxy settings?


Yeah, my proxy settings were not correct. With the help of my
colleagues Horiguchi-san and Yamada-san, I corrected them but still
can't clone the master repository. Running git with GIT_CURL_VERBOSE
shows that there is another issue in my terminal environment, so I'm
trying to resolve that.


Are there any updates on getting this patch committed?


That investigation has shown that the cause is my company firewall, not
my terminal environment; that firewall has to be configured to allow me
to access to that repository. So, I'm asking my company about that.


I got the approval from my boss today, so I think that I can have that
access soon.


I fixed typos in the commit message, which Alvaro pointed out off-list, 
and revised the message a little bit.  Also, I reread the patch and 
noticed that the latest version includes now-unnecessary regression test 
cases; those were added to the previous version 
(refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that 
apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() 
work for cases where whole-row Vars are involved, because I modified 
those functions, but the latest version doesn't touch those functions 
anymore, so those test cases seems unnecessary.  So I removed those (no 
other changes), and committed the patch.


Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5B683F60.2070806%40lab.ntt.co.jp



Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-31 Thread Jonathan S. Katz
On 8/31/18 7:54 AM, Etsuro Fujita wrote:
> (2018/08/30 20:25), Etsuro Fujita wrote:
>> (2018/08/29 18:40), Etsuro Fujita wrote:
>>> (2018/08/29 0:21), Jonathan S. Katz wrote:
> On Aug 24, 2018, at 8:38 AM, Etsuro
> Fujita wrote:
> (2018/08/24 11:47), Michael Paquier wrote:
>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>> I tried this today, but doing git behind the corporate firewall
>>> doesn't
>>> work. I don't know the clear cause of that, so I'll investigate
>>> that
>>> tomorrow.
>>
>> You may be able to tweak that by using https as origin point or
>> proper
>> git proxy settings?
>
> Yeah, my proxy settings were not correct. With the help of my
> colleagues Horiguchi-san and Yamada-san, I corrected them but still
> can't clone the master repository. Running git with GIT_CURL_VERBOSE
> shows that there is another issue in my terminal environment, so I'm
> trying to resolve that.

 Are there any updates on getting this patch committed?
>>>
>>> That investigation has shown that the cause is my company firewall, not
>>> my terminal environment; that firewall has to be configured to allow me
>>> to access to that repository. So, I'm asking my company about that.
>>
>> I got the approval from my boss today, so I think that I can have that
>> access soon.
>
> I fixed typos in the commit message, which Alvaro pointed out
> off-list, and revised the message a little bit.  Also, I reread the
> patch and noticed that the latest version includes now-unnecessary
> regression test cases; those were added to the previous version
> (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that
> apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths()
> work for cases where whole-row Vars are involved, because I modified
> those functions, but the latest version doesn't touch those functions
> anymore, so those test cases seems unnecessary.  So I removed those
> (no other changes), and committed the patch.

Thank you for taking care of that and for committing the patch. I have
now closed this issues on the open items list.

Jonathan




signature.asc
Description: OpenPGP digital signature


Re: Hint to set owner for tablespace directory

2018-08-31 Thread Maksim Milyutin

30.08.2018 19:52, Peter Eisentraut wrote:


On 23/08/2018 13:24, Maksim Milyutin wrote:

I have noticed the novice users are stuck trying to create tablespace
over a directory whose owner is not the system postgres user. They
observed the message "could not set permissions on directory ...:
permission denied".

I want to add patch that prints hint to set required owner for the
tablespace directory if this is the cause of the problem (*errno ==
EPERM* after calling *chmod*).

I think the hint is backwards.  When you don't have permission to chmod
the tablespace directory, you probably want to fix the permissions on
the tablespace directory or its parent.


According with man page of chmod(2) when permissions on parent 
directories are insufficient the errno takes the EACCES value. However, 
in Linux man page of chmod(2) there been mentioned that the process 
could be not privileged (does not have the CAP_FOWNER capability) when 
errno is EPERM.



But the hint says to run the database server as a different user.  That's a bit 
unusual.


In this case I propose to:
- replace my initial hint message to the guess what to do if errno == 
EPERM, smth like "You might need to install the PostgreSQL system user 
as the owner of this directory";
- add another hint if errno == EACCES: "Fix permissions on the parent 
directories".


Any thoughts?

--
Regards,
Maksim Milyutin




Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-31 Thread Etsuro Fujita

(2018/08/31 21:30), Jonathan S. Katz wrote:

On 8/31/18 7:54 AM, Etsuro Fujita wrote:

(2018/08/30 20:25), Etsuro Fujita wrote:

(2018/08/29 18:40), Etsuro Fujita wrote:

(2018/08/29 0:21), Jonathan S. Katz wrote:

On Aug 24, 2018, at 8:38 AM, Etsuro
Fujita  wrote:
(2018/08/24 11:47), Michael Paquier wrote:

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:

I tried this today, but doing git behind the corporate firewall
doesn't
work. I don't know the clear cause of that, so I'll investigate
that
tomorrow.


You may be able to tweak that by using https as origin point or
proper
git proxy settings?


Yeah, my proxy settings were not correct. With the help of my
colleagues Horiguchi-san and Yamada-san, I corrected them but still
can't clone the master repository. Running git with GIT_CURL_VERBOSE
shows that there is another issue in my terminal environment, so I'm
trying to resolve that.


Are there any updates on getting this patch committed?


That investigation has shown that the cause is my company firewall, not
my terminal environment; that firewall has to be configured to allow me
to access to that repository. So, I'm asking my company about that.


I got the approval from my boss today, so I think that I can have that
access soon.


I fixed typos in the commit message, which Alvaro pointed out
off-list, and revised the message a little bit.  Also, I reread the
patch and noticed that the latest version includes now-unnecessary
regression test cases; those were added to the previous version
(refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that
apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths()
work for cases where whole-row Vars are involved, because I modified
those functions, but the latest version doesn't touch those functions
anymore, so those test cases seems unnecessary.  So I removed those
(no other changes), and committed the patch.


Thank you for taking care of that and for committing the patch. I have
now closed this issues on the open items list.


Thanks!

Best regards,
Etsuro Fujita



Progress reporting for pg_verify_checksums

2018-08-31 Thread Michael Banck
Hi,

my colleague Bernd Helmle recently added progress reporting to our
pg_checksums application[1]. I have now forward ported this to
pg_verify_checksums for the September commitfest, please see the
attached patch.

Here's the description:

This optionally prints the progress of pg_verify_checksums via read
kilobytes to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.


Michael

[1] https://github.com/credativ/pg_checksums/commit/2b691
-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzcommit ad69b43a24a6eef8a7deef4c9810ee312f3496fb
Author: Michael Banck 
Date:   Fri Aug 31 13:04:59 2018 +0200

Add progress reporting to pg_verify_checksums.

This optionally prints the progress of pg_verify_checksums via read kilobytes
to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster, the
caller can send SIGUSR1 to pg_verify_checksums to turn progress status
reporting on during runtime.

Author: Bernd Helmle

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index a941236563..27b1103902 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "pg_getopt.h"
 
@@ -32,9 +34,18 @@ static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static bool show_progress = false;
 
 static const char *progname;
 
+/*
+ * Progress status information.
+ */
+int64		total_size = 0;
+int64		current_size = 0;
+pg_time_t	last_progress_update;
+pg_time_t	scan_started;
+
 static void
 usage()
 {
@@ -45,6 +56,7 @@ usage()
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
+	printf(_("  -P show progress information\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
 	printf(_("\nIf no data directory (DATADIR) is specified, "
@@ -60,6 +72,69 @@ static const char *skip[] = {
 	NULL,
 };
 
+static void
+enable_progress_report(int signo,
+	   siginfo_t * siginfo,
+	   void *context)
+{
+
+	/* we handle SIGUSR1 only */
+	if (signo == SIGUSR1)
+	{
+		show_progress = true;
+	}
+
+}
+
+/*
+ * Report current progress status. Parts borrowed from
+ * PostgreSQLs' src/bin/pg_basebackup.c
+ */
+static void
+report_progress(bool force)
+{
+	pg_time_t	now = time(NULL);
+	int			total_percent = 0;
+
+	char		totalstr[32];
+	char		currentstr[32];
+	char		currspeedstr[32];
+
+	/* Make sure we just report at least once a second */
+	if ((now == last_progress_update) && !force)
+		return;
+
+	/* Save current time */
+	last_progress_update = now;
+
+	/*
+	 * Calculate current percent done, based on KiB...
+	 */
+	total_percent = total_size ? (int64) ((current_size / 1024) * 100 / (total_size / 1024)) : 0;
+
+	/* Don't display larger than 100% */
+	if (total_percent > 100)
+		total_percent = 100;
+
+	/* The same for total size */
+	if (current_size > total_size)
+		total_size = current_size / 1024;
+
+	snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
+			 total_size / 1024);
+	snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
+			 current_size / 1024);
+	snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
+			 (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
+	fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
+			currentstr, totalstr, total_percent, currspeedstr);
+
+	if (isatty(fileno(stderr)))
+		fprintf(stderr, "\r");
+	else
+		fprintf(stderr, "\n");
+}
+
 static bool
 skipfile(char *fn)
 {
@@ -76,7 +151,7 @@ skipfile(char *fn)
 }
 
 static void
-scan_file(char *fn, int segmentno)
+scan_file(char *fn, int segmentno, bool sizeonly)
 {
 	char		buf[BLCKSZ];
 	PageHeader	header = (PageHeader) buf;
@@ -113,6 +188,7 @@ scan_file(char *fn, int segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		current_size += r;
 		if (csum != header->pd_checksum)
 		{
 			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
@@ -120,6 +196,9 @@ scan_file(char *fn, int segme

TR: redundant constraint_schema

2018-08-31 Thread Olivier Leprêtre
Hi,

 

Please find a question that didn't get an answer in the pgsql-sql list. I
hope I'll get an answer here.

 

Thanks,

 

Olivier

 

 

De : Olivier Leprêtre [mailto:o.lepre...@gmail.com] 
Envoyé : mercredi 29 août 2018 15:49
À : 'pgsql-...@lists.postgresql.org'
Objet : redundant constraint_schema

 

Hi,

 

Can someone explain why, when a column is not created (add column if not
exists), a redundant constraint is still created from the REFERENCES part ?

 

 

I have a patching script that is supposed to add column if not existing :

 

ALTER TABLE myschem.table1

  ADD COLUMN IF NOT EXISTS col1 VARCHAR(254) REFERENCES
myschem.table2(col2)

 

When col1 already exists, I expected that nothing would happen. But, when
applying the previous query and then querying :

 

select constraint_name from information_schema.key_column_usage where
constraint_schema='myschem'

 

I notice that a new constraint "table1_col2_fkeyxxx" is created each time
the previous ALTER TABLE is called (with xxx being a new number each time)

 

Thanks,

 

Olivier

 



PostgreSQL logical decoder output plugin - unchanged toast data

2018-08-31 Thread Georgy Buranov
Hi. I have a question about PostgreSQL logical decoder output plugin.

I am not specialist in Postgres at all, so maybe I miss some very
basic point. In the plugin, I want to always get all the values (even
those that are unchanged  toast data)


When I try to get the datum that is internal on disk (here is code)

```
struct varlena *s = (struct varlena *)DatumGetPointer(origval);
struct varlena * ret = heap_tuple_untoast_attr(s);
Datum result = PointerGetDatum(PG_DETOAST_DATUM(ret));
```
it fails with no known snapshots error (from heap_tuple_untoast_attr).
My question is why is it that.

So, even theoretically it is not possible to the the varlena on disk
from logical replication plugin?



Mit freundlichen Grüßen,
Georgy Buranov



Re: Hint to set owner for tablespace directory

2018-08-31 Thread Tom Lane
Maksim Milyutin  writes:
> 30.08.2018 19:52, Peter Eisentraut wrote:
>> I think the hint is backwards.  When you don't have permission to chmod
>> the tablespace directory, you probably want to fix the permissions on
>> the tablespace directory or its parent.

> In this case I propose to:
> - replace my initial hint message to the guess what to do if errno == 
> EPERM, smth like "You might need to install the PostgreSQL system user 
> as the owner of this directory";
> - add another hint if errno == EACCES: "Fix permissions on the parent 
> directories".

I agree with what I think Peter is saying: the hint should just recommend
fixing permissions on the directory, for either errno code.  The more
detail you try to give, the more likely the hint will be wrong
... especially when you're dealing with errno codes that aren't all that
well standardized, as these aren't.

regards, tom lane



Re: psql \dC incorrectly shows casts "with inout" as "binary coercible" on 9.5.14 and 11beta3

2018-08-31 Thread Tom Lane
"jean.pierre.pelletier0"  writes:
> To reproduce, compare the output of \dC on two built-in casts(json to jsonb) 
> and (xml to text) where only the the first is really "with inout".

Hm, yeah, it just does

  "   CASE WHEN castfunc = 0 THEN '(binary 
coercible)'\n"
  "ELSE p.proname\n"
  "   END as \"%s\",\n"

without regard for the castmethod column (which it can't necessarily
assume is there, anyway).  It's hard to be sure after all these
years whether this was intentional or just an oversight, unless maybe
Heikki remembers ... but I tend to agree that "(with inout)" would be
more apropos than "(binary coercible)".

Not sure if this rises to the level of a back-patchable bug.
People might be surprised if we change that output in minor releases.
But we could still squeeze it into v11, I think.

regards, tom lane



Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-08-31 Thread Andres Freund
Hi,

Hi,

On 2018-08-31 15:36:26 +0200, Georgy Buranov wrote:
> I am not specialist in Postgres at all, so maybe I miss some very
> basic point. In the plugin, I want to always get all the values (even
> those that are unchanged  toast data)

> When I try to get the datum that is internal on disk (here is code)
> 
> ```
> struct varlena *s = (struct varlena *)DatumGetPointer(origval);
> struct varlena * ret = heap_tuple_untoast_attr(s);
> Datum result = PointerGetDatum(PG_DETOAST_DATUM(ret));
> ```
> it fails with no known snapshots error (from heap_tuple_untoast_attr).
> My question is why is it that.

Yes, that's not possible in general. On-disk toasted data for tuples
from the WAL are not guaranteed in any way to be retained. If that
weren't the case database tables would bloat while logical replication
is behind, and the sequential reads (i.e. fast) reads of logical
decoding would turn into random IO.

You can however alter the replication identity of tables to FULL. Then
the "old" tuple in change callbacks will have the full old tuple. But
that will increase the size of the WAL stream obviously.


> So, even theoretically it is not possible to the the varlena on disk
> from logical replication plugin?

Correct.

Greetings,

Andres Freund



Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-08-31 Thread Georgy Buranov
Ok, thank you very much for your explanation,

maybe I need something else in my case.

As far as I understand, "On-disk toasted data for tuples from the WAL
are not guaranteed in any way to be retain", but still, the LATEST
value for the same cell should exist in postgres (in on-disk toast if
it is huge, or not).

If I cannot get access to the on-disk toasted data for tuple from the
WAL, can I have the access to the _latest_ value in this case
(hopefully I describe it correct)

> Yes, that's not possible in general. On-disk toasted data for tuples
> from the WAL are not guaranteed in any way to be retained. If that
> weren't the case database tables would bloat while logical replication
> is behind, and the sequential reads (i.e. fast) reads of logical
> decoding would turn into random IO.

Mit freundlichen Grüßen,
Georgy Buranov



Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Aug 30, 2018 at 07:37:37PM -0400, Tom Lane wrote:
>> Anyway, I'll work on a patch for that, unless you were on it already?

> I have begun working on that and am less than halfway through it as I
> needed a fresh problem, however I am not sure I would be able to finish
> it today, perhaps tomorrow...  If you have time and want to press on as
> 11 would release soon, of course feel free to wrap up more quickly than
> I can.

Some of these places might be performance-critical enough that adding
a palloc/pfree cycle would not be nice.  What I was considering doing
was inventing something like

typedef union PGAlignedBuffer
{
chardata[BLCKSZ];
double  force_align;
} PGAlignedBuffer;

and replacing plain char[] variables with that as appropriate.
We might need one for XLOG_BLCKSZ too.

Since this'd be used in frontend as well as backend, it'd likely
have to end up in c.h, which is slightly annoying but not really
a big deal as long as we pick a nonconflicting type name.

regards, tom lane



Re: [HACKERS] WIP: Aggregation push-down

2018-08-31 Thread Antonin Houska
Antonin Houska  wrote:

> I've reworked the patch so that separate RelOptInfo is used for grouped
> relation. The attached patch is only the core part. Neither postgres_fdw
> changes nor the part that tries to avoid the final aggregation is included
> here. I'll add these when the patch does not seem to need another major 
> rework.

This is the next version. After having posted a few notes to the [1] thread,
I'm returning to the original one so it can be referenced from my entry in the
CF application. As proposed in the other thread, it uses only "simple
aggregation". The 2-stage aggregation, which gives more power to the feature
and which also enables paralle processing for it, will be coded in a separate
patch.

I eventually decided abandon the Var substitution proposed by Heikki in
[2]. It's not necessary if we accept the restriction that the feature accepts
only simple column reference as grouping expression so far.

[1] 
https://www.postgresql.org/message-id/c165b72e-8dbb-2a24-291f-113aeb67b76a%40iki.fi

[2] 
https://www.postgresql.org/message-id/113e9594-7c08-0f1f-ad15-41773b56a86b%40iki.fi

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



agg_pushdown_v8.tgz
Description: GNU Zip compressed data


Re: remove ancient pre-dlopen dynloader code

2018-08-31 Thread Tom Lane
Peter Eisentraut  writes:
> How about this: We only have two nonstandard dlopen() implementations
> left: Windows and (old) HP-UX.  We move those into src/port/dlopen.c and
> treat it like a regular libpgport member.  That gets rid of all those
> duplicative empty per-platform files.

+1.  I eyeballed the patch quickly and it looks sane, but I didn't
try to test it.  Being a lazy person, I don't want to test it manually,
but I'll be happy to queue up a gaur run as soon as you push it
(and if by some chance that shows a problem, I'll fix it).

regards, tom lane



Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-08-31 Thread Andres Freund
Hi,

On 2018-08-31 16:55:37 +0200, Georgy Buranov wrote:
> Ok, thank you very much for your explanation,
> 
> maybe I need something else in my case.
> 
> As far as I understand, "On-disk toasted data for tuples from the WAL
> are not guaranteed in any way to be retain", but still, the LATEST
> value for the same cell should exist in postgres (in on-disk toast if
> it is huge, or not).

No, there's absolutely no such guarantee.  The tuple could since have
been deleted, the column could have been updated to non-toasted, the
table could have been dropped, ...


> If I cannot get access to the on-disk toasted data for tuple from the
> WAL, can I have the access to the _latest_ value in this case
> (hopefully I describe it correct)

Again, you can set REPLICA IDENTITY to FULL and it'll be there.

Greetings,

Andres Freund



Re: remove ancient pre-dlopen dynloader code

2018-08-31 Thread Andres Freund
On 2018-08-31 10:52:18 +0200, Peter Eisentraut wrote:
> How about this: We only have two nonstandard dlopen() implementations
> left: Windows and (old) HP-UX.  We move those into src/port/dlopen.c and
> treat it like a regular libpgport member.  That gets rid of all those
> duplicative empty per-platform files.

Great!  Quickly skimmed the patch and it looks good.  I don't quite know
why you moved the implementation to src/port rather than
src/backend/port, but either is fine with me... Long term the former
probably is better.

Greetings,

Andres Freund



Re: Dimension limit in contrib/cube (dump/restore hazard?)

2018-08-31 Thread Alvaro Herrera
On 2018-Aug-30, Alexander Korotkov wrote:

> Personally I get tired with cube.out
> and cube_2.out.  They are different with only few checks involving
> scientific notation.  But all the patches touching cube regression
> tests should update both cube.out and cube_2.out.  I propose to split
> scientific notation checks into separate test.

Good idea.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: TupleTableSlot abstraction

2018-08-31 Thread Andres Freund
Hi,

On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote:
> On 28 August 2018 at 22:43, Ashutosh Bapat
> >> I think I was wrong at saying that we should remove this. I think you
> >> were right that it should become a callback...
> 
> > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
> > want to reinstantiate those as well? If so, slot_getsysattr() becomes
> > a wrapper around getsysattr() callback.

Right.

> One option is that the getsysattr() callback function returns false if
> system attributes are not supported for that slot type. Other option
> is that in the not-supported case, the function errors out, meaning
> that the caller should be aware that the slot type is the one that
> supports system attributes.

> I had prepared changes for the first option, but Ashutosh Bapat
> offlist made me realize that it's worth considering the 2nd option(
> i.e. erroring out).

I think we should error out.


> >> I still think this is an optimization with a negative benefit,
> >> especially as it requires an extra callback. We should just rely on
> >> slot_deform_tuple and then access that. That'll also just access the
> >> null bitmap for the relevant column, and it'll make successive accesses
> >> cheaper.
> >
> > I don't understand why we have differing implementations for
> > slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what
> > you are saying is true, we should have implemented all the first and
> > last as a call to slot_getsomeattrs() followed by returing values from
> > tts_values and tts_isnull. Since this is refactoring work, I am trying
> > not to change the existing functionality of those functions.
> 
> I agree that we should not change the way in which slot_getattr()
> finds the attr; i.e. first call att_isnull(), and only then try to
> deform the tuple. I mean there must be some good reason that is done
> on HEAD. Maybe we can change this separately after investigation, but
> not as part of the tuple abstraction patch.

There's really no good reason for the behaviour as it exists on HEAD. It
already can cause worse performance there. The price to pay for
continuing to have an optimization which isn't actually optimizing
anything is way too high if it requires us to have multiple functionally
unnecessary callbacks.

Greetings,

Andres Freund



Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-08-31 Thread Georgy Buranov
> Again, you can set REPLICA IDENTITY to FULL and it'll be there.
>

So, why I think this is complicated

* We use primary keys for all tables, so we do not need REPLICA
IDENTITY full actually. As far as I understand, it will make
master/slave replication ineffective as well
* I need the information about this primary key in rd_replidindex  of
relation (to send it later to kafka). As far as I understand, with
REPLICA IDENTITY FULL the rd_replidindex will not be the primary key



Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-08-31 Thread Andres Freund
On 2018-08-31 17:34:02 +0200, Georgy Buranov wrote:
> > Again, you can set REPLICA IDENTITY to FULL and it'll be there.
> >
> 
> So, why I think this is complicated
> 
> * We use primary keys for all tables, so we do not need REPLICA
> IDENTITY full actually. As far as I understand, it will make
> master/slave replication ineffective as well

What do you mean with "ineffective"?  That there's more data in the WAL?
Sure. Otherwise I don't know what you could mean.

There's no free lunch :/


> * I need the information about this primary key in rd_replidindex  of
> relation (to send it later to kafka). As far as I understand, with
> REPLICA IDENTITY FULL the rd_replidindex will not be the primary key

If you want the pkey, that's in rd_pkindex. rd_replidindex will only
differ if the identity is manually set to another candidate key
(possibly because there's no pkey).

Greetings,

Andres Freund



Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Tom Lane
Michael Banck  writes:
> Am Donnerstag, den 30.08.2018, 19:02 -0400 schrieb Tom Lane:
>> Maybe.  In any case, the attached version avoids any need for memcpy
>> and is, I think, cleaner code anyhow.  It fixes the problem for me
>> with Fedora's gcc, though I'm not sure that it'd be enough to get rid
>> of the warning Michael sees (I wonder what compiler he's using).

> This fixes the bogus checksums, thanks!
> I am on Debian stable, i.e. 'gcc version 6.3.0 20170516 (Debian 6.3.0-
> 18+deb9u1)'.

Ah-hah.  I still can't replicate any warning with gcc 8.1.1, but
I do have a Raspbian installation at hand, with
gcc version 6.3.0 20170516 (Raspbian 6.3.0-18+rpi1+deb9u1) 
and on that I get

$ gcc  -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fwrapv 
-fexcess-precision=standard -g -O2 -I../../../src/include  -D_GNU_SOURCE   -c 
-o pg_verify_checksums.o pg_verify_checksums.c
pg_verify_checksums.c: In function 'scan_file':
pg_verify_checksums.c:112:3: warning: dereferencing type-punned pointer will 
break strict-aliasing rules [-Wstrict-aliasing]
   if (PageIsNew(buf))
   ^~

Adding -Wstrict-aliasing=2 makes it slightly more verbose:

$ gcc -Wstrict-aliasing=2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fwrapv 
-fexcess-precision=standard -g -O2 -I../../../src/include  -D_GNU_SOURCE   -c 
-o pg_verify_checksums.o pg_verify_checksums.c
pg_verify_checksums.c: In function 'scan_file':
pg_verify_checksums.c:82:2: warning: dereferencing type-punned pointer will 
break strict-aliasing rules [-Wstrict-aliasing]
  PageHeader header = (PageHeader) buf;
  ^~
pg_verify_checksums.c:112:3: warning: dereferencing type-punned pointer will 
break strict-aliasing rules [-Wstrict-aliasing]
   if (PageIsNew(buf))
   ^~

Compiling this way also causes pg_verify_checksums to compute wrong
checksums.  Installing the patched version of checksum_impl.h fixes
that, but doesn't change the warnings, meaning they have absolutely
nothing to do with the actual problem :-(

So, even aside from the difficulty of getting rid of aliasing-unsafe
code in Postgres, the state of the compiler warning technology is
still years short of where we could sanely consider dispensing with
-fno-strict-aliasing in general.  Still, as I said before, it'd be
a good idea for checksum_impl.h to not depend on that.  I'll go
ahead and push the fix.

regards, tom lane



Re: PostgreSQL logical decoder output plugin - unchanged toast data

2018-08-31 Thread Georgy Buranov
Ok, I got false understanding that REPLICA IDENTITY is used for
something more than a WAL. This is basically not true. So, what I can
do

* Set the REPLICA IDENTITY to full, and in this case I can still get
the pk from rd_pkindex. In this case the WAL will be bigger, but we
will have all the values in it.
* Other solution is basically (since we know exact primary key) - get
the row, and take the latest value from there using SQL query on the
client side (other part of kafka)

Thank you very much fro your help

> What do you mean with "ineffective"?  That there's more data in the WAL?
> Sure. Otherwise I don't know what you could mean.
>
> There's no free lunch :/



Re: FailedAssertion on partprune

2018-08-31 Thread Jaime Casanova
On Thu, 16 Aug 2018 at 11:38, Alvaro Herrera  wrote:
>
> On 2018-Jul-23, Jaime Casanova wrote:
>
>
> > I was trying sqlsmith on REL_11_STABLE (commit
> > 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at
> > hand, and got an assertion failure.
> > It seems to happen during planning on prunning time but only when
> > tables get bigger than certain size.
>
> Hi Jaime
>
> Did you try your luck running sqlsmith after this fix?  It'd be
> interesting to know if there are further failures.
>

intermitently... so far so good!
i'm adding things (taken from release notes) slowly every often

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FailedAssertion on partprune

2018-08-31 Thread Jonathan S. Katz
On 8/29/18 1:38 PM, Robert Haas wrote:
> On Mon, Aug 27, 2018 at 6:05 PM, Jonathan S. Katz  
> wrote:
>> On behalf of the RMT, I just want to make sure this keeps moving along.
>> It sounds like the next step is for Robert to verify that [3] is the
>> expected
>> behavior and then David can decide what to do from there.
> 
> Yes, that's the expected behavior.  If we didn't regenerate the paths
> there, we'd end up with
> 
> Result
> -> Append
>  -> [various paths that produce a tlist which needs to be adjusted
> later by the result node]
> 
> Instead of:
> 
> Append
> -> [various paths that produce an adjusted tlist]
> 
> Paths of the former kind have already been generated; we regenerate
> paths here to get the latter kind as well, which end up displacing the
> old ones on cost.
> 

An update from the RMT: after our meeting today, we decided that we
would drop this from the list of open items for the 11 major release.
The initial issue was already fixed and we understand that developing
the test for this will take some work and we do not thing it is needed
for the v11 release.

Thanks again for working on this!

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: Negotiating the SCRAM channel binding type

2018-08-31 Thread Michael Paquier
On Fri, Aug 31, 2018 at 12:18:52PM +0200, Peter Eisentraut wrote:
> I was updating the gnutls patch for the changed channel binding setup,
> and I noticed that the 002_scram.pl test now passes even though the
> gnutls patch currently does not support channel binding.  So AFAICT,
> we're not testing the channel binding functionality there at all.  Is
> that as intended?

As far as I understood that's the intention.  One can still test easily
channel binding if you implement it so you can make sure that the
default SSL connection still works.  And you can also make sure that if
you don't implement channel binding then an SSL connection still works.
But you cannot make sure that if you have channel binding implemented
then the disabled path works.

I'd still like to think that having a way to enforce the disabled code
path over SSL has value, but you know, votes...
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15346: Replica fails to start after the crash

2018-08-31 Thread Michael Paquier
On Thu, Aug 30, 2018 at 11:23:54PM -0700, Michael Paquier wrote:
> Yes that's a matter of safety, as I put into the truck any modules which
> may use XLogFlush().  And that maps with the old code, so there is no
> more surprise.

Okay, I have pushed my previous version as that's the safest approach.
If you have any idea of improvements or clarifications, let's discuss
about those on a different thread and only for HEAD.

Many thanks Alexander for the whole work and Horiguchi-san for the
review.
--
Michael


signature.asc
Description: PGP signature


Re: Bug in slot.c and are replication slots ever used at Window?

2018-08-31 Thread Michael Paquier
Hi Konstantin,

On Thu, Aug 30, 2018 at 11:27:31AM -0700, Michael Paquier wrote:
> It seems to me that you are right here, "path" points to
> pg_replslot/$SLOTNAME/state which is a file so the fsync is incorrect.
> I am not sure that we'd want to publish fsync_parent_path out of fd.c
> though, so perhaps we could just save the slot path in a different
> variable and use it?

I have spent more time on this bug, and the code path you have pointed
at is the only one having such an issue.  Attached is a patch to fix the
problem.  It includes the sanity checks I have used to check all code
paths calling fsync_fname() for both the frontend and the backend code.

The checks will not be included in the final fix, still they look useful
so I am planning to start a new thread on the matter as perhaps other
folks have more and/or better ideas.
--
Michael
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 19978d9a9e..4f30904141 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1352,6 +1352,7 @@ RestoreSlotFromDisk(const char *name)
 {
 	ReplicationSlotOnDisk cp;
 	int			i;
+	char		slotdir[MAXPGPATH];
 	char		path[MAXPGPATH + 22];
 	int			fd;
 	bool		restored = false;
@@ -1361,13 +1362,14 @@ RestoreSlotFromDisk(const char *name)
 	/* no need to lock here, no concurrent access allowed yet */
 
 	/* delete temp file if it exists */
-	sprintf(path, "pg_replslot/%s/state.tmp", name);
+	sprintf(slotdir, "pg_replslot/%s", name);
+	sprintf(path, "%s/state.tmp", slotdir);
 	if (unlink(path) < 0 && errno != ENOENT)
 		ereport(PANIC,
 (errcode_for_file_access(),
  errmsg("could not remove file \"%s\": %m", path)));
 
-	sprintf(path, "pg_replslot/%s/state", name);
+	sprintf(path, "%s/state", slotdir);
 
 	elog(DEBUG1, "restoring replication slot from \"%s\"", path);
 
@@ -1402,7 +1404,7 @@ RestoreSlotFromDisk(const char *name)
 
 	/* Also sync the parent directory */
 	START_CRIT_SECTION();
-	fsync_fname(path, true);
+	fsync_fname(slotdir, true);
 	END_CRIT_SECTION();
 
 	/* read part of statefile that's guaranteed to be version independent */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1767..49a5640c61 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -574,6 +574,14 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
 void
 fsync_fname(const char *fname, bool isdir)
 {
+#ifdef USE_ASSERT_CHECKING
+	struct stat statbuf;
+
+	stat(fname, &statbuf);
+	Assert((isdir && S_ISDIR(statbuf.st_mode)) ||
+		   (!isdir && !S_ISDIR(statbuf.st_mode)));
+#endif
+
 	fsync_fname_ext(fname, isdir, false, ERROR);
 }
 
diff --git a/src/common/file_utils.c b/src/common/file_utils.c
index 48876061c3..36095b01af 100644
--- a/src/common/file_utils.c
+++ b/src/common/file_utils.c
@@ -266,6 +266,14 @@ fsync_fname(const char *fname, bool isdir, const char *progname)
 	int			flags;
 	int			returncode;
 
+#ifdef USE_ASSERT_CHECKING
+	struct stat statbuf;
+
+	stat(fname, &statbuf);
+	Assert((isdir && S_ISDIR(statbuf.st_mode)) ||
+		   (!isdir && !S_ISDIR(statbuf.st_mode)));
+#endif
+
 	/*
 	 * Some OSs require directories to be opened read-only whereas other
 	 * systems don't allow us to fsync files opened read-only; so we need both


signature.asc
Description: PGP signature


Re: psql \dC incorrectly shows casts "with inout" as "binary coercible" on 9.5.14 and 11beta3

2018-08-31 Thread Jean-Pierre Pelletier
Btw, pg_dump is handling this right.

Jean-Pierre Pelletier

Le ven. 31 août 2018 10:33, Tom Lane  a écrit :

> "jean.pierre.pelletier0"  writes:
> > To reproduce, compare the output of \dC on two built-in casts(json to
> jsonb) and (xml to text) where only the the first is really "with inout".
>
> Hm, yeah, it just does
>
>   "   CASE WHEN castfunc = 0 THEN '(binary
> coercible)'\n"
>   "ELSE p.proname\n"
>   "   END as \"%s\",\n"
>
> without regard for the castmethod column (which it can't necessarily
> assume is there, anyway).  It's hard to be sure after all these
> years whether this was intentional or just an oversight, unless maybe
> Heikki remembers ... but I tend to agree that "(with inout)" would be
> more apropos than "(binary coercible)".
>
> Not sure if this rises to the level of a back-patchable bug.
> People might be surprised if we change that output in minor releases.
> But we could still squeeze it into v11, I think.
>
> regards, tom lane
>


Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Tom Lane
I wrote:
> Some of these places might be performance-critical enough that adding
> a palloc/pfree cycle would not be nice.  What I was considering doing
> was inventing something like
>
> typedef union PGAlignedBuffer
> {
>   chardata[BLCKSZ];
>   double  force_align;
> } PGAlignedBuffer;

Here's a proposed patch using that approach.

Although some of the places that were using "char buf[BLCKSZ]" variables
weren't doing anything that really requires better alignment, I made
almost all of them use PGAlignedBuffer variables anyway, on the grounds
that you typically get better memcpy speed for aligned than unaligned
transfers.

I also fixed a few places that were using the palloc solution, and
one that was actually doing hand-alignment of the pointer (ugh).
I didn't try to be thorough about getting rid of such pallocs,
just fix places that seemed likely to be performance-critical.

This needs to be back-patched as relevant, of course.

regards, tom lane

diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index 4afdea7..5ae1dda 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -36,7 +36,7 @@ typedef struct
 	int64		indtuples;		/* total number of tuples indexed */
 	MemoryContext tmpCtx;		/* temporary memory context reset after each
  * tuple */
-	char		data[BLCKSZ];	/* cached page */
+	PGAlignedBuffer data;		/* cached page */
 	int			count;			/* number of tuples in cached page */
 } BloomBuildState;
 
@@ -52,7 +52,7 @@ flushCachedPage(Relation index, BloomBuildState *buildstate)
 
 	state = GenericXLogStart(index);
 	page = GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE);
-	memcpy(page, buildstate->data, BLCKSZ);
+	memcpy(page, buildstate->data.data, BLCKSZ);
 	GenericXLogFinish(state);
 	UnlockReleaseBuffer(buffer);
 }
@@ -63,8 +63,8 @@ flushCachedPage(Relation index, BloomBuildState *buildstate)
 static void
 initCachedPage(BloomBuildState *buildstate)
 {
-	memset(buildstate->data, 0, BLCKSZ);
-	BloomInitPage(buildstate->data, 0);
+	memset(buildstate->data.data, 0, BLCKSZ);
+	BloomInitPage(buildstate->data.data, 0);
 	buildstate->count = 0;
 }
 
@@ -84,7 +84,7 @@ bloomBuildCallback(Relation index, HeapTuple htup, Datum *values,
 	itup = BloomFormTuple(&buildstate->blstate, &htup->t_self, values, isnull);
 
 	/* Try to add next item to cached page */
-	if (BloomPageAddItem(&buildstate->blstate, buildstate->data, itup))
+	if (BloomPageAddItem(&buildstate->blstate, buildstate->data.data, itup))
 	{
 		/* Next item was added successfully */
 		buildstate->count++;
@@ -98,7 +98,7 @@ bloomBuildCallback(Relation index, HeapTuple htup, Datum *values,
 
 		initCachedPage(buildstate);
 
-		if (!BloomPageAddItem(&buildstate->blstate, buildstate->data, itup))
+		if (!BloomPageAddItem(&buildstate->blstate, buildstate->data.data, itup))
 		{
 			/* We shouldn't be here since we're inserting to the empty page */
 			elog(ERROR, "could not add new bloom tuple to empty page");
diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index 3cbb7c2..1a00c59 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -36,7 +36,7 @@ typedef enum
 	PREWARM_BUFFER
 } PrewarmType;
 
-static char blockbuffer[BLCKSZ];
+static PGAlignedBuffer blockbuffer;
 
 /*
  * pg_prewarm(regclass, mode text, fork text,
@@ -178,7 +178,7 @@ pg_prewarm(PG_FUNCTION_ARGS)
 		for (block = first_block; block <= last_block; ++block)
 		{
 			CHECK_FOR_INTERRUPTS();
-			smgrread(rel->rd_smgr, forkNumber, block, blockbuffer);
+			smgrread(rel->rd_smgr, forkNumber, block, blockbuffer.data);
 			++blocks_done;
 		}
 	}
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 8107697..ef5c367 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -616,7 +616,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
 	Page		lpage = PageGetTempPageCopy(BufferGetPage(origbuf));
 	Page		rpage = PageGetTempPageCopy(BufferGetPage(origbuf));
 	Size		pageSize = PageGetPageSize(lpage);
-	char		tupstore[2 * BLCKSZ];
+	PGAlignedBuffer tupstore[2];	/* could need 2 pages' worth of tuples */
 
 	entryPreparePage(btree, lpage, off, insertData, updateblkno);
 
@@ -625,7 +625,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
 	 * one after another in a temporary workspace.
 	 */
 	maxoff = PageGetMaxOffsetNumber(lpage);
-	ptr = tupstore;
+	ptr = tupstore[0].data;
 	for (i = FirstOffsetNumber; i <= maxoff; i++)
 	{
 		if (i == off)
@@ -658,7 +658,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
 	GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
 	GinInitPage(lpage, GinPageGetOpaque(rpage)->flags, pageSize);
 
-	ptr = tupstore;
+	ptr = tupstore[0].data;
 	maxoff++;
 	lsize = 0;
 
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index e32807e..915b9ca 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/back

Re: psql \dC incorrectly shows casts "with inout" as "binary coercible" on 9.5.14 and 11beta3

2018-08-31 Thread Tom Lane
I wrote:
> Not sure if this rises to the level of a back-patchable bug.
> People might be surprised if we change that output in minor releases.
> But we could still squeeze it into v11, I think.

I pushed a fix into HEAD & v11.

regards, tom lane



Re: WIP: Covering + unique indexes.

2018-08-31 Thread Alvaro Herrera
I'm wondering what's the genesis of this coninclude column actually.
As far as I can tell, the only reason this column is there, is to be
able to print the INCLUDE clause in a UNIQUE/PK constraint in ruleutils
... but surely the same list can be obtained from the pg_index.indkey
instead?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: psql \dC incorrectly shows casts "with inout" as "binary coercible" on 9.5.14 and 11beta3

2018-08-31 Thread Jean-Pierre Pelletier
Awesome, thanks!

Le ven. 31 août 2018 16:46, Tom Lane  a écrit :

> I wrote:
> > Not sure if this rises to the level of a back-patchable bug.
> > People might be surprised if we change that output in minor releases.
> > But we could still squeeze it into v11, I think.
>
> I pushed a fix into HEAD & v11.
>
> regards, tom lane
>


Re: Dimension limit in contrib/cube (dump/restore hazard?)

2018-08-31 Thread Alexander Korotkov
On Fri, Aug 31, 2018 at 6:18 PM Alvaro Herrera  wrote:
> On 2018-Aug-30, Alexander Korotkov wrote:
>
> > Personally I get tired with cube.out
> > and cube_2.out.  They are different with only few checks involving
> > scientific notation.  But all the patches touching cube regression
> > tests should update both cube.out and cube_2.out.  I propose to split
> > scientific notation checks into separate test.
>
> Good idea.

Thank you for the feedback!  Pushed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-31 Thread Andres Freund
On 2018-08-29 17:58:19 -0400, Tom Lane wrote:
> I wrote:
> > We could perhaps fix this with a less invasive change than what you
> > suggest here, by attacking the missed-call-due-to-recursion aspect
> > rather than monkeying with how relcache rebuild itself works.
> 
> Seeing that rearranging the relcache rebuild logic is looking less than
> trivial, I thought it'd be a good idea to investigate this alternative
> approach.
> 
> Essentially, the problem here is that lmgr.c supposes that
> AcceptInvalidationMessages is an atomic operation, which it most
> certainly isn't.  In reality, it's unsafe to suppose that we can skip
> reading incoming inval messages until we have *completed*
> AcceptInvalidationMessages, not just invoked it.
> 
> However, we can fix that very easily, by retaining one additional bit
> of state in each LOCALLOCK entry, which records whether we know we have
> completed AcceptInvalidationMessages at least once subsequent to obtaining
> that lock.  In the attached draft patch, I refer to that state as having
> "cleared" the lock, but I'm not super pleased with that terminology ---
> anybody have a better idea?
> 
> A small problem with the lock.c API as it stands is that we'd have to do
> an additional hashtable lookup to re-find the relevant LOCALLOCK entry.
> In the attached, I fixed that by extending LockAcquireExtended's API
> to allow the caller to obtain a pointer to the LOCALLOCK entry, making
> it trivially cheap to set the flag after we finish
> AcceptInvalidationMessages.  LockAcquireExtended has only one external
> caller right now, which makes me think it's OK to change its API rather
> than introduce YA variant entry point, but that could be argued.
> 
> There are two ways we could adjust the return value from
> LockAcquire(Extended) to cope with this requirement.  What I did here
> was to add an additional LockAcquireResult code, so that "already held"
> can be distinguished from "already held and cleared".  But we don't
> really need to make that distinction, at least not in the core code.
> So another way would be to redefine LOCKACQUIRE_ALREADY_HELD as meaning
> "held and cleared", and then just return LOCKACQUIRE_OK if you haven't
> called MarkLockClear for the lock.  I'm not entirely sure which way is
> less likely to break any third-party code that might be inspecting the
> result of LockAcquire.  You could probably argue it either way depending
> on what you think the third-party code is doing with the result.
> 
> Anyway, the attached appears to fix the problem: Andres' test script
> fails in fractions of a second with HEAD on my workstation, but it's
> run error-free for quite awhile now with this patch.  And this is sure
> a lot simpler than any relcache rebuild refactoring we're likely to come
> up with.
> 
> Discuss.

This is a neat idea.

I've one intuitive correctness concern that I can't quite nail down, I
plan to spend more time on those aspects (Partially around previously
held locks and invalidations generated with non-conflicting lock-modes).

Leaving that aside, I think there's one architectural aspect of my
approach that I prefer over yours: Deduplicating eager cache rebuilds
like my approach does seems quite advantageous. There's a lot of cases
where we end up sending out many cache rebuilds that invalidate the same
entries - as we do not defer rebuilds, that leads to repeated
rebuilds. My approach avoids overhead associated with that, as long as
the invalidations are queued close enough together.

We could obviously just do both, but that seems unnecessary.

Greetings,

Andres Freund



Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Michael Paquier
On Fri, Aug 31, 2018 at 03:55:51PM -0400, Tom Lane wrote:
> I wrote:
>> Some of these places might be performance-critical enough that adding
>> a palloc/pfree cycle would not be nice.  What I was considering doing
>> was inventing something like
>>
>> typedef union PGAlignedBuffer
>> {
>>  chardata[BLCKSZ];
>>  double  force_align;
>> } PGAlignedBuffer;
> 
> Here's a proposed patch using that approach.

This solution is smarter than using malloc/palloc to enforce alignment.
I was digging into the GIN and bloom code with this pattern, but except
if I used TopTransactionContext for a simple approach, which is not
acceptable by the way, I was finishing with ugly memory contexts
used...  I am still not sure if that was completely correct either, this
needed more time ;)

> Although some of the places that were using "char buf[BLCKSZ]" variables
> weren't doing anything that really requires better alignment, I made
> almost all of them use PGAlignedBuffer variables anyway, on the grounds
> that you typically get better memcpy speed for aligned than unaligned
> transfers.

Makes sense.

> I also fixed a few places that were using the palloc solution, and
> one that was actually doing hand-alignment of the pointer (ugh).
> I didn't try to be thorough about getting rid of such pallocs,
> just fix places that seemed likely to be performance-critical.

Okay, for the memo replay_image_masked and master_image_masked
in xlog.c could make use of the new structure.  SetWALSegSize in
pg_standby.c and WriteEmptyXLOG in pg_resetwal.c, and pg_upgrade's
file.c could also be patched.

walmethods.c could also use some static buffers, not worth the
complication perhaps..  

> +typedef union PGAlignedBuffer
> +{
> + chardata[BLCKSZ];
> + double  force_align_d;
> + int64   force_align_i64;
> +} PGAlignedBuffer;
> +
> +/* Same, but for an XLOG_BLCKSZ-sized buffer */
> +typedef union PGAlignedXLogBuffer
> +{
> + chardata[XLOG_BLCKSZ];
> + double  force_align_d;
> + int64   force_align_i64;
> +} PGAlignedXLogBuffer;

One complain I have is about the name of those structures.  Perhaps
PGAlignedBlock and PGAlignedXlogBlock make more sense?
--
Michael


signature.asc
Description: PGP signature


Re: patch to allow disable of WAL recycling

2018-08-31 Thread Tomas Vondra

On 08/27/2018 03:59 AM, Thomas Munro wrote:
> On Mon, Aug 27, 2018 at 10:14 AM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
>> zfs (Linux)
>> ---
>> On scale 200, there's pretty much no difference.
> 
> Speculation: It could be that the dnode and/or indirect blocks that
> point to data blocks are falling out of memory in my test setup[1] but
> not in yours.  I don't know, but I guess those blocks compete with
> regular data blocks in the ARC?  If so it might come down to ARC size
> and the amount of other data churning through it.
> 

Not sure, but I'd expect this to matter on the largest scale. The
machine has 64GB of RAM, and scale 8000 is ~120GB with mostly random
access. I've repeated the tests with scale 6000 to give ZFS a bit more
free space and prevent the issues when there's less than 20% of free
space (results later), but I still don't see any massive improvement.

> Further speculation:  Other filesystems have equivalent data structures,
> but for example XFS jams that data into the inode itself in a compact
> "extent list" format[2] if it can, to avoid the need for an external
> btree.  Hmm, I wonder if that format tends to be used for our segment
> files.  Since cached inodes are reclaimed in a different way than cached
> data pages, I wonder if that makes them more sticky in the face of high
> data churn rates (or I guess less, depending on your Linux
> vfs_cache_pressure setting and number of active files).  I suppose the
> combination of those two things, sticky inodes with internalised extent
> lists, might make it more likely that we can overwrite an old file
> without having to fault anything in.
> 

That's possible. The question is how it affects in which cases it's
worth disabling the WAL reuse, and why you observe better performance
and I don't.

> One big difference between your test rig and mine is that your Optane
> 900P claims to do about half a million random IOPS.  That is about half
> a million more IOPS than my spinning disks.  (Actually I used my 5400RPM
> steam powered machine deliberately for that test: I disabled fsync so
> that commit rate wouldn't be slowed down but cache misses would be
> obvious.  I guess Joyent's storage is somewhere between these two
> extremes...)
> 

Yeah. It seems very much like a CPU vs. I/O trade-off, where disabling
the WAL reuse saves a bit of I/O but increases the CPU cost. On the SSD
the reduced number of I/O requests are not noticeable, but the extra CPU
costs does matter (thanks to the high tps values). On slower devices the
I/O savings will matter more, probably.


>> On scale 2000, the
>> throughput actually decreased a bit, by about 5% - from the chart it
>> seems disabling the WAL reuse somewhat amplifies impact of checkpoints,
>> for some reason.
> 
> Huh.
> 

Not sure what's causing this. On SATA results it's not visible, though.

>> I have no idea what happened at the largest scale (8000) - on master
>> there's a huge drop after ~120 minutes, which somewhat recovers at ~220
>> minutes (but not fully). Without WAL reuse there's no such drop,
>> although there seems to be some degradation after ~220 minutes (i.e. at
>> about the same time the master partially recovers. I'm not sure what to
>> think about this, I wonder if it might be caused by almost filling the
>> disk space, or something like that. I'm rerunning this with scale 600.
> 
> There are lots of reports of ZFS performance degrading when free space
> gets below something like 20%.
> 


I've repeated the benchmarks on the Optane SSD with the largest scale
reduced to 6000, to see if it prevents the performance drop with less
than 20% of free space. It apparently does (see zfs2.pdf), although it
does not change the behavior - with WAL reuse disabled it's still a bit
slower.

I've also done the tests with SATA devices (3x 7.2k drives), to see if
it changes the behavior due to I/O vs. CPU trade-off. And it seems to be
the case (see zfs-sata.pdf), to some extent. For the smallest scale
(200) there's not much difference. For medium (2000) there seems to be a
clear improvement, although the behavior is not particularly smooth. On
the largest scale (8000) there seems to be a slight improvement, or at
least it's not slower like before.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


zfs-sata.pdf
Description: Adobe PDF document


zfs2.pdf
Description: Adobe PDF document


Re: Some pgq table rewrite incompatibility with logical decoding?

2018-08-31 Thread Tomas Vondra
Hi,

On 08/29/2018 12:01 AM, Tomas Vondra wrote:
> On 08/28/2018 07:41 PM, Jeremy Finzel wrote:
>> Jeremy, are you able to reproduce the issue locally, using pgq?
>> That would be very valuable.
>>
>>
>> Tomas et al:
>>
>> We have hit this error again, and we plan to snapshot the database as to
>> be able to do whatever troubleshooting we can.  If someone could provide
>> me guidance as to what exactly you would like me to do, please let me
>> know.  I am able to provide an xlog dump and also debugging information
>> upon request.
>>
>> This is actually a different database system that also uses skytools,
>> and the exact same table (pgq.event_58_1) is again the cause of the
>> relfilenode error.  I did a point-in-time recovery to a point after this
>> relfilenode appears using pg_xlogdump, and verified this was the table
>> that appeared, then disappeared.
>>
> 
> Great!
> 
> Can you attach to the decoding process using gdb, and set a breakpoint
> to the elog(ERROR) at reorderbuffer.c:1599, and find out at which LSN /
> record it fails?
> 
> https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/replication/logical/reorderbuffer.c#L1599
> 
> If it fails too fast, making it difficult to attach gdb before the
> crash, adding the LSN to the log message might be easier.
> 
> Once we have the LSN, it would be useful to see the pg_xlogdump
> before/around that position.
> 
> Another interesting piece of information would be to know the contents
> of the relmapper cache (which essentially means stepping through
> RelationMapFilenodeToOid or something like that).
> 

Have you managed to get the backtrace, or investigate where exactly it
fails (which LSN etc.)? We've managed to get a backtrace for "our"
failure, and it'd be interesting to compare those.

Attached is a subset of pg_waldump, for the two relevant transactions.
2554301859 is the transaction doing VACUUM FULL on the user table (so
essentially the pgq table), and 2554301862 (with a single 2554301862
subxact, likely due to exception handled in plpgsql function) is the
transaction that fails during decoding - on the very first WAL record
after the 2554301859 commit.

In reality there are records from additional transactions intermixed,
but those are irrelevant I believe. It's 2554301862 that fails, because
it commits first (and thus gets decoded first).

At first I thought this might be related to the "could not read block 3"
issue discussed in another thread, but if you say upgrade to 9.6.10
fixes this, that seems unlikely (because that issue is still there).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
rmgr: Transaction len (rec/tot): 38/38, tx: 2554301863, lsn: 
48C2/03FF0F00, prev 48C2/03FF0EB0, desc: ASSIGNMENT xtop 2554301862: subxacts: 
2554301863
rmgr: Heaplen (rec/tot): 54/54, tx: 2554301863, lsn: 
48C2/03FF0F28, prev 48C2/03FF0F00, desc: LOCK off 7: xid 2554301863: flags 0 
LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16400/20087 blk 118253
rmgr: Heaplen (rec/tot):124/   124, tx: 2554301863, lsn: 
48C2/03FF0F60, prev 48C2/03FF0F28, desc: HOT_UPDATE off 7 xmax 2554301863 ; new 
off 9 xmax 0, blkref #0: rel 1663/16400/20087 blk 118253
rmgr: Standby len (rec/tot): 42/42, tx: 2554301859, lsn: 
48C2/04073AF8, prev 48C2/04073AC8, desc: LOCK xid 2554301859 db 16400 rel 23934 
rmgr: Storage len (rec/tot): 42/42, tx: 2554301859, lsn: 
48C2/04073B28, prev 48C2/04073AF8, desc: CREATE base/16400/609294572
rmgr: Heap2   len (rec/tot): 60/60, tx: 2554301859, lsn: 
48C2/04073B58, prev 48C2/04073B28, desc: NEW_CID rel 1663/16400/1247; tid 
23/89; cmin: 0, cmax: 4294967295, combo: 4294967295
rmgr: Heaplen (rec/tot): 54/  7322, tx: 2554301859, lsn: 
48C2/04073B98, prev 48C2/04073B58, desc: INSERT off 89, blkref #0: rel 
1663/16400/1247 blk 23 FPW
rmgr: Btree   len (rec/tot): 53/  6673, tx: 2554301859, lsn: 
48C2/04075850, prev 48C2/04073B98, desc: INSERT_LEAF off 305, blkref #0: rel 
1663/16400/2703 blk 15 FPW
rmgr: Btree   len (rec/tot): 53/  5373, tx: 2554301859, lsn: 
48C2/04077280, prev 48C2/04075850, desc: INSERT_LEAF off 92, blkref #0: rel 
1663/16400/2704 blk 4 FPW
rmgr: Heap2   len (rec/tot): 60/60, tx: 2554301859, lsn: 
48C2/04078898, prev 48C2/04078868, desc: NEW_CID rel 1663/16400/2608; tid 
100/49; cmin: 0, cmax: 4294967295, combo: 4294967295
rmgr: Heaplen (rec/tot): 54/  6550, tx: 2554301859, lsn: 
48C2/040788D8, prev 48C2/04078898, desc: INSERT off 49, blkref #0: rel 
1663/16400/2608 blk 100 FPW
rmgr: Btree   len (rec/tot): 53/  7205, tx: 2554301859, lsn: 
48C2/0407A288, prev 48C2/040788D8, desc: INSERT_LEAF off 131, blkref #0: rel 
1663/16400/2673 blk 204 FPW
rmgr: Btree   len (rec/tot): 53/  4741, tx: 2554301859, lsn: 
48C2/0407BEB0, prev 48C2/0407A288, desc: INSERT_LEAF off 62, blkref #0: 

Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-31 Thread Tom Lane
Andres Freund  writes:
> Leaving that aside, I think there's one architectural aspect of my
> approach that I prefer over yours: Deduplicating eager cache rebuilds
> like my approach does seems quite advantageous.

That is attractive, for sure, but the other side of the coin is that
getting there seems to require a lot of ticklish redesign.  We would
certainly not consider back-patching such a change normally, and I'm
unconvinced that we should do so in this case.

My thought is to do (and back-patch) my change, and then work on yours
as a performance improvement for HEAD only.  I don't believe that yours
would make mine redundant, either --- they are good complementary changes
to make real sure we have no remaining bugs of this ilk.  (In particular,
no matter how much de-duplication we do, we'll still have scenarios with
recursive cache flushes; so I'm not quite convinced that your solution
provides a 100% fix by itself.)

regards, tom lane



Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Aug 31, 2018 at 03:55:51PM -0400, Tom Lane wrote:
>> I also fixed a few places that were using the palloc solution, and
>> one that was actually doing hand-alignment of the pointer (ugh).
>> I didn't try to be thorough about getting rid of such pallocs,
>> just fix places that seemed likely to be performance-critical.

> Okay, for the memo replay_image_masked and master_image_masked
> in xlog.c could make use of the new structure.  SetWALSegSize in
> pg_standby.c and WriteEmptyXLOG in pg_resetwal.c, and pg_upgrade's
> file.c could also be patched.

I intentionally didn't change replay_image_masked/master_image_masked
to use statically-allocated buffers.  Since, AFAICS, those aren't
needed in most backend processes, they'd just be eating 16KB of
per-process data space to no purpose.

The others you mention could be changed, probably, but I didn't
bother as they didn't seem performance-critical.

>> +typedef union PGAlignedBuffer

> One complain I have is about the name of those structures.  Perhaps
> PGAlignedBlock and PGAlignedXlogBlock make more sense?

Don't have a strong preference, anybody else have an opinion?

(I also wondered whether to use "WAL" instead of "XLog" in that
struct name, but it seems like we've mostly stuck with "xlog"
in internal C names.)

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-31 Thread Andres Freund
Hi,

On 2018-08-31 19:53:43 -0400, Tom Lane wrote:
> My thought is to do (and back-patch) my change, and then work on yours
> as a performance improvement for HEAD only.

That does make sense.

> I don't believe that yours would make mine redundant, either --- they
> are good complementary changes to make real sure we have no remaining
> bugs of this ilk.  (In particular, no matter how much de-duplication
> we do, we'll still have scenarios with recursive cache flushes; so I'm
> not quite convinced that your solution provides a 100% fix by itself.)

Yea this is a fair concern.

One concern I have with your approach is that it isn't particularly
bullet-proof for cases where the rebuild is triggered by something that
doesn't hold a conflicting lock.  If there's an independent cause to
reload - something that very commonly happens - we might not necessarily
re-trigger the recursive reloads.  That *should* be safe due such
changes hopefully being "harmless", but it's a bit disconcerting
nonetheless.

Greetings,

Andres Freund



Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Michael Paquier
On Fri, Aug 31, 2018 at 07:59:58PM -0400, Tom Lane wrote:
> The others you mention could be changed, probably, but I didn't
> bother as they didn't seem performance-critical.

It is not really critical indeed.  There is an argument to change them
so as other folks get used to it though.

> (I also wondered whether to use "WAL" instead of "XLog" in that
> struct name, but it seems like we've mostly stuck with "xlog"
> in internal C names.)

XLOG_BLCKSZ is used, which makes me think that XLog is better than WAL
here.  A matter of taste of course.
--
Michael


signature.asc
Description: PGP signature


Re: pg_verify_checksums failure with hash indexes

2018-08-31 Thread Robert Haas
On Thu, Aug 30, 2018 at 7:27 AM, Amit Kapila  wrote:
> We have previously changed this define in 620b49a1 with the intent to
> allow many non-unique values in hash indexes without worrying to reach
> the limit of the number of overflow pages.  I think this didn't occur
> to us that it won't work for smaller block sizes.  As such, I don't
> see any problem with the suggested fix.  It will allow us the same
> limit for the number of overflow pages at 8K block size and a smaller
> limit at smaller block size.  I am not sure if we can do any better
> with the current design.  As it will change the metapage, I think we
> need to bump HASH_VERSION.

I wouldn't bother bumping HASH_VERSION.  First, the fix needs to be
back-patched, and you certainly can't back-patch a HASH_VERSION bump.
Second, you should just pick a formula that gives the same answer as
now for the cases where the overrun doesn't occur, and some other
sufficiently-value for the cases where an overrun currently does
occur.  If you do that, you're not changing the behavior in any case
that currently works, so there's really no reason for a version bump.
It just becomes a bug fix at that point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



RE: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-08-31 Thread Shinoda, Noriyoshi (PN Japan GCS Delivery)
Eisentraut-san
Thank you for becoming a reviewer.

> The reason is that postgres_fdw filters out connection settings that are 
> marked debug ("D"), and the "options" keyword is marked as such.  
> I think this is wrong.  Just remove that designation and then this will work. 
>  
> (Perhaps filtering out the debug options is also wrong, but I can see an 
> argument for it.)

Certainly the PQconndefaults function specifies Debug flag for the "options" 
option.
I think that eliminating the Debug flag is the simplest solution. 
For attached patches, GUC can be specified with the following syntax.

CREATE SERVER remsvr1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'host 1', 
..., options '-c work_mem=64MB -c geqo=off');
ALTER SERVER remsv11 OPTIONS (SET options '-c work_mem=64MB -c geqo=off');

However, I am afraid of the effect that this patch will change the behavior of 
official API PQconndefaults.

Best Regards,
Noriyoshi Shinoda

-Original Message-
From: Peter Eisentraut [mailto:peter.eisentr...@2ndquadrant.com] 
Sent: Friday, August 31, 2018 2:45 AM
To: Shinoda, Noriyoshi (PN Japan GCS Delivery) ; 
fabriziome...@gmail.com; [pgdg] Robert Haas ; 
mich...@paquier.xyz
Cc: Pgsql Hackers 
Subject: Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

On 28/08/2018 05:55, Shinoda, Noriyoshi (PN Japan GCS Delivery) wrote:
>>> I like the direction of your thinking, but it seems to me that this 
>>> would cause a problem if you want to set search_path=foo,bar.
>> ... OPTIONS ( host 'remhost1', port '5433', dbname 'demodb', 
>> option='option1=foo', option='option2=bar' );
> I do not want to allow postgres_fdw to set "all" GUCs. Because I think it is 
> very difficult to check the validity of all input values.
> So, I attached modified the patch so that we can easily add GUC that we can 
> set to postgres_fdw module.
> If you wish to add more GUC options to the module, add values to the 
> non_libpq_options[] array of the InitPgFdwOptions function, And add the 
> validator code for the GUC in the postgres_fdw_validator function.

We already have a method for libpq applications to pass GUC settings via 
connection parameters.  And postgres_fdw supports passing libpq connection 
parameters as server options.  So why doesn't this work here?

The reason is that postgres_fdw filters out connection settings that are marked 
debug ("D"), and the "options" keyword is marked as such.  I think this is 
wrong.  Just remove that designation and then this will work.  (Perhaps 
filtering out the debug options is also wrong, but I can see an argument for 
it.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


postgres_fdw_work_mem_v4.patch
Description: postgres_fdw_work_mem_v4.patch


Re: remove ancient pre-dlopen dynloader code

2018-08-31 Thread Peter Eisentraut
On 31/08/2018 10:52, Peter Eisentraut wrote:
> On 16/08/2018 16:10, Andres Freund wrote:
 If I had my druthers, we'd just remove all that configure magic for
 selecting these files and just use ifdefs.  Personally I find it
 occasionally that they're linked into place, rather than built under
 their original name.
>>>
>>> Even if we all agreed that was an improvement (which I'm not sure of),
>>> it wouldn't fix this problem would it?  On affected platforms, the
>>> file would still be empty after preprocessing.
>>
>> Well, that depends on what you put into that file, it seems
>> realistically combinable with a bunch of non-conditional code...
> 
> How about this: We only have two nonstandard dlopen() implementations
> left: Windows and (old) HP-UX.  We move those into src/port/dlopen.c and
> treat it like a regular libpgport member.  That gets rid of all those
> duplicative empty per-platform files.

Updated patch.  It needed some adjustments for Windows, per Appveyor,

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From cb7ba655d4eb032412be0e63d75c4e4cd07ee049 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 1 Sep 2018 06:41:17 +0200
Subject: [PATCH v2] Refactor dlopen() support

Nowadays, all platforms except Windows and older HP-UX have standard
dlopen() support.  So having a separate implementation per platform
under src/backend/port/dynloader/ is a bit excessive.  Instead, treat
dlopen() like other library functions that happen to be missing
sometimes and put a replacement implementation under src/port/.
---
 configure | 43 +--
 configure.in  |  8 +--
 src/backend/Makefile  |  2 +-
 src/backend/port/.gitignore   |  1 -
 src/backend/port/Makefile |  2 +-
 src/backend/port/dynloader/aix.c  |  7 --
 src/backend/port/dynloader/aix.h  | 39 --
 src/backend/port/dynloader/cygwin.c   |  3 -
 src/backend/port/dynloader/cygwin.h   | 36 --
 src/backend/port/dynloader/darwin.c   | 35 -
 src/backend/port/dynloader/darwin.h   |  8 ---
 src/backend/port/dynloader/freebsd.c  |  7 --
 src/backend/port/dynloader/freebsd.h  | 38 --
 src/backend/port/dynloader/hpux.c | 68 --
 src/backend/port/dynloader/hpux.h | 25 ---
 src/backend/port/dynloader/linux.c|  7 --
 src/backend/port/dynloader/linux.h| 38 --
 src/backend/port/dynloader/netbsd.c   |  7 --
 src/backend/port/dynloader/netbsd.h   | 38 --
 src/backend/port/dynloader/openbsd.c  |  7 --
 src/backend/port/dynloader/openbsd.h  | 38 --
 src/backend/port/dynloader/solaris.c  |  7 --
 src/backend/port/dynloader/solaris.h  | 38 --
 src/backend/port/dynloader/win32.h| 19 -
 src/backend/postmaster/postmaster.c   |  1 -
 src/backend/utils/fmgr/dfmgr.c| 31 
 src/include/.gitignore|  1 -
 src/include/Makefile  |  4 +-
 src/include/pg_config.h.in|  8 +++
 src/include/pg_config.h.win32 |  8 +++
 src/include/port.h| 23 ++
 src/include/utils/dynamic_loader.h| 25 ---
 .../port/dynloader/win32.c => port/dlopen.c}  | 72 +--
 src/tools/msvc/Install.pm |  5 +-
 src/tools/msvc/Mkvcbuild.pm   |  5 +-
 src/tools/msvc/Solution.pm|  7 --
 src/tools/msvc/clean.bat  |  1 -
 37 files changed, 172 insertions(+), 540 deletions(-)
 delete mode 100644 src/backend/port/dynloader/aix.c
 delete mode 100644 src/backend/port/dynloader/aix.h
 delete mode 100644 src/backend/port/dynloader/cygwin.c
 delete mode 100644 src/backend/port/dynloader/cygwin.h
 delete mode 100644 src/backend/port/dynloader/darwin.c
 delete mode 100644 src/backend/port/dynloader/darwin.h
 delete mode 100644 src/backend/port/dynloader/freebsd.c
 delete mode 100644 src/backend/port/dynloader/freebsd.h
 delete mode 100644 src/backend/port/dynloader/hpux.c
 delete mode 100644 src/backend/port/dynloader/hpux.h
 delete mode 100644 src/backend/port/dynloader/linux.c
 delete mode 100644 src/backend/port/dynloader/linux.h
 delete mode 100644 src/backend/port/dynloader/netbsd.c
 delete mode 100644 src/backend/port/dynloader/netbsd.h
 delete mode 100644 src/backend/port/dynloader/openbsd.c
 delete mode 100644 src/backend/port/dynloader/openbsd.h
 delete mode 100644 src/backend/port/dynloader/solaris.c
 delete mode 100644 src/backend/port/dynloader/solaris.h
 delete mode 100644 src/backend/port/dynloader/win32.h
 delete mode 100644 src/include/utils/dynamic_loader.h
 rename src/{backen

Re: pg_verify_checksums failure with hash indexes

2018-08-31 Thread Dilip Kumar
On Sat, Sep 1, 2018 at 8:22 AM, Robert Haas  wrote:
> On Thu, Aug 30, 2018 at 7:27 AM, Amit Kapila  wrote:
>> We have previously changed this define in 620b49a1 with the intent to
>> allow many non-unique values in hash indexes without worrying to reach
>> the limit of the number of overflow pages.  I think this didn't occur
>> to us that it won't work for smaller block sizes.  As such, I don't
>> see any problem with the suggested fix.  It will allow us the same
>> limit for the number of overflow pages at 8K block size and a smaller
>> limit at smaller block size.  I am not sure if we can do any better
>> with the current design.  As it will change the metapage, I think we
>> need to bump HASH_VERSION.
>
> I wouldn't bother bumping HASH_VERSION.  First, the fix needs to be
> back-patched, and you certainly can't back-patch a HASH_VERSION bump.
> Second, you should just pick a formula that gives the same answer as
> now for the cases where the overrun doesn't occur, and some other
> sufficiently-value for the cases where an overrun currently does
> occur.  If you do that, you're not changing the behavior in any case
> that currently works, so there's really no reason for a version bump.
> It just becomes a bug fix at that point.
>

I think if we compute with below formula which I suggested upthread

#define HASH_MAX_BITMAPS   Min(BLCKSZ / 8, 1024)

then for BLCKSZ 8K and bigger, it will remain the same value where it
does not overrun.  And, for the small BLCKSZ, I think it will give
sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
(HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
to the BLCKSZ.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: pg_verify_checksums and -fno-strict-aliasing

2018-08-31 Thread Fabien COELHO



Hello,


Okay, for the memo replay_image_masked and master_image_masked
in xlog.c could make use of the new structure.  SetWALSegSize in
pg_standby.c and WriteEmptyXLOG in pg_resetwal.c, and pg_upgrade's
file.c could also be patched.


I intentionally didn't change replay_image_masked/master_image_masked
to use statically-allocated buffers.  Since, AFAICS, those aren't
needed in most backend processes, they'd just be eating 16KB of
per-process data space to no purpose.

The others you mention could be changed, probably, but I didn't
bother as they didn't seem performance-critical.


I'd go for having just one same approach everywhere, for code base 
homogeneity.



+typedef union PGAlignedBuffer



One complain I have is about the name of those structures.  Perhaps
PGAlignedBlock and PGAlignedXlogBlock make more sense?


Don't have a strong preference, anybody else have an opinion?


I like "Block" better, because it's more precise.


(I also wondered whether to use "WAL" instead of "XLog" in that
struct name, but it seems like we've mostly stuck with "xlog"
in internal C names.)


Best to blend with the surrounding code in the header file?

--
Fabien.



Re: file cloning in pg_upgrade and CREATE DATABASE

2018-08-31 Thread Peter Eisentraut
rebased patch, no functionality changes

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5e0f60e9cf182063f2f711251430d79282be1f93 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 1 Sep 2018 07:05:02 +0200
Subject: [PATCH v4] pg_upgrade: Allow use of file cloning

For file copying in pg_upgrade, allow using special file cloning calls
if available.  This makes the copying faster and more space efficient.
This achieves speed similar to --link mode without the associated
drawbacks.

Add an option --reflink to select whether file cloning is turned on,
off, or automatic.  Automatic is the default.

On Linux, file cloning is supported on Btrfs and XFS (if formatted with
reflink support).  On macOS, file cloning is supported on APFS.
---
 configure|   2 +-
 configure.in |   2 +-
 doc/src/sgml/ref/pgupgrade.sgml  |  33 +
 src/bin/pg_upgrade/check.c   |   2 +
 src/bin/pg_upgrade/file.c| 123 +++
 src/bin/pg_upgrade/option.c  |  14 
 src/bin/pg_upgrade/pg_upgrade.h  |  15 
 src/bin/pg_upgrade/relfilenode.c |  31 +++-
 src/include/pg_config.h.in   |   3 +
 9 files changed, 220 insertions(+), 5 deletions(-)

diff --git a/configure b/configure
index dd94c5bbab..a324dd9ff7 100755
--- a/configure
+++ b/configure
@@ -15060,7 +15060,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred 
getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np 
readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range 
utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime copyfile dlopen fdatasync getifaddrs 
getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
symlink sync_file_range utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 3280afa0da..614262fc52 100644
--- a/configure.in
+++ b/configure.in
@@ -1544,7 +1544,7 @@ PGAC_FUNC_WCSTOMBS_L
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred 
getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np 
readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range 
utime utimes wcstombs_l])
+AC_CHECK_FUNCS([cbrt clock_gettime copyfile dlopen fdatasync getifaddrs 
getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
symlink sync_file_range utime utimes wcstombs_l])
 
 AC_REPLACE_FUNCS(fseeko)
 case $host_os in
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index d51146d641..d994218c44 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -182,6 +182,39 @@ Options
   display version information, then exit
  
 
+ 
+  
--reflink={always|auto|never}
+  
+   
+Determines whether pg_upgrade, when in copy
+mode, should use efficient file cloning (also known as
+reflinks) on some operating systems and file systems.
+This can result in near-instantaneous copying of the data files,
+giving the speed advantages of
+-k/--link while leaving the old
+cluster untouched.
+   
+
+   
+The setting always requires the use of reflinks.  If
+they are not supported, the pg_upgrade run
+will abort.  Use this in production to limit the upgrade run time.
+The setting auto uses reflinks when available,
+otherwise it falls back to a normal copy.  This is the default.  The
+setting never prevents use of reflinks and always
+uses a normal copy.  This can be useful to ensure that the upgraded
+cluster has its disk space fully allocated and not shared with the old
+cluster.
+   
+
+   
+At present, reflinks are supported on Linux (kernel 4.5 or later) with
+Btrfs and XFS (on file systems created with reflink support, which is
+not the default for XFS at this writing), and on macOS with APFS.
+   
+  
+ 
+
  
   -?
   --help
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 5a78d603dc..eb1f18180a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -151,6 +151,8 @@ check_new_cluster(void)
 
if (user_opts.transfer_mode == TRANSFER_MODE_LINK)
check_hard_link();
+   else if (user_opts.transfer_mode == TRANSF

Re: Progress reporting for pg_verify_checksums

2018-08-31 Thread Fabien COELHO



Hallo Michael,


my colleague Bernd Helmle recently added progress reporting to our
pg_checksums application[1]. I have now forward ported this to
pg_verify_checksums for the September commitfest, please see the
attached patch.


It seems that the patch does not apply cleanly on master, neither with 
"git apply" nor "patch". Could you rebase it?



Here's the description:

This optionally prints the progress of pg_verify_checksums via read
kilobytes


MB? GB?


to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.


Hmmm. I cannot say I like the signal feature much. Would it make sense for 
the progress to be on by default, and to have a quiet option instead?


--
Fabien.



Re: pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)

2018-08-31 Thread Fabien COELHO



Hello Yugo-san,


Attached is a patch to allow pg_verity_checksums to specify a database
to scan.  This is usefule for users who want to verify checksums of relations
in a specific database. We can specify a database by OID using -d or --dboid 
option.
Also, when -g or --global-only is used only shared relations are scaned.


It seems that the patch does not apply anymore. Could you rebase it?

--
Fabien.



Re: Online verification of checksums

2018-08-31 Thread Fabien COELHO



Hallo Michael,


I've now forward-ported this change to pg_verify_checksums, in order to
make this application useful for online clusters, see attached patch.


Patch does not seem to apply anymore, could you rebase it?

--
Fabien.