Re: Commit fest 2019-09

2019-10-01 Thread Michael Paquier
On Mon, Sep 30, 2019 at 06:20:31PM -0400, David Steele wrote:
> On 9/30/19 5:26 PM, David Fetter wrote:
>> 
>> Thanks for doing all this work!
> 
> +1!

Thanks, Alvaro!
--
Michael


signature.asc
Description: PGP signature


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Tels

Moin,

On 2019-09-30 23:26, Bruce Momjian wrote:

For full-cluster Transparent Data Encryption (TDE), the current plan is
to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
overflow).  The plan is:


https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, 
or

other files.  Is that correct?  Do any other PGDATA files contain user
data?


IMHO the general rule in crypto is: encrypt everything, or don't bother.

If you don't encrypt some things, somebody is going to find loopholes 
and sidechannels
and partial-plaintext attacks. Just a silly example: If you trick the DB 
into putting only one row per page,
any "bit-per-page" map suddenly reveals information about a single 
encrypted row that it shouldn't reveal.


Many people with a lot of free time on their hands will sit around, 
drink a nice cup of tea and come up
with all sorts of attacks on these things that you didn't (and couldn't) 
anticipate now.


So IMHO it would be much better to err on the side of caution and 
encrypt everything possible.


Best regards,

Tels




Declaring a strict function returns not null / eval speed

2019-10-01 Thread Andres Freund
Hi,


We spend a surprising amount of time during expression evaluation to reevaluate 
whether input to a strict function (or similar) is not null, even though the 
value either comes from a strict function, or a column declared not null.

Now you can rightfully say that a strict function still can return NULL, even 
when called with non-NULL input. But practically that's quite rare. Most of the 
common byvalue type operators are strict, and approximately none of those 
return NULL when actually called.

That makes me wonder if it's worthwhile to invent a function property declaring 
strict strictness or such. It'd allow for some quite noticable improvements for 
e.g. queries aggregating a lot of rows, we spend a fair time checking whether 
the transition value has "turned" not null. I'm about to submit a patch making 
that less expensive, but it's still expensive.

I can also imagine that being able to propagate NOT NULL further up the 
parse-analysis tree could be beneficial for planning, but I've not looked at it 
in any detail.


A related issue is that we, during executor initialization, currently "loose" 
information about a column's NOT NULLness just above the lower scan nodes. 
Efficiency wise that's a substantial loss for many realistic queries: For JITed 
deforming that basically turns a bunch of mov instructions with constant 
offsets into much slower attribute by attribute trawling through the tuple. The 
latter can approximately not take advantage of the superscalar nature of just 
about any relevant processor.  And for non JITed execution an expression step 
that used a cheaper deforming routine for the cases where only leading not null 
columns are accessed would also yield significant speedups.  This is made worse 
by the fact that we often not actually deform at the scan nodes, due to the 
physical tlist optimization.   This is especially bad for nodes storing tuples 
as minimal tuples (e.g. hashjoin, hashagg), where often a very significant 
fraction of time of spent re-deforming columns that already were deformed 
earlier.

It doesn't seem very hard to propagate attnotnull upwards in a good number of 
the cases. We don't need to do so everywhere for it to be beneficial.

Comments?

Andres



Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Smith, Peter
Dear Hackers,

I have identified some OSS code which maybe can make use of C99 designated 
initialisers for nulls/values arrays.

~

Background:
There are lots of tuple operations where arrays of values and flags are being 
passed.
Typically these arrays are being previously initialised 0/false by memset.
By modifying code to use C99 designated initialiser syntax [1], most of these 
memsets can become redundant.
Actually, this mechanism is already being used in some of the existing OSS 
code. This patch/proposal just propagates the same idea to all other similar 
places I could find.

~

Result:
Less code. Removes ~200 unnecessary memsets.
More consistent initialisation.

~

Typical Example:
Before:
Datum   values[Natts_pg_attribute];
boolnulls[Natts_pg_attribute];
...
memset(values, 0, sizeof(values));
memset(nulls, false, sizeof(nulls));
After:
Datum   values[Natts_pg_attribute] = {0};
boolnulls[Natts_pg_attribute] = {0};


---
[1] REF C99 [$6.7.8/21] If there are fewer initializers in a brace-enclosed 
list than there are elements or members of an aggregate, 
or fewer characters in a string literal used to initialize an array of known 
size than there are elements in the array, 
the remainder of the aggregate shall be initialized implicitly the same as 
objects that have static storage duration

~

Please refer to the attached patch.

Kind Regards,

---
Peter Smith
Fujitsu Australia






init_nulls.patch
Description: init_nulls.patch


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Moon, Insung
Dear Tels.

On Tue, Oct 1, 2019 at 4:33 PM Tels  wrote:
>
> Moin,
>
> On 2019-09-30 23:26, Bruce Momjian wrote:
> > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > overflow).  The plan is:
> >
> >   
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> >
> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
> > or
> > other files.  Is that correct?  Do any other PGDATA files contain user
> > data?
>
> IMHO the general rule in crypto is: encrypt everything, or don't bother.
>
> If you don't encrypt some things, somebody is going to find loopholes
> and sidechannels
> and partial-plaintext attacks. Just a silly example: If you trick the DB
> into putting only one row per page,
> any "bit-per-page" map suddenly reveals information about a single
> encrypted row that it shouldn't reveal.
>
> Many people with a lot of free time on their hands will sit around,
> drink a nice cup of tea and come up
> with all sorts of attacks on these things that you didn't (and couldn't)
> anticipate now.

This is my thinks, but to minimize overhead, we try not to encrypt
data that does not store confidential data.

And I'm not a security expert, so my thoughts may be wrong.
But isn't it more dangerous to encrypt predictable data?

For example, when encrypting data other than the data entered by the user,
it is possible(maybe..) to predict the plain text data.
And if these data are encrypted, I think that there will be a security problem.

Of course, the encryption key will use separately.
But I thought it would be a problem if there were confidential data
encrypted using the same key as the attacked data.

Best regards.
Moon.


>
> So IMHO it would be much better to err on the side of caution and
> encrypt everything possible.
>
> Best regards,
>
> Tels
>
>




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Magnus Hagander
On Tue, Oct 1, 2019 at 9:33 AM Tels  wrote:

> Moin,
>
> On 2019-09-30 23:26, Bruce Momjian wrote:
> > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > overflow).  The plan is:
> >
> >
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> >
> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
> > or
> > other files.  Is that correct?  Do any other PGDATA files contain user
> > data?
>
> IMHO the general rule in crypto is: encrypt everything, or don't bother.
>
> If you don't encrypt some things, somebody is going to find loopholes
> and sidechannels
> and partial-plaintext attacks. Just a silly example: If you trick the DB
> into putting only one row per page,
> any "bit-per-page" map suddenly reveals information about a single
> encrypted row that it shouldn't reveal.
>
> Many people with a lot of free time on their hands will sit around,
> drink a nice cup of tea and come up
> with all sorts of attacks on these things that you didn't (and couldn't)
> anticipate now.
>
> So IMHO it would be much better to err on the side of caution and
> encrypt everything possible.
>

+1.

Unless we are *absolutely* certain, I bet someone will be able to find a
side-channel that somehow leaks some data or data-about-data, if we don't
encrypt everything. If nothing else, you can get use patterns out of it,
and you can make a lot from that. (E.g. by whether transactions are using
multixacts or not you can potentially determine which transaction they are,
if you know what type of transactions are being issued by the application.
In the simplest case, there might be a single pattern where multixacts end
up actually being used, and in that case being able to see the multixact
data tells you a lot about the system).

As for other things -- by default, we store the log files in text format in
the data directory. That contains *loads* of sensitive data in a lot of
cases. Will those also be encrypted?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Attempt to consolidate reading of XLOG page

2019-10-01 Thread Kyotaro Horiguchi
At Tue, 01 Oct 2019 08:28:03 +0200, Antonin Houska  wrote in 
<2188.1569911283@antos>
> Kyotaro Horiguchi  wrote:
> > > > XLogRead() tests for NULL so it should not crash but I don't insist on 
> > > > doing
> > > > it this way. XLogRead() actually does not have to care whether the "open
> > > > segment callback" determines the TLI or not, so it (XLogRead) can always
> > > > receive a valid pointer to seg.ws_tli.
> > > 
> > > This is actually wrong - seg.ws_tli is not always the correct value to
> > > pass. If seg.ws_tli refers to the segment from which data was read last 
> > > time,
> > > then XLogRead() still needs a separate argument to specify from which TLI 
> > > the
> > > current call should read. If these two differ, new file needs to be 
> > > opened.
> > 
> > openSegment represents the file *currently* opened.
> 
> I suppose you mean the "seg" argument.
> 
> > XLogRead needs the TLI *to be* opened. If they are different, as far as wal
> > logical wal sender and pg_waldump is concerned, XLogRead switches to the new
> > TLI and the new TLI is set to openSegment.ws_tli.
> 
> Yes, it works in these cases.
> 
> > So, it seems to me that the parameter doesn't need to be inout? It is enough
> > that it is an "in" parameter.
> 
> I did consider "TimeLineID *tli_p" to be "in" parameter in the last patch
> version. The reason I used pointer was the special meaning of the NULL value:
> if NULL is passed, then the timeline should be ignored (because of the other
> cases, see below).

Understood.

> > > The problem of walsender.c is that its implementation of XLogRead() does 
> > > not
> > > care about the TLI of the previous read. If the behavior of the new, 
> > > generic
> > > implementation should be exactly the same, we need to tell XLogRead() 
> > > that in
> > > some cases it also should not compare the current TLI to the previous
> > > one. That's why I tried to use the NULL pointer, or the InvalidTimeLineID
> > > earlier.
> > 
> > Physical wal sender doesn't switch TLI. So I don't think the
> > behavior doesn't harm (or doesn't fire). openSegment holds the
> > TLI set at the first call. (Even if future wal sender switches
> > TLI, the behavior should be needed.)
> 
> Note that walsender.c:XLogRead() has no TLI argument, however the XLogRead()
> introduced by the patch does have one. What should be passed for TLI to the
> new implementation if it's called from walsender.c? If the check for a segment
> change looks like this (here "tli" is the argument representing the desired
> TLI)

TLI is mandatory to generate a wal file name so it must be passed
to the function anyways. In the current code it is sendTimeLine
for the walsender.c:XLogRead(). logical_read_xlog_page sets the
variable very time immediately before calling
XLogRead(). CreateReplicationSlot and StartReplication set the
variable to desired TLI immediately before calling and once it is
set by StartReplication, it is not changed by XLogSendPhysical
and wal sender ends at the end of the current timeline. In the
XLogRead, the value is copied to sendSeg->ws_tli when the file
for the new timeline is read.

>   if (seg->ws_file < 0 ||
>   !XLByteInSeg(recptr, seg->ws_segno, segcxt->ws_segsize) ||
>   tli != seg->ws_tli)
>   {
>   XLogSegNo   nextSegNo;
> 
>   /* Switch to another logfile segment */
>   if (seg->ws_file >= 0)
>   close(seg->ws_file);
> 
> then any valid TLI can result in accidental closing of the current segment
> file. Since this is only refactoring patch, we should not allow such a change
> of behavior even if it seems that the same segment will be reopened
> immediately.

Mmm. ws_file must be -1 in the case? tli != seg->ws_tli is true
but seg->ws_file < 0 is also always true at the time. In other
words, the "tli != seg->ws_tli" is not even evaluated.

If wal sender had an open file (ws_file >= 0) and the new TLI is
different from ws_tli, it would be the sign of a serious bug.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




About Google Code-in

2019-10-01 Thread Sakshi Munjal
Greetings,
I am Sakshi Munjal and I would like to know how I can apply to google
code-in as mentor with your organization.

I have great interest in coding and I like to spend my free time learning
new things. I have developed my skills in full stack web development, I
have knowledge about machine learning and I am currently pursuing my
interest in cyber security. I like to spend my free time playing piano and
serving in church. I have been an all time merit holder in school and I
managed to score 9.5 CGPA in college semester as well. I have contributed
to open source organizations earlier and I would like to enhance my
thinking horizon.

Thank you for taking out your valuable time.
I would definitely wait for a response from you.
Regards


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Moon, Insung
Dear  Magnus Hagander.

On Tue, Oct 1, 2019 at 5:37 PM Magnus Hagander  wrote:
>
>
>
> On Tue, Oct 1, 2019 at 9:33 AM Tels  wrote:
>>
>> Moin,
>>
>> On 2019-09-30 23:26, Bruce Momjian wrote:
>> > For full-cluster Transparent Data Encryption (TDE), the current plan is
>> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
>> > overflow).  The plan is:
>> >
>> >   
>> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>> >
>> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
>> > or
>> > other files.  Is that correct?  Do any other PGDATA files contain user
>> > data?
>>
>> IMHO the general rule in crypto is: encrypt everything, or don't bother.
>>
>> If you don't encrypt some things, somebody is going to find loopholes
>> and sidechannels
>> and partial-plaintext attacks. Just a silly example: If you trick the DB
>> into putting only one row per page,
>> any "bit-per-page" map suddenly reveals information about a single
>> encrypted row that it shouldn't reveal.
>>
>> Many people with a lot of free time on their hands will sit around,
>> drink a nice cup of tea and come up
>> with all sorts of attacks on these things that you didn't (and couldn't)
>> anticipate now.
>>
>> So IMHO it would be much better to err on the side of caution and
>> encrypt everything possible.
>
>
> +1.
>
> Unless we are *absolutely* certain, I bet someone will be able to find a 
> side-channel that somehow leaks some data or data-about-data, if we don't 
> encrypt everything. If nothing else, you can get use patterns out of it, and 
> you can make a lot from that. (E.g. by whether transactions are using 
> multixacts or not you can potentially determine which transaction they are, 
> if you know what type of transactions are being issued by the application. In 
> the simplest case, there might be a single pattern where multixacts end up 
> actually being used, and in that case being able to see the multixact data 
> tells you a lot about the system).
>
> As for other things -- by default, we store the log files in text format in 
> the data directory. That contains *loads* of sensitive data in a lot of 
> cases. Will those also be encrypted?


Maybe...as a result of the discussion so far, we are not encrypted of
the server log.

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#What_to_encrypt.2Fdecrypt

I think Encrypting server logs can be a very difficult challenge,
and will probably need to develop another application to see the
encrypted server logs.

Best regards.
Moon.


>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/




Re: Modest proposal for making bpchar less inconsistent

2019-10-01 Thread Kyotaro Horiguchi
At Sat, 28 Sep 2019 08:22:22 -0400, Bruce Momjian  wrote in 
<2019092812.ga26...@momjian.us>
> On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote:
> > 
> > 
> > Dne pá 13. 9. 2019 16:43 uživatel Tom Lane  napsal:
> > 
> > It struck me that the real reason that we keep getting gripes about
> > the weird behavior of CHAR(n) is that these functions (and, hence,
> > their corresponding operators) fail to obey the "trailing blanks
> > aren't significant" rule:
> > 
> >                regprocedure                |        prosrc       
> > ---+--
> >  bpcharlike(character,text)                | textlike
> >  bpcharnlike(character,text)               | textnlike
> >  bpcharicregexeq(character,text)           | texticregexeq
> >  bpcharicregexne(character,text)           | texticregexne
> >  bpcharregexeq(character,text)             | textregexeq
> >  bpcharregexne(character,text)             | textregexne
> >  bpchariclike(character,text)              | texticlike
> >  bpcharicnlike(character,text)             | texticnlike
> > 
> > They're just relying on binary compatibility of bpchar to text ...
> > but of course textlike etc. think trailing blanks are significant.
> > 
> > Every other primitive operation we have for bpchar correctly ignores
> > the trailing spaces.
> > 
> > We could fix this, and save some catalog space too, if we simply
> > deleted these functions/operators and let such calls devolve
> > into implicit casts to text.
> > 
> > This might annoy people who are actually writing trailing spaces
> > in their patterns to make such cases work.  But I think there
> > are probably not too many such people, and having real consistency
> > here is worth something.
> > 
> > 
> > has sense
> 
> Yes, I think this is a great idea!

I totally agree.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: Optimize partial TOAST decompression

2019-10-01 Thread Tomas Vondra

On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote:




30 сент. 2019 г., в 22:29, Tomas Vondra  
написал(а):

On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:




30 сент. 2019 г., в 20:56, Tomas Vondra  
написал(а):

I mean this:

 /*
  * Use int64 to prevent overflow during calculation.
  */
 compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

 #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?




compressed_size accounts for possible increase of size during
compression. pglz can consume up to 1 control byte for each 8 bytes of
data in worst case.


OK, but does that actually translate in to the formula? We essentially
need to count 8-byte chunks in raw data, and multiply that by 9. Which
gives us something like

 nchunks = ((rawsize + 7) / 8) * 9;

which is not quite what the patch does.


I'm afraid neither formula is correct, but all this is hair-splitting 
differences.



Sure. I just want to be sure the formula is safe and we won't end up
using too low value in some corner case.


Your formula does not account for the fact that we may not need all bytes from 
last chunk.
Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 
literals, 4 bytes total
But nchunks = 9.



OK, so essentially this means my formula works with whole chunks, i.e.
if we happen to need just a part of a decompressed chunk, we still
request enough data to decompress it whole. This way we may request up
to 7 extra bytes, which seems fine.


Binguo's formula is appending 1 control bit per data byte and one extra
control byte.  Consider size = 8 bytes. We need 1 control byte, 8
literals, 9 total.  But compressed_size = 10.

Mathematically correct formula is compressed_size = (int32) ((int64)
rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7
control bits for overflow.

But this equations make no big difference, each formula is safe. I'd
pick one which is easier to understand and document (IMO, its nchunks =
((rawsize + 7) / 8) * 9).



I'd use the *mathematically correct* formula, it doesn't seem to be any
more complex, and the "one bit per byte, complete bytes" explanation
seems quite understandable.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Amit Kapila
On Tue, Oct 1, 2019 at 1:25 PM Smith, Peter  wrote:
>
> Dear Hackers,
>
> I have identified some OSS code which maybe can make use of C99 designated 
> initialisers for nulls/values arrays.
>
> ~
>
> Background:
> There are lots of tuple operations where arrays of values and flags are being 
> passed.
> Typically these arrays are being previously initialised 0/false by memset.
> By modifying code to use C99 designated initialiser syntax [1], most of these 
> memsets can become redundant.
> Actually, this mechanism is already being used in some of the existing OSS 
> code. This patch/proposal just propagates the same idea to all other similar 
> places I could find.
>
> ~
>
> Result:
> Less code. Removes ~200 unnecessary memsets.
> More consistent initialisation.
>

+1.  This seems like an improvement.  I can review and take this
forward unless there are objections from others.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Change atoi to strtol in same place

2019-10-01 Thread Kyotaro Horiguchi
Hello.

At Sun, 29 Sep 2019 23:51:23 -0500, Joe Nelson  wrote in 
<20190930045123.gc68...@begriffs.com>
> Alvaro Herrera wrote:
> > ... can we have a new patch?
> 
> OK, I've attached v4. It works cleanly on 55282fa20f with
> str2int-16.patch applied. My patch won't compile without the other one
> applied too.
> 
> Changed:
> [x] revert my changes in common/Makefile
> [x] rename arg_utils.[ch] to option.[ch]
> [x] update @pgfeutilsfiles in Mkvcbuild.pm
> [x] pgindent everything
> [x] get rid of atoi() in more utilities

Compiler complained as "INT_MAX undeclared" (gcc 7.3 / CentOS7.6).

> One question about how the utilities parse port numbers.  I currently
> have it check that the value can be parsed as an integer, and that its
> range is within 1 .. (1<<16)-1. I wonder if the former restriction is
> (un)desirable, because ultimately getaddrinfo() takes a "service name
> description" for the port, which can be a name such as found in
> '/etc/services' as well as the string representation of a number. If
> desired, I *could* treat only range errors as a failure for ports, and
> allow integer parse errors.

We could do that, but perhaps no use for our usage. We are not
likely to use named ports other than 'postgres', if any.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Change atoi to strtol in same place

2019-10-01 Thread Kyotaro Horiguchi
At Tue, 01 Oct 2019 19:32:08 +0900 (Tokyo Standard Time), Kyotaro Horiguchi 
 wrote in 
<20191001.193208.264851337.horikyota@gmail.com>
> Hello.
> 
> At Sun, 29 Sep 2019 23:51:23 -0500, Joe Nelson  wrote in 
> <20190930045123.gc68...@begriffs.com>
> > Alvaro Herrera wrote:
> > > ... can we have a new patch?
> > 
> > OK, I've attached v4. It works cleanly on 55282fa20f with
> > str2int-16.patch applied. My patch won't compile without the other one
> > applied too.
> > 
> > Changed:
> > [x] revert my changes in common/Makefile
> > [x] rename arg_utils.[ch] to option.[ch]
> > [x] update @pgfeutilsfiles in Mkvcbuild.pm
> > [x] pgindent everything
> > [x] get rid of atoi() in more utilities

I didn't checked closely, but -k of pg_standby's message looks
somewhat strange. Needs a separator?

> pg_standby: -k keepfiles could not parse 'hoge' as integer

Building a sentense just concatenating multiple nonindependent
(or incomplete) subphrases makes translation harder.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Libpq support to connect to standby server as priority

2019-10-01 Thread Greg Nancarrow
On Wed, Sep 11, 2019 at 10:17 AM Alvaro Herrera from 2ndQuadrant
 wrote:
>
> Oh, oops. Here they are then.
>

With the permission of the original patch author, Haribabu Kommi, I’ve
rationalized the existing 8 patches into 3 patches, merging patches
1-5 and 6-7, and tidying up some documentation and code comments. I
also rebased them to the latest PG12 source code (as of October 1,
2019). The patch code itself is the same, except for some version
checks that I have updated to target the features for PG13 instead of
PG12.
I’ve attached the updated patches.

Regards,
Greg Nancarrow
Fujitsu Australia


v14-0001-libpq-target_session_attrs-read_write-prefer_read-read_only.patch
Description: Binary data


v14-0003-Server-recovery-mode-handling.patch
Description: Binary data


v14-0002-libpq-target_session_attrs-primary-prefer_standby-standby.patch
Description: Binary data


Re: Libpq support to connect to standby server as priority

2019-10-01 Thread Greg Nancarrow
On Wed, Sep 11, 2019 at 10:17 AM Alvaro Herrera from 2ndQuadrant
 wrote:
>
>
> Oh, oops. Here they are then.
>

With the permission of the original patch author, Haribabu Kommi, I’ve
rationalized the existing 8 patches into 3 patches, merging patches
1-5 and 6-7, and tidying up some documentation and code comments. I
also rebased them to the latest PG12 source code (as of October 1,
2019). The patch code itself is the same, except for some version
checks that I have updated to target the features for PG13 instead of
PG12.
I’ve attached the updated patches.

Regards,
Greg Nancarrow
Fujitsu Australia


v14-0001-libpq-target_session_attrs-read_write-prefer_read-read_only.patch
Description: Binary data


v14-0002-libpq-target_session_attrs-primary-prefer_standby-standby.patch
Description: Binary data


v14-0003-Server-recovery-mode-handling.patch
Description: Binary data


Re: backup manifests

2019-10-01 Thread Robert Haas
On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
 wrote:
> Entry for directory is not added in manifest. So it might be difficult
> at client to get to know about the directories. Will it be good to add
> an entry for each directory too? May be like:
> Dir 

Well, what kind of corruption would this allow us to detect that we
can't detect as things stand? I think the only case is an empty
directory. If it's not empty, we'd have some entries for the files in
that directory, and those files won't be able to exist unless the
directory does. But, how would we end up backing up an empty
directory, anyway?

I don't really *mind* adding directories into the manifest, but I'm
not sure how much it helps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Drop Trigger Mechanism with Detached partitions

2019-10-01 Thread Robert Haas
On Mon, Sep 30, 2019 at 5:59 AM M Beena Emerson
 wrote:
> Detach partition does not remove the partition trigger dependency as seen in 
> below scenario.

Sounds like a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Optimize partial TOAST decompression

2019-10-01 Thread Tomas Vondra

On Tue, Oct 01, 2019 at 12:08:05PM +0200, Tomas Vondra wrote:

On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote:




30 сент. 2019 г., в 22:29, Tomas Vondra  
написал(а):

On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:




30 сент. 2019 г., в 20:56, Tomas Vondra  
написал(а):

I mean this:

/*
 * Use int64 to prevent overflow during calculation.
 */
compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

#define PGLZ_MAX_OUTPUT(_dlen)  ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?




compressed_size accounts for possible increase of size during
compression. pglz can consume up to 1 control byte for each 8 bytes of
data in worst case.


OK, but does that actually translate in to the formula? We essentially
need to count 8-byte chunks in raw data, and multiply that by 9. Which
gives us something like

nchunks = ((rawsize + 7) / 8) * 9;

which is not quite what the patch does.


I'm afraid neither formula is correct, but all this is hair-splitting 
differences.



Sure. I just want to be sure the formula is safe and we won't end up
using too low value in some corner case.


Your formula does not account for the fact that we may not need all bytes from 
last chunk.
Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 
literals, 4 bytes total
But nchunks = 9.



OK, so essentially this means my formula works with whole chunks, i.e.
if we happen to need just a part of a decompressed chunk, we still
request enough data to decompress it whole. This way we may request up
to 7 extra bytes, which seems fine.


Binguo's formula is appending 1 control bit per data byte and one extra
control byte.  Consider size = 8 bytes. We need 1 control byte, 8
literals, 9 total.  But compressed_size = 10.

Mathematically correct formula is compressed_size = (int32) ((int64)
rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7
control bits for overflow.

But this equations make no big difference, each formula is safe. I'd
pick one which is easier to understand and document (IMO, its nchunks =
((rawsize + 7) / 8) * 9).



I'd use the *mathematically correct* formula, it doesn't seem to be any
more complex, and the "one bit per byte, complete bytes" explanation
seems quite understandable.



Pushed.

I've ended up using the *mathematically correct* formula, hopefully
with sufficient explanation why it's correct. I've also polished a
couple more comments, and pushed like that.

Thanks to Binguo Bao for this improvement, and all the reviewers in this
thread.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Andrew Dunstan


On 10/1/19 6:12 AM, Amit Kapila wrote:
> On Tue, Oct 1, 2019 at 1:25 PM Smith, Peter  
> wrote:
>> Dear Hackers,
>>
>> I have identified some OSS code which maybe can make use of C99 designated 
>> initialisers for nulls/values arrays.
>>
>> ~
>>
>> Background:
>> There are lots of tuple operations where arrays of values and flags are 
>> being passed.
>> Typically these arrays are being previously initialised 0/false by memset.
>> By modifying code to use C99 designated initialiser syntax [1], most of 
>> these memsets can become redundant.
>> Actually, this mechanism is already being used in some of the existing OSS 
>> code. This patch/proposal just propagates the same idea to all other similar 
>> places I could find.
>>
>> ~
>>
>> Result:
>> Less code. Removes ~200 unnecessary memsets.
>> More consistent initialisation.
>>
> +1.  This seems like an improvement.  I can review and take this
> forward unless there are objections from others.
>
>

+1.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Optimize partial TOAST decompression

2019-10-01 Thread Tomas Vondra

On Tue, Oct 01, 2019 at 02:34:20PM +0200, Tomas Vondra wrote:

On Tue, Oct 01, 2019 at 12:08:05PM +0200, Tomas Vondra wrote:

On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote:




30 сент. 2019 г., в 22:29, Tomas Vondra  
написал(а):

On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:




30 сент. 2019 г., в 20:56, Tomas Vondra  
написал(а):

I mean this:

/*
* Use int64 to prevent overflow during calculation.
*/
compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

#define PGLZ_MAX_OUTPUT(_dlen)  ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?




compressed_size accounts for possible increase of size during
compression. pglz can consume up to 1 control byte for each 8 bytes of
data in worst case.


OK, but does that actually translate in to the formula? We essentially
need to count 8-byte chunks in raw data, and multiply that by 9. Which
gives us something like

nchunks = ((rawsize + 7) / 8) * 9;

which is not quite what the patch does.


I'm afraid neither formula is correct, but all this is hair-splitting 
differences.



Sure. I just want to be sure the formula is safe and we won't end up
using too low value in some corner case.


Your formula does not account for the fact that we may not need all bytes from 
last chunk.
Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 
literals, 4 bytes total
But nchunks = 9.



OK, so essentially this means my formula works with whole chunks, i.e.
if we happen to need just a part of a decompressed chunk, we still
request enough data to decompress it whole. This way we may request up
to 7 extra bytes, which seems fine.


Binguo's formula is appending 1 control bit per data byte and one extra
control byte.  Consider size = 8 bytes. We need 1 control byte, 8
literals, 9 total.  But compressed_size = 10.

Mathematically correct formula is compressed_size = (int32) ((int64)
rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7
control bits for overflow.

But this equations make no big difference, each formula is safe. I'd
pick one which is easier to understand and document (IMO, its nchunks =
((rawsize + 7) / 8) * 9).



I'd use the *mathematically correct* formula, it doesn't seem to be any
more complex, and the "one bit per byte, complete bytes" explanation
seems quite understandable.



Pushed.

I've ended up using the *mathematically correct* formula, hopefully
with sufficient explanation why it's correct. I've also polished a
couple more comments, and pushed like that.

Thanks to Binguo Bao for this improvement, and all the reviewers in this
thread.



Hmmm, this seems to trigger a failure on thorntail, which is a sparc64
machine (and it seems to pass on all x86 machines, so far). Per the
backtrace, it seems to have failed like this:

   Core was generated by `postgres: parallel worker for PID 2341
'.
   Program terminated with signal SIGUSR1, User defined signal 1.
   #0  heap_tuple_untoast_attr_slice (attr=, sliceoffset=, slicelength=) at 
/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/access/common/detoast.c:235
   235  max_size = 
pglz_maximum_compressed_size(sliceoffset + slicelength,
   #0   heap_tuple_untoast_attr_slice (attr=, sliceoffset=, slicelength=) at 
/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/access/common/detoast.c:235
   #1  0x013d4ae8 in ExecInterpExpr (state=0x1d02298, 
econtext=0x1d01510, isnull=0x7feffb2fd1f) at 
/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/executor/execExprInterp.c:690
   ...

so likely on this line:

   max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
   TOAST_COMPRESS_SIZE(attr));

the offset+length is just intereger arithmetics, so I don't see why that
would fail. So it has to be TOAST_COMPRESS_SIZE, which is defined like
this:

   #define TOAST_COMPRESS_SIZE(ptr)  ((int32) VARSIZE(ptr) - 
TOAST_COMPRESS_HDRSZ)

I wonder if that's wrong, somehow ... Maybe it should use VARSIZE_ANY,
but then how would it work on any platform and only fail on sparc64?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-01 Thread Amit Kapila
On Sun, Sep 29, 2019 at 11:24 AM Amit Kapila 
wrote:
> On Sun, Sep 29, 2019 at 12:39 AM Tomas Vondra
>  wrote:
> >
>
> Yeah, it is better to deal it separately as I am also not entirely
> convinced at this stage about this parameter.  I have mentioned the
> same in the previous email as well.
>
> While glancing through the changes, I noticed a small thing:
> +#logical_decoding_work_mem = 64MB # min 1MB, or -1 to use
maintenance_work_mem
>
> I guess this need to be updated.
>

On further testing, I found that the patch seems to have problems with
toast.  Consider below scenario:
Session-1
Create table large_text(t1 text);
INSERT INTO large_text
SELECT (SELECT string_agg('x', ',')
FROM generate_series(1, 100)) FROM generate_series(1, 1000);

Session-2
SELECT * FROM pg_create_logical_replication_slot('regression_slot',
'test_decoding');
SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
*--kaboom*

The second statement in Session-2 leads to a crash.

Other than that, I am not sure if the changes related to spill to disk
after logical_decoding_work_mem works for toast table as I couldn't hit
that code for toast table case, but I might be missing something.  As
mentioned previously, I feel there should be some way to test whether this
patch works for the cases it claims to work.  As of now, I have to check
via debugging.  Let me know if there is any way, I can test this.

I am reluctant to say, but I think this patch still needs some more work
(review, test, rework) before we can commit it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

2019-10-01 Thread Amit Kapila
On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila  wrote:

> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada 
> wrote:
> >
> > Since the previous version patch conflicts with current HEAD, I've
> > attached the updated version patches.
> >
>
> Review comments:
> --
>

Sawada-San, are you planning to work on the review comments?  I can take
care of this and then proceed with further review if you are tied up with
something else.


> *
> +/*
> + * DSM keys for parallel lazy vacuum. Unlike other parallel execution
> code,
> + * since we don't need to worry about DSM keys conflicting with
> plan_node_id
> + * we can use small integers.
> + */
> +#define PARALLEL_VACUUM_KEY_SHARED 1
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
>
> I think it would be better if these keys should be assigned numbers in
> a way we do for other similar operation like create index.  See below
> defines
> in code:
> /* Magic numbers for parallel state sharing */
> #define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA001)
>
> This will make the code consistent with other parallel operations.
>

I think we don't need to handle this comment.  Today, I read the other
emails in the thread and noticed that you have done this based on comment
by Robert and that decision seems wise to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Isaac Morland
On Tue, 1 Oct 2019 at 03:55, Smith, Peter 
wrote:


> Typical Example:
> Before:
> Datum   values[Natts_pg_attribute];
> boolnulls[Natts_pg_attribute];
> ...
> memset(values, 0, sizeof(values));
> memset(nulls, false, sizeof(nulls));
> After:
> Datum   values[Natts_pg_attribute] = {0};
> boolnulls[Natts_pg_attribute] = {0};
>

I hope you'll forgive a noob question. Why does the "After" initialization
for the boolean array have {0} rather than {false}?


Re: [HACKERS] Block level parallel vacuum

2019-10-01 Thread Masahiko Sawada
On Tue, Oct 1, 2019 at 10:31 PM Amit Kapila  wrote:
>
> On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila  wrote:
>>
>> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada  
>> wrote:
>> >
>> > Since the previous version patch conflicts with current HEAD, I've
>> > attached the updated version patches.
>> >
>>
>> Review comments:
>> --
>
>
> Sawada-San, are you planning to work on the review comments?  I can take care 
> of this and then proceed with further review if you are tied up with 
> something else.
>

Thank you for reviewing this patch.

Yes I'm addressing your comments and will submit the updated patch soon.

> I think we don't need to handle this comment.  Today, I read the other emails 
> in the thread and noticed that you have done this based on comment by Robert 
> and that decision seems wise to me.

Understood.

Regards,

--
Masahiko Sawada




Re: pgbench - allow to create partitioned tables

2019-10-01 Thread Amit Kapila
On Tue, Oct 1, 2019 at 11:51 AM Fabien COELHO  wrote:

>
> Hello Amit,
>
> > 1. ran pgindent
> > 2. As per Alvaro's suggestions move few function definitions.
> > 3. Changed one or two comments and fixed spelling at one place.
>
> Thanks for the improvements.
>
> Not sure why you put "XXX - " in front of "append_fillfactor" comment,
> though.
>
>
It is to indicate that we can do this after further consideration.


> > + fprintf(stderr,
> > + "no pgbench_accounts table found in search_path\n"
> > + "Perhaps you need to do initialization (\"pgbench -i\") in database
> > \"%s\"\n", PQdb(con));
>
> > Can anyone else think of a better error message either in wording or
> > style for above case?
>
> No better idea from me. The second part is a duplicate from a earlier
> comment, when getting the scale fails.
>

Yeah, I know that, but this doesn't look quite right.  I mean to say
whatever we want to say via this message is correct, but I am not
completely happy with the display part.  How about something like:
"pgbench_accounts is missing, you need to do initialization (\"pgbench
-i\") in database \"%s\"\n"?  Feel free to propose something else on
similar lines?  If possible, I want to convey this information in a single
sentence.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Value of Transparent Data Encryption (TDE)

2019-10-01 Thread Tomas Vondra

On Mon, Sep 30, 2019 at 05:40:52PM -0400, Bruce Momjian wrote:

For plan for full-cluster Transparent Data Encryption (TDE) is here:


https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

The values it has, I think, are:

*  encrypts data for anyone with read-access to the file system (but not
  memory)

*  I think write access would allow access to the encryption keys
   by modifying postgresql.conf or other files

* This is particularly useful if the storage is remote

*  encrypts non-logical/non-pg_dump-like backups

*  fulfills several security compliance requirements

*  encrypts storage



Maybe. I think this is approaching the problem from the wrong angle.
Encryption is more a means of achieving something. OK, for compliance
purposes it's useful to be able to tick "encryption" checkbox. But other
than that, people really care about threat models and how encryption
improves them (or does not).

So I think it'd be valuable to improve the "thread models" section on
that wiki page, with more detailed cases. We need to explain what
capabilities the attacker has (can he read files?can he interact with
the database? can he read memory? ..) and then explain how that works
with encrypted cluster.



*  perhaps easier to implement than file system encryption



Not sure. IMO  filesystem encryption is fairly simple to use, to the
extent that it's hard to beat. The problem is usually people can't use
it for various reasons - lack of support on their OS, no access to the
block device, problems with obtaining the privileges etc.

Having it built into the database menas you can sidestep most of those
issue (e.g. you can deploy it as a DBA, on arbitrary OS, ...).

Plus it allows features you can't easily achieve with fs encryption,
because the filesystem only sees opaque data files. So having keys per
database/user/... is easier from within the database.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Tomas Vondra

On Tue, Oct 01, 2019 at 06:30:39PM +0900, Moon, Insung wrote:

Dear  Magnus Hagander.

On Tue, Oct 1, 2019 at 5:37 PM Magnus Hagander  wrote:




On Tue, Oct 1, 2019 at 9:33 AM Tels  wrote:


Moin,

On 2019-09-30 23:26, Bruce Momjian wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
>
>   
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
> or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

IMHO the general rule in crypto is: encrypt everything, or don't bother.

If you don't encrypt some things, somebody is going to find loopholes
and sidechannels
and partial-plaintext attacks. Just a silly example: If you trick the DB
into putting only one row per page,
any "bit-per-page" map suddenly reveals information about a single
encrypted row that it shouldn't reveal.

Many people with a lot of free time on their hands will sit around,
drink a nice cup of tea and come up
with all sorts of attacks on these things that you didn't (and couldn't)
anticipate now.

So IMHO it would be much better to err on the side of caution and
encrypt everything possible.



+1.

Unless we are *absolutely* certain, I bet someone will be able to find a 
side-channel that somehow leaks some data or data-about-data, if we don't 
encrypt everything. If nothing else, you can get use patterns out of it, and 
you can make a lot from that. (E.g. by whether transactions are using 
multixacts or not you can potentially determine which transaction they are, if 
you know what type of transactions are being issued by the application. In the 
simplest case, there might be a single pattern where multixacts end up actually 
being used, and in that case being able to see the multixact data tells you a 
lot about the system).

As for other things -- by default, we store the log files in text format in the 
data directory. That contains *loads* of sensitive data in a lot of cases. Will 
those also be encrypted?



Maybe...as a result of the discussion so far, we are not encrypted of
the server log.

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#What_to_encrypt.2Fdecrypt

I think Encrypting server logs can be a very difficult challenge,
and will probably need to develop another application to see the
encrypted server logs.



IMO leaks of sensitive data into the server log (say, as part of error
messages, slow queries, ...) are a serious issue. It's one of the main
issues with pgcrypto-style encryption, because it's trivial to leak e.g.
keys into the server log. Even if proper key management prevents leaking
keys, there are still user data - say, credit card numbers, and such.

So I don't see how we could not encrypt the server log, in the end.

But yes, you're right it's a challenging topis.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-01 Thread Tomas Vondra

On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:

On Sun, Sep 29, 2019 at 11:24 AM Amit Kapila 
wrote:

On Sun, Sep 29, 2019 at 12:39 AM Tomas Vondra
 wrote:
>

Yeah, it is better to deal it separately as I am also not entirely
convinced at this stage about this parameter.  I have mentioned the
same in the previous email as well.

While glancing through the changes, I noticed a small thing:
+#logical_decoding_work_mem = 64MB # min 1MB, or -1 to use

maintenance_work_mem


I guess this need to be updated.



On further testing, I found that the patch seems to have problems with
toast.  Consider below scenario:
Session-1
Create table large_text(t1 text);
INSERT INTO large_text
SELECT (SELECT string_agg('x', ',')
FROM generate_series(1, 100)) FROM generate_series(1, 1000);

Session-2
SELECT * FROM pg_create_logical_replication_slot('regression_slot',
'test_decoding');
SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
*--kaboom*

The second statement in Session-2 leads to a crash.



OK, thanks for the report - will investigate.


Other than that, I am not sure if the changes related to spill to disk
after logical_decoding_work_mem works for toast table as I couldn't hit
that code for toast table case, but I might be missing something.  As
mentioned previously, I feel there should be some way to test whether this
patch works for the cases it claims to work.  As of now, I have to check
via debugging.  Let me know if there is any way, I can test this.



That's one of the reasons why I proposed to move the statistics (which
say how many transactions / bytes were spilled to disk) from a later
patch in the series. I don't think there's a better way.


I am reluctant to say, but I think this patch still needs some more work
(review, test, rework) before we can commit it.



I agreee.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





pg_basebackup from REL_12_STABLE hands on solaris/sparch

2019-10-01 Thread Victor Wagner
Collegues,

I've encountered following problem on some old Sparc64 machine running
solaris 10:

When I compile postgresql 12 with --enable-tap-tests and run make check
in src/bin, test src/bin/pg_basebackup/t/010_pg_basebackup.pl
hangs and hangs infinitely. 

I've tried to attach gdb to the hanging process, but it attempt to 
do backtrace in it, gdb reports that stack is corrupt

Attaching to program 
`/home/vitus/postgrespro/src/bin/pg_basebackup/pg_basebackup', process 1467
[New process 1467]
Retry #1:
Retry #2:
Retry #3:
Retry #4:
Reading symbols from /usr/lib/sparcv9/ld.so.1...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/sparcv9/ld.so.1
---Type  to continue, or q  to quit---
0xff2cca38 in ?? ()
(gdb) bt
#0  0xff2cca38 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


When afterword I kill hanged process with
kill -SEGV to get core, I get following stack trace from core file:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
(gdb) bt
#0  0x7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
#1  0x7d5c8c8c in __sigtimedwait () from /lib/64/libc.so.1
#2  0x7d5c0628 in __posix_sigwait () from /lib/64/libc.so.1
#3  0x7f3362b4 in pq_reset_sigpipe (osigset=0x7fffeb1c,
sigpipe_pending=false, got_epipe=true) at fe-secure.c:529 #4
0x7f336084 in pqsecure_raw_write (conn=0x100135e90,
ptr=0x10013a370, len=5) at fe-secure.c:399 #5  0x7f335e28 in
pqsecure_write (conn=0x100135e90, ptr=0x10013a370, len=5) at
fe-secure.c:316 #6  0x7f326c54 in pqSendSome (conn=0x100135e90,
len=5) at fe-misc.c:876 #7  0x7f326e84 in pqFlush
(conn=0x100135e90) at fe-misc.c:1004 #8  0x7f316584 in
sendTerminateConn (conn=0x100135e90) at fe-connect.c:4031 #9
0x7f3165a4 in closePGconn (conn=0x100135e90) at
fe-connect.c:4049 #10 0x7f31663c in PQfinish (conn=0x100135e90)
at fe-connect.c:4083 #11 0x0001bc64 in BaseBackup () at
pg_basebackup.c:2136 #12 0x0001d7ec in main (argc=4,
argv=0x7808) at pg_basebackup.c:2547

This happens on random tests in this test file with probablity about
1/10, but because there is more than 100 tests, hanging has 100%
probablity. But other two test files in src/bin/pg_basebackup directory
don't hang.

As far as I can notice, there is only two machines with Solaris in
pgbuildfarm now, and neither of them has any records of running
REL_12_STABLE branch. (not to mention that both don't run tap tests).

-- 




Re: Optimize partial TOAST decompression

2019-10-01 Thread Tom Lane
Tomas Vondra  writes:
> Hmmm, this seems to trigger a failure on thorntail, which is a sparc64
> machine (and it seems to pass on all x86 machines, so far).

gharial's not happy either, and I bet if you wait a bit longer you'll
see the same on other big-endian machines.

> I wonder if that's wrong, somehow ... Maybe it should use VARSIZE_ANY,
> but then how would it work on any platform and only fail on sparc64?

Maybe it accidentally seems to work on little-endian, thanks to the
different definitions of varlena headers?

regards, tom lane




Re: pgbench - allow to create partitioned tables

2019-10-01 Thread Rafia Sabih
On Tue, 1 Oct 2019 at 15:39, Amit Kapila  wrote:

> On Tue, Oct 1, 2019 at 11:51 AM Fabien COELHO  wrote:
>
>>
>> Hello Amit,
>>
>> > 1. ran pgindent
>> > 2. As per Alvaro's suggestions move few function definitions.
>> > 3. Changed one or two comments and fixed spelling at one place.
>>
>> Thanks for the improvements.
>>
>> Not sure why you put "XXX - " in front of "append_fillfactor" comment,
>> though.
>>
>>
> It is to indicate that we can do this after further consideration.
>
>
>> > + fprintf(stderr,
>> > + "no pgbench_accounts table found in search_path\n"
>> > + "Perhaps you need to do initialization (\"pgbench -i\") in database
>> > \"%s\"\n", PQdb(con));
>>
>> > Can anyone else think of a better error message either in wording or
>> > style for above case?
>>
>> No better idea from me. The second part is a duplicate from a earlier
>> comment, when getting the scale fails.
>>
>
> Yeah, I know that, but this doesn't look quite right.  I mean to say
> whatever we want to say via this message is correct, but I am not
> completely happy with the display part.  How about something like:
> "pgbench_accounts is missing, you need to do initialization (\"pgbench
> -i\") in database \"%s\"\n"?  Feel free to propose something else on
> similar lines?  If possible, I want to convey this information in a single
> sentence.
>
> How about, "pgbench_accounts is missing, initialize (\"pgbench -i\") in
database \"%s\"\n"?

>
-- 
Regards,
Rafia Sabih


Re: pgbench - allow to create partitioned tables

2019-10-01 Thread Fabien COELHO




Yeah, I know that, but this doesn't look quite right.  I mean to say
whatever we want to say via this message is correct, but I am not
completely happy with the display part.  How about something like:
"pgbench_accounts is missing, you need to do initialization (\"pgbench
-i\") in database \"%s\"\n"?  Feel free to propose something else on
similar lines?  If possible, I want to convey this information in a single
sentence.

How about, "pgbench_accounts is missing, initialize (\"pgbench -i\") in

database \"%s\"\n"?


I think that we should not presume too much about the solution: perhaps 
the user did not specify the right database or host and it has nothing to 
do with initialization.


Maybe something like:

"pgbench_accounts is missing, perhaps you need to initialize (\"pgbench 
-i\") in database \"%s\"\n"


The two sentences approach has the logic of "error" and a separate "hint" 
which is often used.


--
Fabien.




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Bruce Momjian
On Tue, Oct  1, 2019 at 03:48:31PM +0200, Tomas Vondra wrote:
> IMO leaks of sensitive data into the server log (say, as part of error
> messages, slow queries, ...) are a serious issue. It's one of the main
> issues with pgcrypto-style encryption, because it's trivial to leak e.g.
> keys into the server log. Even if proper key management prevents leaking
> keys, there are still user data - say, credit card numbers, and such.

Fortunately, the full-cluster encryption keys are stored encrypted in
pg_control and are never accessible unencrypted at the SQL level.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Optimize partial TOAST decompression

2019-10-01 Thread Tomas Vondra

On Tue, Oct 01, 2019 at 10:10:37AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Hmmm, this seems to trigger a failure on thorntail, which is a sparc64
machine (and it seems to pass on all x86 machines, so far).


gharial's not happy either, and I bet if you wait a bit longer you'll
see the same on other big-endian machines.


I wonder if that's wrong, somehow ... Maybe it should use VARSIZE_ANY,
but then how would it work on any platform and only fail on sparc64?


Maybe it accidentally seems to work on little-endian, thanks to the
different definitions of varlena headers?



Maybe. Let's see if just using VARSIZE_ANY does the trick. If not, I'll
investigate further.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgbench - allow to create partitioned tables

2019-10-01 Thread Rafia Sabih
On Tue, 1 Oct 2019 at 16:48, Fabien COELHO  wrote:

>
> >> Yeah, I know that, but this doesn't look quite right.  I mean to say
> >> whatever we want to say via this message is correct, but I am not
> >> completely happy with the display part.  How about something like:
> >> "pgbench_accounts is missing, you need to do initialization (\"pgbench
> >> -i\") in database \"%s\"\n"?  Feel free to propose something else on
> >> similar lines?  If possible, I want to convey this information in a
> single
> >> sentence.
> >>
> >> How about, "pgbench_accounts is missing, initialize (\"pgbench -i\") in
> > database \"%s\"\n"?
>
> I think that we should not presume too much about the solution: perhaps
> the user did not specify the right database or host and it has nothing to
> do with initialization.
>
> Maybe something like:
>
> "pgbench_accounts is missing, perhaps you need to initialize (\"pgbench
> -i\") in database \"%s\"\n"
>
> The two sentences approach has the logic of "error" and a separate "hint"
> which is often used.
>

+1 for error and hint separation.


-- 
Regards,
Rafia Sabih


Peripatus: Can someone look?

2019-10-01 Thread Larry Rosenman
My Buildfarm animal (peripatus) has been failing check since yesterday.  
Can someone look at it?



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: Optimize partial TOAST decompression

2019-10-01 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Oct 01, 2019 at 10:10:37AM -0400, Tom Lane wrote:
>> Maybe it accidentally seems to work on little-endian, thanks to the
>> different definitions of varlena headers?

> Maybe. Let's see if just using VARSIZE_ANY does the trick. If not, I'll
> investigate further.

FWIW, prairiedog got past that test, so whatever it is seems specific
to big-endian 64-bit.  Odd.

regards, tom lane




Re: Peripatus: Can someone look?

2019-10-01 Thread Tom Lane
Larry Rosenman  writes:
> My Buildfarm animal (peripatus) has been failing check since yesterday.  
> Can someone look at it?

It's been doing this in parallel queries, in v11 and up:

2019-09-29 19:00:15.534 CDT [49513:1] ERROR:  could not open shared memory 
segment "/PostgreSQL.1225945786": Permission denied
2019-09-29 19:00:15.535 CDT [48596:491] pg_regress/partition_prune ERROR:  
parallel worker failed to initialize
2019-09-29 19:00:15.535 CDT [48596:492] pg_regress/partition_prune HINT:  More 
details may be available in the server log.
2019-09-29 19:00:15.535 CDT [48596:493] pg_regress/partition_prune CONTEXT:  
PL/pgSQL function explain_parallel_append(text) line 5 at FOR over EXECUTE 
statement
2019-09-29 19:00:15.535 CDT [48596:494] pg_regress/partition_prune STATEMENT:  
select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on 
ab.a = a.a where a.a in(1, 0, 0)');
2019-09-29 19:00:15.535 CDT [48596:495] pg_regress/partition_prune WARNING:  
could not remove shared memory segment "/PostgreSQL.1225945786": Permission 
denied

which looks like an external problem to me --- certainly, nothing
we changed yesterday would explain it.  Did you change anything
about the system configuration, or do a software update?

regards, tom lane




Re: Peripatus: Can someone look?

2019-10-01 Thread Larry Rosenman

On 10/01/2019 10:46 am, Tom Lane wrote:

Larry Rosenman  writes:
My Buildfarm animal (peripatus) has been failing check since 
yesterday.

Can someone look at it?


It's been doing this in parallel queries, in v11 and up:

2019-09-29 19:00:15.534 CDT [49513:1] ERROR:  could not open shared
memory segment "/PostgreSQL.1225945786": Permission denied
2019-09-29 19:00:15.535 CDT [48596:491] pg_regress/partition_prune
ERROR:  parallel worker failed to initialize
2019-09-29 19:00:15.535 CDT [48596:492] pg_regress/partition_prune
HINT:  More details may be available in the server log.
2019-09-29 19:00:15.535 CDT [48596:493] pg_regress/partition_prune
CONTEXT:  PL/pgSQL function explain_parallel_append(text) line 5 at
FOR over EXECUTE statement
2019-09-29 19:00:15.535 CDT [48596:494] pg_regress/partition_prune
STATEMENT:  select explain_parallel_append('select avg(ab.a) from ab
inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
2019-09-29 19:00:15.535 CDT [48596:495] pg_regress/partition_prune
WARNING:  could not remove shared memory segment
"/PostgreSQL.1225945786": Permission denied

which looks like an external problem to me --- certainly, nothing
we changed yesterday would explain it.  Did you change anything
about the system configuration, or do a software update?

regards, tom lane


I did do an upgrade to a later SVN rev.

Let me reboot and see if that fixes anything.

(this is -CURRENT on FreeBSD, so it's always a moving target).


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: Value of Transparent Data Encryption (TDE)

2019-10-01 Thread Bruce Momjian
On Tue, Oct  1, 2019 at 03:43:05PM +0200, Tomas Vondra wrote:
> On Mon, Sep 30, 2019 at 05:40:52PM -0400, Bruce Momjian wrote:
> Maybe. I think this is approaching the problem from the wrong angle.
> Encryption is more a means of achieving something. OK, for compliance
> purposes it's useful to be able to tick "encryption" checkbox. But other
> than that, people really care about threat models and how encryption
> improves them (or does not).

Yes, that is what I am trying to do with this email thread.

> So I think it'd be valuable to improve the "thread models" section on
> that wiki page, with more detailed cases. We need to explain what
> capabilities the attacker has (can he read files?can he interact with
> the database? can he read memory? ..) and then explain how that works
> with encrypted cluster.
> 
> 
> > *  perhaps easier to implement than file system encryption
> > 
> 
> Not sure. IMO  filesystem encryption is fairly simple to use, to the
> extent that it's hard to beat. The problem is usually people can't use
> it for various reasons - lack of support on their OS, no access to the
> block device, problems with obtaining the privileges etc.

Right, that's the "perhaps easier" part of my text above.

> Having it built into the database menas you can sidestep most of those
> issue (e.g. you can deploy it as a DBA, on arbitrary OS, ...).
> 
> Plus it allows features you can't easily achieve with fs encryption,
> because the filesystem only sees opaque data files. So having keys per
> database/user/... is easier from within the database.

Yes, but we will not be doing that for the first release because of the
complexity related to handling this in WAL and requiring crash recovery
to be able to unlock all the keys for replay.  I personally think that
database/user/... keys are best done at the SQL level, with proper
locking.  pgcryptokey (http://momjian.us/download/pgcryptokey/) is an
example of that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Bruce Momjian
On Tue, Oct  1, 2019 at 08:40:26AM -0400, Andrew Dunstan wrote:
> 
> On 10/1/19 6:12 AM, Amit Kapila wrote:
> > On Tue, Oct 1, 2019 at 1:25 PM Smith, Peter  
> > wrote:
> >> Dear Hackers,
> >>
> >> I have identified some OSS code which maybe can make use of C99 designated 
> >> initialisers for nulls/values arrays.
> >>
> >> ~
> >>
> >> Background:
> >> There are lots of tuple operations where arrays of values and flags are 
> >> being passed.
> >> Typically these arrays are being previously initialised 0/false by memset.
> >> By modifying code to use C99 designated initialiser syntax [1], most of 
> >> these memsets can become redundant.
> >> Actually, this mechanism is already being used in some of the existing OSS 
> >> code. This patch/proposal just propagates the same idea to all other 
> >> similar places I could find.
> >>
> >> ~
> >>
> >> Result:
> >> Less code. Removes ~200 unnecessary memsets.
> >> More consistent initialisation.
> >>
> > +1.  This seems like an improvement.  I can review and take this
> > forward unless there are objections from others.
> >
> >
> 
> +1.

I like it!

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




RE: Shared Memory: How to use SYSV rather than MMAP ?

2019-10-01 Thread REIX, Tony
Hi,
I've been able to rebuild the 12rc1 on AIX 7.2 with my old patches, except the 
one dealing with shared memory for sure.
Tests are running.
I'll look at the proposed patch tomorrow.

Regards,

Tony

De : Alvaro Herrera 
Envoyé : vendredi 27 septembre 2019 14:39
À : REIX, Tony 
Cc : Thomas Munro ; Andres Freund ; 
Robert Haas ; Pg Hackers ; 
EMPEREUR-MOT, SYLVIE 
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

Hi Tony,

On 2019-Sep-27, REIX, Tony wrote:

> Hello Thomas, Alvaro,
>
> Sorry for the late answer, I missed your message of September 10. (I'm 
> working on several different projects in parallel.)
> Let me talk with Sylvie ASAP and see when I will be able to test it, probably 
> next week, Tuesday. Is that OK for you?

Sure -- I'm inclined to push this patch in state Needs Review to the
November commitfest in this case.

--
Álvaro Herrera
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.2ndQuadrant.com%2F&data=02%7C01%7Ctony.reix%40atos.net%7C1a83e1b688064d2068c808d7434f8ac5%7C33440fc6b7c7412cbb730e70b0198d5a%7C0%7C0%7C637051881216955303&sdata=YwnDQn4%2Finz3eT7clMl7fKK6WEKHtTebqcPbNy4N8ms%3D&reserved=0
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Tom Lane
Bruce Momjian  writes:
>>> On Tue, Oct 1, 2019 at 1:25 PM Smith, Peter  
>>> wrote:
 There are lots of tuple operations where arrays of values and flags are 
 being passed.
 Typically these arrays are being previously initialised 0/false by memset.
 By modifying code to use C99 designated initialiser syntax [1], most of 
 these memsets can become redundant.

> I like it!

FYI, I checked into whether this would result in worse generated code.
In the one place I checked (InsertPgAttributeTuple, which hopefully
is representative), I got *exactly the same* assembly code before
and after, on both a somewhat-aging gcc and fairly modern clang.
Hadn't quite expected that, but it removes any worries about whether
we might be losing anything.

Note though that InsertPgAttributeTuple uses memset(), while some of
these other places use MemSet().  The code I see being generated for
MemSet() is also the same(!) on clang, but it is different and
probably worse on gcc.  I wonder if it isn't time to kick MemSet to
the curb.  We have not re-evaluated that macro in more than a dozen
years, and compilers have surely changed.

regards, tom lane




Re: Value of Transparent Data Encryption (TDE)

2019-10-01 Thread Bruce Momjian
On Tue, Oct  1, 2019 at 11:54:26AM -0400, Bruce Momjian wrote:
> On Tue, Oct  1, 2019 at 03:43:05PM +0200, Tomas Vondra wrote:
> > Plus it allows features you can't easily achieve with fs encryption,
> > because the filesystem only sees opaque data files. So having keys per
> > database/user/... is easier from within the database.
> 
> Yes, but we will not be doing that for the first release because of the
> complexity related to handling this in WAL and requiring crash recovery
> to be able to unlock all the keys for replay.  I personally think that
> database/user/... keys are best done at the SQL level, with proper
> locking.  pgcryptokey (http://momjian.us/download/pgcryptokey/) is an
> example of that.

Just to give more detail.  Initially, there was a desire to store keys
in only one place, either in the file system or in database tables. 
However, it became clear that the needs of booting the server and crash
recovery required file system keys, and per-user/db keys were best done
at the SQL level, so that indexing can be used, and logical dumps
contain the locked keys.  SQL-level storage allows databases to be
completely independent of other databases in terms of key storage and
usage.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Andres Freund
Hi,

On 2019-10-01 12:17:08 -0400, Tom Lane wrote:
> FYI, I checked into whether this would result in worse generated code.
> In the one place I checked (InsertPgAttributeTuple, which hopefully
> is representative), I got *exactly the same* assembly code before
> and after, on both a somewhat-aging gcc and fairly modern clang.
> Hadn't quite expected that, but it removes any worries about whether
> we might be losing anything.

I think the only case where it's plausible to be really worse is where
we intentionally leave part of such allocations uninitialized - which we
can't easily do in these cases because the rest of the struct will also
get zeroed out.  The compiler will probably figure it out in some cases,
but there's plenty where it can't.  But I don't think there's many
places like that in our code though.


> Note though that InsertPgAttributeTuple uses memset(), while some of
> these other places use MemSet().  The code I see being generated for
> MemSet() is also the same(!) on clang, but it is different and
> probably worse on gcc.  I wonder if it isn't time to kick MemSet to
> the curb.  We have not re-evaluated that macro in more than a dozen
> years, and compilers have surely changed.

Yes, we really should!

- Andres




Re: pgsql: Implement jsonpath .datetime() method

2019-10-01 Thread Alexander Korotkov
On Mon, Sep 30, 2019 at 10:56 PM Robert Haas  wrote:
> On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
>  wrote:
> > So, jsonpath behaves like 100 is not greater than 2020.  This
> > looks like plain false.  And user can't expect that unless she is
> > familiar with our particular issues.  Now I got opinion  that such
> > errors shouldn't be suppressed.  We can't suppress *every* error.  If
> > trying to do this, we can come to an idea to suppress OOM error and
> > return garbage then, which is obviously ridiculous.  Opinions?
>
> I don't know enough about jsonpath to have a view on specifically
> which errors ought to be suppressed, but I agree that it's probably
> not all of them. In fact, I'd go so far as to say that thinking about
> it in terms of error suppression is probably not the right approach in
> the first place. Rather, you want to ask what behavior you're trying
> to create.
>
> For example, if I'm trying to write a function that takes a string as
> input and returns JSON, where the result is formatted as a number if
> possible or a string otherwise, I might want access at the C level to
> the guts of numeric_in, with all parsing errors returned rather than
> thrown. But it would be silly to suppress an out-of-memory condition,
> because that doesn't help the caller. The caller wants to know whether
> the thing can be parsed as a number or not, and that has nothing to do
> with whether we're out of memory, so an out-of-memory error should
> still be thrown.
>
> In this case here, it seems to me that you should similarly start by
> defining the behavior you're trying to create. Unless that's clearly
> defined, deciding which errors to suppress may be difficult.

Making C functions return errors rather than throw is what we're
implementing in our patchsets.  In big picture the behavior we're
trying to create is SQL Standard 2016.  It defines error handling as
following.

> The SQL operators JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS provide
> the following mechanisms to handle these errors:
> 1) The SQL/JSON path language traps any errors that occur during the 
> evaluation
> of a . Depending on the precise 
> contained in the , the result may be Unknown, True, or
> False, depending on the outcome of non-error tests evaluated in the  predicate>.
> 2) The SQL/JSON path language has two modes, strict and lax, which govern
> structural errors, as follows:
>   a) In lax mode:
> i) If an operation requires an SQL/JSON array but the operand is not an 
> SQL
> JSON array, then the operand is first “wrapped” in an SQL/JSON array prior
> to performing the operation.
> ii) If an operation requires something other than an SQL/JSON array, but
> the operand is an SQL/JSON array, then the operand is “unwrapped” by
> converting its elements into an SQL/JSON sequence prior to performing the
> operation.
> iii) After applying the preceding resolutions to structural errors, if
> there is still a structural error, the result is an empty SQL/JSON
> sequence.
>   b) In strict mode, if the structural error occurs within aexpression>, then the error handling of  applies
>Otherwise, a structural error is an unhandled error.
> 3) Non-structural errors outside of a  are always
> unhandled errors, resulting in an exception condition returned from the path
> engine to the SQL/JSON query operator.
> 4) The SQL/JSON query operators provide an ON ERROR clause to specify the
> behavior in case of an input conversion error, an unhandled structural error,
> an unhandled non-structural error, or an output conversion error.

So, basically standard requires us to suppress any error happening in
filter expression.  But as I wrote before suppression of errors in
datetime comparison may lead to surprising results.  That happens in
rare corner cases, but still.  This makes uneasy choice between
consistent behavior and standard behavior.

However, Nikita Glukhov gave to good idea about that.  Instead on
thinking about whether we should suppress or not cast errors in
datetime comparison, we may just eliminate those error.  So, if we
know that casting date to timestamp overflows upper bound of finite
timestamp, then we also know that this date is greater than any finite
timestamp.  So, we still able to do correct comparison.  I'm going to
implement this and post a patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On Mon, Sep 30, 2019 at 10:56 PM Robert Haas  wrote:
>
> On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
>  wrote:
> > So, jsonpath behaves like 100 is not greater than 2020.  This
> > looks like plain false.  And user can't expect that unless she is
> > familiar with our particular issues.  Now I got opinion  that such
> > errors shouldn't be suppressed.  We can't suppress *every* error.  If
> > trying to do this, we can come to an idea to suppress OOM error and
> > return garbage then, which

Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Thomas Munro
On Wed, Oct 2, 2019 at 5:49 AM Andres Freund  wrote:
> On 2019-10-01 12:17:08 -0400, Tom Lane wrote:
> > Note though that InsertPgAttributeTuple uses memset(), while some of
> > these other places use MemSet().  The code I see being generated for
> > MemSet() is also the same(!) on clang, but it is different and
> > probably worse on gcc.  I wonder if it isn't time to kick MemSet to
> > the curb.  We have not re-evaluated that macro in more than a dozen
> > years, and compilers have surely changed.
>
> Yes, we really should!

+1

FWIW I experimented with that over here:

https://www.postgresql.org/message-id/CA%2BhUKGLfa6ANa0vs7Lf0op0XBH05HE8SyX8NFhDyT7k2CHYLXw%40mail.gmail.com




Re: Modest proposal for making bpchar less inconsistent

2019-10-01 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Sat, 28 Sep 2019 08:22:22 -0400, Bruce Momjian  wrote in 
> <2019092812.ga26...@momjian.us>
>> On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote:
>>> Dne pá 13. 9. 2019 16:43 uživatel Tom Lane  napsal:
 It struck me that the real reason that we keep getting gripes about
 the weird behavior of CHAR(n) is that these functions (and, hence,
 their corresponding operators) fail to obey the "trailing blanks
 aren't significant" rule:
 ...
 We could fix this, and save some catalog space too, if we simply
 deleted these functions/operators and let such calls devolve
 into implicit casts to text.

>> Yes, I think this is a great idea!

> I totally agree.

I experimented with this, as per the attached simple patch.  If you just
apply the catalog deletions, no regression test results change, which
says more about our lack of test coverage in this area than anything else.
So I added a few simple test cases too.

However, playing with this more, I'm not sure it's the direction we want
to go.  I realized that the BPCHAR-related code paths in like_support.c
are dead code with this patch, because it's no longer possible to match
a LIKE/regex operator to a bpchar column.  For example, in existing
releases you can do

regression=# create table t(f1 char(20) unique);
CREATE TABLE
regression=# explain select * from t where f1 like 'abcdef';
   QUERY PLAN   

 Index Only Scan using t_f1_key on t  (cost=0.15..8.17 rows=1 width=24)
   Index Cond: (f1 = 'abcdef'::bpchar)
   Filter: (f1 ~~ 'abcdef'::text)
(3 rows)

regression=# explain select * from t where f1 like 'abcdef%';
 QUERY PLAN 

 Bitmap Heap Scan on t  (cost=4.23..14.39 rows=8 width=24)
   Filter: (f1 ~~ 'abcdef%'::text)
   ->  Bitmap Index Scan on t_f1_key  (cost=0.00..4.23 rows=8 width=0)
 Index Cond: ((f1 >= 'abcdef'::bpchar) AND (f1 < 'abcdeg'::bpchar))
(4 rows)

But with this patch, you just get dumb seqscan plans because the
expression trees now look like "f1::text ~~ constant" which doesn't
match to an index on the bare column f1.

If we wanted to preserve these index optimizations while still
redefining the pattern match operators as ignoring trailing whitespace,
we could keep the operators/functions but change them to point at new
C functions that strip trailing blanks before invoking the pattern
match machinery.  Some thought would need to be given as well to whether
like_fixed_prefix et al need to behave differently to agree with this
behavior.  (Offhand it seems like they might need to strip trailing
blanks from what would otherwise be the fixed prefix, but I'm not
quite sure.)

That would be much more work than this patch of course (though still
not an enormous amount), and I'm not quite sure if it's worth the
trouble.  Is this a case that anyone is using in practice?

regards, tom lane

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index fa7dc96..1b430ef 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -1166,15 +1166,6 @@
   oprnegate => '<>(bpchar,bpchar)', oprcode => 'bpchareq', oprrest => 'eqsel',
   oprjoin => 'eqjoinsel' },
 
-{ oid => '1055', oid_symbol => 'OID_BPCHAR_REGEXEQ_OP',
-  descr => 'matches regular expression, case-sensitive',
-  oprname => '~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool',
-  oprnegate => '!~(bpchar,text)', oprcode => 'bpcharregexeq',
-  oprrest => 'regexeqsel', oprjoin => 'regexeqjoinsel' },
-{ oid => '1056', descr => 'does not match regular expression, case-sensitive',
-  oprname => '!~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool',
-  oprnegate => '~(bpchar,text)', oprcode => 'bpcharregexne',
-  oprrest => 'regexnesel', oprjoin => 'regexnejoinsel' },
 { oid => '1057', descr => 'not equal',
   oprname => '<>', oprleft => 'bpchar', oprright => 'bpchar',
   oprresult => 'bool', oprcom => '<>(bpchar,bpchar)',
@@ -1444,15 +1435,6 @@
   oprname => '!~~', oprleft => 'text', oprright => 'text', oprresult => 'bool',
   oprnegate => '~~(text,text)', oprcode => 'textnlike', oprrest => 'nlikesel',
   oprjoin => 'nlikejoinsel' },
-{ oid => '1211', oid_symbol => 'OID_BPCHAR_LIKE_OP',
-  descr => 'matches LIKE expression',
-  oprname => '~~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool',
-  oprnegate => '!~~(bpchar,text)', oprcode => 'bpcharlike',
-  oprrest => 'likesel', oprjoin => 'likejoinsel' },
-{ oid => '1212', descr => 'does not match LIKE expression',
-  oprname => '!~~', oprleft => 'bpchar', oprright => 'text',
-  oprresult => 'bool', oprnegate => '~~(bpchar,text)', oprcode => 'bpcharnlike',
-  oprrest => 'nlikesel', op

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-01 Thread Amit Kapila
On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra 
wrote:

> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:
> >
> >On further testing, I found that the patch seems to have problems with
> >toast.  Consider below scenario:
> >Session-1
> >Create table large_text(t1 text);
> >INSERT INTO large_text
> >SELECT (SELECT string_agg('x', ',')
> >FROM generate_series(1, 100)) FROM generate_series(1, 1000);
> >
> >Session-2
> >SELECT * FROM pg_create_logical_replication_slot('regression_slot',
> >'test_decoding');
> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
> >*--kaboom*
> >
> >The second statement in Session-2 leads to a crash.
> >
>
> OK, thanks for the report - will investigate.
>

It was an assertion failure in ReorderBufferCleanupTXN at below line:
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);



> >Other than that, I am not sure if the changes related to spill to disk
> >after logical_decoding_work_mem works for toast table as I couldn't hit
> >that code for toast table case, but I might be missing something.  As
> >mentioned previously, I feel there should be some way to test whether this
> >patch works for the cases it claims to work.  As of now, I have to check
> >via debugging.  Let me know if there is any way, I can test this.
> >
>
> That's one of the reasons why I proposed to move the statistics (which
> say how many transactions / bytes were spilled to disk) from a later
> patch in the series. I don't think there's a better way.
>
>
I like that idea, but I think you need to split that patch to only get the
stats related to the spill.  It would be easier to review if you can
prepare that atop of
0001-Add-logical_decoding_work_mem-to-limit-ReorderBuffer.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Smith, Peter
From: Isaac Morland  Sent: Tuesday, 1 October 2019 
11:32 PM

>Typical Example:
>Before:
>        Datum           values[Natts_pg_attribute];
>        bool            nulls[Natts_pg_attribute];
>        ...
>        memset(values, 0, sizeof(values));
>        memset(nulls, false, sizeof(nulls));
>After:
>        Datum           values[Natts_pg_attribute] = {0};
>        bool            nulls[Natts_pg_attribute] = {0};
>
>I hope you'll forgive a noob question. Why does the "After" initialization for 
>the boolean array have {0} rather than {false}? 

It is a valid question. 

I found that the original memsets that this patch replaces were already using 0 
and false interchangeably. So I just picked one. 
Reasons I chose {0} over {false} are: (a) laziness, and (b) consistency with 
the values[] initialiser.

But it is no problem to change the bool initialisers to {false} if that becomes 
a committer review issue.

Kind Regards
--
Peter Smith
Fujitsu Australia


Re: pgbench - allow to create partitioned tables

2019-10-01 Thread Amit Kapila
On Tue, Oct 1, 2019 at 8:45 PM Rafia Sabih 
wrote:

> On Tue, 1 Oct 2019 at 16:48, Fabien COELHO  wrote:
>
>>
>> >> Yeah, I know that, but this doesn't look quite right.  I mean to say
>> >> whatever we want to say via this message is correct, but I am not
>> >> completely happy with the display part.  How about something like:
>> >> "pgbench_accounts is missing, you need to do initialization (\"pgbench
>> >> -i\") in database \"%s\"\n"?  Feel free to propose something else on
>> >> similar lines?  If possible, I want to convey this information in a
>> single
>> >> sentence.
>> >>
>> >> How about, "pgbench_accounts is missing, initialize (\"pgbench -i\") in
>> > database \"%s\"\n"?
>>
>> I think that we should not presume too much about the solution: perhaps
>> the user did not specify the right database or host and it has nothing to
>> do with initialization.
>>
>> Maybe something like:
>>
>> "pgbench_accounts is missing, perhaps you need to initialize (\"pgbench
>> -i\") in database \"%s\"\n"
>>
>> The two sentences approach has the logic of "error" and a separate "hint"
>> which is often used.
>>
>
> +1 for error and hint separation.
>

Okay, if you people like the approach of two sentences for the separation
of "hint" and "error", then I think the second line should end with a
period.  See below note in docs[1]:
"Grammar and Punctuation

The rules are different for primary error messages and for detail/hint
messages:

Primary error messages: Do not capitalize the first letter. Do not end a
message with a period. Do not even think about ending a message with an
exclamation point.

Detail and hint messages: Use complete sentences, and end each with a
period. Capitalize the first word of sentences. Put two spaces after the
period if another sentence follows (for English text; might be
inappropriate in other languages)."

Also, the similar style is used in other places in code, see
contrib/oid2name/oid2name.c, contrib/pg_standby/pg_standby.c for similar
usage.

I shall modify this before commit unless you disagree.

[1] - https://www.postgresql.org/docs/devel/error-style-guide.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Amit Kapila
On Wed, Oct 2, 2019 at 4:53 AM Smith, Peter 
wrote:

> From: Isaac Morland  Sent: Tuesday, 1 October
> 2019 11:32 PM
>
> >Typical Example:
> >Before:
> >Datum   values[Natts_pg_attribute];
> >boolnulls[Natts_pg_attribute];
> >...
> >memset(values, 0, sizeof(values));
> >memset(nulls, false, sizeof(nulls));
> >After:
> >Datum   values[Natts_pg_attribute] = {0};
> >boolnulls[Natts_pg_attribute] = {0};
> >
> >I hope you'll forgive a noob question. Why does the "After"
> initialization for the boolean array have {0} rather than {false}?
>
> It is a valid question.
>
> I found that the original memsets that this patch replaces were already
> using 0 and false interchangeably. So I just picked one.
> Reasons I chose {0} over {false} are: (a) laziness, and (b) consistency
> with the values[] initialiser.
>

In this case, I think it is better to be consistent in all the places.  As
of now (without patch), we are using 'false' or '0' to initialize the
boolean array.  See below two instances from the patch:
1.
@@ -607,9 +601,9 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid
relationOid, int attnum,

  Relation rel;

- Datum values[Natts_pg_statistic_ext_data];
- bool nulls[Natts_pg_statistic_ext_data];
- bool replaces[Natts_pg_statistic_ext_data];
+ Datum values[Natts_pg_statistic_ext_data] = {0};
+ bool nulls[Natts_pg_statistic_ext_data] = {0};
+ bool replaces[Natts_pg_statistic_ext_data] = {0};

  oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statsOid));
  if (!HeapTupleIsValid(oldtup))
@@ -630,10 +624,6 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid
relationOid, int attnum,
  * OK, we need to reset some statistics. So let's build the new tuple,
  * replacing the affected statistics types with NULL.
  */
- memset(nulls, 0, Natts_pg_statistic_ext_data * sizeof(bool));
- memset(replaces, 0, Natts_pg_statistic_ext_data * sizeof(bool));
- memset(values, 0, Natts_pg_statistic_ext_data * sizeof(Datum));

2.
@@ -69,10 +69,10 @@ CreateStatistics(CreateStatsStmt *stmt)
  Oid namespaceId;
  Oid stxowner = GetUserId();
  HeapTuple htup;
- Datum values[Natts_pg_statistic_ext];
- bool nulls[Natts_pg_statistic_ext];
- Datum datavalues[Natts_pg_statistic_ext_data];
- bool datanulls[Natts_pg_statistic_ext_data];
+ Datum values[Natts_pg_statistic_ext] = {0};
+ bool nulls[Natts_pg_statistic_ext] = {0};
+ Datum datavalues[Natts_pg_statistic_ext_data] = {0};
+ bool datanulls[Natts_pg_statistic_ext_data] = {0};
  int2vector *stxkeys;
  Relation statrel;
  Relation datarel;
@@ -330,9 +330,6 @@ CreateStatistics(CreateStatsStmt *stmt)
  /*
  * Everything seems fine, so let's build the pg_statistic_ext tuple.
  */
- memset(values, 0, sizeof(values));
- memset(nulls, false, sizeof(nulls));
-
  statoid = GetNewOidWithIndex(statrel, StatisticExtOidIndexId,
  Anum_pg_statistic_ext_oid);
  values[Anum_pg_statistic_ext_oid - 1] = ObjectIdGetDatum(statoid);
@@ -357,9 +354,6 @@ CreateStatistics(CreateStatsStmt *stmt)
  */
  datarel = table_open(StatisticExtDataRelationId, RowExclusiveLock);

- memset(datavalues, 0, sizeof(datavalues));
- memset(datanulls, false, sizeof(datanulls));

In the first usage, we are initializing the boolean array with 0 and in the
second case, we are using false.   The patch changes it to use 0 at all the
places which I think is better.

I don't have any strong opinion on this, but I would mildly prefer to
initialize boolean array with false just for the sake of readability (we
generally initializing booleans with false).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-01 Thread Smith, Peter
From: Amit Kapila  Sent: Tuesday, 1 October 2019 8:12 
PM

> +1.  This seems like an improvement.  I can review and take this forward 
> unless there are objections from others.

FYI - I created a Commitfest entry for this here: 
https://commitfest.postgresql.org/25/2290/

Kind Regards
--
Peter Smith
Fujitsu Australia



My buildfarm member now giving permission denied

2019-10-01 Thread Larry Rosenman

FreeBSD SVN rev:
r352600   -  -  1.69G 2019-09-22 13:13
r352873   NR /  43.1G 2019-09-29 16:36

I went from r352600 to r352873 and now I'm getting PostgreSQL permission 
denied

errors on the check phase of the build.

FreeBSD folks: Any ideas?
PostgreSQL folks: FYI.



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: My buildfarm member now giving permission denied

2019-10-01 Thread Larry Rosenman

On 10/01/2019 8:27 pm, Larry Rosenman wrote:

FreeBSD SVN rev:
r352600   -  -  1.69G 2019-09-22 13:13
r352873   NR /  43.1G 2019-09-29 16:36

I went from r352600 to r352873 and now I'm getting PostgreSQL 
permission denied

errors on the check phase of the build.

FreeBSD folks: Any ideas?
PostgreSQL folks: FYI.

latest build log:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2019-10-02%2001%3A20%3A14


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: Peripatus: Can someone look?

2019-10-01 Thread Thomas Munro
On Wed, Oct 2, 2019 at 4:49 AM Larry Rosenman  wrote:
> On 10/01/2019 10:46 am, Tom Lane wrote:
> > Larry Rosenman  writes:
> >> My Buildfarm animal (peripatus) has been failing check since
> >> yesterday.
> >> Can someone look at it?
> >
> > It's been doing this in parallel queries, in v11 and up:
> >
> > 2019-09-29 19:00:15.534 CDT [49513:1] ERROR:  could not open shared
> > memory segment "/PostgreSQL.1225945786": Permission denied
> > 2019-09-29 19:00:15.535 CDT [48596:491] pg_regress/partition_prune
> > ERROR:  parallel worker failed to initialize
> > 2019-09-29 19:00:15.535 CDT [48596:492] pg_regress/partition_prune
> > HINT:  More details may be available in the server log.
> > 2019-09-29 19:00:15.535 CDT [48596:493] pg_regress/partition_prune
> > CONTEXT:  PL/pgSQL function explain_parallel_append(text) line 5 at
> > FOR over EXECUTE statement
> > 2019-09-29 19:00:15.535 CDT [48596:494] pg_regress/partition_prune
> > STATEMENT:  select explain_parallel_append('select avg(ab.a) from ab
> > inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
> > 2019-09-29 19:00:15.535 CDT [48596:495] pg_regress/partition_prune
> > WARNING:  could not remove shared memory segment
> > "/PostgreSQL.1225945786": Permission denied
> >
> > which looks like an external problem to me --- certainly, nothing
> > we changed yesterday would explain it.  Did you change anything
> > about the system configuration, or do a software update?
> >
> >   regards, tom lane
>
> I did do an upgrade to a later SVN rev.
>
> Let me reboot and see if that fixes anything.
>
> (this is -CURRENT on FreeBSD, so it's always a moving target).

Hi Larry,

I'm seeing this on my FreeBSD 13 bleeding edge system too (built a
couple of days ago) and will see if I can find out what's up with
that.  The most obvious culprit is the stuff that just landed in the
kernel to support Linux-style memfd_create() and thereby changed
around some shm_open() related things.  Seems to be clearly not a
PostgreSQL problem.

Thanks,
Thomas




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Bruce Momjian
On Mon, Sep 30, 2019 at 05:26:33PM -0400, Bruce Momjian wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
> 
>   
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> 
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

Oh, there is also consideration that the pg_replslot directory might
also contain user data.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Peripatus: Can someone look?

2019-10-01 Thread Larry Rosenman

On 10/01/2019 8:33 pm, Thomas Munro wrote:

On Wed, Oct 2, 2019 at 4:49 AM Larry Rosenman  wrote:

On 10/01/2019 10:46 am, Tom Lane wrote:
> Larry Rosenman  writes:
>> My Buildfarm animal (peripatus) has been failing check since
>> yesterday.
>> Can someone look at it?
>
> It's been doing this in parallel queries, in v11 and up:
>
> 2019-09-29 19:00:15.534 CDT [49513:1] ERROR:  could not open shared
> memory segment "/PostgreSQL.1225945786": Permission denied
> 2019-09-29 19:00:15.535 CDT [48596:491] pg_regress/partition_prune
> ERROR:  parallel worker failed to initialize
> 2019-09-29 19:00:15.535 CDT [48596:492] pg_regress/partition_prune
> HINT:  More details may be available in the server log.
> 2019-09-29 19:00:15.535 CDT [48596:493] pg_regress/partition_prune
> CONTEXT:  PL/pgSQL function explain_parallel_append(text) line 5 at
> FOR over EXECUTE statement
> 2019-09-29 19:00:15.535 CDT [48596:494] pg_regress/partition_prune
> STATEMENT:  select explain_parallel_append('select avg(ab.a) from ab
> inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
> 2019-09-29 19:00:15.535 CDT [48596:495] pg_regress/partition_prune
> WARNING:  could not remove shared memory segment
> "/PostgreSQL.1225945786": Permission denied
>
> which looks like an external problem to me --- certainly, nothing
> we changed yesterday would explain it.  Did you change anything
> about the system configuration, or do a software update?
>
>   regards, tom lane

I did do an upgrade to a later SVN rev.

Let me reboot and see if that fixes anything.

(this is -CURRENT on FreeBSD, so it's always a moving target).


Hi Larry,

I'm seeing this on my FreeBSD 13 bleeding edge system too (built a
couple of days ago) and will see if I can find out what's up with
that.  The most obvious culprit is the stuff that just landed in the
kernel to support Linux-style memfd_create() and thereby changed
around some shm_open() related things.  Seems to be clearly not a
PostgreSQL problem.

Thanks,
Thomas


Thanks, Thomas.

Here's my 2 SVN revs if that would help:
FreeBSD SVN rev:
r352600   -  -  1.69G 2019-09-22 13:13
r352873   NR /  43.1G 2019-09-29 16:36


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Is querying SPITupleTable with SQL possible?

2019-10-01 Thread Tom Mercha
Dear Hackers

I am using PostgreSQL's SPI to execute a simple SQL query (SELECT * FROM 
...) via SPI_exec. As a a result, I get an SPITupleTable with the 
results of my query.

Now that I have the SPITupleTable, I was wondering if it would be 
possible to later query over it further in my SQL statements using SPI, 
for example, something a bit similar to SPI_Exec ("Select * FROM 
:mySPITupleTable", 0);

My motivation is to treat, and use the SPITupleTable as 'intermediate' 
or 'temporary' tables which I would discard early - I want to apply a 
series of manipulations to my SPITupleTable before I would finally store 
it in the tablespace. Therefore, minimization of any overheads is also 
very important. I understand that I could introduce a CREATE TABLE to my 
SQL query and reference a table in that way, but I am under the 
impression that it would incur unnecessary overheads?

So, I would be grateful if anyone could help me understand how to 
manipulate the SPITupleTable further with SQL or indicate if it is at 
all possible. In the case that it is not possible, I would also be 
interested in alternatives and discussion on overheads.

Thanks in advance.

Best,
Tom


Re: dropdb --force

2019-10-01 Thread Amit Kapila
On Thu, Sep 26, 2019 at 7:18 PM Pavel Stehule 
wrote:

>
> út 24. 9. 2019 v 14:52 odesílatel Amit Kapila 
> napsal:
>
>>
>> One other minor comment:
>> +
>> +  This will also fail, if the connections do not terminate in 5
>> seconds.
>> + 
>>
>> Is there any implementation in the patch for the above note?
>>
>
> Yes, is there.
>
> The force flag ensure sending SIGTERM to related clients. Nothing more.
> There are still check
>
> -->if (CountOtherDBBackends(db_id, ¬herbackends, &npreparedxacts))
> <--><-->ereport(ERROR,
> <--><--><--><-->(errcode(ERRCODE_OBJECT_IN_USE),
> <--><--><--><--> errmsg("database \"%s\" is being accessed by other users",
> <--><--><--><--><--><-->dbname),
> <--><--><--><--> errdetail_busy_db(notherbackends, npreparedxacts)));
>
> that can fails after 5 sec. Sending signal doesn't ensure nothing, so I am
> for no changes in these check.
>

I think 5 seconds is a hard-coded value that can even change in the
future.  So, it is better to write something more generic like "This will
also fail if we are not able to terminate connections" or something like
that.  This part of the documentation might change after addressing the
other comments below.

One more point I would like to add here is that I think it is worth
> considering to split this patch by keeping the changes in dropdb
> utility as a separate patch.  Even though the code is not very much
> but I think it can be a separate patch atop the main patch which
> contains the core server changes.
>

> I did it - last patch contains server side only. I expect so client side
(very small patch) can be nex

I still see the code related to dropdb utility in the patch.  See,
diff --git a/src/bin/scripts/dropdb.c b/src/bin/scripts/dropdb.c
index dacd8e5f1d..1bb80fda74 100644
--- a/src/bin/scripts/dropdb.c
+++ b/src/bin/scripts/dropdb.c
@@ -34,6 +34,7 @@ main(int argc, char *argv[])
  {"interactive", no_argument, NULL, 'i'},
  {"if-exists", no_argument, &if_exists, 1},
  {"maintenance-db", required_argument, NULL, 2},
+ {"force", no_argument, NULL, 'f'},
  {NULL, 0, NULL, 0}
  };

Few other comments:

1.
+void
+TerminateOtherDBBackends(Oid databaseId)
+{
+ ProcArrayStruct *arrayP = procArray;
+ List   *pids = NIL;
+ int i;
+
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+
+ for (i = 0; i < procArray->numProcs; i++)
+ {
+ int pgprocno = arrayP->pgprocnos[i];
+ PGPROC   *proc = &allProcs[pgprocno];
+
+ if (proc->databaseId != databaseId)
+ continue;
+ if (proc == MyProc)
+ continue;
+
+ if (proc->pid != 0)
+ pids = lappend_int(pids, proc->pid);
+ }

So here we are terminating only connection which doesn't have any prepared
transaction.  The behavior of this patch with the prepared transactions
will be terrible. Basically, after terminating all the connections via
TerminateOtherDBBackends, we will give error once CountOtherDBBackends is
invoked.  I have tested the same and it gives an error like below:

postgres=# drop database db1 With (Force);
ERROR:  database "db1" is being accessed by other users
DETAIL:  There is 1 prepared transaction using the database.

I think we have two options here:
(a) Document that even with force option, if there are any prepared
transactions in the same database, we won't drop the database.  Along with
this, fix the code such that we don't terminate other connections if the
prepared transactions are active.
(b) Rollback the prepared transactions.

I think (a) is a better option because if the number of prepared
transactions is large, then it might take time and I am not sure if it is
worth to add the complexity of rolling back all the prepared xacts.  OTOH,
if you or others think option (b) is good and doesn't add much complexity,
then I think it is worth exploring that option.

I think we should definitely do something to deal with this even if you
don't like the proposed options because the current behavior of the patch
seems worse than either of these options.

2.
-DROP DATABASE [ IF EXISTS ] name
+DROP DATABASE [ IF EXISTS ] name [ WITH ( option [, ...] ) ]

It is better to keep WITH clause as optional similar to Copy command.  This
might address Peter E's concern related to WITH clause.

3.
- * DROP DATABASE [ IF EXISTS ]
+ * DROP DATABASE [ ( options ) ] [ IF EXISTS ]

You seem to forget to change the syntax in the above comments after
changing the patch.

4.
+   If anyone else is connected to the target database, this command will
fail
+   - unless you use the FORCE option described below.

I don't understand the significance of using '-' before unless.  I think we
can remove it.

Changed the patch status as 'Waiting on Author'.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Revert back to standard AC_STRUCT_TIMEZONE Autoconf macro

2019-10-01 Thread Peter Eisentraut
On 2019-09-30 21:36, Tom Lane wrote:
> Peter Eisentraut  writes:
>> Instead of AC_STRUCT_TIMEZONE we use our own variant called
>> PGAC_STRUCT_TIMEZONE that checks for tzname even if other variants were
>> found first.  But since 63bd0db12199c5df043e1dea0f2b574f622b3a4c we
>> don't use tzname anymore, so we don't need this anymore.
> 
> Hmm.  I wonder if we need AC_STRUCT_TIMEZONE either?  Seems like
> we should only be using our own struct pg_tm.

There are a few places that seem to need it, such as initdb/findtimezone.c.

> If we could get
> rid of that configure macro altogether, we could remove some dubious
> junk like plpython.h's "#undef HAVE_TZNAME".

We could keep just the part of AC_STRUCT_TIMEZONE that we need, namely
the check for tm_zone, and remove the part about tzname.

New patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 78597561e3f8e9119d6fd07e4106cc780e58507b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 30 Sep 2019 20:50:16 +0200
Subject: [PATCH v2 1/2] Remove use of deprecated Autoconf define

Change from HAVE_TM_ZONE to HAVE_STRUCT_TM_TM_ZONE.
---
 src/interfaces/ecpg/pgtypeslib/dt_common.c | 4 ++--
 src/interfaces/ecpg/pgtypeslib/timestamp.c | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/interfaces/ecpg/pgtypeslib/dt_common.c 
b/src/interfaces/ecpg/pgtypeslib/dt_common.c
index e71defaa66..29c1117546 100644
--- a/src/interfaces/ecpg/pgtypeslib/dt_common.c
+++ b/src/interfaces/ecpg/pgtypeslib/dt_common.c
@@ -995,7 +995,7 @@ abstime2tm(AbsoluteTime _time, int *tzp, struct tm *tm, 
char **tzn)
tm->tm_sec = tx->tm_sec;
tm->tm_isdst = tx->tm_isdst;
 
-#if defined(HAVE_TM_ZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE)
tm->tm_gmtoff = tx->tm_gmtoff;
tm->tm_zone = tx->tm_zone;
 
@@ -1041,7 +1041,7 @@ abstime2tm(AbsoluteTime _time, int *tzp, struct tm *tm, 
char **tzn)
}
else
tm->tm_isdst = -1;
-#else  /* not (HAVE_TM_ZONE || 
HAVE_INT_TIMEZONE) */
+#else  /* not 
(HAVE_STRUCT_TM_TM_ZONE || HAVE_INT_TIMEZONE) */
if (tzp != NULL)
{
/* default to UTC */
diff --git a/src/interfaces/ecpg/pgtypeslib/timestamp.c 
b/src/interfaces/ecpg/pgtypeslib/timestamp.c
index e830ee737e..2be151f7e6 100644
--- a/src/interfaces/ecpg/pgtypeslib/timestamp.c
+++ b/src/interfaces/ecpg/pgtypeslib/timestamp.c
@@ -100,7 +100,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
int64   dDate,
date0;
int64   time;
-#if defined(HAVE_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
time_t  utime;
struct tm  *tx;
 #endif
@@ -134,7 +134,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
 */
if (IS_VALID_UTIME(tm->tm_year, tm->tm_mon, tm->tm_mday))
{
-#if defined(HAVE_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
 
utime = dt / USECS_PER_SEC +
((date0 - date2j(1970, 1, 1)) * 
INT64CONST(86400));
@@ -147,7 +147,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
tm->tm_min = tx->tm_min;
tm->tm_isdst = tx->tm_isdst;
 
-#if defined(HAVE_TM_ZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE)
tm->tm_gmtoff = tx->tm_gmtoff;
tm->tm_zone = tx->tm_zone;
 
@@ -159,7 +159,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
if (tzn != NULL)
*tzn = TZNAME_GLOBAL[(tm->tm_isdst > 0)];
 #endif
-#else  /* not (HAVE_TM_ZONE || 
HAVE_INT_TIMEZONE) */
+#else  /* not 
(HAVE_STRUCT_TM_TM_ZONE || HAVE_INT_TIMEZONE) */
*tzp = 0;
/* Mark this as *no* time zone available */
tm->tm_isdst = -1;
-- 
2.23.0

From 716fb6c09648b03ba563a61b1cd2f353ed7acfca Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 2 Oct 2019 07:26:08 +0200
Subject: [PATCH v2 2/2] Simplify PGAC_STRUCT_TIMEZONE Autoconf macro

Since 63bd0db12199c5df043e1dea0f2b574f622b3a4c we don't use tzname
anymore, so we don't need to check for it.  Instead, just keep the
part of PGAC_STRUCT_TIMEZONE that we need, which is the check for
struct tm.tm_zone.
---
 config/c-library.m4   | 31 +++---
 configure | 78 +--
 src/include/pg_config.h.in| 

Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' )

2019-10-01 Thread Yuya Watari
Horiguchi-san,

On Tue, Oct 1, 2019 at 3:41 PM Kyotaro Horiguchi
 wrote:
> I found a trick seems workable generically (*1). (2.0 *
> (PG_INT64_MAX/2 + 1)) will generate the value next to the
> PG_INT64_MAX based on some assumptions
> (*1). IS_DOUBLE_SAFE_IN_INT64() below would be able to check if
> the value can be converted into int64 safely or not.

Thanks for sharing a nice way of checking overflow. I tested your
IS_DOUBLE_SAFE_IN_INT64() macro in my environment by the simple code
(attached to this email) and confirmed that it appropriately handled
the overflow. However, further consideration is needed for different
architectures.

I attached the modified patch. In the patch, I placed the macro in
"src/include/c.h", but this may not be a good choice because c.h is
widely included from a lot of files. Do you have any good ideas about
its placement?

Thanks,
Yuya Watari
NTT Software Innovation Center
watari.y...@gmail.com
#include 
#include 
#include 
#include 

#define PG_INT64_MAX INT64_MAX
#define PG_INT64_MIN INT64_MIN

/*
 * Check if a double value can be casted into int64.
 *
 * This macro is assuming that FLT_RADIX == 2 so that the * 2.0 trick works,
 * PG_INT64_MAX is so below DBL_MAX that the doubled value can be represented
 * in double and DBL_MANT_DIG is equal or smaller than DBL_MAX_EXP so that
 * ceil() returns expected result.
 */
#define MIN_DOUBLE_OVER_INT64_MAX (2.0 * (double) (PG_INT64_MAX / 2 + 1))
#define MAX_DOUBLE_UNDER_INT64_MIN (2.0 * (double) (PG_INT64_MIN / 2 - 1))

#if -PG_INT64_MAX != PG_INT64_MIN
#define IS_DOUBLE_SAFE_IN_INT64(x)  \
((x) < MIN_DOUBLE_OVER_INT64_MAX && ceil(x) >= (double) PG_INT64_MIN)
#else
#define IS_DOUBLE_SAFE_IN_INT64(x)  \
((x) < MIN_DOUBLE_OVER_INT64_MAX && (x) > MAX_DOUBLE_UNDER_INT64_MIN)
#endif

int main()
{
double values[] =
{
-pow(2, 63) - 1025,
-pow(2, 63),
pow(2, 63) - 1024,
pow(2, 63),
INFINITY,
-INFINITY,
NAN
};

for (int i = 0; i < sizeof(values) / sizeof(values[0]); i++)
{
double value = values[i];
printf("value == %lf, Overflow = %s\n", value,
!IS_DOUBLE_SAFE_IN_INT64(value) ? "true" : "false");
}
}


v3-keep-compiler-silence.patch
Description: Binary data


Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

2019-10-01 Thread Michael Paquier
On Mon, Sep 30, 2019 at 05:07:08PM +0900, Masahiko Sawada wrote:
> Thank you for updating! The comment in your patch is much better.

Thanks, done and back-patched down to 9.5.
--
Michael


signature.asc
Description: PGP signature