Re: proposal: plpgsql pragma statement

2018-12-12 Thread Pavel Stehule
čt 6. 12. 2018 v 18:27 odesílatel Pavel Stehule 
napsal:

>
>
> čt 6. 12. 2018 v 18:17 odesílatel Robert Haas 
> napsal:
>
>> On Thu, Dec 6, 2018 at 12:13 PM Pavel Stehule 
>> wrote:
>> > My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA.
>> This is not runtime statement - the information from this command will be
>> assigned to related object - function, block, command at parser time.
>>
>> That's sensible, but the syntax you were proposing didn't look like it
>> was related to a specific statement.  I was objecting to the idea that
>> PRAGMA whatever; should be construed as an annotation of,
>> specifically, the following statement.
>>
>
> please, can you propose, some what you like?
>
> For my purpose I can imagine PRAGMA on function level with same syntax
> like PL/SQL - I need to push somewhere some information that I can use for
> plpgsql_check to protect users against false alarms. The locality in this
> moment is not too important for me. But I prefer solution that doesn't
> looks too strange, and is possible just with change plpgsql parser.
>

I looked to some documentation - and for example - the PL/SQL PRAGMA inline
looks very similar to what I propose.

For me good enough is block level pragma used in DECLARE section

DECLARE
  pragma plpgsql_check(OFF);
BEGIN
  .. this part will not be checked ..
END;

The syntax will be prepared for future PL/SQL pragmas like
AUTONOMOUS_TRANSACTION, ..

Regards

Pavel

>
> Regards
>
> Pavel
>
>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>


Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2018-12-12 Thread Simon Riggs
On Wed, 12 Dec 2018 at 05:35, Andres Freund  wrote:


> >What do you think about the attached to simplify the logic?  Even if
> >primary_conninfo and primary_slot_name are not switched to SIGHUP this
> >cleanup looks like a good thing to me.
>
> I am not convinced this is a good idea. This allows the state of walrcv
> and startup to diverge, they could e.g. have different configuration, due
> to differently time config reloads.


That sounds bad, but most parameters apply to one or the other, not both.

If there are some that apply to both, then yes, coordination would be
important.

It does seem likely that the new scheme will require us to look carefully
at when parameters are reloaded, since the timing of reloads was never
taken into account in the previous coding.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Undo logs

2018-12-12 Thread Amit Kapila
On Wed, Dec 12, 2018 at 11:18 AM Dilip Kumar
 wrote:
>
> On Sat, Dec 8, 2018 at 7:52 PM Amit Kapila  wrote:
>>
>>
>> I think I see the problem in the discard mechanism when the log is
>> spread across multiple logs.  Basically, if the second log contains
>> undo of some transaction prior to the transaction which has just
>> decided to spread it's undo in the chosen undo log, then we might
>> discard the undo log of some transaction(s) inadvertently.  Am, I
>> missing something?
>
> Actually, I don't see exactly this problem here because we only process one 
> undo log at a time, so we will not got to the next undo log and discard some 
> transaction for which we supposed to retain the undo.
>

How will rollbacks work for such a case?  I have more to say about
this, see below.

>>
>> If not, then I guess we need to ensure that we
>> don't immediately discard the undo in the second log when a single
>> transactions undo is spreaded across two logs
>>
>> Before choosing a new undo log to span the undo for a transaction, do
>> we ensure that it is not already linked with some other undo log for a
>> similar reason?
>>

You seem to forget answering this.

>> One more thing in this regard:
>> +UndoRecordAllocate(UnpackedUndoRecord *undorecords, int nrecords,
>> +TransactionId txid, UndoPersistence upersistence)
..
>>
>> Isn't there a hidden assumption in the above code that you will always
>> get a fresh undo log if the undo doesn't fit in the currently attached
>> log?  What is the guarantee of same?
>
>
> Yeah it's a problem, we might get the undo which may not be empty.  One way 
> to avoid this could be that instead of relying on  the check 
> "UndoRecPtrGetOffset(urecptr) == UndoLogBlockHeaderSize". We can add some 
> flag to the undo log meta data that whether it's the first record after 
> attach or not and based on that we can decide.  But, I want to think some 
> better solution where we can identify without adding anything extra to undo 
> meta.
>

I think what we need to determine here is whether we have switched the
log for some non-first record of the transaction.  If so, can't we
detect by something like:

log = GetLatestUndoLogAmAttachedTo();
UndoLogAllocate();
if (!need_xact_hdr)
{
currnet_log = GetLatestUndoLogAmAttachedTo();
if (currnet_log is not same as log)
{
Assert(currnet_log->meta.prevlogno == log->logno);
log_switched = true;
}
}


Won't the similar problem happens when we read undo records during
rollback?  In code below:

+UndoRecPtr
+UndoGetPrevUndoRecptr(UndoRecPtr urp, uint16 prevlen)
+{
+ UndoLogNumber logno = UndoRecPtrGetLogNo(urp);
+ UndoLogOffset offset = UndoRecPtrGetOffset(urp);
+
+ /*
+ * We have reached to the first undo record of this undo log, so fetch the
+ * previous undo record of the transaction from the previous log.
+ */
+ if (offset == UndoLogBlockHeaderSize)
+ {
+ UndoLogControl *prevlog,

We seem to be assuming here that the new log starts from the
beginning.  IIUC, we can read the record of some other transaction if
the transaction's log spans across two logs and the second log
contains some other transaction's log in the beginning.

>>
>> 8.
>> +typedef struct UndoRecordTransaction
>> +{
>> + uint32 urec_progress; /* undo applying progress. */
>> + uint32 urec_xidepoch; /* epoch of the current transaction */
>>
>> Can you expand comments about how the progress is defined and used?
>
> Moved your comment from UnpackedUndoRecord to this structure and in 
> UnpackedUndoRecord I have mentioned that we can refer detailed comment in 
> this structure.
>
>>
>> Also, write a few sentences about why epoch is captured and or used?
>
> urec_xidepoch is captured mainly for the zheap visibility purpose so isn't it 
> good that we mention it there?
>

Okay, you can leave it here as it is.  One small point about this structure:

+ uint64 urec_next; /* urec pointer of the next transaction */
+} UndoRecordTransaction;
+
+#define SizeOfUrecNext (sizeof(UndoRecPtr))
+#define SizeOfUndoRecordTransaction \
+ (offsetof(UndoRecordTransaction, urec_next) + SizeOfUrecNext)

Isn't it better to define urec_next as UndoRecPtr, even though it is
technically the same as per the current code?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-12-12 Thread Haribabu Kommi
On Thu, Nov 29, 2018 at 1:57 PM Amit Kapila  wrote:

> On Wed, Nov 28, 2018 at 7:13 PM Alvaro Herrera 
> wrote:
> >
> > On 2018-Nov-28, Amit Kapila wrote:
> >
> > > The problem with this idea is that if someone specifies a particular
> > > parameter using query and the query doesn't return any parameters,
> > > then it can lead to inadvertent behavior.  For example, if user uses
> > > something like pg_stat_statements_reset(,
> > > , SELECT s.queryid FROM pg_stat_statements AS s WHERE
> > > s.query = 'SELECT $1 AS "ONE"');  now, if the query doesn't return any
> > > row, we will remove the stats for all queries that belong to
> > > (userid,dbid).  It could be surprising for some users, that's why we
> > > came up with option-4 where we keep the default value of parameters as
> > > 0.
> >
> > Right, I think option 4 is a clear improvement over option 1.  I can get
> > behind that one.  Since not many people care to vote, I think this tips
> > the scales enough to that side.
> >
>
> Agreed.  Hari, can you change the patch as per the requirements of
> option-4.
>

Apologies for delay. Thanks to all your opinions.

Attached is the updated patch as per the option-4 and added a test also to
ensure
the function strictness.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Extend-pg_stat_statements_reset-to-reset-statistics_v14.patch
Description: Binary data


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-12-12 Thread Amit Kapila
On Wed, Dec 12, 2018 at 3:54 PM Haribabu Kommi  wrote:
>
>
> On Thu, Nov 29, 2018 at 1:57 PM Amit Kapila  wrote:
>>
>> On Wed, Nov 28, 2018 at 7:13 PM Alvaro Herrera  
>> wrote:
>> >
>> > On 2018-Nov-28, Amit Kapila wrote:
>> >
>> > > The problem with this idea is that if someone specifies a particular
>> > > parameter using query and the query doesn't return any parameters,
>> > > then it can lead to inadvertent behavior.  For example, if user uses
>> > > something like pg_stat_statements_reset(,
>> > > , SELECT s.queryid FROM pg_stat_statements AS s WHERE
>> > > s.query = 'SELECT $1 AS "ONE"');  now, if the query doesn't return any
>> > > row, we will remove the stats for all queries that belong to
>> > > (userid,dbid).  It could be surprising for some users, that's why we
>> > > came up with option-4 where we keep the default value of parameters as
>> > > 0.
>> >
>> > Right, I think option 4 is a clear improvement over option 1.  I can get
>> > behind that one.  Since not many people care to vote, I think this tips
>> > the scales enough to that side.
>> >
>>
>> Agreed.  Hari, can you change the patch as per the requirements of option-4.
>
>
> Apologies for delay. Thanks to all your opinions.
>
> Attached is the updated patch as per the option-4 and added a test also to 
> ensure
> the function strictness.
>

Thanks for the updated patch. I will look into it in the next few days.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Add timeline to partial WAL segments

2018-12-12 Thread David Steele
On 12/12/18 12:14 AM, Michael Paquier wrote:
> On Tue, Dec 11, 2018 at 11:27:30PM -0500, David Steele wrote:
>> And yeah, I'm not sure that I'm totally sold on this idea either, but I
>> wanted to get a conversation going.
> 
> Another idea which I think we could live with is to embed within the
> segment file name the LSN switch point, in a format consistent with
> backup history files.  And as a bonus it could be used for sanity
> checks.

The LSN switch point is often the same even when servers are going to
different timelines.  If the LSN is different enough then the problem
solves itself since the .partial will be on an entirely different segment.

But, we could at least use the . notation and end up with something like
000100010001.0002.partial or perhaps
000100010001.T0002.partial?  Maybe
000100010001.0002.tpartial?

I can't decide whether the .partial files generated by pg_receivewal are
a problem or not.  It seems to me that only the original cluster is
going to be able to stream this file -- once the segment is renamed to
.partial it won't be visible to pg_receivexlog.  So, .partial without a
timeline extension just means partial from the original timeline.

There's another difference.  The .partial generated by pg_receivewal is
an actively-worked-on file whereas the .partial generated by a promotion
is probably headed for oblivion.  I haven't see a single case where one
was used in an actual recovery (which doesn't mean it hasn't happened,
of course).

> (I am completely sold to the idea of prioritizing file types in the
> archiver.)

OK, I'll work up a patch for that, then.  It doesn't solve the .partial
problem, but it does reduce the likelihood that two servers will end up
on the same timeline.

-- 
-David
da...@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

2018-12-12 Thread David Steele
On 12/12/18 12:58 AM, Laurenz Albe wrote:
> On Tue, 2018-12-11 at 23:43 -0500, David Steele wrote:
> [about managing backups from pre- and post-file-system-backup scrips]
>>> I have come up with some sample code here:
>>> https://github.com/cybertec-postgresql/safe-backup
>>>
>>> This just uses bash and psql.
>>> Does that look reasonably safe?
>>>
>>> It's probably too big to be introduced into the documentation, but maybe
>>> we could add it to the Wiki.
>>
>> My bash-fu is not all that great, but it seems to me you could do this
>> all in one script and forgo the table entirely.  Just do the file copy
>> at about line 60 in pgpre.sh.
>>
>> It does look workable to me, just wonder if it could be simpler.
> 
> Thanks for having a look.  Yes, it looks appallingly complicated.
> 
> Sure, if I knew there was a file to write, and I knew where to write
> it, I could do it in the "pre" script.  

Good point!

> But since I have no idea how the
> actual backup is performed and how the "backup_label" file is going to
> be saved, I thought it best to return the information to the caller and
> persist it somewhere, and only the "post" script can actually return the
> information.

Perhaps it could be done in a bash file that is included into the user's
script, rather than doing pre/post.

So, they would have start and stop backup methods they could call that
would manage the psql process in the background.  In that case it should
be possible to eliminate the table.

I do see the table as a weak spot.  It appears to work, but does add a
lot of complexity.  Limiting to one backup at a time can be a feature,
though.  In pgBackRest we use advisory locks to enforce this.

-- 
-David
da...@pgmasters.net



Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-12 Thread Andrey Borodin
Hi!

> 12 дек. 2018 г., в 3:26, Alexander Korotkov  написал(а):
> 
> BTW, I still can't see you're setting deleteXid field of
> ginxlogDeletePage struct anywhere.
Oops. Fixed.
> 
> Also, I note that protection against re-usage of recently deleted
> pages is implemented differently than it is in B-tree.
> 1) You check TransactionIdPrecedes(GinPageGetDeleteXid(page),
> RecentGlobalDataXmin)), while B-tree checks
> TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin).  Could you
> clarify why do we use RecentGlobalDataXmin instead of RecentGlobalXmin
> here?
As far as I understand RecentGlobalDataXmin may be slightly farther than 
RecentGlobalXmin in case when replication_slot_catalog_xmin is holding 
RecentGlobalXmin. And GIN is never used by catalog tables. But let's use 
RecentGlobalXmin like in B-tree.

> 2) B-tree checks this condition both before putting page into FSM and
> after getting page from FSM.  You're checking only after getting page
> from FSM.  Approach of B-tree looks better for me.  It's seems more
> consistent when FSM pages are really free for usage.
Fixed.


Best regards, Andrey Borodin.


0001-Stamp-deleted-GIN-page-with-xid-v3.patch
Description: Binary data


Re: Remove Deprecated Exclusive Backup Mode

2018-12-12 Thread David Steele
On 12/11/18 11:31 PM, Robert Haas wrote:
> On Wed, Dec 12, 2018 at 1:23 PM David Steele  wrote:
>> I didn't get the impression that Peter was against, he just thought that
>> it needed to stand on its own, rather than be justified by the
>> recovery.conf changes, which I agree with.
>>
>> Simon rather clearly said that he thinks we should wait until the next
>> release, which I don't see as being entirely against.
> 
> Well, nobody is saying that we should NEVER remove this.  The
> discussion is about what to do in v12.

Fair enough.

> Most of the features I've been involved in removing have been
> deprecated for 5+ years.  The first release where this one was
> deprecated was only 2 years ago.  So it feels dramatically faster to
> me than what I think we have typically done.

I think this case is different because exclusive backups have known issues.

I also think that having two similar but different methods stifles
development in this area and makes the docs harder to improve.  There's
a lot of "if this then that else that" in the docs which make them hard
to maintain and harder to read.

Add the fact that we have *zero* tests for exclusive backups.  I only
had to refactor one exclusive backup in the tests and since it did not
archive, exclude pg_wal, postmaster.pid, or do anything else our docs
recommend I wouldn't say it qualifies as a real test.  Also, it wasn't
even trying to test exclusive backups -- it was a test for logical
replication following timelines.

So yeah, we'd be removing it in three years instead of five, but there
has been a safe in-core alternative since 9.1 (eight years).

Regards,
-- 
-David
da...@pgmasters.net



Re: Record last password change

2018-12-12 Thread Bear Giles
Could you add your own UPDATE trigger to the password table? It can write
an entry to a new table, e.g., (userid, current date) whenever a record in
that table is modified.

On an earlier question - the issue isn't whether someone can crack your
password, it's possible disclosure in archive media somewhere. E.g., a
classic example is someone who accidently commits source code that contains
a password and then reverts it. It's not in the current source code but
without a lot of effort (not always possible) it could be retrieved by
anyone with access to the commit history. If you change your password every
few months this will soon be a moot issue even if the person doesn't
mention this to someone who can change the password immediately.

A more subtle point is backups. An attacker might have had access to
encrypted backups (or regular backups containing encrypted records) for a
very long time and held onto them against the chance discovery of the
password. Once they learn it they have access to all of that data. If you
rotate the passwords they might have access to a few months of data but no
more than that. It's bad, but a few months is far better than a few years
if your data contains information that requires notification of everyone
affected and the offer of credit monitoring, etc.

I agree that people may choose bad passwords if forced to change them too
frequently but I'm in the camp that says it's fine to use a password
manager or even to write them down on a card kept in the person's wallet.

BTW another solution is SSO, e.g., Kerberos. I still need to submit a patch
to pgsql to handle it better(*) but with postgresql itself you sign into
the system and then the database server will just know who you are. You
don't have to worry about remembering a new password for postgresql. X.509
(digital certs) are another possibility and I know you can tie them to a
smart card but again I don't know how well we could integrate it into pgsql.

(*) I haven't looked at the code recently but the last time I checked pgsql
used the username/password combo. Enterprise environments usually use
keytab files instead of (u/p). It should also be smart enough to check if
the user already has a kerberos ticket and use it if nothing else is
specified. The latter would usually work with people. (I'm not sure what
happens in a more secure environment where the database expects the service
to be specified as well - users would need the username/postgresql@REALM
identity, not the more generic username@REALM identity.)



On Tue, Dec 11, 2018 at 12:04 PM Chapman Flack 
wrote:

> On 12/11/18 9:56 AM, Tom Lane wrote:
> > I've heard that if you want to implement a password aging policy, PAM
> > authentication can manage that for you; but I don't know the details.
>
> Interesting idea ... could use pam-pgsql[1] and PAM as the
> authentication method. Might result in another connection (from PAM)
> to authenticate every connection, though. I suppose the module could
> use a daemon keeping one connection open for auth queries, but the
> README doesn't *say* it does. Could set up a pooler just for the auth
> module to connect through, I guess.
>
> It allows you to configure arbitrary auth_query, acct_query, pwd_query,
> etc., so you could conceivably join pg_authid with some other table
> where you'd keep expiration info.
>
> Looks like our PAM authentication might not support some PAM
> capabilities like conducting additional message exchanges (for
> example, to prompt for a new password on the spot if the old
> one has expired).
>
> It might be possible to shoehorn that capability into the existing
> fe-be protocol by calling it a custom SASL method, something analogous
> to ssh's "keyboard-interactive"[2].
>
> -Chap
>
>
> [1] https://github.com/pam-pgsql/pam-pgsql
> [2] https://tools.ietf.org/html/rfc4256
>
>


Re: [HACKERS] Can ICU be used for a database's default sort order?

2018-12-12 Thread Daniel Verite
Peter Eisentraut wrote:

> Another issue is that we'd need to carefully divide up the role of the
> "default" collation and the "default" provider.  The default collation
> is the collation defined for the database, the default provider means to
> use the libc non-locale_t enabled API functions.  Right now these are
> always the same, but if the database-global locale is ICU, then the
> default collation would use the ICU provider.

I think one related issue that the patch works around by using a libc locale
as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE.
In fact I've been wondering if that's the main reason for the interface
implemented by the patch.

Otherwise, how should these env variables be initialized for ICU
databases?
For instance in the existing FTS code, lowerstr_with_len() in
tsearch/ts_locale.c calls tolower() or towlower() to fold a string to
lower case when normalizing lexemes. This requires LC_CTYPE to be set
to something compatible with the database encoding, at the very
least. Even if that code looks like it might need to be changed for
ICU anyway (or just to be collation-aware according to the TODO marks?),
what about comparable calls in extensions?

In the case that we don't touch libc's LC_COLLATE/LC_CTYPE in backends,
extension code would have them inherited from the postmaster? Does that
sound acceptable? If not, maybe ICU databases should have these as
settable options, in addition to their ICU locale?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Reorganize collation lookup time and place

2018-12-12 Thread Andreas Karlsson

On 12/10/18 4:12 PM, Peter Eisentraut wrote:

Attached is a patch that reorganizes how and where collations are looked up.

Until now, a fmgr C function that makes use of collation information
(for example varstr_cmp(), str_toupper()) gets passed the collation OID,
looks up the collation with pg_newlocale_from_collation(), and then does
something with it.

With this change, the lookup is moved earlier, typically during executor
initialization.  The fmgr functions receive the locale pointer that is
the result of that lookup.


Sounds like a great plan. I too feel that having the look ups there 
makes more logical sense.


But when taking a look at your patch I got a segfault when running 
initdb. See the stack trace below. My compiler is "gcc (Debian 8.2.0-9) 
8.2.0" and the locale when running initdb is glibc's "en_US.utf8".


#0  __GI___strcoll_l (s1=0x55d64555d998 "({CONST :consttype 1009 
:consttypmod -1 :constcollid 100 :constlen -1 :constbyval false 
:constisnull false :location 160 :constvalue 16 [ 64 0 0 0 0 0 0 0 0 0 0 
0 25 0 0 0 ]})", s2=0x55d64555ddb0 "({CONST :consttype 1009 :consttypmod 
-1 :constcollid 100 :constlen -1 :constbyval false :constisnull false 
:location 161 :constvalue 16 [ 64 0 0 0 0 0 0 0 0 0 0 0 25 0 0 0 ]})", 
l=0x0) at strcoll_l.c:259
#1  0x55d644c4d607 in varstrfastcmp_locale (x=94378777346072, 
y=94378777347120, ssup=0x7ffea6a64990) at varlena.c:2185
#2  0x55d644958dad in ApplySortComparator (ssup=0x7ffea6a64990, 
isNull2=false, datum2=, isNull1=false, datum1=out>) at ../../../src/include/utils/sortsupport.h:224
#3  compare_scalars (a=, b=, 
arg=0x7ffea6a64980) at analyze.c:2784
#4  0x55d644cba494 in qsort_arg (a=a@entry=0x55d6456ee5f0, 
n=n@entry=14, es=es@entry=16, cmp=cmp@entry=0x55d644958d82 
, arg=arg@entry=0x7ffea6a64980) at qsort_arg.c:140
#5  0x55d64495b833 in compute_scalar_stats (stats=0x55d6456c6c88, 
fetchfunc=0x55d6449597f1 , samplerows=2904, 
totalrows=2904) at analyze.c:2360
#6  0x55d64495cca3 in do_analyze_rel 
(onerel=onerel@entry=0x7f3424bdbd50, options=options@entry=2, 
params=params@entry=0x7ffea6a64d90, va_cols=va_cols@entry=0x0, 
acquirefunc=0x55d64495901b , relpages=77, 
inh=false, in_outer_xact=false, elevel=13) at analyze.c:527
#7  0x55d64495d2ec in analyze_rel (relid=, 
relation=0x0, options=options@entry=2, 
params=params@entry=0x7ffea6a64d90, va_cols=0x0, 
in_outer_xact=, bstrategy=0x55d6455413a8) at analyze.c:258
#8  0x55d6449d61b1 in vacuum (options=2, relations=0x55d645541520, 
params=params@entry=0x7ffea6a64d90, bstrategy=, 
bstrategy@entry=0x0, isTopLevel=) at vacuum.c:357
#9  0x55d6449d644c in ExecVacuum 
(vacstmt=vacstmt@entry=0x55d645442e50, isTopLevel=isTopLevel@entry=true) 
at vacuum.c:141
#10 0x55d644b5ec9d in standard_ProcessUtility (pstmt=0x55d6454431a0, 
queryString=0x55d645442468 "ANALYZE;\n", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x55d644f50b00 , completionTag=0x7ffea6a650d0 "") at 
utility.c:670
#11 0x55d644b5f1cc in ProcessUtility 
(pstmt=pstmt@entry=0x55d6454431a0, queryString=, 
context=, params=, queryEnv=out>, dest=dest@entry=0x55d644f50b00 , 
completionTag=0x7ffea6a650d0 "") at utility.c:360
#12 0x55d644b5b582 in PortalRunUtility 
(portal=portal@entry=0x55d645434428, pstmt=pstmt@entry=0x55d6454431a0, 
isTopLevel=isTopLevel@entry=true, 
setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x55d644f50b00 , 
completionTag=completionTag@entry=0x7ffea6a650d0 "") at pquery.c:1175
#13 0x55d644b5c1fd in PortalRunMulti 
(portal=portal@entry=0x55d645434428, isTopLevel=isTopLevel@entry=true, 
setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x55d644f50b00 , 
altdest=altdest@entry=0x55d644f50b00 , 
completionTag=completionTag@entry=0x7ffea6a650d0 "") at pquery.c:1321
#14 0x55d644b5cfbb in PortalRun (portal=portal@entry=0x55d645434428, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, 
run_once=run_once@entry=true, dest=dest@entry=0x55d644f50b00 
, altdest=altdest@entry=0x55d644f50b00 , 
completionTag=0x7ffea6a650d0 "") at pquery.c:796
#15 0x55d644b591b8 in exec_simple_query 
(query_string=query_string@entry=0x55d645442468 "ANALYZE;\n") at 
postgres.c:1215
#16 0x55d644b5b0dd in PostgresMain (argc=, 
argv=, dbname=, username=) 
at postgres.c:4256

#17 0x55d644a3abf9 in main (argc=10, argv=0x55d6453c3490) at main.c:224





Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2018-12-12 Thread Sergei Kornilov
Hello

> This allows the state of walrcv and startup to diverge, they could e.g. have 
> different configuration, due to differently time config reloads.
So we have exactly same problem with, for example, hot_standby_feedback, right?
We change hot_standby_feedback value, reload it and we can have 'show 
hot_standby_feedback' different to currently running walreceiver.
But why we have nothing about hot_standby_feedback in 
pg_stat_get_wal_receiver()?
Where is difference?

regards, Sergei



Upgrading pg_statistic to handle collation honestly

2018-12-12 Thread Tom Lane
When we first put in collations support, we basically punted on teaching
ANALYZE, pg_statistic, and the planner selectivity functions about that.
They just use DEFAULT_COLLATION_OID independently of the actual collation
of the data.  I tripped over this while investigating making type "name"
collatable: it needs to default to C_COLLATION_OID, and the mismatch
resulted in broken statistics for name columns.  So it's time to pay down
that technical debt.

Attached is a draft patch for same.  It adds storage to pg_statistic
to record the collation of each statistics "slot".  A plausible
alternative design would be to just say "look at the collation of the
underlying column", but that would require extra catcache lookups in
the selectivity functions that need the info.  Doing it like this also
makes it theoretically feasible to track stats computed with respect
to different collations for the same column, though I'm not really
convinced that we'd ever do that.

Loose ends:

* I'm not sure what, if anything, needs to be done in the extended
statistics stuff.  It looks like the existing types of extended stats
aren't really collation sensitive, so maybe the answer is "nothing".

* There's a remaining use of DEFAULT_COLLATION_OID in array_selfuncs.c's
element_compare().  I'm not sure if it's important to get rid of that,
either; it doesn't seem to be used for anything that relates to
collected statistics, so it might be fine as-is.

* Probably this conflicts to some extent with Peter's "Reorganize
collation lookup" patch, but I haven't studied that yet.

* There's a kluge in get_attstatsslot() that I'd like to get rid of
later, but it's necessary for now because of the weird things that
happen when doing regex operators on "name" columns.

Comments, objections?

regards, tom lane

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 18c38e4..af4d062 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
*** SCRAM-SHA-256$attrtypid = exprType(index_expr);
  		stats->attrtypmod = exprTypmod(index_expr);
+ 		stats->attrcollid = exprCollation(index_expr);
+ 		/* XXX should we consult indcollation instead? */
  	}
  	else
  	{
  		stats->attrtypid = attr->atttypid;
  		stats->attrtypmod = attr->atttypmod;
+ 		stats->attrcollid = attr->attcollation;
  	}
  
  	typtuple = SearchSysCacheCopy1(TYPEOID,
*** update_attstats(Oid relid, bool inh, int
*** 1553,1558 
--- 1556,1566 
  		{
  			values[i++] = ObjectIdGetDatum(stats->staop[k]);	/* staopN */
  		}
+ 		i = Anum_pg_statistic_stacoll1 - 1;
+ 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+ 		{
+ 			values[i++] = ObjectIdGetDatum(stats->stacoll[k]);	/* stacollN */
+ 		}
  		i = Anum_pg_statistic_stanumbers1 - 1;
  		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
  		{
*** compute_distinct_stats(VacAttrStatsP sta
*** 1993,2001 
  		firstcount1 = track_cnt;
  		for (j = 0; j < track_cnt; j++)
  		{
- 			/* We always use the default collation for statistics */
  			if (DatumGetBool(FunctionCall2Coll(&f_cmpeq,
! 			   DEFAULT_COLLATION_OID,
  			   value, track[j].value)))
  			{
  match = true;
--- 2001,2008 
  		firstcount1 = track_cnt;
  		for (j = 0; j < track_cnt; j++)
  		{
  			if (DatumGetBool(FunctionCall2Coll(&f_cmpeq,
! 			   stats->attrcollid,
  			   value, track[j].value)))
  			{
  match = true;
*** compute_distinct_stats(VacAttrStatsP sta
*** 2202,2207 
--- 2209,2215 
  
  			stats->stakind[0] = STATISTIC_KIND_MCV;
  			stats->staop[0] = mystats->eqopr;
+ 			stats->stacoll[0] = stats->attrcollid;
  			stats->stanumbers[0] = mcv_freqs;
  			stats->numnumbers[0] = num_mcv;
  			stats->stavalues[0] = mcv_values;
*** compute_scalar_stats(VacAttrStatsP stats
*** 2273,2280 
  
  	memset(&ssup, 0, sizeof(ssup));
  	ssup.ssup_cxt = CurrentMemoryContext;
! 	/* We always use the default collation for statistics */
! 	ssup.ssup_collation = DEFAULT_COLLATION_OID;
  	ssup.ssup_nulls_first = false;
  
  	/*
--- 2281,2287 
  
  	memset(&ssup, 0, sizeof(ssup));
  	ssup.ssup_cxt = CurrentMemoryContext;
! 	ssup.ssup_collation = stats->attrcollid;
  	ssup.ssu

Re: Cache lookup errors with functions manipulation object addresses

2018-12-12 Thread Alvaro Herrera
On 2018-Sep-18, Michael Paquier wrote:

> On Fri, Sep 14, 2018 at 12:07:23PM -0300, Alvaro Herrera wrote:
> > On 2018-Sep-14, Alvaro Herrera wrote:
> >> I haven't looked at 0003 yet.
> 
> Thanks for the review.  I have pushed 0002 for now.  I had one doubt
> about 0001 though.  So as to avoid redesigning the APIs for FDWs and
> servers again in the future, wouldn't we want to give them a uint32
> flags which can be in with more than one option.  There would be only
> one option for now, which is what I have done in the attached.

Agreed.

I think defining bit flags in an enum is weird; I'd use defines instead.
And (a purely stylistic point) I'd use bits32 rather than uint32.  (I'm
probably alone in this opinion, since almost every interface passing
flags uses unsigned int of some size.  But "bits" types are defined
precisely for this purpose!) Compare a61f5ab98638.

I support 0001 other than those two points.

> > It's strange that pg_identify_object returns empty type in only some
> > cases (as seen in the regression test output) ...
> 
> Are you referring to something in particular?

Yeah, these two cases:

+-- Keep those checks in the same order as getObjectTypeDescription()
+SELECT * FROM pg_identify_object_as_address('pg_class'::regclass, 0, 0); -- no 
relation
+ type | object_names | object_args 
+--+--+-
+  |  | 
+(1 row)

+SELECT * FROM pg_identify_object_as_address('pg_proc'::regclass, 0, 0); -- no 
function
+ type | object_names | object_args 
+--+--+-
+  | {}   | {}
+(1 row)

All the other cases you add have a non-empty value in "type".

> All the routines used in objectaddress.c are part of what exists in
> core for some time now, particularly handling for operators, functions
> and types has been sort of messy.

I think this is our chance to fix the mess.  Previously (before I added
the SQL-access of the object addressing mechanism in 9.5 I mean) it
wasn't possible to get at this code at all with arbitrary input, so this
is all a "new" problem (I mean new in the last 5 years).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Can ICU be used for a database's default sort order?

2018-12-12 Thread Peter Eisentraut
On 12/12/2018 15:57, Daniel Verite wrote:
> I think one related issue that the patch works around by using a libc locale
> as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE.
> In fact I've been wondering if that's the main reason for the interface
> implemented by the patch.

So it seems, but then it should be called out more clearly.

> Otherwise, how should these env variables be initialized for ICU
> databases?

I think when using ICU by default, then it should not matter because we
shouldn't be calling any libc functions that use those settings.  Maybe
there need to be some exceptions, but again we should call those out
more clearly.

We could set them to "C" for consistency perhaps.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Reorganize collation lookup time and place

2018-12-12 Thread Andres Freund
Hi,

On 2018-12-10 16:12:54 +0100, Peter Eisentraut wrote:
> Attached is a patch that reorganizes how and where collations are looked up.
> 
> Until now, a fmgr C function that makes use of collation information
> (for example varstr_cmp(), str_toupper()) gets passed the collation OID,
> looks up the collation with pg_newlocale_from_collation(), and then does
> something with it.
> 
> With this change, the lookup is moved earlier, typically during executor
> initialization.  The fmgr functions receive the locale pointer that is
> the result of that lookup.
> 
> This makes the collation lookup more similar to the function lookup.  In
> many cases they now happen right next to each other.
> pg_newlocale_from_collation() becomes analogous to fmgr_info() and
> pg_locale_t to FmgrInfo *.

Why isn't this integrated into fmgr_info()?

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

2018-12-12 Thread Tomas Vondra


On 12/12/18 7:05 AM, Surafel Temesgen wrote:
> 
> 
> On Tue, Dec 4, 2018 at 12:44 PM Alvaro Herrera  > wrote:
> 
> After reading this thread, I think I like WHERE better than FILTER.
> Tally:
> 
> WHERE: Adam Berlin, Lim Myungkyu, Dean Rasheed, yours truly
> FILTER: Tomas Vondra, Surafel Temesgen
> 
> 
> 
> accepting tally result  i change the keyword to WHERE condition and
> allow volatile function with single insertion method

Can you also update the docs to mention that the functions called from
the WHERE clause does not see effects of the COPY itself?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pEpkey.asc
Description: application/pgp-keys


Bogus EPQ plan construction in postgres_fdw

2018-12-12 Thread Tom Lane
By chance I noticed that postgres_fdw's postgresGetForeignPlan() assumes
--- without any checking --- that the outer_plan it's given for a join
relation must have a NestLoop, MergeJoin, or HashJoin node at the top.
That's been wrong at least since commit 4bbf6edfb (which could cause
insertion of a Sort node on top) and it seems like a pretty unsafe
thing to Just Assume even without that.

There are two ways in which this faulty assumption is materialized.
One is that a new targetlist is jammed into the plan node without
any check as to whether it's really safe to do that, ie whether the
node is projection-capable or not.  Of course, a Sort node is *not*
projection-capable, meaning that we get a broken plan tree that will
not deliver the set of columns that postgres_fdw wanted.

This would be a security problem, similar to other recent problems where
the output datatypes of a plan node are misidentified, except that through
blind good fortune it seems impossible to reach the problem at execution.
The jammed-in targetlist is always identical to the subplan's original
targetlist (i.e., it's the Vars-only targetlist the planner constructed
for the joinrel) except at the topmost join level --- and if this is
happening at the topmost level, then the foreign join must encompass all
relations in the query.  That means that an EPQ recheck will never happen
(since EPQ could fire only if there's at least one non-postgres_fdw
relation involved), so the broken plan tree will never be executed.

There remains a cosmetic problem, which is that since the Sort node's
tlist doesn't match reality, EXPLAIN sometimes gets confused and prints
garbage info about the Sort's sortkeys.  This is in fact visible in
current regression test results:

   ->  Sort
 Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
 Sort Key: t1.c3 USING <, t1.c1

The "USING <" bit should only appear if the sort key is using a nondefault
sort operator, which surely isn't the case here, so why's that there?
The reason is that EXPLAIN is looking at the bogus tlist and drawing the
wrong conclusion about the sort key's datatype, which naturally leads it
to draw the wrong conclusion about whether the operator is the default.

The second way in which postgresGetForeignPlan() is broken is that it
does this without any check on the plan node's actual type:

if (join_plan->jointype == JOIN_INNER)
join_plan->joinqual = list_delete(join_plan->joinqual,
  qual);

This would certainly lead to a crash if list_delete were invoked on
some bits that weren't a pointer-to-List.  In the case of a Sort node,
we're again escaping bad consequences through blind good fortune:
Join.jointype is at the same struct offset as Sort.numCols, which will
never be zero.  Hence, if JoinType is the same size as int, or if it's
smaller but the machine is little-endian, jointype will not read as
JOIN_INNER (zero) so nothing bad happens.  On a big-endian machine where
enum types can be smaller than int, this'd likely crash in list_delete.
The fact that none of our big-endian buildfarm critters have failed on
the regression tests suggests that that combination may not exist in
reality.  (We'd also have a problem if the plan node was neither a join
nor a Sort, but I'm unsure whether such a case is actually reachable.)

Hence, I'm planning to apply the attached, which removes both of the
unwarranted assumptions.  In the qual-deletion step, we can just not
try to delete quals if it's not recognizably a join node; that's only
a minor optimization that we can skip.  The other problem can be fixed
by checking whether the node is projection-capable and inserting a
Result node if it isn't.  createplan.c has logic for exactly that,
but it wasn't exposed in a usable form, so I modified createplan.c
to provide a function for this.

regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index dfa6201..666548b 100644
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
*** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2
*** 1701,1725 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Sort
!  Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Sort Key: t1.c3 USING <, t1.c1
!   

Minimal logical decoding on standbys

2018-12-12 Thread Andres Freund
Hi,

Craig previously worked on $subject, see thread [1].  A bunch of the
prerequisite features from that and other related threads have been
integrated into PG.  What's missing is actually allowing logical
decoding on a standby.  The latest patch from that thread does that [2],
but unfortunately hasn't been updated after slipping v10.

The biggest remaining issue to allow it is that the catalog xmin on the
primary has to be above the catalog xmin horizon of all slots on the
standby. The patch in [2] does so by periodically logging a new record
that announces the current catalog xmin horizon.   Additionally it
checks that hot_standby_feedback is enabled when doing logical decoding
from a standby.

I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements.  I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts.  As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.

I also don't think we should actually enforce hot_standby_feedback being
enabled - there's use-cases where that's not convenient, and it's not
bullet proof anyway (can be enabled/disabled without using logical
decoding inbetween).  I think when there's a conflict we should have the
HINT mention that hs_feedback can be used to prevent such conflicts,
that ought to be enough.

Attached is a rough draft patch. If we were to go for this approach,
we'd obviously need to improve the actual conflict handling against
slots - right now it just logs a WARNING and retries shortly after.

I think there's currently one hole in this approach. Nbtree (and other
index types, which are pretty unlikely to matter here) have this logic
to handle snapshot conflicts for single-page deletions:


/*
 * If we have any conflict processing to do, it must happen before we
 * update the page.
 *
 * Btree delete records can conflict with standby queries.  You might
 * think that vacuum records would conflict as well, but we've handled
 * that already.  XLOG_HEAP2_CLEANUP_INFO records provide the highest 
xid
 * cleaned by the vacuum of the heap and so we can resolve any conflicts
 * just once when that arrives.  After that we know that no conflicts
 * exist from individual btree vacuum records on that index.
 */
if (InHotStandby)
{
TransactionId latestRemovedXid = 
btree_xlog_delete_get_latestRemovedXid(record);
RelFileNode rnode;

XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);

ResolveRecoveryConflictWithSnapshot(latestRemovedXid,

xlrec->onCatalogTable, rnode);
}

I.e. we get the latest removed xid from the heap, which has the
following logic:
/*
 * If there's nothing running on the standby we don't need to derive a
 * full latestRemovedXid value, so use a fast path out of here.  This
 * returns InvalidTransactionId, and so will conflict with all HS
 * transactions; but since we just worked out that that's zero people,
 * it's OK.
 *
 * XXX There is a race condition here, which is that a new backend might
 * start just after we look.  If so, it cannot need to conflict, but 
this
 * coding will result in throwing a conflict anyway.
 */
if (CountDBBackends(InvalidOid) == 0)
return latestRemovedXid;

/*
 * In what follows, we have to examine the previous state of the index
 * page, as well as the heap page(s) it points to.  This is only valid 
if
 * WAL replay has reached a consistent database state; which means that
 * the preceding check is not just an optimization, but is *necessary*. 
We
 * won't have let in any user sessions before we reach consistency.
 */
if (!reachedConsistency)
elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot 
operate with inconsistent data");

so we wouldn't get a correct xid when not if nobody is connected to a
database (and by implication when not yet consistent).


I'm wondering if it's time to move the latestRemovedXid computation for
this type of record to the primary - it's likely to be cheaper there and
avoids this kind of complication. Secondarily, it'd have the advantage
of making pluggable storage integration easier - there we have the
problem that we don't know which type of relation we're dealing with
during recovery, so such lookups make pluggability harder (zheap just
adds extra flags to signal that, but that's not extensible).

Another alternative would be to just prevent such index deletions for
catalog tables when wal_level 

Re: Introducing SNI in TLS handshake for SSL connections

2018-12-12 Thread Pablo Iranzo Gómez

+++ Andreas Karlsson [11/12/18 18:18 +0100]:
On 12/11/18 3:52 PM, Pablo Iranzo Gómez wrote:> I came to this old 
thread while trying to figure out on how to setup
postgres replication behind OpenShift/Kubernetes behind a route 
(which only forwards 80 or 443 traffic), but could work if SNI is 
supported on the client using it.


I haven't found any further follow-up on this, but based on the 
number of posts and questions on many sites on accessing postgres on 
OpenShift/Kubernetes it could be something good to have supported.


Any further information or plans?


I am pretty sure nobody is working on this.

It seems like it would be easy to implement (basically just call 
SSL_set_tlsext_host_name() with the right hostname) with the only 
issue being that we may need to add a new connection string 
parameter[1] because I doubt all users would want SNI enabled by 
default since PostgreSQL itself cannot do anything useful with the 
hostname, only some kind of TLS proxy can. Hopefully there wont be 
much bike shedding about the new connection parameter. :)


Feel free to write a patch if you have the time and submit it to the 
next commitfest[2] for review.


Unfortunately I do not consider myself a coder, so if there is any way
to 'list' this as a 'nice to have' thing so that someone can take the
task and move it forward.

Thanks,
Pablo



Notes:

1. List of current options: 
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
2. https://wiki.postgresql.org/wiki/CommitFest

Andreas



--

Pablo Iranzo Gómez (pablo.ira...@redhat.com)  GnuPG: 0x5BD8E1E4
Senior Software Engineer - Solutions Engineering   iranzo @ IRC
RHC{A,SS,DS,VA,E,SA,SP,AOSP}, JBCAA#110-215-852RHCA Level V

Blog: https://iranzo.github.io https://citellus.org


signature.asc
Description: PGP signature


Log a sample of transactions

2018-12-12 Thread Adrien Nayrat
Hello,

Per idea of Nikolay Samokhvalov[1] I propose this patch to add the possibility
to log all statements from a fraction of transactions.

I have several questions:
  * Do we want this feature?
  * How can I add tests? I seems hard to handle tests when a there is duration
in the output.
  * Should I handle the case when someone try to change the setting *inside* a
transaction:

# SET client_min_messages to LOG;
SET
# SET log_transaction_sample_rate to 1;
SET
# BEGIN;
LOG:  duration: 0.364 ms  statement: BEGIN;
BEGIN
# SELECT 1;
LOG:  duration: 0.530 ms  statement: SELECT 1;
-[ RECORD 1 ]
?column? | 1

# SET log_transaction_sample_rate to 0;
LOG:  duration: 0.333 ms  statement: SET log_transaction_sample_rate to 0;
SET
# SELECT 1;
LOG:  duration: 0.586 ms  statement: SELECT 1;
-[ RECORD 1 ]
?column? | 1

Thanks

[1]:
https://www.postgresql.org/message-id/CANNMO%2BLg65EFqHb%2BZYbMLKyE2y498HJzsdFrMnW1dQ6AFJ3Mpw%40mail.gmail.com


-- 
Adrien NAYRAT
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4a7121a51f..7312b1022f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5771,6 +5771,24 @@ local0.*/var/log/postgresql

   
 
+ 
+  log_transaction_sample_rate (real)
+  
+   log_transaction_sample_rate configuration parameter
+  
+  
+   
+
+ Set the fraction of transactions whose statements are logged. It applies
+ to each new transaction regardless of their duration. The default is
+ 0, meaning do not log statements from this transaction.
+ Setting this to 1 logs all statements for all transactions.
+ log_transaction_sample_rate is helpful to track a
+ sample of transaction.
+
+   
+  
+
  
 
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d967400384..d622023ebe 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -75,6 +75,7 @@ int			XactIsoLevel;
 
 bool		DefaultXactReadOnly = false;
 bool		XactReadOnly;
+bool		xact_is_sampled;
 
 bool		DefaultXactDeferrable = false;
 bool		XactDeferrable;
@@ -1821,6 +1822,9 @@ StartTransaction(void)
 	s->state = TRANS_START;
 	s->transactionId = InvalidTransactionId;	/* until assigned */
 
+	/* Determine if we log statements in this transaction */
+	xact_is_sampled = random() <= log_xact_sample_rate * MAX_RANDOM_VALUE;
+
 	/*
 	 * initialize current transaction state fields
 	 *
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5ab7d3cd8d..8f1ae0a5dd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -100,7 +100,8 @@ int			max_stack_depth = 100;
 /* wait N seconds to allow attach from a debugger */
 int			PostAuthDelay = 0;
 
-
+/* flag for logging statements in this transaction */
+bool		xact_is_sampled = false;
 
 /* 
  *		private variables
@@ -2203,6 +2204,8 @@ check_log_statement(List *stmt_list)
  * check_log_duration
  *		Determine whether current command's duration should be logged.
  *		If log_statement_sample_rate < 1.0, log only a sample.
+ *		We also check if this statement in this transaction must be logged
+ *		(regardless of its duration).
  *
  * Returns:
  *		0 if no logging is needed
@@ -2218,7 +2221,7 @@ check_log_statement(List *stmt_list)
 int
 check_log_duration(char *msec_str, bool was_logged)
 {
-	if (log_duration || log_min_duration_statement >= 0)
+	if (log_duration || log_min_duration_statement >= 0 || xact_is_sampled)
 	{
 		long		secs;
 		int			usecs;
@@ -2252,11 +2255,11 @@ check_log_duration(char *msec_str, bool was_logged)
 			(log_statement_sample_rate == 1 ||
 			 random() <= log_statement_sample_rate * MAX_RANDOM_VALUE);
 
-		if ((exceeded && in_sample) || log_duration)
+		if ((exceeded && in_sample) || log_duration || xact_is_sampled)
 		{
 			snprintf(msec_str, 32, "%ld.%03d",
 	 secs * 1000 + msecs, usecs % 1000);
-			if (exceeded && !was_logged)
+			if ((exceeded || xact_is_sampled) && !was_logged)
 return 2;
 			else
 return 1;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..160c725123 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -487,6 +487,7 @@ int			client_min_messages = NOTICE;
 int			log_min_duration_statement = -1;
 int			log_temp_files = -1;
 double		log_statement_sample_rate = 1.0;
+double		log_xact_sample_rate = 0;
 int			trace_recovery_messages = LOG;
 
 int			temp_file_limit = -1;
@@ -3325,13 +3326,25 @@ static struct config_real ConfigureNamesReal[] =
 		{"log_statement_sample_rate", PGC_SUSET, LOGGING_WHEN,
 			gettext_noop("Fraction of statements over log_min_duration_statement to log."),
 			gettext_noop("If you only want a sample, use a value between 0 (never "
-		 "log) and 1.0 (always log).")
+		 "log) and 1 (always log).")
 		},
 		&log_statement_sample_rate,
 		1.0, 0.0, 1.0,
 		NULL, NULL, NULL
 	},
 
+	{

Re: WIP: Avoid creation of the free space map for small tables

2018-12-12 Thread John Naylor
On 11/24/18, Amit Kapila  wrote:
> 4. You have mentioned above that "system catalogs created during
> bootstrap still have a FSM if they have any data." and I can also see
> this behavior, have you investigated this point further?

I found the cause of this. There is some special-case code in md.c to
create any file if it's opened in bootstrap mode. I removed this and a
similar special case (attached), and make check still passes. After
digging through the history, I'm guessing this has been useless code
since about 2001, when certain special catalogs were removed.

-John Naylor
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 4c6a50509f..9331c8b1d1 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -310,13 +310,7 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
 	{
 		int			save_errno = errno;
 
-		/*
-		 * During bootstrap, there are cases where a system relation will be
-		 * accessed (by internal backend processes) before the bootstrap
-		 * script nominally creates it.  Therefore, allow the file to exist
-		 * already, even if isRedo is not set.  (See also mdopen)
-		 */
-		if (isRedo || IsBootstrapProcessingMode())
+		if (isRedo)
 			fd = PathNameOpenFile(path, O_RDWR | PG_BINARY);
 		if (fd < 0)
 		{
@@ -572,26 +566,15 @@ mdopen(SMgrRelation reln, ForkNumber forknum, int behavior)
 
 	if (fd < 0)
 	{
-		/*
-		 * During bootstrap, there are cases where a system relation will be
-		 * accessed (by internal backend processes) before the bootstrap
-		 * script nominally creates it.  Therefore, accept mdopen() as a
-		 * substitute for mdcreate() in bootstrap mode only. (See mdcreate)
-		 */
-		if (IsBootstrapProcessingMode())
-			fd = PathNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
-		if (fd < 0)
+		if ((behavior & EXTENSION_RETURN_NULL) &&
+			FILE_POSSIBLY_DELETED(errno))
 		{
-			if ((behavior & EXTENSION_RETURN_NULL) &&
-FILE_POSSIBLY_DELETED(errno))
-			{
-pfree(path);
-return NULL;
-			}
-			ereport(ERROR,
-	(errcode_for_file_access(),
-	 errmsg("could not open file \"%s\": %m", path)));
+			pfree(path);
+			return NULL;
 		}
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", path)));
 	}
 
 	pfree(path);


gist microvacuum doesn't appear to care about hot standby?

2018-12-12 Thread Andres Freund
Hi,

Am I missing something or did

commit 013ebc0a7b7ea9c1b1ab7a3d4dd75ea121ea8ba7
Author: Teodor Sigaev 
Date:   2015-09-09 18:43:37 +0300

Microvacuum for GIST

Mark index tuple as dead if it's pointed by kill_prior_tuple during
ordinary (search) scan and remove it during insert process if there is no
enough space for new tuple to insert. This improves select performance
because index will not return tuple marked as dead and improves insert
performance because it reduces number of page split.

Anastasia Lubennikova  with
 minor editorialization by me

entirely disregard recovery conflict handling?  The index entries it
removes could very well be visible to a snapshot on the standby. That's
why the equivalent nbtree (and hash) code does:


/*
 * If we have any conflict processing to do, it must happen before we
 * update the page.
 *
 * Btree delete records can conflict with standby queries.  You might
 * think that vacuum records would conflict as well, but we've handled
 * that already.  XLOG_HEAP2_CLEANUP_INFO records provide the highest 
xid
 * cleaned by the vacuum of the heap and so we can resolve any conflicts
 * just once when that arrives.  After that we know that no conflicts
 * exist from individual btree vacuum records on that index.
 */
if (InHotStandby)
{
TransactionId latestRemovedXid = 
btree_xlog_delete_get_latestRemovedXid(record);
RelFileNode rnode;

XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);

ResolveRecoveryConflictWithSnapshot(latestRemovedXid,

xlrec->onCatalogTable, rnode);
}

Is there any reason something like that isn't necessary for gist? If so,
where's that documented? If not, uh, isn't that a somewhat serious bug
in gist?

Greetings,

Andres Freund



Re: Remove Deprecated Exclusive Backup Mode

2018-12-12 Thread Michael Paquier
On Wed, Dec 12, 2018 at 08:22:10AM -0500, David Steele wrote:
> Add the fact that we have *zero* tests for exclusive backups.  I only
> had to refactor one exclusive backup in the tests and since it did not
> archive, exclude pg_wal, postmaster.pid, or do anything else our docs
> recommend I wouldn't say it qualifies as a real test.  Also, it wasn't
> even trying to test exclusive backups -- it was a test for logical
> replication following timelines.

This point is really right.  The TAP tests rely heavily on pg_basebackup
when taking base backups, still there is an interface to be able to take
filesystem-level backups with the exclusive SQL interface.  The test
David is referring to here is backup_fs_hot in PostgresNode.pm.
--
Michael


signature.asc
Description: PGP signature


Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2018-12-12 Thread Michael Paquier
On Wed, Dec 12, 2018 at 06:55:04PM +0300, Sergei Kornilov wrote:
>> This allows the state of walrcv and startup to diverge, they could
>> e.g. have different configuration, due to differently time config
>> reloads.
>
> So we have exactly same problem with, for example, hot_standby_feedback, 
> right?
> We change hot_standby_feedback value, reload it and we can have 'show
> hot_standby_feedback' different to currently running walreceiver.  But
> why we have nothing about hot_standby_feedback in pg_stat_get_wal_receiver()? 
> Where is difference?

The difference is in the fact that a WAL receiver uses the connection
string and the slot name when establishing the connection when using
START_STREAMING through the replication protocol, and
hot_standby_feedback gets used depending on the context of the messages
exchanged.
--
Michael


signature.asc
Description: PGP signature


Re: Add timeline to partial WAL segments

2018-12-12 Thread Michael Paquier
On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
> The LSN switch point is often the same even when servers are going to
> different timelines.  If the LSN is different enough then the problem
> solves itself since the .partial will be on an entirely different
> segment.

That would mean that WAL forked exactly at the same record.  You have
likely seen more cases where than can happen in real life than I do.

> But, we could at least use the . notation and end up with something like
> 000100010001.0002.partial or perhaps
> 000100010001.T0002.partial?  Maybe
> 000100010001.0002.tpartial?
> 
> I can't decide whether the .partial files generated by pg_receivewal are
> a problem or not.  It seems to me that only the original cluster is
> going to be able to stream this file -- once the segment is renamed to
> .partial it won't be visible to pg_receivexlog.  So, .partial without a
> timeline extension just means partial from the original timeline.

Still this does not actually solve the problem when two servers are
trying to use the same timeline?  You would get conflicts with the
history file as well in this case but the partial segment gets archived
first..  It seems to me that it is kind of difficult to come with a
totally bullet-proof solution.  Adding the timeline is appealing to use
if the history files can be added on time, the switchpoint LSN is also
appealing as per the likelihood of WAL forking at a different point on a
record basis.  Perhaps another solution could be to add both, but that
looks like an overkill.

> There's another difference.  The .partial generated by pg_receivewal is
> an actively-worked-on file whereas the .partial generated by a promotion
> is probably headed for oblivion.  I haven't see a single case where one
> was used in an actual recovery (which doesn't mean it hasn't happened,
> of course).

There are many people implementing their own backup solutions, it is
hard to say that none of those solutions are actually able to copy the
last partial file to replay up to the end of a wanted timeline for a
PITR.

>> (I am completely sold to the idea of prioritizing file types in the
>> archiver.)
> 
> OK, I'll work up a patch for that, then.  It doesn't solve the .partial
> problem, but it does reduce the likelihood that two servers will end up
> on the same timeline.

Thanks a lot, David!
--
Michael


signature.asc
Description: PGP signature


Re: Introducing SNI in TLS handshake for SSL connections

2018-12-12 Thread Andreas Karlsson

On 12/11/18 3:52 PM, Pablo Iranzo Gómez wrote:
I came to this old thread while trying to figure out on how to setup 
postgres replication behind OpenShift/Kubernetes behind a route (which 
only forwards 80 or 443 traffic), but could work if SNI is supported on 
the client using it.


Hm ... while hacking at a patch for this I gave your specific problem 
some more thought.


I am not familiar with OpenShift or Kubernetes but I want you to be 
aware of that whatever proxy you are going to use will still need to be 
aware of, at least a subset of, the PostgreSQL protocol, since similar 
to SMTP's STARTTLS command the PostgreSQL client will start out using 
the plain text PostgreSQL protocol and then request the server to switch 
over to SSL[1]. So it would be necessary to add support for this to 
whatever proxy you intend to use.


Do you know if adding such custom protocol support is easy to do to the 
proxies you refer to? And do you have any links to documentation for 
these solutions?


Notes

1. https://www.postgresql.org/docs/11/protocol-flow.html#id-1.10.5.7.11

Andreas



Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2018-12-12 Thread Michael Paquier
On Wed, Dec 12, 2018 at 02:55:17PM +0900, Michael Paquier wrote:
> Well, the conninfo and slot name accessible to the user are the values
> available only once the information of the WAL receiver in shared memory
> is ready to be displayed.  Relying more on the GUC context has the
> advantage to never have in shared memory the original string and only
> store the clobbered one, which actually simplifies what's stored in
> shared memory because you can entirely remove ready_to_display (I forgot
> to remove that in the patch actually).  If those parameters become
> reloadable, you actually rely only on what the param context has, and
> not on what the shared memory context may have or not.
> 
> Removing entirely ready_to_display is quite appealing actually..

I have been looking at that stuff this morning, and indeed
ready_for_display can be entirely removed.  I think that's quite an
advantage as WalRcv->conninfo only stores the connection string
cloberred to hide security-sensitive fields and it never has to save the
initial state which could have sensitive data, simplifying how the WAL
receiver data is filled.  I am adding that to the next commit fest.
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c80b14ed97..9d664559e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11848,8 +11848,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 tli, curFileTLI);
 		}
 		curFileTLI = tli;
-		RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
-			 PrimarySlotName);
+		RequestXLogStreaming(tli, ptr);
 		receivedUpto = 0;
 	}
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 9643c2ed7b..1b077c9a3d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -188,9 +188,7 @@ DisableWalRcvImmediateExit(void)
 void
 WalReceiverMain(void)
 {
-	char		conninfo[MAXCONNINFO];
 	char	   *tmp_conninfo;
-	char		slotname[NAMEDATALEN];
 	XLogRecPtr	startpoint;
 	TimeLineID	startpointTLI;
 	TimeLineID	primaryTLI;
@@ -248,10 +246,6 @@ WalReceiverMain(void)
 	walrcv->pid = MyProcPid;
 	walrcv->walRcvState = WALRCV_STREAMING;
 
-	/* Fetch information required to start streaming */
-	walrcv->ready_to_display = false;
-	strlcpy(conninfo, (char *) walrcv->conninfo, MAXCONNINFO);
-	strlcpy(slotname, (char *) walrcv->slotname, NAMEDATALEN);
 	startpoint = walrcv->receiveStart;
 	startpointTLI = walrcv->receiveStartTLI;
 
@@ -291,9 +285,14 @@ WalReceiverMain(void)
 	/* Unblock signals (they were blocked when the postmaster forked us) */
 	PG_SETMASK(&UnBlockSig);
 
+	if (PrimaryConnInfo == NULL || PrimaryConnInfo[0] == '\0')
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined")));
+
 	/* Establish the connection to the primary for XLOG streaming */
 	EnableWalRcvImmediateExit();
-	wrconn = walrcv_connect(conninfo, false, "walreceiver", &err);
+	wrconn = walrcv_connect(PrimaryConnInfo, false, "walreceiver", &err);
 	if (!wrconn)
 		ereport(ERROR,
 (errmsg("could not connect to the primary server: %s", err)));
@@ -311,12 +310,15 @@ WalReceiverMain(void)
 	if (tmp_conninfo)
 		strlcpy((char *) walrcv->conninfo, tmp_conninfo, MAXCONNINFO);
 
+	memset(walrcv->slotname, 0, NAMEDATALEN);
+	if (PrimarySlotName)
+		strlcpy((char *) walrcv->slotname, PrimarySlotName, NAMEDATALEN);
+
 	memset(walrcv->sender_host, 0, NI_MAXHOST);
 	if (sender_host)
 		strlcpy((char *) walrcv->sender_host, sender_host, NI_MAXHOST);
 
 	walrcv->sender_port = sender_port;
-	walrcv->ready_to_display = true;
 	SpinLockRelease(&walrcv->mutex);
 
 	if (tmp_conninfo)
@@ -387,7 +389,8 @@ WalReceiverMain(void)
 		 */
 		options.logical = false;
 		options.startpoint = startpoint;
-		options.slotname = slotname[0] != '\0' ? slotname : NULL;
+		options.slotname = (PrimarySlotName && PrimarySlotName[0] != '\0') ?
+			PrimarySlotName : NULL;
 		options.proto.physical.startpointTLI = startpointTLI;
 		ThisTimeLineID = startpointTLI;
 		if (walrcv_startstreaming(wrconn, &options))
@@ -776,7 +779,6 @@ WalRcvDie(int code, Datum arg)
 	Assert(walrcv->pid == MyProcPid);
 	walrcv->walRcvState = WALRCV_STOPPED;
 	walrcv->pid = 0;
-	walrcv->ready_to_display = false;
 	walrcv->latch = NULL;
 	SpinLockRelease(&walrcv->mutex);
 
@@ -1374,7 +1376,6 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	Datum	   *values;
 	bool	   *nulls;
 	int			pid;
-	bool		ready_to_display;
 	WalRcvState state;
 	XLogRecPtr	receive_start_lsn;
 	TimeLineID	receive_start_tli;
@@ -1392,7 +1393,6 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	/* Take a lock to ensure value consistency */
 	SpinLockAcquire(&WalRcv->mutex);
 	pid = (int) WalRcv->pid;
-	ready_to_display = WalRcv->ready_to_display;
 	state = WalRcv->walRcvState;
 	receive_start_lsn = WalRcv->receiveStart;
 	receive_star

Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-12 Thread Alexander Korotkov
On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> Is there any reason something like that isn't necessary for gist? If so,
> where's that documented? If not, uh, isn't that a somewhat serious bug
> in gist?

Thank you for pointing!  This looks like a bug for me too.  I'm going
to investigate more on this and provide a fix in next couple of days.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Cache lookup errors with functions manipulation object addresses

2018-12-12 Thread Michael Paquier
On Wed, Dec 12, 2018 at 02:21:29PM -0300, Alvaro Herrera wrote:
> I think defining bit flags in an enum is weird; I'd use defines
> instead.

Okay, I am fine with that.

> And (a purely stylistic point) I'd use bits32 rather than uint32.  (I'm
> probably alone in this opinion, since almost every interface passing
> flags uses unsigned int of some size.  But "bits" types are defined
> precisely for this purpose!) Compare a61f5ab98638.

Fine with that as well, let's do as you suggest then.

> I support 0001 other than those two points.

Attached is an updated version for that as 0001.  Thanks for the
review.  Does that part look good to you now?

> Yeah, these two cases:
> 
> +-- Keep those checks in the same order as getObjectTypeDescription()
> +SELECT * FROM pg_identify_object_as_address('pg_class'::regclass, 0, 0); -- 
> no relation
> + type | object_names | object_args 
> +--+--+-
> +  |  | 
> +(1 row)

"type" should show "relation" here, yes.

> +SELECT * FROM pg_identify_object_as_address('pg_proc'::regclass, 0, 0); -- 
> no function
> + type | object_names | object_args 
> +--+--+-
> +  | {}   | {}
> +(1 row)
> 
> All the other cases you add have a non-empty value in "type".

On this one I would expect NULL for object_names and object_args, as
empty does not make sense for an undefined object, still "type" should
print...  "type".

+SELECT * FROM pg_identify_object_as_address('pg_constraint'::regclass, 0, 0); 
-- no constraint
+ type | object_names | object_args
+--+--+-
+  |  |
+(1 row)
Constraints also are empty.

+SELECT * FROM pg_identify_object_as_address('pg_largeobject'::regclass, 0, 0); 
-- no large object, no error
+ type | object_names | object_args
+--+--+-
+ large object | {0}  | {}
+(1 row)
Large objects should return NULL for the last two fields.

There are a couple of other inconsistent cases with the existing sub-APIs:
+SELECT * FROM pg_identify_object_as_address('pg_type'::regclass, 0, 0); -- no 
type
+ type | object_names | object_args
+--+--+-
+ type | {-}  | {}
+(1 row)
Here I would expect NULL for object_names and object_args.

> I think this is our chance to fix the mess.  Previously (before I added
> the SQL-access of the object addressing mechanism in 9.5 I mean) it
> wasn't possible to get at this code at all with arbitrary input, so this
> is all a "new" problem (I mean new in the last 5 years).

This requires a bit more work with the low-level APIs grabbing the
printed information though.  And I think that the following guidelines
make sense to work on as a base definition for undefined objects:
- pg_identify_object returns the object type name, NULL for the other fields.
- pg_describe_object returns just NULL.
- pg_identify_object_as_address returns the object type name and NULL
for the other fields.

There is some more refactoring work still needed for constraints, large
objects and functions, in a way similar to a26116c6.  I am pretty happy
with the shape of 0001, so this could be applied, 0002 still needs to be
reworked so as all undefined object types behave as described above in a
consistent manner.  Do those definitions make sense?
--
Michael
From 7d832462d64f8d8df417682cff234876ac154b41 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 13 Dec 2018 14:29:19 +0900
Subject: [PATCH 1/2] Introduce new extended routines for FDW and foreign
 server lookups
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The cache lookup routines for foreign-data wrappers and foreign servers
are extended with an extra argument to handle a set of flags.  The only
value which can be used now is to indicate if a missing object should
result in an error or not, and are designed to be extensible on need.
Those new routines are added into the existing set of user-visible
things and documented in consequence.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wb5...@mail.gmail.com
---
 doc/src/sgml/fdwhandler.sgml  | 34 +
 src/backend/foreign/foreign.c | 36 +--
 src/include/foreign/foreign.h | 10 ++
 3 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd77c..452b776b9e 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1408,6 +1408,23 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
 
 
 ForeignDataWrapper *
+GetForeignDataWrapperExtended(Oid fdwid, bits16 flags);
+
+
+ This function returns a ForeignDataWrapper
+ object for the foreign-data wrapper with the given OID.  A
+ ForeignDataWrapper object contains properties
+ of the FDW

Re: Pluggable Storage - Andres's take

2018-12-12 Thread Kyotaro HORIGUCHI
Hello.

(in the next branch:)
At Tue, 27 Nov 2018 14:58:35 +0900, Amit Langote 
 wrote in 
<080ce65e-7b96-adbf-1c8c-7c88d87ea...@lab.ntt.co.jp>
> Thank you for working on this.  Really looking forward to how this shapes
> up. :)

+1.

I looked through the documentation part, as where I can do something.

am.html:
> 61.1. Overview of Index access methods
>  61.1.1. Basic API Structure for Indexes
>  61.1.2. Index Access Method Functions
>  61.1.3. Index Scanning
> 61.2. Overview of Table access methods
>  61.2.1. Table access method API
>  61.2.2. Table Access Method Functions
>  61.2.3. Table scanning

Aren't 61.1 and 61.2 better in the reverse order?

Is there a reason for the difference of the titles between 61.1.1
and 61.2.1? The contents are quite similar.


+ 
+  Table access method API

The member names of index AM struct begins with "am" but they
don't have an unified prefix in table AM. It seems a bit
incosistent.  Perhaps we might should rename some long and
internal names..


+ 
+  Table Access Method Functions

Table AM functions are far finer-grained than index AM. I think
that AM developers needs the more concrete description on what
every API function does and explanation on various
previously-internal structs.

I suppose that how the functions are used in core code paths will
be written in the following sections.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Pluggable Storage - Andres's take

2018-12-12 Thread Kyotaro HORIGUCHI
Hello.

At Tue, 27 Nov 2018 14:58:35 +0900, Amit Langote 
 wrote in 
<080ce65e-7b96-adbf-1c8c-7c88d87ea...@lab.ntt.co.jp>
> +  
> +
> +TupleTableSlotOps *
> +slot_callbacks (Relation relation);
> +
> +   API to access the slot specific methods;
> +   Following methods are available;
> +   TTSOpsVirtual,
> +   TTSOpsHeapTuple,
> +   TTSOpsMinimalTuple,
> +   TTSOpsBufferTuple,
> +  
> 
> Unless I'm misunderstanding what the TupleTableSlotOps abstraction is or
> its relations to the TableAmRoutine abstraction, I think the text
> description could better be written as:
> 
> "API to get the slot operations struct for a given table access method"
> 
> It's not clear to me why various TTSOps* structs are listed here?  Is the
> point that different AMs may choose one of the listed alternatives?  For
> example, I see that heap AM implementation returns TTOpsBufferTuple, so it
> manipulates slots containing buffered tuples, right?  Other AMs are free
> to return any one of these?  For example, some AMs may never use buffer
> manager and hence not use TTOpsBufferTuple.  Is that understanding correct?

Yeah, I'm not sure why it should not be a pointer to the struct
itself but a function. And the four struct doesn't seem relevant
to table AMs. Perhaps clear, getsomeattrs and so on should be
listed instead.

> +  
> +
> +Oid
> +tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
> +  int options, BulkInsertState bistate);
> +
> +   API to insert the tuple and provide the ItemPointerData
> +   where the tuple is successfully inserted.
> +  
> 
> It's not clear from the signature where you get the ItemPointerData.
> Looking at heapam_tuple_insert which puts it in slot->tts_tid, I think
> this should mention it a bit differently, like:
> 
> API to insert the tuple contained in the provided slot and return its TID,
> that is, the location where the tuple is successfully inserted

It is actually an OID, not a TID in the current code. TID is
internaly handled.

> +  
> +
> +bool
> +tuple_fetch_follow (struct IndexFetchTableData *scan,
> +  ItemPointer tid,
> +  Snapshot snapshot,
> +  TupleTableSlot *slot,
> +  bool *call_again, bool *all_dead);
> +
> +   API to get the all the tuples of the page that satisfies itempointer.
> +  
> 
> IIUC, "all the tuples of of the page" in the above sentence means all the
> tuples in the HOT chain of a given heap tuple, making this description of
> the API slightly specific to the heap AM.  Can we make the description
> more generic or is the API itself very specific that it cannot be
> expressed in generic terms?  Ignoring that for a moment, I think the
> sentence contains more "the"s than there need to be, so maybe write as:
> 
> API to get all tuples on a given page that are linked to the tuple of the
> given TID

Mmm. This is exposing MVCC matters to indexam. I suppose we
should refactor this API.

> +  
> +
> +tuple_data
> +get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
> +
> +   API to return the internal structure members of the HeapTuple.
> +  
> 
> I think this description doesn't mention enough details of both the
> information that needs to be specified when calling the function (what's
> in flags) and the information that's returned.

(I suppose it will be described in later sections.)

> +  
> +
> +bool
> +scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
> +  double *liverows, double *deadrows, TupleTableSlot
> *slot));
> +
> +   API to analyze the block and fill the buffered heap tuple in the slot
> and also
> +   provide the live and dead rows.
> +  
> 
> How about:
> 
> API to get the next tuple from the block being scanned, which also updates
> the number of live and dead rows encountered

"live" and "dead" are MVCC terms. I suppose that we should stash
out the deadrows somwhere else. (But analyze code would need to
be modified if we do so.)

> +void
> +scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber
> numBlks);
> +
> +   API to fix the relation scan range limits.
> +  
> 
> 
> How about:
> 
> API to set scan range endpoints

This sets start point and the number of blocks.. Just "API to set
scan range" would be sifficient reffering to the parameter list.

> +
> +
> +bool
> +scan_bitmap_pagescan (TableScanDesc scan,
> +TBMIterateResult *tbmres);
> +
> +   API to scan the relation and fill the scan description bitmap with
> valid item pointers
> +   for the specified block.
> +  
> 
> This says "to scan the relation", but seems to be concerned with only a
> page worth of data as the name also says.  Also, it's not clear what "scan
> description bitmap" means.  Maybe write as:
> 
> API to scan the relation block specified in the scan descriptor to collect
> and return the tuples requested by the given bitmap

"API to collect the tuples in a page requested by the given
bitmpap scan resul

Re: pg_partition_tree crashes for a non-defined relation

2018-12-12 Thread Amit Langote
On 2018/12/12 9:52, Michael Paquier wrote:
> On Mon, Dec 10, 2018 at 10:52:37PM +0900, Michael Paquier wrote:
>> OK.  Sure, let's do as you suggest then.  I'll wait a couple of days
>> before committing the patch so as if someone has extra comments they
>> have the time to post.  So please feel free to comment!
> 
> And done this way.  Thanks for the input, Stephen and others!

Thank you Michael for taking care of this.  I agree with the consensus
that instead of throwing an error for unsupported relkinds,
pg_partition_tree should rather return single row containing NULL for all
columns.

Regards,
Amit




Re: Introducing SNI in TLS handshake for SSL connections

2018-12-12 Thread Pablo Iranzo Gómez

Hi Andreas

+++ Andreas Karlsson [13/12/18 01:30 +0100]:

On 12/11/18 3:52 PM, Pablo Iranzo Gómez wrote:
I came to this old thread while trying to figure out on how to setup 
postgres replication behind OpenShift/Kubernetes behind a route 
(which only forwards 80 or 443 traffic), but could work if SNI is 
supported on the client using it.


Hm ... while hacking at a patch for this I gave your specific problem 
some more thought.


Thanks for this!



I am not familiar with OpenShift or Kubernetes but I want you to be 
aware of that whatever proxy you are going to use will still need to 


haproxy is what is used behind, the idea is that haproxy by default when
enabled via a 'route' does allow http or https protocol ONLY, BUT
(https://docs.openshift.com/container-platform/3.9/architecture/networking/routes.html),
routers do support TLS with SNI.

As PSQL by default tries TLS and fallbacks to plain psql protocol the
idea behind is that we tell OpenShift route to be 'Secure' and
'passtrough', in this way, when PSQL does speak to '443' port in the
route that goes to the 'pod' running postgres using TLS and SNI, the
connection goes thru without any special protocol change.

be aware of, at least a subset of, the PostgreSQL protocol, since 
similar to SMTP's STARTTLS command the PostgreSQL client will start 
out using the plain text PostgreSQL protocol and then request the 
server to switch over to SSL[1]. So it would be necessary to add 
support for this to whatever proxy you intend to use.


Do you know if adding such custom protocol support is easy to do to 
the proxies you refer to? And do you have any links to documentation 
for these solutions?


I found some diagrams and other links to SSL and HAProxy in:

https://www.haproxy.com/fr/blog/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/

Probably: https://tools.ietf.org/html/rfc6066#page-6

Let me know if this is not helpful, and thanks again for your time on
this.

Pablo




Notes

1. https://www.postgresql.org/docs/11/protocol-flow.html#id-1.10.5.7.11

Andreas


--

Pablo Iranzo Gómez (pablo.ira...@redhat.com)  GnuPG: 0x5BD8E1E4
Senior Software Engineer - Solutions Engineering   iranzo @ IRC
RHC{A,SS,DS,VA,E,SA,SP,AOSP}, JBCAA#110-215-852RHCA Level V

Blog: https://iranzo.github.io https://citellus.org


signature.asc
Description: PGP signature


Re: COPY FROM WHEN condition

2018-12-12 Thread Surafel Temesgen
On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra 
wrote:

>
> Can you also update the docs to mention that the functions called from
> the WHERE clause does not see effects of the COPY itself?
>
>

*Of course, i  also add same comment to insertion method selection*

*regards *

*Surafel*
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 411941ed31..4e6eb1fcdb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -25,6 +25,7 @@ PostgreSQL documentation
 COPY table_name [ ( column_name [, ...] ) ]
 FROM { 'filename' | PROGRAM 'command' | STDIN }
 [ [ WITH ] ( option [, ...] ) ]
+[ WHERE condition ]
 
 COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
 TO { 'filename' | PROGRAM 'command' | STDOUT }
@@ -353,6 +354,31 @@ COPY { table_name [ ( 

 
+   
+WHERE
+
+   
+The optional WHERE clause has the general form
+
+WHERE condition
+
+where condition is
+any expression that evaluates to a result of type
+boolean.  Any row that does not satisfy this
+condition will not be inserted to the table.  A row satisfies the
+condition if it returns true when the actual row values are
+substituted for any variable references.
+   
+
+   
+Currently, WHERE expressions cannot contain
+subqueries and will not see changes made by COPY
+command in a case of VOLATILE function.
+   
+
+
+   
+
   
  
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4aa8890fe8..b0471eedab 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -40,7 +40,11 @@
 #include "miscadmin.h"
 #include "optimizer/clauses.h"
 #include "optimizer/planner.h"
+#include "optimizer/prep.h"
 #include "nodes/makefuncs.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
@@ -150,6 +154,7 @@ typedef struct CopyStateData
 	bool		convert_selectively;	/* do selective binary conversion? */
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -180,6 +185,7 @@ typedef struct CopyStateData
 	ExprState **defexprs;		/* array of default att expressions */
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
+	ExprState  *qualexpr;
 
 	TransitionCaptureState *transition_capture;
 
@@ -801,6 +807,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	Relation	rel;
 	Oid			relid;
 	RawStmt*query = NULL;
+	Node*whereClause = NULL;
 
 	/*
 	 * Disallow COPY to/from file or program except to users with the
@@ -854,6 +861,25 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 			NULL, false, false);
 		rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT);
 
+		if (stmt->whereClause)
+		{
+			/* add rte to column namespace  */
+			addRTEtoQuery(pstate, rte, false, true, true);
+
+			/* Transform the raw expression tree */
+			whereClause = transformExpr(pstate, stmt->whereClause, EXPR_KIND_COPY_WHERE);
+
+			/*  Make sure it yields a boolean result. */
+			whereClause = coerce_to_boolean(pstate, whereClause, "WHERE");
+			/* we have to fix its collations too */
+			assign_expr_collations(pstate, whereClause);
+
+			whereClause = eval_const_expressions(NULL, whereClause);
+
+			whereClause = (Node *) canonicalize_qual((Expr *) whereClause, false);
+			whereClause = (Node *) make_ands_implicit((Expr *) whereClause);
+		}
+
 		tupDesc = RelationGetDescr(rel);
 		attnums = CopyGetAttnums(tupDesc, rel, stmt->attlist);
 		foreach(cur, attnums)
@@ -1002,6 +1028,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 			   NULL, stmt->attlist, stmt->options);
+		cstate->whereClause = whereClause;
 		*processed = CopyFrom(cstate);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
@@ -2531,6 +2558,10 @@ CopyFrom(CopyState cstate)
 	if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		proute = ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
+	if (cstate->whereClause)
+		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
+&mtstate->ps);
+
 	/*
 	 * It's more efficient to prepare a bunch of tuples for insertion, and
 	 * insert them in one heap_multi_insert() call, than call heap_insert()
@@ -2576,6 +2607,16 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->whereClause != NULL ||
+			 contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile funcation
+		 * expressions in WHERE clause.  Similarly to the trigger case a

Re: Upgrading pg_statistic to handle collation honestly

2018-12-12 Thread Peter Eisentraut
On 12/12/2018 16:57, Tom Lane wrote:
> Attached is a draft patch for same.  It adds storage to pg_statistic
> to record the collation of each statistics "slot".  A plausible
> alternative design would be to just say "look at the collation of the
> underlying column", but that would require extra catcache lookups in
> the selectivity functions that need the info.

That looks like a good approach to me.

> Doing it like this also
> makes it theoretically feasible to track stats computed with respect
> to different collations for the same column, though I'm not really
> convinced that we'd ever do that.

It's a good option to keep around.  Maybe someday extended statistics
could be used to ask for additional statistics to be collected.

> * Probably this conflicts to some extent with Peter's "Reorganize
> collation lookup" patch, but I haven't studied that yet.

I've looked it over, and it's nothing that can't be easily fixed up.  In
fact, it simplifies a few things, so I'm in favor of moving your patch
along first.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services