Re: [HACKERS] proposal: schema variables

2019-08-10 Thread Pavel Stehule
Hi

just rebase

Regards

Pavel


schema-variables-rebase-20190810.patch.gz
Description: application/gzip


Re: [survey] New "Stable" QueryId based on normalized query text

2019-08-10 Thread Julien Rouhaud
On Sat, Aug 10, 2019 at 3:27 AM Jim Finnerty  wrote:
>
> I missed this thread.  I'd be happy to post the code for what we use as the
> stable query identifier, but we could definitely come up with a more
> efficient algorithm if we're willing to assume that the sql statements are
> the same if and only if the parse tree structure is the same.
>
> Currently what we do for the sql hash is to simply replace all the literals
> and then hash the resulting SQL string

Isn't that what pg_store_plan is already doing?  Except that it
removes extraneous whitespaces and put identifiers in uppercase so
that you get a reasonable query identifier.

> you could define a stable identifier for each node type, ignore literal
> constants, and hash fully-qualified object names instead of OIDs.  That
> should be pretty fast.

This has been discussed already, and resolving all object names and
qualifier names will add a dramatic overhead for many workloads.




Re: Shrinking tuplesort.c's SortTuple struct (Was: More ideas for speeding up sorting)

2019-08-10 Thread Heikki Linnakangas

On 10/08/2019 02:14, Peter Geoghegan wrote:

The easy part was removing SortTuple.tupindex itself -- it was fairly
natural to stash that in the slab allocation for each tape. I used the
aset.c trick of having a metadata "chunk" immediately prior to address
that represents the allocation proper -- we can just back up by a few
bytes from stup.tuple to find the place to stash the tape number
during merging. The worst thing about this change was that it makes a
tape slab allocation mandatory in cases that previously didn't have
any need for a stup.tuple allocation (e.g. datum tuplesorts of
pass-by-value types), though only during merging. Since we must always
store the tapenum when merging, we always need a slab buffer for each
tape when merging. This aspect wasn't so bad.


Hmm. Wouldn't it be more straightforward to have the extra tupindex 
field at the end of the struct? Something like:


typedef struct
{
void   *tuple;  /* the tuple itself */
Datum   datum1; /* value of first key column */
boolisnull1;/* is first key column NULL? */
} SortTuple;

typedef struct
{
SortTuple stuple;
int tupindex;   /* see notes above */
} MergeTuple;

The initial sorting phase would deal with SortTuples, and the merge 
phase would deal with MergeTuples. The same comparison routines work 
with both.



The hard/ugly part was getting rid of the remaining "unnecessary"
SortTuple field, isnull1. This involved squeezing an extra bit out of
the stup.tuple pointer, by stealing the least-significant bit. This
was invasive in about the way you'd expect it to be. It wasn't awful,
but it also wasn't something I'd countenance pursuing without getting
a fairly noticeable benefit for users. (Actually, the code that I
wrote so far *is* pretty awful, but I could certainly clean it up some
more if I felt like it.)

I think that the rough patch that I came up with gives us an accurate
picture of what the benefits of having SortTuples that are only 16
bytes wide are. The benefits seem kind of underwhelming at this point.
For something like a "SELECT COUNT(distinct my_int4_col) FROM tab"
query, which uses the qsort_ssup() qsort specialization, we can easily
go from getting an external sort to getting an internal sort. We can
maybe end up sorting about 20% faster if things really work out for
the patch.


If you separate the NULLs from non-NULLs in a separate array, as was 
discussed back in 2016, instead of stealing a bit, you can squeeze some 
instructions out of the comparison routines, which might give some extra 
speedup.



But in cases that users really care about, such as REINDEX,
the difference is in the noise. ISTM that this is simple not worth the
trouble at this time. These days, external sorts are often slightly
faster than internal sorts in practice, due to the fact that we can do
an on-the-fly merge with external sorts, so we could easily hurt
performance by making more memory available!


Yeah, that's a bit sad.

That makes me think: even when everything fits in memory, it might make 
sense to divide the input into a few batches, qsort them individually, 
and do an on-the-fly merge of the batches. I guess I'm essentially 
suggesting that we should use merge instead of quicksort for the 
in-memory case, too.


If we had the concept of in-memory batches, you could merge together 
in-memory and external batches. That might be handy. For example, when 
doing an external sort, instead of flushing the last run to disk before 
you start merging, you could keep it in memory. That might be 
significant in the cases where the input is only slightly too big to fit 
in memory.


- Heikki




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-10 Thread Bruce Momjian
On Fri, Aug  9, 2019 at 10:54:51PM -0400, Bruce Momjian wrote:
> On Thu, Aug  8, 2019 at 10:17:53PM -0400, Sehrope Sarkuni wrote:
> > On Thu, Aug 8, 2019 at 2:16 PM Bruce Momjian  wrote:
> > 
> > On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> > > Simplest approach for derived keys would be to use immutable 
> > attributes
> > of the
> > > WAL files as an input to the key derivation. Something like HKDF(MDEK,
> > "WAL:" |
> > 
> > So, I am thinking we should use "WAL:" for WAL and "REL:" for heap/index
> > files.
> > 
> > 
> > Sounds good. Any unique convention is fine. Main thing to keep in mind is 
> > that
> > they're directly tied to the master key so it's not possible to rotate them
> > without changing the master key.
> 
> A recent email talked about using two different encryption keys for
> heap/index and WAL, which allows for future features, and allows for key
> rotation of the two independently.  (I already stated how hard key
> rotation would be with WAL and pg_rewind.)

So, I just had an indea if we use separate encryption keys for
heap/index and for WAL --- we already know we will have an offline tool
that can rotate the passphrase or encryption keys.  If we allow the
encryption keys to be rotated independently, we can create a standby,
and immediately rotate its heap/index encryption key.  We can then start
streaming replication.  When we promote the standby to primary, we can
then shut it down and rotate the WAL encryption key --- the new primary
would then have no shared keys with the old primary.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Add "password_protocol" connection parameter to libpq

2019-08-10 Thread Jonathan S. Katz
On 8/9/19 7:54 PM, Jeff Davis wrote:
> On Sat, 2019-08-10 at 00:17 +0300, Heikki Linnakangas wrote:
>> This is a multi-dimensional problem. "channel_binding=require" is
>> one 
>> way to prevent MITM attacks, but sslmode=verify-ca is another. (Does 
>> Kerberos also prevent MITM?) Or you might want to enable plaintext 
>> passwords over SSL, but not without SSL.
>>
>> I think we'll need something like the 'ssl_ciphers' GUC, where you
>> can 
>> choose from a few reasonable default rules, but also enable/disable 
>> specific methods:
> 
> ..
> 
>> auth_methods = 'MITM, -password, -md5'
> 
> Keep in mind this is client configuration, so something reasonable in
> postgresql.conf might not be so reasonable in the form:
> 
> postgresql://foo:secret@myhost/mydb?auth_methods=MITM%2C%20-
> password%2C%20-md5

Yeah, and I do agree it is a multi-dimensional problem, but the context
in which I gave my opinion was for the password authentication methods
that PostgreSQL supports natively, i.e. not requiring a 3rd party to
arbitrate via GSSAPI, LDAP etc.

That said, I dove into the code a bit more to look at the behavior
specifically with LDAP, which as described does send back a request for
"AuthenticationCleartextPassword"

If we go with the client sending up a "password_protocol" that is not
plaintext, and the server only provides LDAP authentication, does the
client close the connection? I would say yes.

(And as such, I would also consider adding "plaintext" back to the list,
just to have the explicit option).

The other question I have is that do we have it occur in the
hierarchical manner, i.e. "md5 or better?" I would also say yes to that,
we would just need to clearly document that. Perhaps we adopt a similar
name to "sslmode" e.g. "password_protocol_mode" but that can be debated :)

> Another thing to consider is that there's less control configuring on
> the client than on the server. The server will send at most one
> authentication request based on its own rules, and all the client can
> do is either answer it, or disconnect. And the SSL stuff all happens
> before that, and won't use an authentication request message at all.

Yes. Using the LDAP example above, the client also needs some general
awareness of how it can connect to the server, e.g. "You may want
scram-sha-256 but authentication occurs over LDAP, so don't stop
requesting scram-sha-256!" That said, part of that is a human problem:
it's up to the server administrator to inform clients how they can
connect to PostgreSQL.

> Some protocols allow negotiation within them, like SASL, which gives
> the client a bit more freedom. But FE/BE doesn't allow for arbitrary
> subsets of authentication methods to be negoitated between client and
> server, so I'm worried trying to express it that way will just lead to
> clients that break when you upgrade your server.

Agreed. I see this as a way of a client saying "Hey, I really want to
authenticate with scram-sha-256 or better, so if you don't let me do
that, I'm out." In addition to ensuring it uses the client's desired
password protocol, this could be helpful for testing that the
appropriate authentication rules are set in a server, e.g. one that is
rolling out SCRAM authentication.

And as Heikki mentions, there are other protections a client can use,
e.g. verify-ca/full, to guard against eavesdropping, MITM etc.

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-10 Thread Bruce Momjian
On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> On Mon, Aug 5, 2019 at 9:02 PM Bruce Momjian  wrote:
> I was thinking the WAL would use the same key since the nonce is unique
> between the two.  What value is there in using a different key?

> Never having to worry about overlap in Key + IV usage is main advantage. While
> it's possible to structure IVs to avoid that from happening, it's much easier
> to completely avoid that situation by ensuring different parts of an
> application are using separate derived keys.

Now that we are considering a different encryption key for heap/index
files and WAL, so there is no chance of overlap, it seems we can go back
to using a non-zero IV rather than derived keys.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Shrinking tuplesort.c's SortTuple struct (Was: More ideas for speeding up sorting)

2019-08-10 Thread Peter Geoghegan
On Sat, Aug 10, 2019 at 1:20 AM Heikki Linnakangas  wrote:
> Hmm. Wouldn't it be more straightforward to have the extra tupindex
> field at the end of the struct?

> The initial sorting phase would deal with SortTuples, and the merge
> phase would deal with MergeTuples. The same comparison routines work
> with both.

Maybe, but then you would have to use MergeTuples in the
tuplesort_heap* routines, which are not just used when merging
external sort runs. You'd probably incur a penalty for top-N heap
sorts too. Now, that could still be worth it, but it's something to
consider.

> If you separate the NULLs from non-NULLs in a separate array, as was
> discussed back in 2016, instead of stealing a bit, you can squeeze some
> instructions out of the comparison routines, which might give some extra
> speedup.

That might work well, but partitioning the memtuples array isn't
trivial. Routines like grow_memtuples() still need to work, and that
seems like it would be tricky. So again, this may well be a better way
to do it, but that isn't obvious.

> > But in cases that users really care about, such as REINDEX,
> > the difference is in the noise. ISTM that this is simple not worth the
> > trouble at this time. These days, external sorts are often slightly
> > faster than internal sorts in practice, due to the fact that we can do
> > an on-the-fly merge with external sorts, so we could easily hurt
> > performance by making more memory available!
>
> Yeah, that's a bit sad.

I think that this is likely to be the problem with any combination of
enhancements that remove fields from the SortTuple struct, to get it
down to 16 bytes: Making  SortTuples only 16 bytes just isn't that
compelling.

> That makes me think: even when everything fits in memory, it might make
> sense to divide the input into a few batches, qsort them individually,
> and do an on-the-fly merge of the batches. I guess I'm essentially
> suggesting that we should use merge instead of quicksort for the
> in-memory case, too.

That might make sense. The Alphasort paper [1] recommends using
quicksort on CPU-cached sized chunks, and merging the chunks together
as they're written out as a single on-disk run. The Alphasort paper is
probably the first place where the abbreviated keys technique is
described, and had a lot of good ideas.

> If we had the concept of in-memory batches, you could merge together
> in-memory and external batches. That might be handy. For example, when
> doing an external sort, instead of flushing the last run to disk before
> you start merging, you could keep it in memory. That might be
> significant in the cases where the input is only slightly too big to fit
> in memory.

The patch that I wrote to make tuplesort.c use quicksort in preference
to replacement selection sort for generating initial runs starting out
with an implementation of something that I called "quicksort with
spillover". The idea was that you could only spill a few extra tuples
to disk when you almost had enough workMem, and then merge the on-disk
run with the larger, quicksorted in memory run. It worked alright, but
it felt more important to make external sorts use quicksort in
general. Robert Haas really hated it at the time, because it relied on
various magic numbers, based on heuristics.

The easiest and least controversial way to make internal sorting
faster may be to update our Quicksort algorithm to use the same
implementation that was added to Java 7 [2]. It uses all of the same
tricks as our existing the Bentley & McIlroy implementation, but is
more cache efficient. It's considered the successor to B&M, and had
input from Bentley himself. It is provably faster than B&M for a wide
variety of inputs, at least on modern hardware.

[1] http://www.vldb.org/journal/VLDBJ4/P603.pdf
[2] https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf
-- 
Peter Geoghegan




Re: [survey] New "Stable" QueryId based on normalized query text

2019-08-10 Thread legrand legrand
Hi Jim,

Its never too later, as nothing has been concluded about that survey ;o)

For information, I thought It would be possible to get a more stable
QueryId,
by hashing relation name or fully qualified names.

With the support of Julien Rouhaud, I tested with this kind of code:

case RTE_RELATION:
if (pgss_queryid_oid)
{
APP_JUMB(rte->relid);
}
else
{
rel = RelationIdGetRelation(rte->relid);

APP_JUMB_STRING(RelationGetRelationName(rel));

APP_JUMB_STRING(get_namespace_name(get_rel_namespace(rte->relid)));
RelationClose(rel);
{

thinking that 3 hash options would be interesting in pgss:
1- actual OID
2- relation names only (for databases WITHOUT distinct schemas contaning
same tables)
3- fully qualified names schema.relname (for databases WITH distinct schemas
contaning same tables)

but performances where quite bad (it was a few month ago, but I remenber
about a 1-5% decrease).
I also remenber that's this was not portable between distinct pg versions
11/12
and also not sure it was stable between windows / linux ports ...

So I stopped here ... Maybe its time to test deeper this alternative 
(to get fully qualified names hashes in One call) knowing that such
transformations 
will have to be done for all objects types (not only relations) ?

I'm ready to continue testing as it seems the less impacting solution to
keep actual pgss ...

If this doesn't work, then trying with a normalized query text (associated
with search_path) would be the 
other alternative, but impacts on actual pgss would be higher ... 

Regards
PAscal





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Global temporary tables

2019-08-10 Thread Konstantin Knizhnik



On 10.08.2019 5:12, Craig Ringer wrote:
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> wrote:




Ok, here it is: global_private_temp-1.patch


Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on 
physical read replicas, though I know there are plenty of other 
barriers there (not least of which being temp tables using persistent 
txns not vtxids)


Does it have a CF entry?


https://commitfest.postgresql.org/24/2233/


Also I have attached updated version of the global temp tables
with shared buffers - global_shared_temp-1.patch


Nice to see that split out. In addition to giving the first patch more 
hope of being committed this time around, it'll help with readability 
and testability too.


To be clear, I have long wanted to see PostgreSQL have the "session" 
state abstraction you have implemented. I think it's really important 
for high client count OLTP workloads, working with the endless 
collection of ORMs out there, etc. So I'm all in favour of it in 
principle so long as it can be made to work reliably with limited 
performance impact on existing workloads and without making life lots 
harder when adding new core functionality, for extension authors etc. 
The same goes for built-in pooling. I think PostgreSQL has needed some 
sort of separation of "connection", "backend", "session" and 
"executor" for a long time and I'm glad to see you working on it.


With that said: How do you intend to address the likelihood that this 
will cause performance regressions for existing workloads that use 
temp tables *without* relying on your session state and connection 
pooler? Consider workloads that use temp tables for mid-long txns 
where txn pooling is unimportant, where they also do plenty of read 
and write activity on persistent tables. Classic OLAP/DW stuff. e.g.:


* four clients, four backends, four connections, session-level 
connections that stay busy with minimal client sleeps

* All sessions run the same bench code
* transactions all read plenty of data from a medium to large 
persistent table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed 
window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate 
data for its medium-length transactions
* transactions also write to some persistent relations, say to record 
their summarised results


How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of 
temp tables
* the temp table data in shared_buffers may put pressure on 
shared_buffers space, cached pages for persistent tables all sessions 
are sharing;
* the temp table data in shared_buffers may put pressure on 
shared_buffers space for dirty buffers, forcing writes of persistent 
tables out earlier therefore reducing write-combining opportunities;


I agree that access to local buffers is cheaper than to shared buffers 
because there is no lock overhead.
And the fact that access to local tables can not affect cached data of 
persistent tables is also important.
But most of Postgres tables are still normal (persistent) tables access 
through shared buffers.
And huge amount of efforts were made to make this access as efficient as 
possible (use clock algorithm which doesn't require global lock,
atomic operations,...). Also using the same replacement discipline for 
all tables at some workloads may be also preferable.
So it is not so obvious to me that in the described scenario local 
buffer cache for temporary table really will provide significant advantages.

It will be interesting to perform some benchmarking - I am going to do it.

What I have observed right now is that in type scenario: dumping results 
of huge query to temporary table with subsequent traverse of this table
old (local) temporary tables provide better performance (may be because 
of small size of local buffer cache and different eviction policy).
But subsequent accesses to global shared table are faster (because it 
completely fits in large shared buffer cache).


There is one more problem with global temporary tables for which I do 
not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need 
different query plans.
Right now if you perform "analyze table" in one backend, then it will 
affect plans in all backends.
It can be considered not as bug, but as feature if we assume that 
distribution if data in all backens is similar.

But if this assumption is not true, then it can be a problem.