Re: check error messages in SSL tests

2018-02-24 Thread Michael Paquier
On Fri, Feb 23, 2018 at 01:57:44PM -0500, Peter Eisentraut wrote:
> Oh.  I actually had that file as 0600 in my checked-out tree, probably
> from earlier experiments.  Fixed that.  And I also changed it to make
> another temp file that is explicitly 0644, because we can't rely on that
> being the default either.

Thanks for fixing the first one.  I also like the second change.  This
patch looks fine to me.
--
Michael


signature.asc
Description: PGP signature


Re: Translations contributions urgently needed

2018-02-24 Thread Robert Haas
On Fri, Feb 23, 2018 at 9:15 PM, Alvaro Herrera
 wrote:
> Clearly, for a few languages the current facilities are sufficient.
> If we make the process easier, we're likely to have more translations.
> Will there be people verifying the quality also?

It seems to me that making the process easier has got to be a net
positive.  It's possible that some language which otherwise would have
had no translation at all will end up with a bad one, but it's equally
possible that some language which would otherwise have had a bad one
will end up with a good one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Aggregation push-down

2018-02-24 Thread Robert Haas
On Fri, Feb 23, 2018 at 11:08 AM, Antonin Houska  wrote:
> I spent some more time thinking about this. What about adding a new strategy
> number for hash index operator classes, e.g. HTBinaryEqualStrategyNumber? For
> most types both HTEqualStrategyNumber and HTBinaryEqualStrategyNumber strategy
> would point to the same operator, but types like numeric would naturally have
> them different.
>
> Thus the pushed-down partial aggregation can only use the
> HTBinaryEqualStrategyNumber's operator to compare grouping expressions. In the
> initial version (until we have useful statistics for the binary values) we can
> avoid the aggregation push-down if the grouping expression output type has the
> two strategies implemented using different functions because, as you noted
> upthread, grouping based on binary equality can result in excessive number of
> groups.
>
> One open question is whether the binary equality operator needs a separate
> operator class or not. If an opclass cares only about the binary equality, its
> hash function(s) can be a lot simpler.

Hmm.  How about instead adding another regproc field to pg_type which
stores the OID of a function that tests binary equality for that
datatype?  If that happens to be equal to the OID you got from the
opclass, then you're all set.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: check error messages in SSL tests

2018-02-24 Thread Peter Eisentraut
On 2/24/18 07:37, Michael Paquier wrote:
> On Fri, Feb 23, 2018 at 01:57:44PM -0500, Peter Eisentraut wrote:
>> Oh.  I actually had that file as 0600 in my checked-out tree, probably
>> from earlier experiments.  Fixed that.  And I also changed it to make
>> another temp file that is explicitly 0644, because we can't rely on that
>> being the default either.
> 
> Thanks for fixing the first one.  I also like the second change.  This
> patch looks fine to me.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-24 Thread Peter Eisentraut
On 2/17/18 08:48, Michael Paquier wrote:
> Attached is what I have finished with.  I have gathered the feedback
> from everybody on this thread and I think that the result can satisfy
> all the requirements mentioned:
> - 0001 is a small patch which makes the SSL and LDAP test suite fail
> immediately if the build's ./configure is not set up with necessary
> build options.  This uses TestLib::check_pg_config to do the checks.

I'm not sure why we need that.  The tests will presumably fail anyway.

> - 0002 introduces a new environment variable which can be used to decide
> if an extra test suite can be used or not.  I have named it
> PROVE_EXTRA_ALLOWED, and can be used as such:
> make -C src/test check PROVE_EXTRA_ALLOWED=3D'ssl ldap'
> This includes also some documentation.  Note that with default settings
> the tests are skipped to keep things secure.

I think sticking this into the Perl code is too complicated and
inflexible.  We might want to use the same mechanism for non-TAP tests
as well.

What I had in mind would consist of something like this in
src/test/Makefile:

ifeq ($(with_ldap),yes)
ifneq (,$(filter ldap,$(YOUR_VARIABLE_HERE)))
SUBDIRS += ldap
endif
endif

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online enabling of checksums

2018-02-24 Thread Michael Banck
Hi,

On Wed, Feb 21, 2018 at 09:53:31PM +0100, Magnus Hagander wrote:
> We’ve also included a small commandline tool, bin/pg_verify_checksums,
> that can be run against an offline cluster to validate all checksums.

The way it is coded in the patch will make pg_verify_checksums fail for
heap files with multiple segments, i.e. tables over 1 GB, becuase the
block number is consecutive and you start over from 0:

$ pgbench -i -s 80 -h /tmp
[...]
$ pg_verify_checksums -D data1
pg_verify_checksums: data1/base/12364/16396.1, block 0, invalid checksum
in file 6D61, calculated 6D5F
pg_verify_checksums: data1/base/12364/16396.1, block 1, invalid checksum
in file 7BE5, calculated 7BE7
[...]
Checksum scan completed
Data checksum version: 1
Files scanned:  943
Blocks scanned: 155925
Bad checksums:  76
 

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Re: prokind column (was Re: [HACKERS] SQL procedures)

2018-02-24 Thread Tom Lane
Peter Eisentraut  writes:
> Here is this patch updated.  The client changes are now complete and all
> the tests pass.  I have also rolled back the places where the code used
> prorettype to detect procedures and replaced this by the new facility.

I took a quick look through this and noted a few small problems; the
only one worth mentioning here is that you've broken psql tab completion
for functions and aggregates when talking to a pre-v11 server.
I don't think that's acceptable; however, since tab-complete.c has no
existing provisions for adjusting its queries based on server version,
it's not really clear what to do to fix it.  I'm sure that's soluble
(and I think I recall a recent thread bumping up against the same
problem for another change), but it needs a bit more sweat.

We need a plan for when/how to apply this, along with the proposed
bootstrap data conversion patch, which obviously conflicts with it
significantly.  Looking through the stuff pending in next month's
commitfest, we are fortunate in that only a few of those patches
seem to need to touch pg_proc.h at all, and none have really large
deltas AFAICS (I might've missed something though).  I propose
proceeding as follows:

1. Get this patch in as soon as we can resolve its remaining weak
spots.  That will force rebasing of pending patches that touch
pg_proc.h, but I don't think it'll be painful, since the needed
changes are pretty small and obvious.

2. During the March commitfest, adjust the bootstrap data conversion
patch to handle this change, and review it generally.

3. At the end of the 'fest, or whenever we've dealt with all other
patches that need to touch the bootstrap source data, apply the
data conversion patch.

My thought here is that the data conversion patch is going to break
basically every pending patch that touches src/include/catalog/,
so we ought to apply it at a point where that list of patches is short
and there's lots of time for people to redo them.  Hence, end of the
dev cycle is the right time.

regards, tom lane



Query pattern tha Postgres doesn't handle well

2018-02-24 Thread Greg Stark
At my day job I've been doing a fair amount of routine query and
schema optimization and I've noticed on particular query shape that
has repeatedly caused problems, and it's one we've talked about
before.

select * from table where simple-restriction 0 OR (complex-subquery)

For example something like:

SELECT * FROM projects WHERE ispublic OR project_id IN (SELECT
project_id FROM project_members WHERE userid = ?)

Either half of this clause can easily be executed using indexes but
the combination forces Postgres to do a full sequential table scan.

The solution has been to rewrite each of these cases into:

SELECT * FROM projects WHERE ispublic
UNION ALL
SELECT * FROM projects WHERE NOT ispublic AND project_id IN (SELECT
project_id FROM project_members WHERE userid = ?)

But there are several problems with this.

1) It's often difficult to get the conditions to be exactly disjoint
such that you can use UNION ALL, and if you can't then UNION can be
slow and possibly even incorrect.

2) The resulting query is difficult to combine with other clauses.
They must either be copied into both sides of the UNION or wrapped
around the outside and hope that Postgres pushes them down into the
union branches where necessary. More complex conditions can't be
pushed down, and in particular combining two conditions that have been
rewritten to use union is very difficult.

3) It's extremely difficult to generate this kind of query in an ORM
such as ActiveRecord without breaking the abstractions and causing
surprising interactions.

I'm not sure I have much to offer here. I definitely don't know where
to start implementing it. But I know it's come up before on the list
and just thought I would mention that I've noticed it being a
recurring problem for at least this user.

-- 
greg



Re: Query pattern tha Postgres doesn't handle well

2018-02-24 Thread Tom Lane
Greg Stark  writes:
> At my day job I've been doing a fair amount of routine query and
> schema optimization and I've noticed on particular query shape that
> has repeatedly caused problems, and it's one we've talked about
> before.

> select * from table where simple-restriction 0 OR (complex-subquery)

> For example something like:

> SELECT * FROM projects WHERE ispublic OR project_id IN (SELECT
> project_id FROM project_members WHERE userid = ?)

> Either half of this clause can easily be executed using indexes but
> the combination forces Postgres to do a full sequential table scan.

Yeah.  This is at least related to, if not the exact same as,
what I was fooling with a year ago:

https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873...@bluetreble.com

The single-relation-scan case is possibly a bit easier to deal with
than what we were looking at there, in that it's clear that you
can use the rel's CTID to de-duplicate, and that that will give
the right answer.

regards, tom lane



Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2018-02-24 Thread Peter Geoghegan
On Fri, Feb 2, 2018 at 4:53 PM, Tom Lane  wrote:
> Tomas Vondra  writes:
>> BTW wouldn't it be possible to derive "traditional" proof in relational
>> algebra, similarly to other transforms?
>
> Perhaps.  The patch depends on CTID, but you could probably model that
> as a primary key in a proof.

I'll remind you that commit b648b703 made the TID sort performed by
CREATE INDEX CONCURRENTLY over 3 times faster in cases where the sort
completes in memory. The simplest way to get a significant portion of
that benefit for your approach with sort+unique on CTID would be to
add sortsupport with abbreviated keys to the TID btree opclass.

-- 
Peter Geoghegan



Re: Online enabling of checksums

2018-02-24 Thread Magnus Hagander
On Sat, Feb 24, 2018 at 1:34 AM, Robert Haas  wrote:

> On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander 
> wrote:
> > I would prefer that yes. But having to re-read 9TB is still significantly
> > better than not being able to turn on checksums at all (state today). And
> > adding a catalog column for it will carry the cost of the migration
> > *forever*, both for clusters that never have checksums and those that
> had it
> > from the beginning.
> >
> > Accepting that the process will start over (but only read, not re-write,
> the
> > blocks that have already been processed) in case of a crash does
> > significantly simplify the process, and reduce the long-term cost of it
> in
> > the form of entries in the catalogs. Since this is a on-time operation
> (or
> > for many people, a zero-time operation), paying that cost that one time
> is
> > probably better than paying a much smaller cost but constantly.
>
> That's not totally illogical, but to be honest I'm kinda surprised
> that you're approaching it that way.  I would have thought that
> relchecksums and datchecksums columns would have been a sort of
> automatic design choice for this feature.  The thing to keep in mind
> is that nobody's going to notice the overhead of adding those columns
> in practice, but someone will surely notice the pain that comes from
> having to restart the whole operation.  You're talking about trading
> an effectively invisible overhead for a very noticeable operational
> problem.


Is it really that invisible? Given how much we argue over adding single
counters to the stats system, I'm not sure it's quite that low.

We did consider doing it at a per-table basis as well. But this is also an
overhead that has to be paid forever, whereas the risk of having to read
the database files more than once (because it'd only have to read them on
the second pass, not write anything) is a one-off operation. And for all
those that have initialized with checksums in the first place don't have to
pay any overhead at all in the current design.

I very strongly doubg it's a "very noticeable operational problem". People
don't restart their databases very often... Let's say it takes 2-3 weeks to
complete a run in a fairly large database. How many such large databases
actually restart that frequently? I'm not sure I know of any. And the only
effect of it is you have to start the process over (but read-only for the
part you have already done). It's certainly not ideal, but I don't agree
it's in any form a "very noticeable problem".

The other point to it is that this keeps the code a lot simpler. That is
both good for having a chance at all to finish it and get it into 11 (and
it can then be improved upon to add for example incremental support in 12,
or something like that). And of course, simple code means less overhead in
the form of maintenance and effects on other parts of the system down the
road.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Online enabling of checksums

2018-02-24 Thread Magnus Hagander
On Sat, Feb 24, 2018 at 4:29 AM, Tomas Vondra 
wrote:

> Hi,
>
> I see the patch also does throttling by calling vacuum_delay_point().
> Being able to throttle the checksum workers not to affect user activity
> definitely seems like a useful feature, no complaints here.
>
> But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
> best idea? I mean, enabling checksums seems rather unrelated to vacuum,
> so it seems a bit strange to configure it by vacuum_* options.
>
> Also, how am I supposed to set the cost limit? Perhaps I'm missing
> something, but the vacuum_delay_point call happens in the bgworker, so
> setting the cost limit before running pg_enable_data_checksums() will
> get there, right? I really don't want to be setting it in the config
> file, because then it will suddenly affect all user VACUUM commands.
>
> And if this patch gets improved to use multiple parallel workers, we'll
> need a global limit (something like we have for autovacuum workers).
>
> In any case, I suggest mentioning this in the docs.
>
>
Ah yes. I actually have it on my TODO to work on that, but I forgot to put
that in the email I sent out. Apologies for that, and thanks for pointing
it out!

Right now you have to set the limit in the configuration file. That's of
course not the way we want to have it long term (but as long as it is that
way it should at least be documented). My plan is to either pick it up from
the current session that calls pg_enable_data_checksums(), or to simply
pass it down as parameters to the function directly.  I'm thinking the
second one (pass a cost_delay and a cost_limit as optional parameters to
the function) is the best one because as you say actually overloading it on
the user visible GUCs seems a bit ugly. Once there I think the easiest is
to just pass it down to the workers through the shared memory segment.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Online enabling of checksums

2018-02-24 Thread Andres Freund
On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:
> Is it really that invisible? Given how much we argue over adding single
> counters to the stats system, I'm not sure it's quite that low.

That's appears to be entirely unrelated. The stats stuff is expensive
because we currently have to essentialy write out the stats for *all*
tables in a database, once a counter is updated. And those counters are
obviously constantly updated. Thus the overhead of adding one column is
essentially multiplied by the number of tables in the system. Whereas
here it's a single column that can be updated on a per-row basis, which
is barely ever going to be written to.

Am I missing something?


> We did consider doing it at a per-table basis as well. But this is also an
> overhead that has to be paid forever, whereas the risk of having to read
> the database files more than once (because it'd only have to read them on
> the second pass, not write anything) is a one-off operation. And for all
> those that have initialized with checksums in the first place don't have to
> pay any overhead at all in the current design.

Why does it have to be paid forever?


> I very strongly doubg it's a "very noticeable operational problem". People
> don't restart their databases very often... Let's say it takes 2-3 weeks to
> complete a run in a fairly large database. How many such large databases
> actually restart that frequently? I'm not sure I know of any. And the only
> effect of it is you have to start the process over (but read-only for the
> part you have already done). It's certainly not ideal, but I don't agree
> it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

Greetings,

Andres Freund



Re: Online enabling of checksums

2018-02-24 Thread Magnus Hagander
On Sat, Feb 24, 2018 at 6:14 PM, Michael Banck 
wrote:

> Hi,
>
> On Wed, Feb 21, 2018 at 09:53:31PM +0100, Magnus Hagander wrote:
> > We’ve also included a small commandline tool, bin/pg_verify_checksums,
> > that can be run against an offline cluster to validate all checksums.
>
> The way it is coded in the patch will make pg_verify_checksums fail for
> heap files with multiple segments, i.e. tables over 1 GB, becuase the
> block number is consecutive and you start over from 0:
>
> $ pgbench -i -s 80 -h /tmp
> [...]
> $ pg_verify_checksums -D data1
> pg_verify_checksums: data1/base/12364/16396.1, block 0, invalid checksum
> in file 6D61, calculated 6D5F
> pg_verify_checksums: data1/base/12364/16396.1, block 1, invalid checksum
> in file 7BE5, calculated 7BE7
> [...]
> Checksum scan completed
> Data checksum version: 1
> Files scanned:  943
> Blocks scanned: 155925
> Bad checksums:  76
>

Yikes. I could've sworn I tested that, but it's pretty obvious I didn't, at
least not in this version. Thanks for the note, will fix and post a new
version!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Online enabling of checksums

2018-02-24 Thread Magnus Hagander
On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund  wrote:

> On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:
> > Is it really that invisible? Given how much we argue over adding single
> > counters to the stats system, I'm not sure it's quite that low.
>
> That's appears to be entirely unrelated. The stats stuff is expensive
> because we currently have to essentialy write out the stats for *all*
> tables in a database, once a counter is updated. And those counters are
> obviously constantly updated. Thus the overhead of adding one column is
> essentially multiplied by the number of tables in the system. Whereas
> here it's a single column that can be updated on a per-row basis, which
> is barely ever going to be written to.
>
> Am I missing something?
>

It's probably at least partially unrelated, you are right. I may have
misread our reluctance to add more values there as a general reluctancy to
add more values to central columns.



> > We did consider doing it at a per-table basis as well. But this is also
> an
> > overhead that has to be paid forever, whereas the risk of having to read
> > the database files more than once (because it'd only have to read them on
> > the second pass, not write anything) is a one-off operation. And for all
> > those that have initialized with checksums in the first place don't have
> to
> > pay any overhead at all in the current design.
>
> Why does it have to be paid forever?
>

The size of the pg_class row would be there forever. Granted, it's not that
big an overhead given that there are already plenty of columns there. But
the point being you can never remove that column, and it will be there for
users who never even considered running without checksums. It's certainly
not a large overhead, but it's also not zero.


> I very strongly doubg it's a "very noticeable operational problem". People
> > don't restart their databases very often... Let's say it takes 2-3 weeks
> to
> > complete a run in a fairly large database. How many such large databases
> > actually restart that frequently? I'm not sure I know of any. And the
> only
> > effect of it is you have to start the process over (but read-only for the
> > part you have already done). It's certainly not ideal, but I don't agree
> > it's in any form a "very noticeable problem".
>
> I definitely know large databases that fail over more frequently than
> that.
>

I would argue that they have bigger issues than enabling checksums... By
far.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-02-24 Thread Tomas Vondra
Hi,

Attached is an updated version of the patch, fixing some minor bitrot
and duplicate OIDs.

Sadly, this patch series does not seem to move forward very much, and
I'm not sure how to change that :-/ But I'd like to point out that while
those new statistics are still per-table, we might use them to improve
join estimates (which is kinda the elephant in the room, when it comes
to estimates) similarly to what eqjoinsel() does with per-column stats.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Keep-information-about-already-estimated-cl-20180224.patch.gz
Description: application/gzip


0002-multivariate-MCV-lists-20180224.patch.gz
Description: application/gzip


0003-multivariate-histograms-20180224.patch.gz
Description: application/gzip


Re: Online enabling of checksums

2018-02-24 Thread Andres Freund
Hi,

On 2018-02-24 22:56:57 +0100, Magnus Hagander wrote:
> On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund  wrote:
> > > We did consider doing it at a per-table basis as well. But this is also
> > an
> > > overhead that has to be paid forever, whereas the risk of having to read
> > > the database files more than once (because it'd only have to read them on
> > > the second pass, not write anything) is a one-off operation. And for all
> > > those that have initialized with checksums in the first place don't have
> > to
> > > pay any overhead at all in the current design.
> >
> > Why does it have to be paid forever?
> >
> 
> The size of the pg_class row would be there forever. Granted, it's not that
> big an overhead given that there are already plenty of columns there. But
> the point being you can never remove that column, and it will be there for
> users who never even considered running without checksums. It's certainly
> not a large overhead, but it's also not zero.

But it can be removed in the next major version, if we decide it's a
good idea? We're not bound on compatibility for catalog layout.

FWIW' there's some padding space available where we currently could
store two booleans without any space overhead. Also, If we decide that
the boolean columns (which don't matter much in comparison to the rest
of the data, particularly relname), we can compress them into a
bitfield.

I don't think this is a valid reason for not supporting
interrupability. You can make fair arguments about adding incremental
support incrementally and whatnot, but the catalog size argument doesn't
seem part of a valid argument.


> > I very strongly doubg it's a "very noticeable operational problem". People
> > > don't restart their databases very often... Let's say it takes 2-3 weeks
> > to
> > > complete a run in a fairly large database. How many such large databases
> > > actually restart that frequently? I'm not sure I know of any. And the
> > only
> > > effect of it is you have to start the process over (but read-only for the
> > > part you have already done). It's certainly not ideal, but I don't agree
> > > it's in any form a "very noticeable problem".
> >
> > I definitely know large databases that fail over more frequently than
> > that.
> >
> 
> I would argue that they have bigger issues than enabling checksums... By
> far.

In one case it's intentional, to make sure the overall system copes. Not
that insane.

Greetings,

Andres Freund



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-02-24 Thread Mark Dilger

> On Feb 24, 2018, at 2:01 PM, Tomas Vondra  
> wrote:

> Sadly, this patch series does not seem to move forward very much, and
> I'm not sure how to change that :-/

I'll take a look at the new patch set this evening.  I have been using your
previous version of these patches applied against postgres 10 sources
with good results. 

mark


Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-24 Thread Michael Paquier
On Sat, Feb 24, 2018 at 10:51:22AM -0500, Peter Eisentraut wrote:
> On 2/17/18 08:48, Michael Paquier wrote:
>> Attached is what I have finished with.  I have gathered the feedback
>> from everybody on this thread and I think that the result can satisfy
>> all the requirements mentioned:
>> - 0001 is a small patch which makes the SSL and LDAP test suite fail
>> immediately if the build's ./configure is not set up with necessary
>> build options.  This uses TestLib::check_pg_config to do the checks.
> 
> I'm not sure why we need that.  The tests will presumably fail anyway.

Sure.  But then I think that it would be nice to show up on screen the
reason why the test failed if possible.  As of now if SSL is missing the
whole run shows in red without providing much useful information.
Instead of 0001 as shaped previously, why not using as well diag to show
the failure on the screen?

For example the following block at the top of each test:
if (!check_pg_config("#define USE_OPENSSL 1"))
{
diag "SSL tests not supported without support in build";
die;
}

Would result in this output:
t/001_ssltests.pl .. # SSL tests not supported without support in build
# Looks like your test exited with 255 before it could output anything.
t/001_ssltests.pl .. Dubious, test returned 255 (wstat 65280, 0xff00)
Failed 40/40 subtests

I would find that more helful to make the difference between an
implementation failure and a test failing because of a missing
dependency.

> I think sticking this into the Perl code is too complicated and
> inflexible.  We might want to use the same mechanism for non-TAP tests
> as well.
> 
> What I had in mind would consist of something like this in
> src/test/Makefile:
> 
> ifeq ($(with_ldap),yes)
> ifneq (,$(filter ldap,$(YOUR_VARIABLE_HERE)))
> SUBDIRS += ldap
> endif
> endif

OK.  So let's call it PG_TEST_EXTRA_INSECURE or PG_TEST_EXTRA, which can
take 'ldap', 'ssl' or 'ldap,ssl' as valid values.  Seeing that
documented is really necessary in my opinion.  Any idea for a better
name?
--
Michael


signature.asc
Description: PGP signature


Re: Online enabling of checksums

2018-02-24 Thread Greg Stark
> The change of the checksum state is WAL logged with a new xlog record. All 
> the buffers written by the background worker are forcibly enabled full page 
> writes to make sure the checksum is fully updated on the standby even if no 
> actual contents of the buffer changed.

Hm. That doesn't sound necessary to me. If you generate a checkpoint
(or just wait until a new checkpoint has started) then go through and
do a normal xlog record for every page (any xlog record, a noop record
even) then the normal logic for full page writes ought to be
sufficient. If the noop record doesn't need a full page write it's
because someone else has already come in and done one and that one
will set the checksum. In fact if any page has an lsn > the checkpoint
start lsn for the checkpoint after the flag was flipped then you
wouldn't need to issue any record at all.



GSOC 2018 ideas

2018-02-24 Thread Charles Cui
Hi Aleksander,

   This is Yan from Columbia University. I saw PostgreSQL is selected in
GSOC 2018 and pretty interested in the ideas of thrift data types support
that proposed by you. So, I want to prepare for a proposal based on this
idea. Can I have more detailed information of what documents or code that I
need to understand? Also, if this idea is allocated to other student (or in
other worlds, you prefer some student to work on it), do let me know, so
that I can pick some other project in PostgreSQL. Any comments or
suggestions are welcomed!

Hope for your reply!


Thanks Charles!


Re: WIP: BRIN multi-range indexes

2018-02-24 Thread Tomas Vondra
Hi,

Attached is an updated patch series, fixing duplicate OIDs and removing
opclasses for reltime/abstime data types, as discussed.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz
Description: application/gzip


0002-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz
Description: application/gzip


0003-BRIN-bloom-indexes.patch.gz
Description: application/gzip


0004-BRIN-multi-range-minmax-indexes.patch.gz
Description: application/gzip


handling of heap rewrites in logical decoding

2018-02-24 Thread Peter Eisentraut
A heap rewrite during a DDL command publishes changes for relations
named like pg_temp_%u, which are not real tables, and this breaks
replication systems that are not aware of that.  We have a hack in the
pgoutput plugin to ignore those, but we knew that was a hack.  So here
is an attempt at a more proper solution: Store a relisrewrite column in
pg_class and filter on that.

I have put this into reorderbuffer.c, so that it affects all plugins.
An alternative would be to have each plugin handle it separately (the
way it is done now), but it's hard to see why a plugin would not want
this fixed behavior.

A more fancy solution would be to make this column an OID that links
back to the original table.  Then, a DDL-aware plugin could choose
replicate the rewrite rows.  However, at the moment no plugin works that
way, so this might be overdoing it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 7201fd3d66aa609893af5ad5eea9a3e5073bb123 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 24 Feb 2018 17:03:57 -0500
Subject: [PATCH] Handle heap rewrites even better in logical decoding

Logical decoding should not publish anything about tables created as
part of a heap rewrite during DDL.  Those tables don't exist externally,
so consumers of logical decoding cannot do anything sensible with that
information.  In ab28feae2bd3d4629bd73ae3548e671c57d785f0, we worked
around this for built-in logical replication, but that was hack.

This is a more proper fix: We mark such transient heaps using the new
field pg_class.reliswrite = true and ignore them in logical decoding
before they get to the output plugin.
---
 .../test_decoding/expected/concurrent_ddl_dml.out  | 82 --
 contrib/test_decoding/expected/ddl.out | 17 ++---
 .../test_decoding/specs/concurrent_ddl_dml.spec|  2 +-
 contrib/test_decoding/sql/ddl.sql  |  5 +-
 doc/src/sgml/catalogs.sgml | 11 +++
 src/backend/bootstrap/bootparse.y  |  1 +
 src/backend/catalog/heap.c |  4 ++
 src/backend/catalog/toasting.c |  1 +
 src/backend/commands/cluster.c |  1 +
 src/backend/commands/tablecmds.c   |  1 +
 src/backend/replication/logical/reorderbuffer.c|  9 +++
 src/backend/replication/pgoutput/pgoutput.c| 26 ---
 src/include/catalog/heap.h |  1 +
 src/include/catalog/pg_class.h | 22 +++---
 14 files changed, 81 insertions(+), 102 deletions(-)

diff --git a/contrib/test_decoding/expected/concurrent_ddl_dml.out 
b/contrib/test_decoding/expected/concurrent_ddl_dml.out
index a15bfa292e..1f9e7661b7 100644
--- a/contrib/test_decoding/expected/concurrent_ddl_dml.out
+++ b/contrib/test_decoding/expected/concurrent_ddl_dml.out
@@ -10,7 +10,7 @@ step s1_insert_tbl1: INSERT INTO tbl1 (val1, val2) VALUES (1, 
1);
 step s2_alter_tbl2_float: ALTER TABLE tbl2 ALTER COLUMN val2 TYPE float;
 step s1_insert_tbl2: INSERT INTO tbl2 (val1, val2) VALUES (1, 1);
 step s1_commit: COMMIT;
-step s2_get_changes: SELECT regexp_replace(data, 'temp_\d+', 'temp') AS data 
FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 
'0', 'skip-empty-xacts', '1');
+step s2_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
 data   
 
 BEGIN  
@@ -32,16 +32,13 @@ step s2_alter_tbl1_float: ALTER TABLE tbl1 ALTER COLUMN 
val2 TYPE float; 
-step s2_get_changes: SELECT regexp_replace(data, 'temp_\d+', 'temp') AS data 
FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 
'0', 'skip-empty-xacts', '1');
+step s2_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
 data   
 
 BEGIN  
 table public.tbl1: INSERT: val1[integer]:1 val2[integer]:1
 table public.tbl2: INSERT: val1[integer]:1 val2[integer]:1
 COMMIT 
-BEGIN  
-table public.pg_temp: INSERT: val1[integer]:1 val2[double precision]:1
-COMMIT 
 ?column?   
 
 stop   
@@ -56,7 +53,7 @@ step s1_insert_tbl1: INSERT INTO tbl1 (val1, val2) VALUES (1, 
1);
 step s2_alter_tbl2_char: ALTER TABLE tbl2 ALTER COLUMN val2 TYPE character 
varying;
 step s1_insert_tbl2: INSERT INTO tbl2 (val1, val2) VALUES (1, 1);
 step s1_commit: COMMIT;
-step s2_get_changes: SELECT regexp_replace(data, 'temp_\d+', 'temp') AS data 
FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 
'0', 'skip-empty-xacts', '1');
+step s2_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
 data   
 
 BEGIN  
@@ -78,16 +75,13 @@ step s2_alter_tbl1_char: ALTER TABLE tbl1 ALTER CO

remove pg_class.relhaspkey

2018-02-24 Thread Peter Eisentraut
pg_class.relhaspkey doesn't seem to be used or useful for anything, so
can we remove it?  See attached patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 01004383b0a3d30d519b9dc219b54c8d753df8ee Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 24 Feb 2018 21:25:08 -0500
Subject: [PATCH] Remove pg_class.relhaspkey

It's not used for anything internally, and it can't be relied on for
external uses, so it can just be removed.
---
 doc/src/sgml/catalogs.sgml  |  9 -
 src/backend/catalog/heap.c  |  1 -
 src/backend/catalog/index.c | 32 ++-
 src/backend/commands/vacuum.c   | 10 --
 src/backend/rewrite/rewriteDefine.c |  1 -
 src/include/catalog/pg_class.h  | 38 ++---
 6 files changed, 20 insertions(+), 71 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 71e20f2740..b93eef4efc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1848,15 +1848,6 @@ pg_class Columns
   
  
 
- 
-  relhaspkey
-  bool
-  
-  
-   True if the table has (or once had) a primary key
-  
- 
-
  
   relhasrules
   bool
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cf36ce4add..3d80ff9e5b 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -798,7 +798,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_relnatts - 1] = Int16GetDatum(rd_rel->relnatts);
values[Anum_pg_class_relchecks - 1] = Int16GetDatum(rd_rel->relchecks);
values[Anum_pg_class_relhasoids - 1] = BoolGetDatum(rd_rel->relhasoids);
-   values[Anum_pg_class_relhaspkey - 1] = BoolGetDatum(rd_rel->relhaspkey);
values[Anum_pg_class_relhasrules - 1] = 
BoolGetDatum(rd_rel->relhasrules);
values[Anum_pg_class_relhastriggers - 1] = 
BoolGetDatum(rd_rel->relhastriggers);
values[Anum_pg_class_relrowsecurity - 1] = 
BoolGetDatum(rd_rel->relrowsecurity);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 564f2069cf..6268c10e11 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -125,7 +125,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid,
bool isready);
 static void index_update_stats(Relation rel,
-  bool hasindex, bool isprimary,
+  bool hasindex,
   double reltuples);
 static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
@@ -1162,7 +1162,6 @@ index_create(Relation heapRelation,
 */
index_update_stats(heapRelation,
   true,
-  isprimary,
   -1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1364,21 +1363,6 @@ index_constraint_create(Relation heapRelation,
 InvalidOid, conOid, 
indexRelationId, true);
}
 
-   /*
-* If needed, mark the table as having a primary key.  We assume it 
can't
-* have been so marked already, so no need to clear the flag in the 
other
-* case.
-*
-* Note: this might better be done by callers.  We do it here to avoid
-* exposing index_update_stats() globally, but that wouldn't be 
necessary
-* if relhaspkey went away.
-*/
-   if (mark_as_primary)
-   index_update_stats(heapRelation,
-  true,
-  true,
-  -1.0);
-
/*
 * If needed, mark the index as primary and/or deferred in pg_index.
 *
@@ -2041,7 +2025,6 @@ FormIndexDatum(IndexInfo *indexInfo,
  * to ensure we can do all the necessary work in just one update.
  *
  * hasindex: set relhasindex to this value
- * isprimary: if true, set relhaspkey true; else no change
  * reltuples: if >= 0, set reltuples to this value; else no change
  *
  * If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -2058,7 +2041,6 @@ FormIndexDatum(IndexInfo *indexInfo,
 static void
 index_update_stats(Relation rel,
   bool hasindex,
-  bool isprimary,
   double reltuples)
 {
Oid relid = RelationGetRelid(rel);
@@ -2088,7 +2070,7 @@ index_update_stats(Relation rel,
 * It is safe to use a non-transac

Re: remove pg_class.relhaspkey

2018-02-24 Thread Tom Lane
Peter Eisentraut  writes:
> pg_class.relhaspkey doesn't seem to be used or useful for anything, so
> can we remove it?  See attached patch.

We've discussed that at least twice before, and not pulled the trigger
for fear of breaking client code.

https://www.postgresql.org/message-id/flat/CAA-aLv7sszmU%2BFz-xLo6cOiASUiX0mCRwAMF2FB%3D2j-5MKqb%2BA%40mail.gmail.com

https://www.postgresql.org/message-id/flat/20140317185255.20724.49675%40wrigleys.postgresql.org

Not sure that the situation has changed any since then ... it still
comes down to whether we want an API break for client code.

regards, tom lane



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-02-24 Thread Mark Dilger

> On Feb 24, 2018, at 2:01 PM, Tomas Vondra  
> wrote:
> 
> Hi,
> 
> Attached is an updated version of the patch, fixing some minor bitrot
> and duplicate OIDs.

The three patches apply cleanly, compile, and pass check-world.

You might consider using PointerGetDatum in compare_scalars_simple
rather than hardcoding the logic directly.

mark