Re: Recently-introduced segfault in initdb?

2018-03-18 Thread Alvaro Herrera
Isaac Morland wrote:
> OK, I must have done something wrong with the bisect the first time. Now
> I'm getting the following as the problem commit:
> 
> fd1a421fe66173fb9b85d3fe150afde8e812cbe4 is the first bad commit

Did you run "make distclean" before git-pulling?  If not, maybe what you
have is an incomplete rebuild after the code update.  Unless you have
--enable-depend in your configure line ...?


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: MCV lists for highly skewed distributions

2018-03-18 Thread John Naylor
I wrote:

> Looks good. I'll run some tests of my own this week, trying to find
> corner cases different from yours.

Over the weekend I tried out a test to measure:
-The size of the MCV list
-The ratio between actual and estimated cardinality of the least
common value in the MCV list.
-The ratio between actual and estimated cardinality of the most common
value not in the MCV list.

The script, results, and some starter queries are attached. I ran a
bunch more queries to drill down in certain areas, but they're all
based on the ones here.

I ran against master, the RSE patch, and 3 different variations of the
V2 patch (with stddev cutoff of 2, 2.5, and 3.0). For each file, I
loaded into each DB, ran ANALYZE 10 times, and took the average for
each of the three metrics above. I only tested with the default stats
target.

For this test, I used the same distributions as Dean's original
script, but I whacked around the script to enable inserting results
into a table.

--
Summary for non-uniform distributions:
Note: There's a lot of variation in the estimation of the most common
non-MCV. About 1% of all ANALYZE runs apparently failed to sample one
of the good candidates for the MCV list, resulting in huge
underestimates. It didn't matter what patch was applied. For future
tests, I'll do more runs and measure a truncated mean. Looking at the
number of MCVs is much more stable, but unlike the uniform
distribution case, we don't have an intuitive feel for what that
number should be. That said,

-The V2 patch is somewhat better than RSE and much better than master
at estimating the most common value not in the MCV list.
-Bumping the sample stddev threshold to 3 seems to behave similarly to
the RSE patch, maybe slightly better.

--
Summary for uniform distributions:
Note 1: Using the average is not really adequate for testing
under/overestimation of values in my test setup, since I set the
estimation ratio to zero if there was either an empty MCV list or a
completely full list. In future runs, I'll drill down a bit more
precisely.
That said, there didn't seem to be a huge amount variation between the
patches as far as either of the ratios I was testing. Looking at the
number of MCVs is much better anyway, since we know we want either
none or everything.

Note 2: All patches fully populated the list when all the values fit
in the MCV list. For the rest of the cases:

-The RSE patch is not much better than master at excluding MCVs.
-The V2 patch is much better than either of these, but still not ideal
(up to 30)
-The V2 patch with a cutoff of 2.5 severely cut down the MCVs (at most 3).
-With a cutoff of 3.0, all are empty.

--
Here's one interesting case. It's a normal distribution, but highly
dispersed (N=50, sd=1000), so it resembles a uniform one. Looking
at the number of MCVs, it jumps around a lot between patches, but the
estimates don't vary much:

Master: 100
RSE: 1
V2: 100
V2 with 2.5 cutoff: 100
V2 with 3.0 cutoff: 32


--
Thoughts and future testing:
I think the V2 patch strikes a good balance.
I'm partial to the V2 patch with the 2.5 cutoff in sample standard
deviation, since it's much more aggressive about excluding values for
uniform distributions. It's almost as good as V2 at including values
in non-uniform distributions, but possibly worse enough that it's not
the best choice.

Time is running out, but some things I'd like to look at:

-As mentioned above, better noise reduction in my data analysis.
-The continuity-corrected Wald interval Dean mentioned [1]
-I wonder if we can actually untie the max size of the MCV list from
the stats target, and use a hard-coded maximum like 1000. That might
allow us to bump the max stat target without hurting planning time.
Perhaps next cycle.


-John Naylor

[1] 
https://www.postgresql.org/message-id/CAEZATCVHEEg%2BCrP%2B-0JUsVeNPu5rs_S23oJVeH4VF%3DfgDwhfLQ%40mail.gmail.com
#!/usr/bin/python

import datetime
import math
import numpy
import psycopg2
import random
import re
import sys

HEAD_PORT = 5430
RSE_PORT = 5431
SSD_20_PORT = 5432
SSD_25_PORT = 5425
SSD_30_PORT = 5433

TEST_DATA_FILE = '/tmp/mcv-test-data'
NUMBER_OF_ANAYLYSE_TESTS = 10

conn0 = psycopg2.connect('port=%d dbname=john user=john' % HEAD_PORT)
conn0.set_session(autocommit = True)
conn1 = psycopg2.connect('port=%d dbname=john user=john' % RSE_PORT)
conn1.set_session(autocommit = True)
conn20 = psycopg2.connect('port=%d dbname=john user=john' % SSD_20_PORT)
conn20.set_session(autocommit = True)
conn25 = psycopg2.connect('port=%d dbname=john user=john' % SSD_25_PORT)
conn25.set_session(autocommit = True)
conn30 = psycopg2.connect('port=%d dbname=john user=john' % SSD_30_PORT)
conn30.set_session(autocommit = True)

branches = {}
branches['master'] = conn0.cursor()
branches['RSE'] = conn1.cursor()
branches['V2_20'] = conn20.cursor()
branches['V2_25'] = conn25.cursor()
branches['V2_30'] = conn30.cursor()

# Store test results in the master DB.
result_store = branches['master']

# Create one results table

Re: Online enabling of checksums

2018-03-18 Thread Michael Banck
Hi,

On Thu, Mar 15, 2018 at 02:01:26PM +0100, Daniel Gustafsson wrote:
> > On 10 Mar 2018, at 16:09, Michael Banck  wrote:
> > I am aware that this is discussed already, but as-is the user experience
> > is pretty bad, I think pg_enable_data_checksums() should either bail
> > earlier if it cannot connect to all databases, or it should be better
> > documented.
> 
> Personally I think we should first attempt to solve the "allow-connections in
> background workers” issue which has been raised on another thread.  For now 
> I’m
> documenting this better.

I had a look at that thread and it seems stalled, I am a bit worried
that this will not be solved before the end of the CF.

So I think unless the above gets solved, pg_enable_data_checksums()
should error out with the hint.  I've had a quick look and it seems one
can partly duplicate the check from BuildDatabaseList() (or optionally
move it) to the beginning of StartChecksumHelperLauncher(), see
attached.

That results in:

postgres=# SELECT pg_enable_data_checksums();
ERROR:  Database "template0" does not allow connections.
HINT:  Allow connections using ALTER DATABASE and try again.
postgres=# 

Which I think is much nice than what we have right now:

postgres=# SELECT pg_enable_data_checksums();
 pg_enable_data_checksums 
--
 t
(1 row)

postgres=# \q
postgres@fock:~$ tail -3 pg1.log 
2018-03-18 14:00:08.512 CET [25514] ERROR:  Database "template0" does not allow 
connections.
2018-03-18 14:00:08.512 CET [25514] HINT:  Allow connections using ALTER 
DATABASE and try again.
2018-03-18 14:00:08.513 CET [24930] LOG:  background worker "checksum helper 
launcher" (PID 25514) exited with exit code 1

> Attached v4 of this patch, which addresses this review, and flipping status
> back in the CF app back to Needs Review.
 
Thanks!

The following errmsg() capitalize the error message without the first
word being a specific term, which I believe is not project style:

+   (errmsg("Checksum helper is currently running, cannot 
disable checksums"),
+   (errmsg("Database \"%s\" 
dropped, skipping", db->dbname)));
+   (errmsg("Checksums enabled, checksumhelper launcher 
shutting down")));
+   (errmsg("Database \"%s\" does not allow 
connections.", NameStr(pgdb->datname)),
+   (errmsg("Checksum worker starting for database oid %d", 
dboid)));
+   (errmsg("Checksum worker completed in database oid %d", 
dboid)));

Also, in src/backend/postmaster/checksumhelper.c there are few
multi-line comments (which are not function comments) that do not have a
full stop at the end, which I think is also project style:

+* Failed to set means somebody else started

Could be changed to a one-line (/* ... */) comment?

+* Force a checkpoint to get everything out to disk

Should have a full stop.

+* checksummed, so skip

Should have a full stop.

+* Enable vacuum cost delay, if any

Could be changed to a one-line comment?

+* Create and set the vacuum strategy as our buffer strategy

Could be changed to a one-line comment?

Apart from that, I previously complained about the error in
pg_verify_checksums:

+   fprintf(stderr, _("%s: %s, block %d, invalid 
checksum in file %X, calculated %X\n"),
+   progname, fn, blockno, 
header->pd_checksum, csum);

I still propose something like in backend/storage/page/bufpage.c's
PageIsVerified(), e.g.:

|fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: 
calculated checksum %X but expected %X\n"),
|progname, fn, blockno, csum, header->pd_checksum);

Otherwise, I had a quick look over v4 and found no further issues.
Hopefully I will be able to test it on some bigger test databases next
week.

I'm switching the state back to 'Waiting on Author'; if you think the
above points are moot, maybe switch it back to 'Needs Review' as Andrey
Borodin also marked himself down as reviewer and might want to have
another look as well.


Cheers,

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
index 9186dbea22..62d5b7b9c8 100644
--- a/src/backend/postmaster/checksumhelper.c
+++ b/src/backend/postmaster/checksumhelper.c
@@ -88,6 +88,25 @@ StartChecksumHelperLauncher(int cost_delay, int cost_limit)
 {
 	BackgroundWorker bgw;
 	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	/*
+	 * Check that all databases allow connectio

Re: Online enabling of checksums

2018-03-18 Thread Andrey Borodin
Hi!

> 18 марта 2018 г., в 19:02, Michael Banck  
> написал(а):
> 
> Otherwise, I had a quick look over v4 and found no further issues.
> Hopefully I will be able to test it on some bigger test databases next
> week.
> 
> I'm switching the state back to 'Waiting on Author'; if you think the
> above points are moot, maybe switch it back to 'Needs Review' as Andrey
> Borodin also marked himself down as reviewer and might want to have
> another look as well.

Yep, i'm already doing another pass on the code again. Hope to finish tomorrow.
My 2 cents, there's typo in the word "connections"
+ template0 is by default not accepting connetions, to

Best regards, Andrey Borodin.


Re: Recently-introduced segfault in initdb?

2018-03-18 Thread Isaac Morland
On 18 March 2018 at 05:57, Alvaro Herrera  wrote:

> Isaac Morland wrote:
> > OK, I must have done something wrong with the bisect the first time. Now
> > I'm getting the following as the problem commit:
> >
> > fd1a421fe66173fb9b85d3fe150afde8e812cbe4 is the first bad commit
>
> Did you run "make distclean" before git-pulling?  If not, maybe what you
> have is an incomplete rebuild after the code update.  Unless you have
> --enable-depend in your configure line ...?
>

Thank you, and sorry everybody for the noise. Yes, I just did a "make
distclean" and then the build worked. Which still leaves the question of
why some commits work and others don't, but some mysteries just aren't
worth figuring out. I actually remember reading about distclean but didn't
think of it when I actually ran into the problem. Thanks again.


Re: [GSoC 2018] Proposal Draft

2018-03-18 Thread Andrey Borodin
Hi Kefar!

> 18 марта 2018 г., в 5:34, Kefan Yang  написал(а):
> 
> I am Kefan Yang, a third-year Computing Science student from Simon Fraser 
> University, Canada. I am very interested in the sorting algorithm 
> benchmarking and implementation issue you mentioned on the idealist of Google 
> Summer of Code 2018. 
> 
> I am currently working on my proposal, but I am a little bit worried about 
> whether I am on the right track. I've encountered some problems in the 
> implementation of the benchmark of those sorting algorithms. I briefly 
> described my current ideas in the proposal draft, but they may change a bit 
> as I read more research papers. I sincerely hope to get some feedback from 
> you to improve my proposal, especially for the implementation part.
> 
> I've attached a proposal draft to this email. It's a brief one but I guess 
> it's enough to show my current ideas:)

Peter have already added some context, I'll comment a bit on contents of your 
proposal.

I expect implementation of both timsort (dual pivot qsort) and introsort 
(depth-limited qsort). MySQL recently switched to introsort (they just used STL 
version instead of his own qsort).
Sorting routines are important part of many DBMS parts. But, indeed, sorting of 
actual user data is related to qsort very distantly - often DBMS deals with 
external sorting.

There is built-in Postgres way to benchmark - pgbench performance. But I do not 
think that it's output will be statistically conclusive. I think that 
benchmarking will be at least as time consuming as implementation itself.

BTW I think it is good to add Postgres hash table implementation to this 
project too. Currently, we use Larson's linear hashing (HTAB struct, 
dynahash.c), which is somewhat seldom used nowadays. It would be good to tests 
cuckoo hashing and some open addressing against HTAB interface.

Please do not forget to add to your proposal review of other's patches. If you 
will produce some useful output in a patch form and you will want it committed 
you will have to review someone else patch. This is required by Postgres dev 
process.

Best regards, Andrey Borodin.

Re: Recently-introduced segfault in initdb?

2018-03-18 Thread Tom Lane
Isaac Morland  writes:
> Thank you, and sorry everybody for the noise. Yes, I just did a "make
> distclean" and then the build worked.

Yeah, this is something most of us have learned the hard way over
time ;-).  Either use --enable-depend, or religiously do "make
distclean" before any git pull, git bisect, etc.  I prefer the
latter because it avoids leaving junk files laying around, such
as .o files that should exist in one version but not another;
but if you're less annoyed by that sort of thing than I am,
--enable-depend is a good alternative.

In any case, using ccache is recommendable if your platform doesn't
already set you up to use that automatically.  Also, if you choose
the "make distclean" approach, you're going to be re-running
configure a lot, so it's good to use --cache-file with that.
These together with "make -j" will get the rebuild time down to
maybe 15 seconds on modern hardware.

regards, tom lane



[GSoC] Proposal Submission

2018-03-18 Thread Christos Maris
Hi all,

I have submitted my proposal and sent an email asking for feedback in the
last 24 hours, but I didn't get any.

Could you please give me some feedback on how to improve it?

Thanks a lot in advance!


Re: Google Summer of Code: Potential Applicant

2018-03-18 Thread Christos Maris
Has this email been sent to the list successfully?

I didn't get any replies!

On Sat, Mar 17, 2018 at 6:12 PM, Christos Maris 
wrote:

> I just submitted my draft proposal.
>
> Could you please take a look at it and give me any feedback on how to
> improve it?
>
> Thanks a lot in advance!
>
> On Wed, Mar 14, 2018 at 10:12 AM, Aleksander Alekseev <
> a.aleks...@postgrespro.ru> wrote:
>
>> Hello Stephen,
>>
>> > > Protobuf is fine too, but unfortunately we don't have any
>> > > Protobuf-related projects this time.
>> >
>> > Just to be clear, the list on the wiki is just a set of suggestions-
>> > students are welcome to propose their own projects as well.
>>
>> Oh, I didn't know that. In this case I will be happy to see any sorts of
>> projects related to Protobuf or any other binary format - Thrift, Avro,
>> Cap'n'Proto, MessagePack, etc!
>>
>> --
>> Best regards,
>> Aleksander Alekseev
>>
>
>


Re: Google Summer of Code: Potential Applicant

2018-03-18 Thread Andrey Borodin
Hi Christos!

> 18 марта 2018 г., в 22:09, Christos Maris  
> написал(а):
> 
> Has this email been sent to the list successfully?
> 
> I didn't get any replies!
> 
> On Sat, Mar 17, 2018 at 6:12 PM, Christos Maris  > wrote:
> I just submitted my draft proposal.
> 
> Could you please take a look at it and give me any feedback on how to improve 
> it?
> 
> Thanks a lot in advance!
Yes, this message landed to pgsql-hackers well.
You have sent a message on Saturday evening. Why do you expect immediate 
answer? Please, be patient. In certain cases expecting answers within a week - 
is optimistic. It worth to invest some time to make your messages clear and 
precise to shorten reply time and avoid additional clarification roundtrips.

Did you submit your proposal to GSoC website? If so - mentors are not 
registered on that site yet.

Best regards, Andrey Borodin.

Re: Google Summer of Code: Potential Applicant

2018-03-18 Thread Christos Maris
I am very sorry I didn't know that. It's just that I really want to improve
my proposal as much as possible.

Should I send you my proposal here?

BTW it is on the project: implementation and benchmarking of shorting
algorithms

On Sun, Mar 18, 2018, 7:55 PM Andrey Borodin  wrote:

> Hi Christos!
>
> 18 марта 2018 г., в 22:09, Christos Maris 
> написал(а):
>
> Has this email been sent to the list successfully?
>
> I didn't get any replies!
>
> On Sat, Mar 17, 2018 at 6:12 PM, Christos Maris <
> christos.c.ma...@gmail.com> wrote:
>
>> I just submitted my draft proposal.
>>
>> Could you please take a look at it and give me any feedback on how to
>> improve it?
>>
>> Thanks a lot in advance!
>>
> Yes, this message landed to pgsql-hackers well.
> You have sent a message on Saturday evening. Why do you expect immediate
> answer? Please, be patient. In certain cases expecting answers within a
> week - is optimistic. It worth to invest some time to make your messages
> clear and precise to shorten reply time and avoid additional clarification
> roundtrips.
>
> Did you submit your proposal to GSoC website? If so - mentors are not
> registered on that site yet.
>
> Best regards, Andrey Borodin.
>


Lexically-scoped options

2018-03-18 Thread Chapman Flack
The SQL standard overloads WITH in a query to supply not only CTEs
but also lexically-scoped option settings:

 WITH XMLBINARY BASE64, foo(bar) AS (VALUES('\xDEADBEEF'::bytea))
 SELECT XMLELEMENT(name foo, XMLATTRIBUTES(bar)) FROM foo;

PG already implements XMLBINARY and XMLOPTION using the GUC system.
Would it be easy or hard, reasonable or objectionable, to generalize
PG's WITH syntax along these lines, rewriting it into lexically-scoped
settings of GUCs?

Perhaps a further development in the same line would be enabling the planner
to choose among IMMUTABLE specializations of a function, as well as a
generic STABLE version that depends on a GUC.

-Chap



include/pgtar.h is missing include guards?

2018-03-18 Thread Andres Freund
Hi Magnus,

Was $subject intentional?

Greetings,

Andres Freund



Re: Implementing SQL ASSERTION

2018-03-18 Thread David Fetter
On Sun, Mar 18, 2018 at 12:29:50PM +, Joe Wildish wrote:
> > 
> >> 
> >> This patch no longer applies.  Any chance of a rebase?
> >> 
> > 
> 
> 
> Attached is a rebased version of this patch. It takes into account the ACL 
> checking changes and a few other minor amendments.

Thanks!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: IndexJoin memory problem using spgist and boxes

2018-03-18 Thread Anton Dignös
Hi,

attached is the patch that uses two memory contexts.
One for calling the inner consistent function,
and a new one for keeping the traversal memory of the inner consistent function.

I run some test to compare the memory footprints. I report the total maximum
memory usage (sum of all children) of the per-query memory context
that is the parent of all memory
contexts used. The test datasets are described below.

 TEST |  HEAD | previous patch | patch V2
--+---++--
 T1   | 3.4GB | 98kB   | 81kB
 T2   | 3.4GB | 17MB   | 17MB
 T3   | 3.5GB | 17MB   | 17MB
 T4   | 7GB   | 17MB   | 17MB
 T5   | 8GB   | 34MB   | 25MB

T1: 1M x 1M tuples with relatively few overlaps
T2: as T1, but with 1 tuple in the outer relation for which the entire
index reports matches
T3: as T1, but with 10 tuples in the outer relation for which the
entire index reports matches
T4: as T3, but the outer relation has double the number of tuples
T5: as T4, but the inner relation has double the number of tuples

TEST dataset creation queries (executed incrementally)

T1:
-- create table r with 1M tuples
CREATE TABLE r AS SELECT 1 i, box(point(generate_series,
generate_series), point(generate_series+10, generate_series+10)) FROM
generate_series(1, 100);
-- create table s with 1M tuples
CREATE TABLE s AS SELECT 1 i, box(point(generate_series,
generate_series), point(generate_series+10, generate_series+10)) FROM
generate_series(1, 100) ORDER BY random(); -- random sort just
speeds up index creation
CREATE INDEX s_idx ON s USING spgist(box);

T2:
-- inserts a tuple for which the entire index is a match
INSERT INTO r VALUES (2, box(point(1, 1), point(100, 100)));

T3:
-- inserts 9 more tuples as in T2.

T4:
-- doubles the outer relations
INSERT INTO r SELECT * FROM r;

T5:
-- doubles the inner relation
INSERT INTO s SELECT * FROM s;


The new patch is a bit better in terms of memory by using the two
memory contexts.
I also checked the query times using explain analyze, both patches
have approximately the same runtime,
but run 5-6 times faster than HEAD.

Best regards,
Anton


spgist-idxscan-mem-fix-V2.patch
Description: Binary data


jsonb nesting level edge case

2018-03-18 Thread Dmitry Dolgov
Hi,

I've just realized, that looks like there is one edge-case in the current jsonb
implementation, that can be quite confusing, and I couldn't find any related
discussion about it. From what I see there is no limit for how many nested
levels can be in a jsonb field, and e.g. when a jsonb is created from a string
it basically means that we're limited only by `check_stack_depth` (in the
following case it's about recursive `parse_object`/`parse_object_field`). So
you can create a jsonb with quite many nesting levels:

=# insert into test_jsonb values((
(select '{' || string_agg('"field' || s.id || '"', ': {')
from generate_series(1, 1) as s(id))
|| ': "some_value"' ||
(select string_agg('}', '') from generate_series(1, 1)))::jsonb);

INSERT 0 1
Time: 29.129 ms

But at the same time `jsonb_set` apparently has a different recursion schema,
and reaches max_stack_depth faster (in this case it's about recursive
`setPath`/`setPathObject`). It means that you can create a document, but you
can't update its value using function, that was specified for that (so you
probably need to override the entire jsonb to actually update something):

=# update test_jsonb set data = jsonb_set(data,
(select array_agg('field' || s.id) from generate_series(1,
1) as s(id)),
'"other_value"');

ERROR:  54001: stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth"
(currently 2048kB), after ensuring the platform's stack depth limit is
adequate.
LOCATION:  check_stack_depth, postgres.c:3163
Time: 17.143 ms

Is it something significant enough to worry about? Just to mention, in some
other databases there is just a limit for number of nested levels for a
document (e.g. in MongoDB Bson, MySQL binary json it's exactly 100).



ECPG installcheck tests fail if PGDATABASE is set

2018-03-18 Thread Andres Freund
Hi,

I got a bit confused running installcheck-world and seeing ecpg
failures like:
 [NO_PID]: ECPGconnect: opening database  on  port   
for user regress_ecpg_user2
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"regress_ecpg_user2" does not exist
-
-[NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_finish: connection main closed
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: raising sqlcode -402 on line 43: could not connect to database 
"" on line 43
-[NO_PID]: sqlca: code: -402, state: 08001
-[NO_PID]: raising sqlcode -220 on line 44: connection "main" does not exist on 
line 44
-[NO_PID]: sqlca: code: -220, state: 08003
 [NO_PID]: ECPGconnect: opening database ecpg2_regression on  port 
  for user regress_ecpg_user1

A bit of pondering pointed me towards my environment's
PGDATABASE=postgres being to blame. Unsetting that makes the test
succeed.

Do we consider that an unsupported configuration?  It seems like a
fairly reasonable thing to want to support imo.

Greetings,

Andres Freund



Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2018-03-18 Thread Daniel Gustafsson
> On 18 Mar 2018, at 03:09, Tom Lane  wrote:
> 
> Chapman Flack  writes:
>> Thanks for the review. I notice that cfbot has now flagged the patch as
>> failing, and when I look into it, it appears that cfbot is building with
>> your test patch, and without the xlog.c patch, and so the test naturally
>> fails. Does the cfbot require both patches to be attached to the same
>> email, in order to include them both?
> 
> I believe so --- AFAIK it does not know anything about dependencies
> between different patches, and will just try to build whatever patch(es)
> appear in the latest email on a given thread.  Munro might be able to
> provide more detail.

Right, I should’ve realized when I didn’t include your original patch as well,
sorry about that.  Now we know that the proposed test fails without the patch
applied and clears with it, that was at least an interesting side effect =)

cheers ./daniel


Flexible permissions for REFRESH MATERIALIZED VIEW

2018-03-18 Thread Isaac Morland
This is a proposal for a Postgres feature enhancement. I've attached a
preliminary patch. However, the patch is extremely preliminary: there is no
documentation or testing change, and I think I actually want to make the
change itself in a different way from what this 2-line patch does.

Right now I'm really looking for whether anybody observes any problems with
the basic idea. If it's considered to be at least in principle a good idea
then I'll go and make a more complete patch.

The original idea was to allow access to REFRESH MATERIALIZED VIEW to be a
grantable permission, rather than being reserved to the table owner. I
found that permission checking is done in RangeVarCallbackOwnsTable(),
which is also used for CLUSTER and REINDEX.

My initial implementation was to duplicate RangeVarCallbackOwnsTable() and
make a new version just for REFRESH, but then I realized that allowing
CLUSTER and REINDEX to be grantable permissions also might make sense so
the attached patch just modifies RangeVarCallbackOwnsTable().

The patch so far uses TRUNCATE permission to control access as a
proof-of-concept. However, I am considering whether we could add a new
REFRESH permission ('R'?) on tables (including materialized views) to
control access. Of course, we would also rename RangeVarCallbackOwnsTable()
to accurately describe its new function.

When I first had the idea, one concern I had was what would happen to the
security context during REFRESH, but it turns out that checking whether the
operation is allowed and actually setting up the context are completely
separate, I think so that REFRESH triggered by superuser and by the owner
(or for that matter by a role which is a member of the owner) result in the
same security context. So the same should apply if we allow other users to
do a REFRESH.

I think backward compatibility is pretty good. If the feature is ignored
and no REFRESH permissions are granted, then it should work out to the same
as what we have now: owner and superuser are considered to have all table
permissions. pg_upgrade might need to upgrade explicit owner permissions -
I'm not yet absolutely clear on how those work. Anybody who wants the new
feature would be able to use it by granting the new permission, while for
anybody who doesn't need it things should continue working as before, with
the only difference being the exact error message resulting from a
permission violation.

I think the feature is pretty well justified in terms of the overall
Postgres authorization model too. DDL can in general be changed only by
object owners: e.g., renaming, altering columns, that sort of thing.
Changing relation contents, however, is a grantable privilege...but not
when it comes to refreshing materialized views or clustering or reindexing
indexes. So this just makes it so that more non-DDL changes are grantable.
Arguably CLUSTER should require the owner to change on which index the
table is clustered, but my inclination is not to have that additional
complication.

I welcome any comments, questions, and suggestions.


matview-permissions-1.patch
Description: Binary data


Re: include/pgtar.h is missing include guards?

2018-03-18 Thread Magnus Hagander
On Sun, Mar 18, 2018 at 8:24 PM, Andres Freund  wrote:

> Hi Magnus,
>
> Was $subject intentional?
>

Not that I can recall, and I can't see why. I think that's just a mistake.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: strange failure in plpgsql_control tests (on fulmar, ICC 14.0.3)

2018-03-18 Thread Thomas Munro
On Sun, Mar 18, 2018 at 7:33 AM, Andres Freund  wrote:
> On 2018-03-17 14:20:26 -0400, Tom Lane wrote:
>> It might be worth studying the icc manual to see if it has an
>> equivalent of -fwrapv.
>
> Yes.
>
> A *quick* look through https://software.intel.com/en-us/node/522795
> unfortunately didn't show anything.

Apparently it does support -fno-strict-overflow.  Is that useful here?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: strange failure in plpgsql_control tests (on fulmar, ICC 14.0.3)

2018-03-18 Thread Thomas Munro
On Mon, Mar 19, 2018 at 10:32 AM, Thomas Munro
 wrote:
> On Sun, Mar 18, 2018 at 7:33 AM, Andres Freund  wrote:
>> On 2018-03-17 14:20:26 -0400, Tom Lane wrote:
>>> It might be worth studying the icc manual to see if it has an
>>> equivalent of -fwrapv.
>>
>> Yes.
>>
>> A *quick* look through https://software.intel.com/en-us/node/522795
>> unfortunately didn't show anything.
>
> Apparently it does support -fno-strict-overflow.  Is that useful here?

Hmm.  This was already discussed here:

https://www.postgresql.org/message-id/51016409.2020808%40gmail.com

Noah seemed to be at least slightly in favour of considering turning
it on despite doubt about its precise meaning, but also pointed out
that even -fwrapv doesn't mean exactly the same thing in GCC and
Clang.

Curiously -fno-strict-overflow doesn't seem to appear in the
documentation that Andres posted (well I couldn't find it quickly,
anyway), but we can see that it exists from this interaction between
Xi Wang and Intel compiler engineering:

https://software.intel.com/en-us/forums/intel-c-compiler/topic/358200

And we can see a reference here:

https://www.spec.org/cpu2017/flags/Intel-ic18.0-official-linux64.html

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2018-03-18 Thread Chapman Flack
On 03/18/18 16:56, Daniel Gustafsson wrote:
> sorry about that.  Now we know that the proposed test fails without the patch
> applied and clears with it, that was at least an interesting side effect =)

It was, and it got me looking at the test, and even though it does detect
the difference between patch-applied and patch-not-applied, I sort of wonder
if it does what it claims to. It seems to me that unpack('N8192', ...)
would want to return 8192 32-bit ints (in array context), but really only
be able to return 2048 of them (because it's only got 8192 bytes to unpack),
and then being used in scalar context, it only returns the first one anyway,
so the test only hinges on whether the first four bytes of the block are
zero or not. Which turns out to be enough to catch a non-zeroed header. :)

What would you think about replacing the last two lines with just

  ok($bytes =~ /\A\x00*+\z/, 'make sure wal segment is zeroed');

-Chap



Fixing some issues in matview.c's refresh-query construction

2018-03-18 Thread Tom Lane
While looking at the performance problem Jeff Janes reported recently[1],
I noticed several other pre-existing deficiencies in the way that
refresh_by_match_merge() generates its query for constructing the diff
table:

1. It doesn't require the selected unique index(es) to be indimmediate.
Perhaps this isn't a bug, but I don't really see why: if they're
deferred constraints, then they might not represent conditions that
actually hold right now.

2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.  The datatype might not even *have* a
default btree opclass.  We must instead look up, and use, the equality
operator associated with the index's actual opclass.

3. It doesn't check that the indexes are btrees.  Even though there are no
other unique-capable index AMs today (at least not in core), I think it's
a good idea to insist on btrees.  We couldn't be sure that the planner
could implement FULL JOIN with an operator that is not btree equality, nor
do we have a convention for identifying what is the equality operator for
an index opclass if it's not btree.

4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  It's not enough to use
OPERATOR(schema.op) notation; you have to also be careful to cast the
inputs if they're not already of the operator's input types.  The correct
way to do this is the way that the ri_triggers.c code does it.  (This
would have been a security bug before CVE-2018-1058.)

5. While not actually a bug, I didn't like the fact that the conditions
for an index being usable for the match merge were written out in two
places.  That's too easy to break.  I also didn't like the fact that
refresh_by_match_merge() would segfault if it came across an index on a
system column, such as the OID column.  That's just latent, since matviews
can't have OID columns today, but we really ought to check that we are
considering only indexes on user columns.

The attached patch rectifies these problems.  Rather than duplicating
ri_triggers.c's code for emitting a safe equality clause, I thought the
best way to proceed was to have ruleutils.c export it for use by the
other modules.  That's where we have most query-deconstruction code,
so just exporting it from ri_triggers.c seemed a bit too historical.

Barring objections, I propose to back-patch this as a bug fix.

regards, tom lane

[1] 
https://postgr.es/m/CAMkU=1z-jogymhneghar1cru4f1xdfhqjdzxp_ctk5cl3do...@mail.gmail.com

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ab6a889..7159a27 100644
*** a/src/backend/commands/matview.c
--- b/src/backend/commands/matview.c
***
*** 21,26 
--- 21,28 
  #include "catalog/catalog.h"
  #include "catalog/indexing.h"
  #include "catalog/namespace.h"
+ #include "catalog/pg_am.h"
+ #include "catalog/pg_opclass.h"
  #include "catalog/pg_operator.h"
  #include "commands/cluster.h"
  #include "commands/matview.h"
***
*** 40,46 
  #include "utils/rel.h"
  #include "utils/snapmgr.h"
  #include "utils/syscache.h"
- #include "utils/typcache.h"
  
  
  typedef struct
--- 42,47 
*** static void transientrel_shutdown(DestRe
*** 62,75 
  static void transientrel_destroy(DestReceiver *self);
  static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
  		 const char *queryString);
- 
  static char *make_temptable_name_n(char *tempname, int n);
- static void mv_GenerateOper(StringInfo buf, Oid opoid);
- 
  static void refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
  	   int save_sec_context);
  static void refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence);
! 
  static void OpenMatViewIncrementalMaintenance(void);
  static void CloseMatViewIncrementalMaintenance(void);
  
--- 63,73 
  static void transientrel_destroy(DestReceiver *self);
  static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
  		 const char *queryString);
  static char *make_temptable_name_n(char *tempname, int n);
  static void refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
  	   int save_sec_context);
  static void refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence);
! static bool is_usable_unique_index(Relation indexRel);
  static void OpenMatViewIncrementalMaintenance(void);
  static void CloseMatViewIncrementalMaintenance(void);
  
*** ExecRefreshMatView(RefreshMatViewStmt *s
*** 230,252 
  		{
  			Oid			indexoid = lfirst_oid(indexoidscan);
  			Relation	indexRel;
- 			Form_pg_index indexStruct;
  
  			indexRel = index_open(indexoid, AccessShareLock);
! 			indexStruct = indexRel->rd_index;
! 
! 			if (indexStruct->indisunique &&
! IndexIsValid(indexStruct) &&
! RelationGetIndexExpressions(indexRel) == NIL

Re: MCV lists for highly skewed distributions

2018-03-18 Thread Dean Rasheed
On 18 March 2018 at 12:24, John Naylor  wrote:
> Over the weekend I tried out a test to measure:
> -The size of the MCV list
> -The ratio between actual and estimated cardinality of the least
> common value in the MCV list.
> -The ratio between actual and estimated cardinality of the most common
> value not in the MCV list.

Nice tests!


> Summary for non-uniform distributions:
> Note: There's a lot of variation in the estimation of the most common
> non-MCV. About 1% of all ANALYZE runs apparently failed to sample one
> of the good candidates for the MCV list, resulting in huge
> underestimates. It didn't matter what patch was applied. For future
> tests, I'll do more runs and measure a truncated mean. Looking at the
> number of MCVs is much more stable, but unlike the uniform
> distribution case, we don't have an intuitive feel for what that
> number should be. That said,
>
> -The V2 patch is somewhat better than RSE and much better than master
> at estimating the most common value not in the MCV list.

Yes, that matches my observations too. It stems from the fact that the
V2 patch tends to produce more MCVs than the RSE patch (which in turn
tends to produce more MCVs than master) for non-uniform distributions.
More MCVs then improve the estimates for non-MCVs.


> -Bumping the sample stddev threshold to 3 seems to behave similarly to
> the RSE patch, maybe slightly better.

In a way, that's probably not surprising. The two algorithms are
closely related. If the non-MCV frequencies are quite low, the v2
patch with a 3-SD threshold behaves like the RSE patch with a 33% RSE
cutoff. The 20% RSE cutoff patch is mathematically equivalent to the
v2 patch with a 5-SD threshold and the non-MCV frequencies all set to
zero, if that makes sense.


> Summary for uniform distributions:
> Note 1: Using the average is not really adequate for testing
> under/overestimation of values in my test setup, since I set the
> estimation ratio to zero if there was either an empty MCV list or a
> completely full list. In future runs, I'll drill down a bit more
> precisely.
> That said, there didn't seem to be a huge amount variation between the
> patches as far as either of the ratios I was testing. Looking at the
> number of MCVs is much better anyway, since we know we want either
> none or everything.
>
> Note 2: All patches fully populated the list when all the values fit
> in the MCV list. For the rest of the cases:
>
> -The RSE patch is not much better than master at excluding MCVs.

To make sense of that, you need to look at the errors in the
estimates. The RSE patch won't exclude MCVs if it thinks they will
give reasonably accurate estimates, so may produce fully populated MCV
lists for uniform distributions even when all the distinct values
don't fit. That's not ideal, but it isn't necessarily bad. In my
previous testing I found that it was good at excluding MCVs in cases
where they gave inaccurate estimates.


> -The V2 patch is much better than either of these, but still not ideal
> (up to 30)
> -The V2 patch with a cutoff of 2.5 severely cut down the MCVs (at most 3).
> -With a cutoff of 3.0, all are empty.

Makes sense, but I don't think we should necessarily put too much
emphasis on the uniform tests. Real-world datasets tend to be
non-uniform, and the consequences of too few MCVs tend to be worse
than too many.

I repeated these tests with a 2-SD continuity-corrected threshold and
the results fell somewhere between the 2-SD and 2.5-SD cases. My
inclination is to go with that, as something that strikes a reasonable
balance, and represents a distinct improvement over master in a number
of different areas.

I'll post something tomorrow, once I've finished wordsmithing the comments.

Regards,
Dean



HELP

2018-03-18 Thread Projat Banerjee
What is the type of proposal should I submit here ? What kind or on what basis 
should I build my proposal so that I may get easily selected or chances for my 
selection is high ?

Sent from Mail for Windows 10



Re: Online enabling of checksums

2018-03-18 Thread Daniel Gustafsson
> On 18 Mar 2018, at 15:02, Michael Banck  wrote:
> On Thu, Mar 15, 2018 at 02:01:26PM +0100, Daniel Gustafsson wrote:
>>> On 10 Mar 2018, at 16:09, Michael Banck  wrote:
>>> I am aware that this is discussed already, but as-is the user experience
>>> is pretty bad, I think pg_enable_data_checksums() should either bail
>>> earlier if it cannot connect to all databases, or it should be better
>>> documented.
>> 
>> Personally I think we should first attempt to solve the "allow-connections in
>> background workers” issue which has been raised on another thread.  For now 
>> I’m
>> documenting this better.
> 
> I had a look at that thread and it seems stalled, I am a bit worried
> that this will not be solved before the end of the CF.
> 
> So I think unless the above gets solved, pg_enable_data_checksums()
> should error out with the hint.  I've had a quick look and it seems one
> can partly duplicate the check from BuildDatabaseList() (or optionally
> move it) to the beginning of StartChecksumHelperLauncher(), see
> attached.

I’ve incorporated a slightly massaged version of your patch.  While it is a
little ugly to duplicate the logic, it’s hopefully a short-term fix, and I
agree that silently failing is even uglier. Thanks for the proposal!

It should be noted that a database may well be altered to not allow connections
between the checksum helper starts, or gets the database list, and when it
tries to checksum it, so we might still fail on this very issue with the checks
bypassed.  Still improves the UX to catch low hanging fruit of course.

> The following errmsg() capitalize the error message without the first
> word being a specific term, which I believe is not project style:

Fixed.

> Also, in src/backend/postmaster/checksumhelper.c there are few
> multi-line comments (which are not function comments) that do not have a
> full stop at the end, which I think is also project style:

I’ve addressed all of these, but I did leave one as a multi-line which I think
looks better as that.  I also did some spellchecking and general tidying up of
the error messages and comments in the checksum helper.

> Apart from that, I previously complained about the error in
> pg_verify_checksums:

Updated to your suggestion.  While in there I re-wrote a few other error
messages to be consistent in message and quoting.

> Otherwise, I had a quick look over v4 and found no further issues.
> Hopefully I will be able to test it on some bigger test databases next
> week.

Thanks a lot for your reviews!

cheers ./daniel



online_checksums5.patch
Description: Binary data


Re: Online enabling of checksums

2018-03-18 Thread Daniel Gustafsson
> On 18 Mar 2018, at 17:21, Andrey Borodin  wrote:
> 
>> 18 марта 2018 г., в 19:02, Michael Banck  
>> написал(а):
>> 
>> Otherwise, I had a quick look over v4 and found no further issues.
>> Hopefully I will be able to test it on some bigger test databases next
>> week.
>> 
>> I'm switching the state back to 'Waiting on Author'; if you think the
>> above points are moot, maybe switch it back to 'Needs Review' as Andrey
>> Borodin also marked himself down as reviewer and might want to have
>> another look as well.
> 
> Yep, i'm already doing another pass on the code again. Hope to finish 
> tomorrow.
> My 2 cents, there's typo in the word "connections"
> + template0 is by default not accepting connetions, to

Fixed in patch just posted in 84693d0c-772f-45c2-88a1-85b4983a5...@yesql.se
(version 5). Thanks!

cheers ./daniel


Re: ECPG installcheck tests fail if PGDATABASE is set

2018-03-18 Thread Tom Lane
Andres Freund  writes:
> I got a bit confused running installcheck-world and seeing ecpg
> failures like:
> ...
> A bit of pondering pointed me towards my environment's
> PGDATABASE=postgres being to blame. Unsetting that makes the test
> succeed.

Hm ... pg_regress unsets PGDATABASE, along with the other related
environment variables, when it has a temp installation but not
when it doesn't.  So what I don't understand is why your environment
doesn't also break every other regression test besides ecpg.
Perhaps they're all connecting to "postgres", but it's hard to
believe there wouldn't be conflicts if so.

regards, tom lane



Re: ECPG installcheck tests fail if PGDATABASE is set

2018-03-18 Thread Andres Freund


On March 18, 2018 4:06:18 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> I got a bit confused running installcheck-world and seeing ecpg
>> failures like:
>> ...
>> A bit of pondering pointed me towards my environment's
>> PGDATABASE=postgres being to blame. Unsetting that makes the test
>> succeed.
>
>Hm ... pg_regress unsets PGDATABASE, along with the other related
>environment variables, when it has a temp installation but not
>when it doesn't.  So what I don't understand is why your environment
>doesn't also break every other regression test besides ecpg.
>Perhaps they're all connecting to "postgres", but it's hard to
>believe there wouldn't be conflicts if so.

All the others specify a database. The issue with the ecpg test is that it 
doesn't for two test cases. The failure is caused by libpq guessing the db name 
from the user name (hence the error about a database with role in it), which 
doesn't happen if libpq sees the environment variable.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2018-03-18 Thread Daniel Gustafsson
> On 18 Mar 2018, at 22:54, Chapman Flack  wrote:
> 
> On 03/18/18 16:56, Daniel Gustafsson wrote:
>> sorry about that.  Now we know that the proposed test fails without the patch
>> applied and clears with it, that was at least an interesting side effect =)
> 
> It was, and it got me looking at the test, and even though it does detect
> the difference between patch-applied and patch-not-applied, I sort of wonder
> if it does what it claims to. It seems to me that unpack('N8192', ...)
> would want to return 8192 32-bit ints (in array context), but really only
> be able to return 2048 of them (because it's only got 8192 bytes to unpack),
> and then being used in scalar context, it only returns the first one anyway,
> so the test only hinges on whether the first four bytes of the block are
> zero or not. Which turns out to be enough to catch a non-zeroed header. :)

Good point, thats what I get for hacking without enough coffee.

> What would you think about replacing the last two lines with just
> 
>  ok($bytes =~ /\A\x00*+\z/, 'make sure wal segment is zeroed’);

It seems expensive to regex over BLCKSZ, but it’s probably the safest option
and it’s not a performance critical codepath.  Feel free to whack the test
patch over the head with the above diff.

cheers ./daniel


Re: ECPG installcheck tests fail if PGDATABASE is set

2018-03-18 Thread Tom Lane
Andres Freund  writes:
> On March 18, 2018 4:06:18 PM PDT, Tom Lane  wrote:
>> Hm ... pg_regress unsets PGDATABASE, along with the other related
>> environment variables, when it has a temp installation but not
>> when it doesn't.  So what I don't understand is why your environment
>> doesn't also break every other regression test besides ecpg.

> All the others specify a database. The issue with the ecpg test is that
> it doesn't for two test cases.

Ah.  Well, it doesn't seem unreasonable to want to test that case,
so I don't think "remove the test case" is the right answer.

Is it sane for pg_regress to unset PGDATABASE unconditionally?  Not
sure, but if we're generally always specifying a value, maybe that's
OK.

regards, tom lane



Re: ECPG installcheck tests fail if PGDATABASE is set

2018-03-18 Thread Andres Freund
Hi,

On 2018-03-18 19:30:33 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On March 18, 2018 4:06:18 PM PDT, Tom Lane  wrote:
> >> Hm ... pg_regress unsets PGDATABASE, along with the other related
> >> environment variables, when it has a temp installation but not
> >> when it doesn't.  So what I don't understand is why your environment
> >> doesn't also break every other regression test besides ecpg.
> 
> > All the others specify a database. The issue with the ecpg test is that
> > it doesn't for two test cases.
> 
> Ah.  Well, it doesn't seem unreasonable to want to test that case,
> so I don't think "remove the test case" is the right answer.

Right.


> Is it sane for pg_regress to unset PGDATABASE unconditionally?  Not
> sure, but if we're generally always specifying a value, maybe that's
> OK.

I'm not sure either.  I wonder whether we should just make ecpg's
pg_regress invocation do so?  That seems to be the way of least
resistance ;)

Greetings,

Andres Freund



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-18 Thread Tomas Vondra
Hi,

Attached is an updated version of the patch series, addressing issues
pointed out by Alvaro. Let me go through the main changes:


1) I've updated / reworked the docs, updating the XML docs. There were
some obsolete references to functions that got renamed later, and I've
also reworked some of the user-facing docs with the aim to meet Alvaro's
suggestions. I've removed the references to READMEs etc, and at this
point I'm not sure I have a good idea how to improve this further ...


2) I got rid of the UPDATE_RESULT macro, along with counting the
matches. Initially I intended to just expand the macro and fix the match
counting (as mentioned in the FIXME), but I came to the conclusion it's
not really worth the complexity.

The idea was that by keeping the count of matching MCV items / histogram
buckets, we can terminate early in some cases. For example when
evaluating AND-clause, we can just terminate when (nmatches==0). But I
have no numbers demonstrating this actually helps, and furthermore it
was not implemented in histograms (well, we still counted the matches
but never terminated).

So I've just ripped that out and we can put it back later if needed.


3) Regarding the pg_mcv_list_items() and pg_histogram_buckets()
functions, it occurred to me that people can't really inject malicious
values because are no casts to the custom data types used to store MCV
lists and histograms in pg_statistic_ext.

The other issue was the lack of knowledge of data types for values
stored in the statistics. The code used OID of the statistic to get this
information (by looking at the relation). But it occurred to me this
could be solved the same way the regular statistics solve this - by
storing OID of the types. The anyarray does this automatically, but
there's no reason we can't do that too in pg_mcv_list and pg_histogram.

So I've done that, and the functions now take the custom data types
instead of the OID. I've also tweaked the documentation to use the
lateral syntax (good idea!) and added a new section into funcs.sgml.


4) I've merged the 0001 and 0002 patches. The 0001 was not really a bug
fix, and it was a behavior change required by introducing the MCV list,
so merging it seems right.


5) I've moved some changes from the histogram patch to MCV. The original
patch series was structured so that it introduced some code in mcv.c and
them moved it into extended_statistic.c so that it can be shared. Now
it's introduced in mcv.c right away, which makes it easier to understand
and reduces size of the patches.


6) I've fixed a bunch of comments, obsolete FIXMEs, etc.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-multivariate-MCV-lists.patch.gz
Description: application/gzip


0002-multivariate-histograms.patch.gz
Description: application/gzip


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2018-03-18 Thread Chapman Flack
On 03/18/18 19:28, Daniel Gustafsson wrote:
> It seems expensive to regex over BLCKSZ, but it’s probably the safest option
> and it’s not a performance critical codepath.  Feel free to whack the test
> patch over the head with the above diff.

Both patches in a single email for cfbot's enjoyment, coming right up.

-Chap
>From b2d36e3e4f4cd003cd7564a217bb15c19e1da5e2 Mon Sep 17 00:00:00 2001
From: Chapman Flack 
Date: Sun, 18 Mar 2018 20:03:04 -0400
Subject: [PATCH] Add test for ensuring WAL segment is zeroed out

Switch to a new WAL file and ensure the last block in the now
switched-away-from file is completely zeroed out.

This adds a mode to check_pg_config() where the match from the
regexp can be returned, not just the literal count of matches.
Only one match is returned (the first), but given the structure
of pg_config.h that's likely to be enough.
---
 .travis.yml  | 47 
 src/test/perl/TestLib.pm | 18 --
 src/test/recovery/t/002_archiving.pl | 19 ++-
 3 files changed, 81 insertions(+), 3 deletions(-)
 create mode 100644 .travis.yml

diff --git a/.travis.yml b/.travis.yml
new file mode 100644
index 000..386bb60
--- /dev/null
+++ b/.travis.yml
@@ -0,0 +1,47 @@
+
+sudo: required
+addons:
+  apt:
+packages:
+  - gdb
+  - lcov
+  - libipc-run-perl
+  - libperl-dev
+  - libpython-dev
+  - tcl-dev
+  - libldap2-dev
+  - libicu-dev
+  - docbook
+  - docbook-dsssl
+  - docbook-xsl
+  - libxml2-utils
+  - openjade1.3
+  - opensp
+  - xsltproc
+language: c
+cache: ccache
+before_install:
+  - echo '/tmp/%e-%s-%p.core' | sudo tee /proc/sys/kernel/core_pattern
+  - echo "deb http://archive.ubuntu.com/ubuntu xenial main" | sudo tee /etc/apt/sources.list.d/xenial.list > /dev/null
+  - |
+sudo tee -a /etc/apt/preferences.d/trusty > /dev/null /dev/null) ; do
+  binary=$(gdb -quiet -core $corefile -batch -ex 'info auxv' | grep AT_EXECFN | perl -pe "s/^.*\"(.*)\"\$/\$1/g")
+  echo dumping $corefile for $binary
+  gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" $binary $corefile
+done
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 7d017d4..8b9796c 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -227,15 +227,29 @@ sub append_to_file
 # that.
 sub check_pg_config
 {
-	my ($regexp) = @_;
+	my ($regexp, $value) = @_;
 	my ($stdout, $stderr);
+	my $match;
 	my $result = IPC::Run::run [ 'pg_config', '--includedir' ], '>',
 	  \$stdout, '2>', \$stderr
 	  or die "could not execute pg_config";
 	chomp($stdout);
 
 	open my $pg_config_h, '<', "$stdout/pg_config.h" or die "$!";
-	my $match = (grep {/^$regexp/} <$pg_config_h>);
+	if (defined $value)
+	{
+		while (<$pg_config_h>)
+		{
+			$_ =~ m/^$regexp/;
+			next unless defined $1;
+			$match = $1;
+			last;
+		}
+	}
+	else
+	{
+		$match = (grep {/^$regexp/} <$pg_config_h>);
+	}
 	close $pg_config_h;
 	return $match;
 }
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index e1bd3c9..9551db1 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 2;
 use File::Copy;
 
 # Initialize master node, doing archives
@@ -49,3 +49,20 @@ $node_standby->poll_query_until('postgres', $caughtup_query)
 my $result =
   $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 is($result, qq(1000), 'check content from archives');
+
+# Add some content, switch WAL file and read the last segment of the WAL
+# file which should be zeroed out
+my $current_file = $node_master->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_current_wal_lsn())");
+$node_master->safe_psql('postgres', "INSERT INTO tab_int VALUES (1)");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
+
+my $xlog_blcksz = check_pg_config('#define XLOG_BLCKSZ (\d+)', 1);
+
+open my $fh, '<:raw', $node_master->data_dir . '/pg_wal/' . $current_file;
+# SEEK_END is defined as 2, but is not allowed when using strict mode
+seek $fh, ($xlog_blcksz * -1), 2;
+my $len = read($fh, my $bytes, $xlog_blcksz);
+close $fh;
+
+ok($bytes =~ /\A\x00*+\z/, 'make sure wal segment is zeroed');
-- 
2.7.3

From 35684bdaa1d27c3f4b7198541f3c92bb2b4cb2f4 Mon Sep 17 00:00:00 2001
From: Chapman Flack 
Date: Sun, 25 Feb 2018 11:44:47 -0500
Subject: [PATCH] Zero headers of unused pages after WAL switch.

When writing zeroed pages to the remainder of a WAL segment
after a WAL switch, ensure that the headers of those pages are
also zeroed, as their initialized values otherwise reduce the
comp

Re: Flexible permissions for REFRESH MATERIALIZED VIEW

2018-03-18 Thread Tom Lane
Isaac Morland  writes:
> The original idea was to allow access to REFRESH MATERIALIZED VIEW to be a
> grantable permission, rather than being reserved to the table owner.

I'm not really on board with making that a separately grantable
permission.  You can do what you need today by having the matview be owned
by a single-purpose group role and granting membership in that role as
needed.  We do not have an infinite supply of available privilege type
bits --- in fact, without breaking on-disk compatibility, there are only
four free bits in AclMode.  So I can't see using one of them up for
REFRESH, let alone using three of them up for REFRESH, CLUSTER and
REINDEX.

> The patch so far uses TRUNCATE permission to control access as a
> proof-of-concept.

I can't see doing that either, TBH.  It's just ugly, and it surely doesn't
scale to cases where the conflated operations all apply to the same kind
of object.  (For instance, including CLUSTER permissions in TRUNCATE
wouldn't be very sane.)

It's conceivable that we could somehow allow new bits to get overlaid
onto bits currently used only for other object types, without that being
user-visible.  But I believe there are significant implementation issues
that'd have to be surmounted; for instance aclitemout has no idea what
sort of object the ACL it's printing is for.  Also, the ACL machinery
doesn't currently think that "matview" is a distinct type of object
from "table" anyway; it just sees 'em all as "relation".

Or, maybe I'm wrong and people feel that REFRESH-by-non-owner is important
enough to be worth consuming an AclMode bit for.  But I'm dubious.

> I think backward compatibility is pretty good.

No, actually, backwards compatibility would be bad.  If someone
had granted any privileges on their matview, which they almost certainly
would have if they cared about this scenario at all, then the owner's
privileges would have been instantiated in pg_class.relacl as
ACL_ALL_RIGHTS_RELATION, which doesn't (currently) include the
hypothetical new ACL_REFRESH bit.  So after upgrading, they'd have to
explicitly grant themselves the new REFRESH right before they could
refresh.  (I think pg_dump makes some attempt to cater for this case
by printing "GRANT ALL PRIVILEGES" in cases where all currently-extant
privileges are granted, but that's not very bulletproof; especially not
when pg_dump and server versions don't match, as they would not in an
upgrade situation.)

We could avoid that set of problems if we just redefined TRUNCATE as
meaning "TRUNCATE or REFRESH", but then you find out that people who
didn't have REFRESH ability before suddenly do.  Admittedly there was
no reason to grant such a privilege on a matview before, so maybe
people didn't; but by the same token there was no harm in granting
such a privilege, so maybe people did.  It's certainly not free from
the backwards-compatibility standpoint.

Now, none of these things are particularly the fault of this patch;
the bottom line is we don't have a good design for adding privilege
types to existing object kinds.  But nonetheless these are problems.

regards, tom lane



Re: ECPG installcheck tests fail if PGDATABASE is set

2018-03-18 Thread Tom Lane
Andres Freund  writes:
> On 2018-03-18 19:30:33 -0400, Tom Lane wrote:
>> Is it sane for pg_regress to unset PGDATABASE unconditionally?  Not
>> sure, but if we're generally always specifying a value, maybe that's
>> OK.

> I'm not sure either.  I wonder whether we should just make ecpg's
> pg_regress invocation do so?  That seems to be the way of least
> resistance ;)

Don't think I like ecpg's tests behaving differently in this respect
than the rest of them do; that seems like a recipe for unrecognized
security issues.

If nobody can think of a positive reason for pg_regress not to
unset PGDATABASE unconditionally, let's try that and see how it
goes.

regards, tom lane



Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-18 Thread Tomas Vondra
On 03/17/2018 05:43 AM, Arthur Zakirov wrote:
> Hello Tomas,
> 
> Arthur, what are your plans with this patch in the current CF?
> 
> 
> I think dsm-based approach is in good shape already and works nice.
> I've planned only to improve the documentation a little. Also it seems I
> should change 0004 part, I found that extension upgrade scripts may be
> made in wrong way.
> In my opinion RELOAD and UNLOAD commands can be made in next commitfest
> (2018-09).
> Did you look it? Have you arguments about how shared memory allocation
> and releasing functions are made?
>  
> 
> 
> It does not seem to be moving towards RFC very much, and reworking the
> patch to use mmap() seems like a quite significant change late in the
> CF. Which means it's likely to cause the patch get get bumped to the
> next CF (2018-09).
> 
> 
> Agree. I have a draft version for mmap-based approach which works in
> platforms with mmap. In Windows it is necessary to use another API
> (CreateFileMapping, etc). But this approach requires more work on
> handling processed dictionary files (how name them, when remove).
>  
> 
> 
> FWIW I am not quite sure if the mmap() approach is better than what was
> implemented by the patch. I'm not sure how exactly will it behave under
> memory pressure (AFAIK it goes through page cache, which means random
> parts of dictionaries might get evicted) or how well is it supported on
> various platforms (say, Windows).
> 
> 
> Yes, as I wrote mmap-based approach requires more work. The only
> benefit I see is that you don't need to process a dictionary after
> server restart.  I'd vote for dsm-based approach.
> 

I do agree with that. We have a working well-understood dsm-based
solution, addressing the goals initially explained in this thread.

I don't see a reason to stall this patch based on a mere assumption that
the mmap-based approach might be magically better in some unknown
aspects. It might be, but we may as well leave that as a future work.

I wonder how much of this patch would be affected by the switch from dsm
to mmap? I guess the memory limit would get mostly irrelevant (mmap
would rely on the OS to page the memory in/out depending on memory
pressure), and so would the UNLOAD/RELOAD commands (because each backend
would do it's own mmap).

In any case, I suggest to polish the dsm-based patch, and see if we can
get that one into PG11.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: HELP

2018-03-18 Thread Craig Ringer
On 18 March 2018 at 04:08, Projat Banerjee 
wrote:

> What is the type of proposal should I submit here ? What kind or on what
> basis should I build my proposal so that I may get easily selected or
> chances for my selection is high ?
>
>

Are you asking about Google Summer of Code?

Because if so, I honestly think you need to be able to answer that question
yourself and demonstrate the required research, search and reading skills.
Finding a suitable project is the first step, and there are already
resources on the wiki to help you with that. If you can't then your chances
of successfully completing a project are quite low.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-18 Thread Andres Freund
Hi,

On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote:
> I do agree with that. We have a working well-understood dsm-based
> solution, addressing the goals initially explained in this thread.

Well, it's also awkward and manual to use. I do think that's something
we've to pay attention to.


> I wonder how much of this patch would be affected by the switch from dsm
> to mmap? I guess the memory limit would get mostly irrelevant (mmap
> would rely on the OS to page the memory in/out depending on memory
> pressure), and so would the UNLOAD/RELOAD commands (because each backend
> would do it's own mmap).

Those seem fairly major.

Greetings,

Andres Freund



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-18 Thread Masahiko Sawada
On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
 wrote:
> On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada 
> wrote:
>>
>> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
>>  wrote:
>> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
>> > wrote:
>> >>
>> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
>> >>  wrote:
>> >> > 2) These parameters are reset during btbulkdelete() and set during
>> >> > btvacuumcleanup().
>> >>
>> >> Can't we set these parameters even during btbulkdelete()? By keeping
>> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
>> >> even after index bulk-delete.
>> >
>> >
>> > We certainly can update cleanup-related parameters during
>> > btbulkdelete().
>> > However, in this case we would update B-tree meta-page during each
>> > VACUUM cycle.  That may cause some overhead for non append-only
>> > workloads.  I don't think this overhead would be sensible, because in
>> > non append-only scenarios VACUUM typically writes much more of
>> > information.
>> > But I would like this oriented to append-only workload patch to be
>> > as harmless as possible for other workloads.
>>
>> What overhead are you referring here? I guess the overhead is only the
>> calculating the oldest btpo.xact. And I think it would be harmless.
>
>
> I meant overhead of setting last_cleanup_num_heap_tuples after every
> btbulkdelete with wal-logging of meta-page.  I bet it also would be
> harmless, but I think that needs some testing.

Agreed.

After more thought, it might be too late but we can consider the
possibility of another idea proposed by Peter. Attached patch
addresses the original issue of index cleanups by storing the epoch
number of page deletion XID into PageHeader->pd_prune_xid which is
4byte field. Comparing to the current proposed patch this patch
doesn't need neither the page upgrade code nor extra WAL-logging. If
we also want to address cases other than append-only case we will
require the bulk-delete method of scanning whole index and of logging
WAL. But it leads some extra overhead. With this patch we no longer
need to depend on the full scan on b-tree index. This might be useful
for a future when we make the bulk-delete of b-tree index not scan
whole index.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


skip-cleanup-index-using-epoch.patch
Description: Binary data


Re: User defined data types in Logical Replication

2018-03-18 Thread Masahiko Sawada
On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera
 wrote:
> Masahiko Sawada wrote:
>> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera  
>> wrote:
>
>> > I think this is a worthwhile test, but IMO it should be improved a bit
>> > before we include it.  Also, we can come up with a better name for the
>> > test surely, not just refer to this particular bug. Maybe "typemap".
>>
>> It might be useful if we have the facility of TAP test to check the
>> log message and regexp-match the message to the expected string.
>
> Something similar to PostgresNode::issues_sql_like() perhaps?
>

Yeah, I didn't know that but I think it's a good idea. Unlike
issues_sql_like() we don't issue anything to the subscriber. So maybe
we need truncate logfile before insertion and verify logfile of
particular period. The test code would be like follows.

$node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION...");
truncate $node_subscriber->logfile, 0;
$node_publisher->safe_psql('postgres', 'INSERT .. ')
my $log = TestLib::slurp_file($node_subscriber->logfile);

# Verify logs
like($log, qr/processing remote data for replication target relation
"public.test" column "b", remote type dummyint, local type dummyint/,
'callback function of datatype conversion1');
like($log, qr/processing remote data for replication target relation
"public.test" column "a", remote type dummytext, local type
dummytext/, 'callback function of datatype conversion2');

Thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-18 Thread Kyotaro HORIGUCHI
At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada  
wrote in 
> On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
>  wrote:
> > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada 
> > wrote:
> >>
> >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
> >>  wrote:
> >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
> >> > wrote:
> >> >>
> >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
> >> >>  wrote:
> >> >> > 2) These parameters are reset during btbulkdelete() and set during
> >> >> > btvacuumcleanup().
> >> >>
> >> >> Can't we set these parameters even during btbulkdelete()? By keeping
> >> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
> >> >> even after index bulk-delete.
> >> >
> >> >
> >> > We certainly can update cleanup-related parameters during
> >> > btbulkdelete().
> >> > However, in this case we would update B-tree meta-page during each
> >> > VACUUM cycle.  That may cause some overhead for non append-only
> >> > workloads.  I don't think this overhead would be sensible, because in
> >> > non append-only scenarios VACUUM typically writes much more of
> >> > information.
> >> > But I would like this oriented to append-only workload patch to be
> >> > as harmless as possible for other workloads.
> >>
> >> What overhead are you referring here? I guess the overhead is only the
> >> calculating the oldest btpo.xact. And I think it would be harmless.
> >
> >
> > I meant overhead of setting last_cleanup_num_heap_tuples after every
> > btbulkdelete with wal-logging of meta-page.  I bet it also would be
> > harmless, but I think that needs some testing.
> 
> Agreed.
> 
> After more thought, it might be too late but we can consider the
> possibility of another idea proposed by Peter. Attached patch
> addresses the original issue of index cleanups by storing the epoch
> number of page deletion XID into PageHeader->pd_prune_xid which is
> 4byte field.

Mmm. It seems to me that the story is returning to the
beginning. Could I try retelling the story?

I understant that the initial problem was vacuum runs apparently
unnecessary full-scan on indexes many times. The reason for that
is the fact that a cleanup scan may leave some (or many under
certain condition) dead pages not-recycled but we don't know
whether a cleanup is needed or not. They will be staying left
forever unless we run additional cleanup-scans at the appropriate
timing.

(If I understand it correctly,) Sawada-san's latest proposal is
(fundamentally the same to the first one,) just skipping the
cleanup scan if the vacuum scan just before found that the number
of *live* tuples are increased. If there where many deletions and
insertions but no increase of total number of tuples, we don't
have a cleanup. Consequently it had a wraparound problem and it
is addressed in this version.

(ditto.) Alexander proposed to record the oldest xid of
recyclable pages in metapage (and the number of tuples at the
last cleanup). This prevents needless cleanup scan and surely
runs cleanups to remove all recyclable pages.

I think that we can accept Sawada-san's proposal if we accept the
fact that indexes can retain recyclable pages for a long
time. (Honestly I don't think so.)

If (as I might have mentioned as the same upthread for Yura's
patch,) we accept to hold the information on index meta page,
Alexander's way would be preferable. The difference betwen Yura's
and Alexander's is the former runs cleanup scan if a recyclable
page is present but the latter avoids that before any recyclable
pages are knwon to be removed.

>   Comparing to the current proposed patch this patch
> doesn't need neither the page upgrade code nor extra WAL-logging. If

# By the way, my proposal was storing the information as Yura
# proposed into stats collector. The information maybe be
# available a bit lately, but it doesn't harm. This doesn't need
# extra WAL logging nor the upgrad code:p

> we also want to address cases other than append-only case we will

I'm afraid that "the problem for the other cases" is a new one
that this patch introduces, not an existing one.

> require the bulk-delete method of scanning whole index and of logging
> WAL. But it leads some extra overhead. With this patch we no longer
> need to depend on the full scan on b-tree index. This might be useful
> for a future when we make the bulk-delete of b-tree index not scan
> whole index.

Perhaps I'm taking something incorrectly, but is it just the
result of skipping 'maybe needed' scans without condiering the
actual necessity?

I also don't like extra WAL logging, but it happens once (or
twice?) per vaccum cycle (for every index). On the other hand I
want to put the on-the-fly upgrade path out of the ordinary
path. (Reviving the pg_upgrade's custom module?)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-18 Thread Kyotaro HORIGUCHI
At Fri, 16 Mar 2018 21:15:54 +0900, Michael Paquier  wrote 
in <20180316121554.ga2...@paquier.xyz>
> On Fri, Mar 16, 2018 at 09:40:18AM +0100, Pavel Stehule wrote:
> > 2018-03-16 9:34 GMT+01:00 Kyotaro HORIGUCHI  >> That's true, but doesn't the additional rule make it more
> >> bothersome to maintain the list?
> > 
> > Hard to say. I have not opinion. But just when I see in this context
> > variables like listen_address, shared_preload_libraries, then it looks
> > messy.
> 
> Let's be clear.  I have listed all the variables in the patch to gather
> more easily opinions, and because it is easier to review the whole stack
> this way. I personally think that the only variables where the patch
> makes sense are:
> - DateStyle
> - search_path
> - plpgsql.extra_errors
> - plpgsql.extra_warnings
> - wal_consistency_checking
> So I would be incline to drop the rest from the patch.  If there are
> authors of popular extensions willing to get this support, let's update
> the list once they argue about it and only if it makes sense.  However,
> as far as I can see, there are no real candidates.  So let's keep the
> list simple.

FWIW +1 from me. It seems reasonable as the amendment to the
current status.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-18 Thread Kyotaro HORIGUCHI
Sorry I'd like to make a trivial but critical fix.

At Mon, 19 Mar 2018 14:45:05 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180319.144505.166111203.horiguchi.kyot...@lab.ntt.co.jp>
> At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada  
> wrote in 
> > On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
> >  wrote:
> > > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada 
> > > wrote:
> > >>
> > >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
> > >>  wrote:
> > >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
> > >> > wrote:
> > >> >>
> > >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
> > >> >>  wrote:
> > >> >> > 2) These parameters are reset during btbulkdelete() and set during
> > >> >> > btvacuumcleanup().
> > >> >>
> > >> >> Can't we set these parameters even during btbulkdelete()? By keeping
> > >> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
> > >> >> even after index bulk-delete.
> > >> >
> > >> >
> > >> > We certainly can update cleanup-related parameters during
> > >> > btbulkdelete().
> > >> > However, in this case we would update B-tree meta-page during each
> > >> > VACUUM cycle.  That may cause some overhead for non append-only
> > >> > workloads.  I don't think this overhead would be sensible, because in
> > >> > non append-only scenarios VACUUM typically writes much more of
> > >> > information.
> > >> > But I would like this oriented to append-only workload patch to be
> > >> > as harmless as possible for other workloads.
> > >>
> > >> What overhead are you referring here? I guess the overhead is only the
> > >> calculating the oldest btpo.xact. And I think it would be harmless.
> > >
> > >
> > > I meant overhead of setting last_cleanup_num_heap_tuples after every
> > > btbulkdelete with wal-logging of meta-page.  I bet it also would be
> > > harmless, but I think that needs some testing.
> > 
> > Agreed.
> > 
> > After more thought, it might be too late but we can consider the
> > possibility of another idea proposed by Peter. Attached patch
> > addresses the original issue of index cleanups by storing the epoch
> > number of page deletion XID into PageHeader->pd_prune_xid which is
> > 4byte field.
> 
> Mmm. It seems to me that the story is returning to the
> beginning. Could I try retelling the story?
> 
> I understant that the initial problem was vacuum runs apparently
> unnecessary full-scan on indexes many times. The reason for that
> is the fact that a cleanup scan may leave some (or many under
> certain condition) dead pages not-recycled but we don't know
> whether a cleanup is needed or not. They will be staying left
> forever unless we run additional cleanup-scans at the appropriate
> timing.
> 
> (If I understand it correctly,) Sawada-san's latest proposal is
> (fundamentally the same to the first one,) just skipping the
> cleanup scan if the vacuum scan just before found that the number
> of *live* tuples are increased. If there where many deletions and
> insertions but no increase of total number of tuples, we don't
> have a cleanup. Consequently it had a wraparound problem and it
> is addressed in this version.
> 
> (ditto.) Alexander proposed to record the oldest xid of
> recyclable pages in metapage (and the number of tuples at the
> last cleanup). This prevents needless cleanup scan and surely
> runs cleanups to remove all recyclable pages.
> 
> I think that we can accept Sawada-san's proposal if we accept the
> fact that indexes can retain recyclable pages for a long
> time. (Honestly I don't think so.)
> 
> If (as I might have mentioned as the same upthread for Yura's
> patch,) we accept to hold the information on index meta page,
> Alexander's way would be preferable. The difference betwen Yura's
> and Alexander's is the former runs cleanup scan if a recyclable
> page is present but the latter avoids that before any recyclable

- pages are knwon to be removed.
+ pages are knwon to be actually removable

> >   Comparing to the current proposed patch this patch
> > doesn't need neither the page upgrade code nor extra WAL-logging. If
> 
> # By the way, my proposal was storing the information as Yura
> # proposed into stats collector. The information maybe be
> # available a bit lately, but it doesn't harm. This doesn't need
> # extra WAL logging nor the upgrad code:p
> 
> > we also want to address cases other than append-only case we will
> 
> I'm afraid that "the problem for the other cases" is a new one
> that this patch introduces, not an existing one.
> 
> > require the bulk-delete method of scanning whole index and of logging
> > WAL. But it leads some extra overhead. With this patch we no longer
> > need to depend on the full scan on b-tree index. This might be useful
> > for a future when we make the bulk-delete of b-tree index not scan
> > whole index.
> 
> Perhaps I'm taking something incorrectly, but is it just the
> result of skipping 'maybe needed' scans without condiering the
> actua

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-18 Thread Amit Langote
On 2018/03/05 18:04, Pavan Deolasee wrote:
> On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera 
> wrote:
>> Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
>> expand_targetlist() runs *after*, not before, so how could it have
>> affected the result?
>>
> 
> If I understand correctly, planner must have called expand_targetlist()
> once for the parent relation's descriptor and added any dropped columns
> from the parent relation. So we may not find mapped attributes for those
> dropped columns in the parent. I haven't actually tested this case though.
> 
> I wonder if it makes sense to actually avoid expanding on-conflict-set
> targetlist in case the target is a partition table and deal with it during
> execution, like we are doing now.
>> I'm obviously confused about what
>> expand_targetlist call this comment is talking about.  Anyway I wanted
>> to make it use resjunk entries instead, but that broke some other case
>> that I didn't have time to research yesterday.  I'll get back to this
>> soon, but in the meantime, here's what I have.
>>
> 
> +   conv_setproj =
> +
>  adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
> + RelationGetDescr(partrel),
> +
>  RelationGetRelationName(partrel),
> + firstVarno,
> + conv_setproj);
> 
> Aren't we are adding back Vars referring to the parent relation, after
> having converted the existing ones to use partition relation's varno? May
> be that works because missing attributes are already added during planning
> and expand_targetlist() here only adds dropped columns, which are just NULL
> constants.

I think this suggestion to defer onConflictSet target list expansion  to
execution time is a good one.  So, in preprocess_targetlist(), we'd only
perform exapand_targetlist on the onConflictSet list if the table is not a
partitioned table.  For partitioned tables, we don't know which
partition(s) will be affected, so it's useless to do the expansion there.
Instead it's better to expand in ExecInitPartitionInfo(), possibly after
changing original TargetEntry nodes to have correct resnos due to
attribute number differences (calling adjust_inherited_tlist(), etc.).

And then since we're not expanding the target list in the planner anymore,
we don't need to install those hacks in adjust_inherited_tlist() like the
patch currently does to ignore entries for dropped columns in the parent
the plan-time expansion adds.

Thanks,
Amit




Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-03-18 Thread amul sul
On Sat, Mar 17, 2018 at 4:32 PM, Amit Kapila  wrote:
> On Mon, Mar 12, 2018 at 6:33 PM, amul sul  wrote:
>> On Mon, Mar 12, 2018 at 11:45 AM, amul sul  wrote:
>>> On Sat, Mar 10, 2018 at 5:25 PM, Amit Kapila  
>>> wrote:
 complete CTID.
>>>
>>> Sure, will do that.
>>
>> I did the aforementioned changes in the attached patch, thanks.
>>
>
> --- a/src/include/storage/itemptr.h
> +++ b/src/include/storage/itemptr.h
> @@ -23,7 +23,9 @@
>   * This is a pointer to an item within a disk page of a known file
>   * (for example, a cross-link from an index to its parent table).
>   * blkid tells us which block, posid tells us which entry in the linp
> - * (ItemIdData) array we want.
> + * (ItemIdData) array we want.  blkid is marked InvalidBlockNumber when
> + * a tuple is moved to another partition relation due to an update of
> + * the partition key.
>
> I think instead of updating this description in itemptr.h, you should
> update it in htup_details.h where we already have a description of
> t_ctid.  After this patch, the t_ctid column value in heap_page_items
> function will show it as InvalidBlockNumber and in the documentation,
> we have given the reference of htup_details.h.   Other than that the
> latest version looks good to me.
>

Okay, fixed in the attached version.

> I have marked this patch as RFC as this is a small change, hope you
> can update the patch soon.
>

Thank you, updated patch attached.

Regards,
Amul


0001-Invalidate-ip_blkid-v7.patch
Description: Binary data


0002-isolation-tests-v6.patch
Description: Binary data


Re: Online enabling of checksums

2018-03-18 Thread Heikki Linnakangas

Hi,

The patch looks good to me at a high level. Some notes below. I didn't 
read through the whole thread, so sorry if some of these have been 
discussed already.



+void
+ShutdownChecksumHelperIfRunning(void)
+{
+   if 
(pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+   {
+   /* Launcher not started, so nothing to shut down */
+   return;
+   }
+
+   ereport(ERROR,
+   (errmsg("checksum helper is currently running, cannot 
disable checksums"),
+errhint("Restart the cluster or wait for the worker to 
finish.")));
+}


Is there no way to stop the checksum helper once it's started? That 
seems rather user-unfriendly. I can imagine it being a pretty common 
mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to 
realize that you forgot to set the cost limit, and that it's hurting 
queries too much. At that point, you want to abort.



+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.


I can't parse this sentence.


+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);


I find the name of the DataChecksumsEnabledOrInProgress() function a bit 
long. And doing this in PageIsVerified looks a bit weird:


> if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()



+ template0 is by default not accepting connections, to
+ enable checksums you'll need to temporarily make it accept connections.


This was already discussed, and I agree with the other comments that 
it's bad.



+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;


pg_[enable|disable]_checksums() functions return a bool. It's not clear 
to me when they would return what. I'd suggest marking them as 'void' 
instead.



--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION 4
 #define PG_DATA_CHECKSUM_VERSION   1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION2
 


This seems like a weird place for these PG_DATA_CHECKSUM_* constants. 
They're not actually stored in the page header, as you might think. I 
think the idea was to keep PG_DATA_CHECKSUM_VERSION close to 
PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk 
format. I think it was a bit weird even before this patch, but now it's 
worse. At least a better comment would be in order, or maybe move these 
elsewhere.


Feature request: it'd be nice to update the 'ps status' with 
set_ps_display, to show a rudimentary progress indicator. Like the name 
and block number of the relation being processed. It won't tell you how 
much is left to go, but at least it will give a warm fuzzy feeling to 
the DBA that something is happening.


I didn't review the offline tool, pg_verify_checksums.

- Heikki




Re: Hash join in SELECT target list expression keeps consuming memory

2018-03-18 Thread Amit Khandekar
On 17 March 2018 at 00:47, Tom Lane  wrote:
> Amit Khandekar  writes:
>> If the SELECT target list expression is a join subquery, and if the
>> subquery does a hash join, then the query keeps on consuming more and
>> more memory. Below is such a query :
>
> Thanks for the report!
>
> I dug into this with valgrind, and found that the problem is that
> ExecHashTableCreate allocates some memory that isn't freed by
> ExecHashTableDestroy, specifically the per-hash-key function
> information.  This is just dumb.  We can keep that stuff in the
> hashtable's hashCxt instead, where it will get freed at the right time.
> The attached patch seems to fix it just by reordering the code.

I saw that you have now committed the fix and also backported it to
all supported branches.

Thanks !

-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company