Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:

I'm seeing a reproducible bus error here:

#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable 
"stats" is not available.
)
   at mcv.c:785
785 memcpy(ITEM_BASE_FREQUENCY(item, ndims), 
&mcvitem->base_frequency, sizeof(double));

What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as

#define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, 
ndims) + 1))

the compiler is assuming that the first argument to memcpy is
double-aligned, and it is generating code that depends on that being
true, and of course it isn't true and kaboom.

You can *not* cast something to an aligned pointer type if it's not
actually certain to be aligned suitably for that type.  In this example,
even if you wrote "(char *)" in front of this, it wouldn't save you;
the compiler would still be entitled to believe that the intermediate
cast value meant something.  The casts in the underlying macros
ITEM_FREQUENCY and so on are equally unsafe.



OK. So the solution is to ditch the casts altogether, and then do plain
pointer arithmetics like this:

#define ITEM_INDEXES(item)  (item)
#define ITEM_NULLS(item,ndims)  (ITEM_INDEXES(item) + (ndims))
#define ITEM_FREQUENCY(item,ndims)  (ITEM_NULLS(item, ndims) + (ndims))
#define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + 
sizeof(double))

Or is that still relying on alignment, somehow?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





inheritance of connection paramters when using \c

2019-06-26 Thread nagaura.ryo...@fujitsu.com
Hello!

\c is followed by [-reuse-previous=on/off] positional syntax | conninfo
Using \c -reuse-previous=on positional syntax or \c positional syntax without 
-reuse-previous option, some parameters which were omitted or notated as "-" 
will be reused in the new connection.
The target is only "dbname", "user", "host", "port" in the current 
implementation.
# details in [1]
When we discussed in [2], this topic came out.
Although I'm not heavy psql user, I want it to inherit connection parameters 
and Fabien-san does too.

My new specification:
\c inherits all the unspecified connection parameters in PQconninfoOptions in 
cases below.
a) \c -reuse-previous=on positional syntax
b) \c positional syntax
This is just an expansion of the target of inheritance of parameters from the 
current specification.

I have an item to talk about.
It is whether the message which indicates connection information to users has 
to have more information such as
You are now connected to database "TomANDJelly" as user "nyannyan" with some 
parameters [reused | defaults].
# after "nyannyan"

Do you have any thoughts?

[1] https://www.postgresql.org/docs/12/app-psql.html
[2] 
https://www.postgresql.org/message-id/flat/EDA4195584F5064680D8130B1CA91C45367328@G01JPEXMBYT04
Best regards,
-
Ryohei Nagaura





[patch]socket_timeout in interfaces/libpq

2019-06-26 Thread nagaura.ryo...@fujitsu.com
Hello all.

First, I'd like to appreciate with all your reviewing and discussion in the 
last CommitFest[1].

I don't think that the rest one of my proposals has been rejected completely, 
so I want to restart discussion.

It is a timeout parameter in interfaces/libpq.

Consider some situations where some happening occurred a server and it became 
significant busy. e.g., what I and Tsunakawa-san have illustrated[2][3].
These server's bad condition(i.e., non-functional server) could cause clients' 
infinite waiting because it is not always possible for current timeout 
parameters in backend side to fire.
Under such server's bad condition, control should be passed to the client after 
a certain period of time, and just a timeout disconnection corresponds to it.
Also, in such situations the backend parameters may not work, so we need to 
implement the timeout parameters on the client side.

It is preferable to implement this parameter in PQwait() where clients can wait 
endlessly.
However this can do unintended timeout when setting socket_timeout < 
statement_timeout(etc. some other timeout parameters).
Thus documentation warns this.

FYI, a similarly parameter socketTimeout is in pgJDBC[4].
Do you have any thoughts?

P.S.
Fabien-san, 
I'll build another thread and let's discussion there about \c's taking care of 
connection parameters you have pointed out!

[1] 
https://www.postgresql.org/message-id/flat/EDA4195584F5064680D8130B1CA91C45367328@G01JPEXMBYT04
[2] 
https://www.postgresql.org/message-id/EDA4195584F5064680D8130B1CA91C45367328%40G01JPEXMBYT04
[3] 
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1FBC7561%40G01JPEXMBYT05
[4] 
https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

Best regards,
-
Ryohei Nagaura


socket_timeout.patch
Description: socket_timeout.patch


Re: [PATCH] Speedup truncates of relation forks

2019-06-26 Thread Adrien Nayrat
On 6/12/19 10:29 AM, Jamison, Kirk wrote:
> 
>> From a user POW, the main issue with relation truncation is that it can block
>> queries on standby server during truncation replay.
>>
>> It could be interesting if you can test this case and give results of your
>> path.
>> Maybe by performing read queries on standby server and counting wait_event
>> with pg_wait_sampling?
> 
> Thanks for the suggestion. I tried using the extension pg_wait_sampling,
> But I wasn't sure that I could replicate the problem of blocked queries on 
> standby server.
> Could you advise?
> Here's what I did for now, similar to my previous test with hot standby setup,
> but with additional read queries of wait events on standby server.
> 
> 128MB shared_buffers
> SELECT create_tables(1);
> SELECT insert_tables(1);
> SELECT delfrom_tables(1);
> 
> [Before VACUUM]
> Standby: SELECT the following view from pg_stat_waitaccum
> 
> wait_event_type |   wait_event| calls | microsec
> -+-+---+--
>  Client  | ClientRead  | 2 | 20887759
>  IO  | DataFileRead|   175 | 2788
>  IO  | RelationMapRead | 4 |   26
>  IO  | SLRURead| 2 |   38
> 
> Primary: Execute VACUUM (induces relation truncates)
> 
> [After VACUUM]
> Standby:
>  wait_event_type |   wait_event| calls | microsec
> -+-+---+--
>  Client  | ClientRead  | 7 | 77662067
>  IO  | DataFileRead|   284 | 4523
>  IO  | RelationMapRead |10 |   51
>  IO  | SLRURead| 3 |   57
> 

(Sorry for the delay, I forgot to answer you)

As far as I remember, you should see "relation" wait events (type lock) on
standby server. This is due to startup process acquiring AccessExclusiveLock for
the truncation and other backend waiting to acquire a lock to read the table.

On primary server, vacuum is able to cancel truncation:

/*
 * We need full exclusive lock on the relation in order to do
 * truncation. If we can't get it, give up rather than waiting --- we
 * don't want to block other backends, and we don't want to deadlock
 * (which is quite possible considering we already hold a lower-grade
 * lock).
 */
vacrelstats->lock_waiter_detected = false;
lock_retry = 0;
while (true)
{
if (ConditionalLockRelation(onerel, AccessExclusiveLock))
break;

/*
 * Check for interrupts while trying to (re-)acquire the exclusive
 * lock.
 */
CHECK_FOR_INTERRUPTS();

if (++lock_retry > (VACUUM_TRUNCATE_LOCK_TIMEOUT /
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
{
/*
 * We failed to establish the lock in the specified number of
 * retries. This means we give up truncating.
 */
vacrelstats->lock_waiter_detected = true;
ereport(elevel,
(errmsg("\"%s\": stopping truncate due to conflicting lock 
request",
RelationGetRelationName(onerel;
return;
}

pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
}


To maximize chances to reproduce we can use big shared_buffers. But I am afraid
it is not easy to perform reproducible tests to compare results. Unfortunately I
don't have servers to perform tests.

Regards,



signature.asc
Description: OpenPGP digital signature


Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Thomas Munro
On Wed, Jun 26, 2019 at 6:32 PM Andrew Gierth
 wrote:
> Pacific/Auckland -> NZ

Right.  On a FreeBSD system here in New Zealand you get "NZ" with
default configure options (ie using PostgreSQL's tzdata).  But if you
build with --with-system-tzdata=/usr/share/zoneinfo you get
"Pacific/Auckland", and that's because the FreeBSD zoneinfo directory
doesn't include the old non-city names like "NZ", "GB", "Japan",
"US/Eastern" etc.  (Unfortunately the FreeBSD packages for PostgreSQL
are not being built with that option so initdb chooses the old names.
Something to take up with the maintainers.)

-- 
Thomas Munro
https://enterprisedb.com




GSoD Introductory Tutorial

2019-06-26 Thread Elif Ak
Dear GSoD PostgreSQL Organization Administrators,

 

I am a PhD Student in Istanbul Technical University, Computer Engineering.
Before the being full-time researcher in the laboratory, I was full-stack
software developer. Therefore, I am strongly familiar the SQL queries. Also,
specially, I used PostgreSQL in my school project naming "Database Systems"
in the bachelor degree (in 2017).  I am willing to contribute Introductory
Tutorial page in whole summer (if it is necessary, I can keep to enhance
tutorial after summer). I am proposing tutorial like in Kubernetes web page
(https://kubernetes.io/docs/home/)  which is compact and in some ways
interactive. So I want to discuss to make interactive tutorial. Also I am
strong in HTML and CSS issues therefore changing web style without
corrupting existing architecture will not problem for us. 

 

If you still need to this enhancement, we can keep in touch according to
your instructions. Otherwise, we can change the project with respect to your
needs. I am free in whole summer and I want to participate to the PostgreSQL
documentation.

 

The project idea was suggested by Pavan Agrawal, James Chanco as I have seen
in the Wiki page of PostgreSQL. However I could not sure about the correct
emails. Therefore I am writing this email according to given mail list in
the GSoD web page and wiki page of the PostgreSQL.

 

You can find my CV:
https://drive.google.com/file/d/1WJcW7E7gK-FWEXSlFF56Rd4KcogWrre_/view?usp=s
haring

Web page: http://bcrg.itu.edu.tr/MemberDetail.aspx?id=9

 

Best regards,

 

 





Elif Ak, PhD Student


  ak...@itu.edu.tr


ISTANBUL TECHNICAL UNIVERSITY


COMPUTER ENGINEERING


Campus of ITU Ayazaga, Department of Computer Engineering 34469, Maslak
Sarıyer / İstanbul



T.

+905387062245

 

 



Index Skip Scan - attempting to evalutate patch

2019-06-26 Thread pguser
Hello

This is my first posting to hackers so sorry if I'm taking up valuable time.

I'm currently migrating a packaged application which supported oracle and sql 
server to PostgreSQL.

Something that I've identified as hurting the performance a lot is loose index 
scanning. I don't have access to the application SQL , so all I can try and do 
is mitigate through indexes. There are ~4000 tables in the application schema, 
and ~6000 indices.

Some plans are clearly worse than I would expect - because there are lots of 
index(a,b,c) and select where a= and c=.

In an attempt to see if the putative skip scan changes will be beneficial on 
our real world data sets, I've been attempting to build and run pgsql from 
github with the v20- patch applied.

If I build without the patch, I get a running server, and can execute whatever 
queries I want.

If I apply the latest patch (which says 1 of 2? - maybe I'm missing a part of 
the patch?), I apply with

$ patch -p1 <../v20-0001-Index-skip-scan.patch
patching file contrib/bloom/blutils.c
patching file doc/src/sgml/config.sgml
patching file doc/src/sgml/indexam.sgml
patching file doc/src/sgml/indices.sgml
patching file src/backend/access/brin/brin.c
patching file src/backend/access/gin/ginutil.c
patching file src/backend/access/gist/gist.c
patching file src/backend/access/hash/hash.c
patching file src/backend/access/index/indexam.c
patching file src/backend/access/nbtree/nbtree.c
patching file src/backend/access/nbtree/nbtsearch.c
patching file src/backend/access/spgist/spgutils.c
patching file src/backend/commands/explain.c
patching file src/backend/executor/nodeIndexonlyscan.c
patching file src/backend/executor/nodeIndexscan.c
patching file src/backend/nodes/copyfuncs.c
patching file src/backend/nodes/outfuncs.c
patching file src/backend/nodes/readfuncs.c
patching file src/backend/optimizer/path/costsize.c
patching file src/backend/optimizer/path/pathkeys.c
patching file src/backend/optimizer/plan/createplan.c
patching file src/backend/optimizer/plan/planagg.c
patching file src/backend/optimizer/plan/planner.c
patching file src/backend/optimizer/util/pathnode.c
patching file src/backend/optimizer/util/plancat.c
patching file src/backend/utils/misc/guc.c
patching file src/backend/utils/misc/postgresql.conf.sample
patching file src/include/access/amapi.h
patching file src/include/access/genam.h
patching file src/include/access/nbtree.h
patching file src/include/nodes/execnodes.h
patching file src/include/nodes/pathnodes.h
patching file src/include/nodes/plannodes.h
patching file src/include/optimizer/cost.h
patching file src/include/optimizer/pathnode.h
patching file src/include/optimizer/paths.h
patching file src/test/regress/expected/create_index.out
patching file src/test/regress/expected/select_distinct.out
patching file src/test/regress/expected/sysviews.out
patching file src/test/regress/sql/create_index.sql
patching file src/test/regress/sql/select_distinct.sql

This will 'make' and 'make install' cleanly.

When I run the server, I can log in but the postgres processes associated with 
my psql session crashes SIGSEGV in many cases, for example when using \d:

psql (12beta2)
Type "help" for help.

db1=> show enable_indexskipscan;
enable_indexskipscan
--
on
(1 row)

db1=> \d
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

I got a backtrace out of the process:

(gdb) backtrace
#0  MemoryContextAllocZeroAligned (context=0x0, size=size@entry=80) at 
mcxt.c:864
#1  0x0067d2d4 in get_eclass_for_sort_expr (root=root@entry=0x22ecb10, 
expr=expr@entry=0x22ee280, nullable_relids=nullable_relids@entry=0x0, 
opfamilies=0x22ff530,
opcintype=opcintype@entry=19, collation=collation@entry=950, 
sortref=, rel=0x0, create_it=true) at equivclass.c:704
#2  0x00686d9e in make_pathkey_from_sortinfo 
(root=root@entry=0x22ecb10, expr=expr@entry=0x22ee280, 
nullable_relids=nullable_relids@entry=0x0, opfamily=1994, opcintype=19,
collation=950, reverse_sort=false, nulls_first=false, sortref=1, rel=0x0, 
create_it=true) at pathkeys.c:228
#3  0x00686eb7 in make_pathkey_from_sortop (root=root@entry=0x22ecb10, 
expr=0x22ee280, nullable_relids=0x0, ordering_op=660, nulls_first=, sortref=1,
create_it=true) at pathkeys.c:271
#4  0x00687a4a in make_pathkeys_for_sortclauses 
(root=root@entry=0x22ecb10, sortclauses=, 
tlist=tlist@entry=0x22ee2f0) at pathkeys.c:1099
#5  0x00694588 in standard_qp_callback (root=0x22ecb10, 
extra=) at planner.c:3635
#6  0x00693024 in query_planner (root=root@entry=0x22ecb10, 
qp_callback=qp_callback@entry=0x6944e0 , 
qp_extra=qp_extra@entry=0x7ffe6fe2b8e0)
at planmain.c:207
#7  0x006970e0 in grouping_planner (root=root@entry=0x22ecb10, 
inheritance_update=inheritance_update@entry=false, tuple_frac

Re: Index Skip Scan - attempting to evalutate patch

2019-06-26 Thread Dmitry Dolgov
> On Wed, Jun 26, 2019 at 1:53 PM pguser  wrote:
>
> If I apply the latest patch (which says 1 of 2? - maybe I'm missing a part of 
> the patch?), I apply with

Hi,

First of all, thanks for evaluation!

> psql (12beta2)
> Type "help" for help.
>
> db1=> show enable_indexskipscan;
> enable_indexskipscan
> --
> on
> (1 row)
>
> db1=> \d
> psql: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !> \q
>
>
> I got a backtrace out of the process:
>
> (gdb) backtrace
> #0  MemoryContextAllocZeroAligned (context=0x0, size=size@entry=80) at 
> mcxt.c:864
> #1  0x0067d2d4 in get_eclass_for_sort_expr 
> (root=root@entry=0x22ecb10, expr=expr@entry=0x22ee280, 
> nullable_relids=nullable_relids@entry=0x0, opfamilies=0x22ff530,
> opcintype=opcintype@entry=19, collation=collation@entry=950, 
> sortref=, rel=0x0, create_it=true) at equivclass.c:704
> #2  0x00686d9e in make_pathkey_from_sortinfo 
> (root=root@entry=0x22ecb10, expr=expr@entry=0x22ee280, 
> nullable_relids=nullable_relids@entry=0x0, opfamily=1994, opcintype=19,
> collation=950, reverse_sort=false, nulls_first=false, sortref=1, rel=0x0, 
> create_it=true) at pathkeys.c:228
> #3  0x00686eb7 in make_pathkey_from_sortop 
> (root=root@entry=0x22ecb10, expr=0x22ee280, nullable_relids=0x0, 
> ordering_op=660, nulls_first=, sortref=1,
> create_it=true) at pathkeys.c:271
> #4  0x00687a4a in make_pathkeys_for_sortclauses 
> (root=root@entry=0x22ecb10, sortclauses=, 
> tlist=tlist@entry=0x22ee2f0) at pathkeys.c:1099
> #5  0x00694588 in standard_qp_callback (root=0x22ecb10, 
> extra=) at planner.c:3635
> #6  0x00693024 in query_planner (root=root@entry=0x22ecb10, 
> qp_callback=qp_callback@entry=0x6944e0 , 
> qp_extra=qp_extra@entry=0x7ffe6fe2b8e0)
> at planmain.c:207
> #7  0x006970e0 in grouping_planner (root=root@entry=0x22ecb10, 
> inheritance_update=inheritance_update@entry=false, tuple_fraction= out>, tuple_fraction@entry=0)
> at planner.c:2048
> #8  0x0069978d in subquery_planner (glob=glob@entry=0x22e43c0, 
> parse=parse@entry=0x22e3f30, parent_root=parent_root@entry=0x0, 
> hasRecursion=hasRecursion@entry=false,
> tuple_fraction=tuple_fraction@entry=0) at planner.c:1012
> #9  0x0069a7b6 in standard_planner (parse=0x22e3f30, 
> cursorOptions=256, boundParams=) at planner.c:406
> #10 0x0073ceac in pg_plan_query (querytree=querytree@entry=0x22e3f30, 
> cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at 
> postgres.c:878
> #11 0x0073cf86 in pg_plan_queries (querytrees=, 
> cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at 
> postgres.c:968
> #12 0x0073d399 in exec_simple_query (
> query_string=0x222a9a0 "SELECT n.nspname as \"Schema\",\n  c.relname as 
> \"Name\",\n  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 
> 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' 
> WHEN '"...) at postgres.c:1143
> #13 0x0073ef5a in PostgresMain (argc=, 
> argv=argv@entry=0x2255440, dbname=, username=) 
> at postgres.c:4249
> #14 0x006cfaf6 in BackendRun (port=0x224e220, port=0x224e220) at 
> postmaster.c:4431
> #15 BackendStartup (port=0x224e220) at postmaster.c:4122
> #16 ServerLoop () at postmaster.c:1704
> #17 0x006d09d0 in PostmasterMain (argc=argc@entry=3, 
> argv=argv@entry=0x2224c50) at postmaster.c:1377
> #18 0x004820c4 in main (argc=3, argv=0x2224c50) at main.c:228

Could you by any change provide also relations schema that were supposed to be
described by this command?




Re: MSVC Build support with visual studio 2019

2019-06-26 Thread Haribabu Kommi
On Wed, 5 Jun 2019 at 17:22, Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

>
> On Wed, May 29, 2019 at 10:30 AM Haribabu Kommi 
> wrote:
>
>>
>> Updated patches are attached.
>>
>>
> All patches apply, build and pass tests. The patch
> '0001-support-building-with-visual-studio-2019_v10_to_v9.6_v3.patch'
> applies on version 9.5.
>
> Not sure if more review is needed before moving to 'ready for commite'r,
> but I have no more comments to make on current patches.
>

Thanks for the review. Yes, that patch applies till 9.5, it is my mistake
in naming the patch.

Regards,
Haribabu Kommi


Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 >> Pacific/Auckland -> NZ

 Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ"
 Thomas> with default configure options (ie using PostgreSQL's tzdata).
 Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo
 Thomas> you get "Pacific/Auckland", and that's because the FreeBSD
 Thomas> zoneinfo directory doesn't include the old non-city names like
 Thomas> "NZ", "GB", "Japan", "US/Eastern" etc. (Unfortunately the
 Thomas> FreeBSD packages for PostgreSQL are not being built with that
 Thomas> option so initdb chooses the old names. Something to take up
 Thomas> with the maintainers.)

Same issue here with Europe/London getting "GB".

-- 
Andrew (irc:RhodiumToad)




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-06-26 Thread Peter Eisentraut
On 2019-06-25 15:39, Robert Haas wrote:
> On Tue, Jun 25, 2019 at 8:28 AM Peter Eisentraut
>  wrote:
>> How are the requirements here different from ssl_passphrase_command?
>> Why do we need a new mechanism?
> 
> I don't think that the requirements are different, and I don't think
> we need a new mechanism.
> 
> I am curious exactly how you would set up ssl_passphrase_command to
> prompt interactively.

For example like this:
https://www.2ndquadrant.com/en/blog/postgresql-passphrase-protected-ssl-keys-systemd/

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:
>> You can *not* cast something to an aligned pointer type if it's not
>> actually certain to be aligned suitably for that type.

> OK. So the solution is to ditch the casts altogether, and then do plain
> pointer arithmetics like this:

> #define ITEM_INDEXES(item)(item)
> #define ITEM_NULLS(item,ndims)(ITEM_INDEXES(item) + (ndims))
> #define ITEM_FREQUENCY(item,ndims)(ITEM_NULLS(item, ndims) + (ndims))
> #define ITEM_BASE_FREQUENCY(item,ndims)   (ITEM_FREQUENCY(item, ndims) + 
> sizeof(double))

> Or is that still relying on alignment, somehow?

No, constructs like a char* pointer plus n times sizeof(something) should
be safe.

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Wed, Jun 26, 2019 at 09:49:46AM +0200, Tomas Vondra wrote:

On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:

I'm seeing a reproducible bus error here:

#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable 
"stats" is not available.
)
  at mcv.c:785
785 memcpy(ITEM_BASE_FREQUENCY(item, ndims), 
&mcvitem->base_frequency, sizeof(double));

What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as

#define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, 
ndims) + 1))

the compiler is assuming that the first argument to memcpy is
double-aligned, and it is generating code that depends on that being
true, and of course it isn't true and kaboom.

You can *not* cast something to an aligned pointer type if it's not
actually certain to be aligned suitably for that type.  In this example,
even if you wrote "(char *)" in front of this, it wouldn't save you;
the compiler would still be entitled to believe that the intermediate
cast value meant something.  The casts in the underlying macros
ITEM_FREQUENCY and so on are equally unsafe.



OK. So the solution is to ditch the casts altogether, and then do plain
pointer arithmetics like this:

#define ITEM_INDEXES(item)  (item)
#define ITEM_NULLS(item,ndims)  (ITEM_INDEXES(item) + (ndims))
#define ITEM_FREQUENCY(item,ndims)  (ITEM_NULLS(item, ndims) + (ndims))
#define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + 
sizeof(double))

Or is that still relying on alignment, somehow?



Attached is a patch that should (hopefully) fix this. It essentially
treats the item as (char *) and does all pointer arithmetics without any
additional casts. So there are no intermediate casts.

I have no way to test this, so I may either wait for you to test this
first, or push and wait. It seems to fail only on a very small number of
buildfarm animals, so having a confirmation would be nice.

The fix keeps the binary format as is, so the serialized MCV items are
max-aligned. That means we can access the uint16 indexes directly, but we
need to copy the rest of the fields (because those may not be aligned). In
hindsight that seems a bit silly, we might as well copy everything, not
care about the alignment and maybe save a few more bytes. But that would
require catversion bump. OTOH we may beed to do that anyway, to fix the
pg_mcv_list_items() signature (as discussed in the other MCV thread).

regards 


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..01ea05b486 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -57,10 +57,10 @@
 /*
  * Macros for convenient access to parts of a serialized MCV item.
  */
-#define ITEM_INDEXES(item) ((uint16 *) (item))
-#define ITEM_NULLS(item,ndims) ((bool *) (ITEM_INDEXES(item) + 
(ndims)))
-#define ITEM_FREQUENCY(item,ndims) ((double *) (ITEM_NULLS(item, ndims) + 
(ndims)))
-#define ITEM_BASE_FREQUENCY(item,ndims)((double *) 
(ITEM_FREQUENCY(item, ndims) + 1))
+#define ITEM_INDEXES(item) ((char *) item)
+#define ITEM_NULLS(item,ndims) (ITEM_INDEXES(item) + (ndims) * 
sizeof(uint16))
+#define ITEM_FREQUENCY(item,ndims) (ITEM_NULLS(item, ndims) + (ndims) * 
sizeof(bool))
+#define ITEM_BASE_FREQUENCY(item,ndims)(ITEM_FREQUENCY(item, ndims) + 
sizeof(double))
 
 /*
  * Used to compute size of serialized MCV list representation.
@@ -751,6 +751,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
for (i = 0; i < mcvlist->nitems; i++)
{
MCVItem*mcvitem = &mcvlist->items[i];
+   uint16 *indexes = (uint16 *) ITEM_INDEXES(item);
 
/* don't write beyond the allocated space */
Assert(ptr <= raw + total_length - itemsize);
@@ -773,10 +774,10 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
Assert(value != NULL);  /* serialization or 
deduplication error */
 
/* compute index within the array */
-   ITEM_INDEXES(item)[dim] = (uint16) (value - 
values[dim]);
+   indexes[dim] = (uint16) (value - values[dim]);
 
/* check the index is within expected bounds */
-   Assert(ITEM_INDEXES(item)[dim] < info[dim].nvalues);
+   Assert(indexes[dim] < info[dim].nvalues);
}
 
/* copy NULL and frequency flags into the item */
@@ -1087,10 +1088,13 @@ statext_mcv_deserialize(bytea *data)
item->isnull = (bool *) isnullptr;
isnullptr += MAXALIGN(sizeof(bool) * ndims);
 
+   /*
+* To access indexes, we can just point to the r

Re: Index Skip Scan - attempting to evalutate patch

2019-06-26 Thread pguser


‐‐‐ Original Message ‐‐‐
On Wednesday, June 26, 2019 1:07 PM, Dmitry Dolgov <9erthali...@gmail.com> 
wrote:

> > On Wed, Jun 26, 2019 at 1:53 PM pguser pgu...@diorite.uk wrote:
> > If I apply the latest patch (which says 1 of 2? - maybe I'm missing a part 
> > of the patch?), I apply with
>
> Hi,
>
> First of all, thanks for evaluation!
>

No problem. I'd like to get involved in helping this patch mature as I think 
that we're suffering in a few areas of performance due to this.

> Could you by any change provide also relations schema that were supposed to be
> described by this command?

Okay for now, it's not much. I get the issue of the SIGSEGV on a brand new 
database with only one relation:

This is with the 12beta2 as compiled from git sources by me:

psql (12beta2)
Type "help" for help.


db2=> \d
   List of relations
 Schema | Name | Type  | Owner
+--+---+---
 e5 | t1   | table | e5
(1 row)

db2=> \d t1
Table "e5.t1"
 Column |   Type| Collation | Nullable | Default
+---+---+--+-
 n1 | smallint  |   |  |
 n2 | smallint  |   |  |
 c1 | character varying |   |  |
 c2 | character varying |   |  |
Indexes:
"i1" btree (n1, n2, c1)


And with patch 20 applied:

psql (12beta2)
Type "help" for help.

db2=> \d
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
[postgres@ip-172-31-33-89 ~]$ . sql2
psql (12beta2)
Type "help" for help.

db2=> \d t1
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q


In fact, if I do:

createdb db3
psql -d db3
db3=# \d
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I get this on empty database with no relations yet defined.

I feel I have done something silly or missed something when applying patch




Re: Index Skip Scan - attempting to evalutate patch

2019-06-26 Thread pguser


‐‐‐ Original Message ‐‐‐
On Wednesday, June 26, 2019 2:55 PM, pguser  wrote:

> ‐‐‐ Original Message ‐‐‐
> On Wednesday, June 26, 2019 1:07 PM, Dmitry Dolgov 9erthali...@gmail.com 
> wrote:
>
> > > On Wed, Jun 26, 2019 at 1:53 PM pguser pgu...@diorite.uk wrote:
> > > If I apply the latest patch (which says 1 of 2? - maybe I'm missing a 
> > > part of the patch?), I apply with
> >
> > Hi,
> > First of all, thanks for evaluation!
>
> No problem. I'd like to get involved in helping this patch mature as I think 
> that we're suffering in a few areas of performance due to this.
>
> > Could you by any change provide also relations schema that were supposed to 
> > be
> > described by this command?
>
> Okay for now, it's not much. I get the issue of the SIGSEGV on a brand new 
> database with only one relation:
>
> This is with the 12beta2 as compiled from git sources by me:
>
> psql (12beta2)
> Type "help" for help.
>
> db2=> \d
>
>List of relations
>
>
> Schema | Name | Type | Owner
> +--+---+---
> e5 | t1 | table | e5
> (1 row)
>
> db2=> \d t1
>
> Table "e5.t1"
>
>
> Column | Type | Collation | Nullable | Default
> +---+---+--+-
> n1 | smallint | | |
> n2 | smallint | | |
> c1 | character varying | | |
> c2 | character varying | | |
> Indexes:
> "i1" btree (n1, n2, c1)
>
> And with patch 20 applied:
>
> psql (12beta2)
> Type "help" for help.
>
> db2=> \d
> psql: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !> \q
> [postgres@ip-172-31-33-89 ~]$ . sql2
> psql (12beta2)
> Type "help" for help.
>
> db2=> \d t1
> psql: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !> \q
>
> In fact, if I do:
>
> createdb db3
> psql -d db3
> db3=# \d
> psql: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> I get this on empty database with no relations yet defined.
>
> I feel I have done something silly or missed something when applying patch


I find that my patched installation can't create its own initdb either:

initdb -D /pgd2
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgd2 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... 2019-06-26 14:05:47.807 UTC [8120] 
FATAL:  could not open file "base/1/2663.1" (target block 17353008): previous 
segment is only 4 blocks at character 65
2019-06-26 14:05:47.807 UTC [8120] STATEMENT:  INSERT INTO pg_shdepend SELECT 
0,0,0,0, tableoid,oid, 'p'  FROM pg_authid;

child process exited with exit code 1
initdb: removing contents of data directory "/pgd2"


I was hoping to share the pgdata between 12beta2 without patch, and 12beta2 
with patch, for ease of side by side comparison.

Even more I feel that I'm missing something more than just this 20 patch from 
the Index Skip Scan thread.




Useless configure checks for RAND_OpenSSL (HAVE_RAND_OPENSSL)

2019-06-26 Thread Michael Paquier
Hi all,

We have been using RAND_OpenSSL(), a function new as of OpenSSL 1.1.0
in pgcrypto until fe0a0b5 which has removed the last traces of the
function in the tree.  We still have a configure check for it and the
related compilation flag in pg_config.h.in, and both are now useless.

Any objections to the cleanup done in the attached patch?

Thanks,
--
Michael
diff --git a/configure b/configure
index 8d47071e4a..9860b0b95d 100755
--- a/configure
+++ b/configure
@@ -12146,7 +12146,7 @@ done
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data RAND_OpenSSL
+  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 74938d4190..efc33d4775 100644
--- a/configure.in
+++ b/configure.in
@@ -1212,7 +1212,7 @@ if test "$with_openssl" = yes ; then
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data RAND_OpenSSL])
+  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data])
   # OpenSSL versions before 1.1.0 required setting callback functions, for
   # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
   # function was removed.
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 6cd4cfed0a..512213aa32 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -464,9 +464,6 @@
 /* Define to 1 if you have the `random' function. */
 #undef HAVE_RANDOM
 
-/* Define to 1 if you have the `RAND_OpenSSL' function. */
-#undef HAVE_RAND_OPENSSL
-
 /* Define to 1 if you have the  header file. */
 #undef HAVE_READLINE_H
 


signature.asc
Description: PGP signature


Re: Useless configure checks for RAND_OpenSSL (HAVE_RAND_OPENSSL)

2019-06-26 Thread Daniel Gustafsson
> On 26 Jun 2019, at 16:25, Michael Paquier  wrote:

> Any objections to the cleanup done in the attached patch?

None, LGTM.

cheers ./daniel




Re: Useless configure checks for RAND_OpenSSL (HAVE_RAND_OPENSSL)

2019-06-26 Thread Tom Lane
Michael Paquier  writes:
> We have been using RAND_OpenSSL(), a function new as of OpenSSL 1.1.0
> in pgcrypto until fe0a0b5 which has removed the last traces of the
> function in the tree.  We still have a configure check for it and the
> related compilation flag in pg_config.h.in, and both are now useless.

> Any objections to the cleanup done in the attached patch?

+1, fewer configure checks always better.  I don't see any other
references to RAND_OpenSSL either.

regards, tom lane




Re: Index Skip Scan - attempting to evalutate patch

2019-06-26 Thread Tomas Vondra

On Wed, Jun 26, 2019 at 02:12:55PM +, pguser wrote:


‐‐‐ Original Message ‐‐‐
On Wednesday, June 26, 2019 2:55 PM, pguser  wrote:


‐‐‐ Original Message ‐‐‐
On Wednesday, June 26, 2019 1:07 PM, Dmitry Dolgov 9erthali...@gmail.com wrote:

> > On Wed, Jun 26, 2019 at 1:53 PM pguser pgu...@diorite.uk wrote:
> > If I apply the latest patch (which says 1 of 2? - maybe I'm missing a part 
of the patch?), I apply with
>
> Hi,
> First of all, thanks for evaluation!

No problem. I'd like to get involved in helping this patch mature as I think 
that we're suffering in a few areas of performance due to this.

> Could you by any change provide also relations schema that were supposed to be
> described by this command?

Okay for now, it's not much. I get the issue of the SIGSEGV on a brand new 
database with only one relation:

This is with the 12beta2 as compiled from git sources by me:

psql (12beta2)
Type "help" for help.

db2=> \d

   List of relations


Schema | Name | Type | Owner
+--+---+---
e5 | t1 | table | e5
(1 row)

db2=> \d t1

Table "e5.t1"


Column | Type | Collation | Nullable | Default
+---+---+--+-
n1 | smallint | | |
n2 | smallint | | |
c1 | character varying | | |
c2 | character varying | | |
Indexes:
"i1" btree (n1, n2, c1)

And with patch 20 applied:

psql (12beta2)
Type "help" for help.

db2=> \d
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
[postgres@ip-172-31-33-89 ~]$ . sql2
psql (12beta2)
Type "help" for help.

db2=> \d t1
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

In fact, if I do:

createdb db3
psql -d db3
db3=# \d
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I get this on empty database with no relations yet defined.

I feel I have done something silly or missed something when applying patch



I find that my patched installation can't create its own initdb either:

initdb -D /pgd2
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgd2 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... 2019-06-26 14:05:47.807 UTC [8120] FATAL:  
could not open file "base/1/2663.1" (target block 17353008): previous segment 
is only 4 blocks at character 65
2019-06-26 14:05:47.807 UTC [8120] STATEMENT:  INSERT INTO pg_shdepend SELECT 
0,0,0,0, tableoid,oid, 'p'  FROM pg_authid;

child process exited with exit code 1
initdb: removing contents of data directory "/pgd2"



Well, there's something seriously wrong with your build or environment,
then. I've tried reproducing the issue, but it works just fine for me
(initdb, psql, ...).



I was hoping to share the pgdata between 12beta2 without patch, and
12beta2 with patch, for ease of side by side comparison.



That might be dangerous, if there may be differences in contents of
catalogs. I don't think the patch does that though, and for me it works
just fine. I can initdb database using current master, create table +
indexes, do \d. And I can do that with the patch applied too.


Even more I feel that I'm missing something more than just this 20 patch
from the Index Skip Scan thread.



Are you sure this is not some sort of OOM issue? That might also
demonstrate as a segfault, in various cases.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> Attached is a patch that should (hopefully) fix this. It essentially
> treats the item as (char *) and does all pointer arithmetics without any
> additional casts. So there are no intermediate casts.

This passes the eyeball test, and it also allows my OpenBSD/hppa
installation to get through the core regression tests, so I think
it's good as far as it goes.  Please push.

However ... nosing around in mcv.c, I noticed that the next macro:

/*
 * Used to compute size of serialized MCV list representation.
 */
#define MinSizeOfMCVList\
(VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))

#define SizeOfMCVList(ndims,nitems) \
(MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
 MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
 MAXALIGN((nitems) * ITEM_SIZE(ndims)))

is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.
No doubt this is why it's not used in the places where it'd matter;
the tests that do use it are testing much weaker conditions than they
should.

> The fix keeps the binary format as is, so the serialized MCV items are
> max-aligned. That means we can access the uint16 indexes directly, but we
> need to copy the rest of the fields (because those may not be aligned). In
> hindsight that seems a bit silly, we might as well copy everything, not
> care about the alignment and maybe save a few more bytes.

I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,
which is generally going to be a maxaligned palloc result plus 4 bytes.
So "aligned" double values are actually guaranteed to be on odd word
boundaries not even ones.

What's more, it's difficult to convince oneself that the maxaligns done
in different parts of the code are all enforcing the same choices about
which substructures get pseudo-maxaligned and which don't, because the
logic doesn't line up very well.

If we do need another catversion bump before v12, I'd vote for ripping
out Every Single One of the "maxalign" operations in this code, just
on the grounds of code simplicity and bug reduction.

regards, tom lane




Re: Index Skip Scan - attempting to evalutate patch

2019-06-26 Thread pguser


‐‐‐ Original Message ‐‐‐
On Wednesday, June 26, 2019 4:07 PM, Tomas Vondra 
 wrote:

> That might be dangerous, if there may be differences in contents of
> catalogs. I don't think the patch does that though, and for me it works
> just fine. I can initdb database using current master, create table +
> indexes, do \d. And I can do that with the patch applied too.
>


Well, this is embarrassing.

I repeated all my steps again on my development laptop (Fedora 30, GCC 9.1.1, 
glibc 2.29.15) and it all works (doesn't segfault, can initdb).

On my Amazon Linux EC2 , (gcc 7.3.1, glibc 2.6.32) it exhibits fault on patched 
version.

Same steps, same sources.

Got to be build tools/version related on my EC2 instance.

Darn it. Sorry for wasting your time, I will continue to evaluate patch, and be 
mindful that something, somewhere is sensitive to build tools versions or lib 
versions.

Many regards






Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a patch that should (hopefully) fix this. It essentially
treats the item as (char *) and does all pointer arithmetics without any
additional casts. So there are no intermediate casts.


This passes the eyeball test, and it also allows my OpenBSD/hppa
installation to get through the core regression tests, so I think
it's good as far as it goes.  Please push.

However ... nosing around in mcv.c, I noticed that the next macro:

/*
* Used to compute size of serialized MCV list representation.
*/
#define MinSizeOfMCVList\
(VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))

#define SizeOfMCVList(ndims,nitems) \
(MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
 MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
 MAXALIGN((nitems) * ITEM_SIZE(ndims)))

is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.
No doubt this is why it's not used in the places where it'd matter;
the tests that do use it are testing much weaker conditions than they
should.



I agree about the macro being underdocumented, but AFAICS it's used
correctly to check the expected length. It can't include the data values
directly, because that's variable amount of data - and it's encoded in not
yet verified part of the data.

So this only includes parts with known lengths, and then the code does
this:

   for (dim = 0; dim < ndims; dim++)
   {
   ...
   expected_size += MAXALIGN(info[dim].nbytes);
   }

and uses that to check the actual length.

   if (VARSIZE_ANY(data) != expected_size)
   elog(ERROR, ...);

That being said, maybe this is unnecessarily defensive and we should just
trust the values not being corrupted. So if we get pg_mcv_list value, we'd
simply assume it's OK.


The fix keeps the binary format as is, so the serialized MCV items are
max-aligned. That means we can access the uint16 indexes directly, but we
need to copy the rest of the fields (because those may not be aligned). In
hindsight that seems a bit silly, we might as well copy everything, not
care about the alignment and maybe save a few more bytes.


I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,
which is generally going to be a maxaligned palloc result plus 4 bytes.
So "aligned" double values are actually guaranteed to be on odd word
boundaries not even ones.



I don't think so. The pointers should be maxaligned with respect to the
whole varlena value, which is what 'raw' points to. At least that was the
intent of code like this:

   raw = palloc0(total_length);

   ...

   /* the header may not be exactly aligned, so make sure it is */
   ptr = raw + MAXALIGN(ptr - raw);

If it's not like that in some place, it's a bug.


What's more, it's difficult to convince oneself that the maxaligns done
in different parts of the code are all enforcing the same choices about
which substructures get pseudo-maxaligned and which don't, because the
logic doesn't line up very well.



Not sure. If there's a way to make it clearer, I'm ready to do the work.
Unfortunately it's hard for me to judge that, because I've spent so much
time on that code that it seems fairly clear to me.


If we do need another catversion bump before v12, I'd vote for ripping
out Every Single One of the "maxalign" operations in this code, just
on the grounds of code simplicity and bug reduction.



Hmmm, OK. The original reason to keep the parts aligned was to be able to
reference the parts directly during processing. If we get rid of the
alignment, we'll have to memcpy everything during deserialization. But
if it makes the code simpler, it might be worth it - this part of the
code was clearly the weakest part of the patch.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>> #define SizeOfMCVList(ndims,nitems)  \
>> is both woefully underdocumented and completely at variance with
>> reality.  It doesn't seem to be accounting for the actual data values.

> I agree about the macro being underdocumented, but AFAICS it's used
> correctly to check the expected length. It can't include the data values
> directly, because that's variable amount of data - and it's encoded in not
> yet verified part of the data.

Well, it should have some other name then.  Or *at least* a comment.
It's unbelievably misleading as it stands.

> That being said, maybe this is unnecessarily defensive and we should just
> trust the values not being corrupted.

No, I'm on board with checking the lengths.  I just don't like how
hard it is to discern what's being checked.

>> I think that part of the problem here is that the way this code is
>> written, "maxaligned" is no such thing.  What you're actually maxaligning
>> seems to be the offset from the start of the data area of a varlena value,

> I don't think so. The pointers should be maxaligned with respect to the
> whole varlena value, which is what 'raw' points to.

[ squint ... ]  OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations.  It'd read better as

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did.  That's okay given that the two extant call sites
are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying.  So really what I'd like to see here is

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* alignment logic assumes full-size datum header */
Assert(VARATT_IS_4B(data));

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...

regards, tom lane




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Tom Lane
Andrew Gierth  writes:
> "Thomas" == Thomas Munro  writes:
>  Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ"
>  Thomas> with default configure options (ie using PostgreSQL's tzdata).
>  Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo
>  Thomas> you get "Pacific/Auckland", and that's because the FreeBSD
>  Thomas> zoneinfo directory doesn't include the old non-city names like
>  Thomas> "NZ", "GB", "Japan", "US/Eastern" etc.

> Same issue here with Europe/London getting "GB".

FreeBSD offers yet another obstacle to Andrew's proposal:

$ uname -a
FreeBSD rpi3.sss.pgh.pa.us 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC  
arm64
$ ls /usr/share/zoneinfo/
Africa/ Australia/  Etc/MST WET
America/CET Europe/ MST7MDT posixrules
Antarctica/ CST6CDT Factory PST8PDT zone.tab
Arctic/ EET HST Pacific/
Asia/   EST Indian/ SystemV/
Atlantic/   EST5EDT MET UTC

No zone1970.tab.  I do not think we can rely on that file being there,
since zic itself doesn't install it; it's up to packagers whether or
where to install the "*.tab" files.

In general, the point I'm trying to make is that our policy should be
"Ties are broken arbitrarily, and if you don't like the choice that initdb
makes, here's how to fix it".  As soon as we try to break some ties in
favor of somebody's idea of what is "right", we are in for neverending
problems with different people disagreeing about what is "right", and
insisting that their preference should be the one the code enforces.
Let's *please* not go there, or even within hailing distance of it.

(By this light, even preferring UTC over UCT is a dangerous precedent.
I won't argue for reverting that, but I don't want to go further.)

regards, tom lane




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Tom Lane
Further on this --- I now remember that the reason we used to want to
reject the "Factory" timezone is that it used to report this as the
zone abbreviation:

Local time zone must be set--see zic manual page

which (a) resulted in syntactically invalid timestamp output from the
timeofday() function and (b) completely screwed up the column width
in the pg_timezone_names view.

But since 2016g, it's reported the much-less-insane string "-00".
I propose therefore that it's time to just drop the discrimination
against "Factory", as per attached.  There doesn't seem to be any
reason anymore to forbid people from seeing it in pg_timezone_names
or selecting it as the timezone if they're so inclined.  We would
only have a problem if somebody is using --with-system-tzdata in
a machine where they've not updated the system tzdata since 2016,
and I'm no longer willing to consider that a valid use-case.

regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 9def318..91b1847 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -4845,19 +4845,6 @@ pg_timezone_names(PG_FUNCTION_ARGS)
 		 &tzoff, &tm, &fsec, &tzn, tz) != 0)
 			continue;			/* ignore if conversion fails */
 
-		/*
-		 * Ignore zic's rather silly "Factory" time zone.  The long string
-		 * about "see zic manual page" is used in tzdata versions before
-		 * 2016g; we can drop it someday when we're pretty sure no such data
-		 * exists in the wild on platforms using --with-system-tzdata.  In
-		 * 2016g and later, the time zone abbreviation "-00" is used for
-		 * "Factory" as well as some invalid cases, all of which we can
-		 * reasonably omit from the pg_timezone_names view.
-		 */
-		if (tzn && (strcmp(tzn, "-00") == 0 ||
-	strcmp(tzn, "Local time zone must be set--see zic manual page") == 0))
-			continue;
-
 		/* Found a displayable zone */
 		break;
 	}
diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index f91fd31..fc86ff0 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -413,12 +413,7 @@ identify_system_timezone(void)
 			 &tt,
 			 &bestscore, resultbuf);
 	if (bestscore > 0)
-	{
-		/* Ignore IANA's rather silly "Factory" zone; use GMT instead */
-		if (strcmp(resultbuf, "Factory") == 0)
-			return NULL;
 		return resultbuf;
-	}
 
 	/*
 	 * Couldn't find a match in the database, so next we try constructed zone


Re: dropdb --force

2019-06-26 Thread Pavel Stehule
Hi

po 24. 6. 2019 v 10:28 odesílatel Anthony Nowocien 
napsal:

> Hi,
> patch no longer applies (as of 12beta2).
>
> postgres@ubuntudev:~/pg_testing/source/postgresql-12beta2$ patch -p1 <
> drop-database-force-20190310_01.patch
> patching file doc/src/sgml/ref/drop_database.sgml
> patching file doc/src/sgml/ref/dropdb.sgml
> patching file src/backend/commands/dbcommands.c
> Hunk #1 succeeded at 489 (offset 2 lines).
> Hunk #2 succeeded at 779 (offset 2 lines).
> Hunk #3 succeeded at 787 (offset 2 lines).
> Hunk #4 succeeded at 871 (offset 2 lines).
> Hunk #5 succeeded at 1056 (offset 2 lines).
> Hunk #6 succeeded at 1186 (offset 2 lines).
> patching file src/backend/nodes/copyfuncs.c
> Hunk #1 succeeded at 3852 (offset 10 lines).
> patching file src/backend/nodes/equalfuncs.c
> Hunk #1 succeeded at 1666 (offset 3 lines).
> patching file src/backend/parser/gram.y
> Hunk #1 succeeded at 10194 (offset 45 lines).
> Hunk #2 succeeded at 10202 (offset 45 lines).
> patching file src/backend/storage/ipc/procarray.c
> Hunk #1 succeeded at 2906 (offset -14 lines).
> Hunk #2 succeeded at 2948 (offset -14 lines).
> patching file src/backend/tcop/utility.c
> patching file src/bin/scripts/dropdb.c
> Hunk #1 succeeded at 34 (offset 1 line).
> Hunk #2 succeeded at 50 (offset 1 line).
> Hunk #3 succeeded at 63 (offset 2 lines).
> Hunk #4 succeeded at 88 (offset 2 lines).
> Hunk #5 succeeded at 128 (offset 2 lines).
> Hunk #6 succeeded at 136 (offset 2 lines).
> Hunk #7 succeeded at 164 (offset 1 line).
> patching file src/bin/scripts/t/050_dropdb.pl
> patching file src/include/commands/dbcommands.h
> patching file src/include/nodes/parsenodes.h
> Hunk #1 succeeded at 3133 (offset 16 lines).
> patching file src/include/storage/procarray.h
> Hunk #1 FAILED at 112.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/include/storage/procarray.h.rej
>
> Could you please update it ? Thanks.
>

fixed

Regards

Pavel

Anthony
>
> The new status of this patch is: Waiting on Author
>
diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml
index 3ac06c984a..84df485e11 100644
--- a/doc/src/sgml/ref/drop_database.sgml
+++ b/doc/src/sgml/ref/drop_database.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-DROP DATABASE [ IF EXISTS ] name
+DROP DATABASE [ IF EXISTS ] name [ FORCE ]
 
  
 
@@ -32,9 +32,11 @@ DROP DATABASE [ IF EXISTS ] name
DROP DATABASE drops a database. It removes the
catalog entries for the database and deletes the directory
containing the data.  It can only be executed by the database owner.
-   Also, it cannot be executed while you or anyone else are connected
-   to the target database.  (Connect to postgres or any
-   other database to issue this command.)
+   Also, it cannot be executed while you are connected to the target database.
+   (Connect to postgres or any other database to issue this
+   command.)
+   If anyone else is connected to the target database, this command will fail
+   - unless you use the FORCE option described below.
   
 
   
@@ -64,6 +66,24 @@ DROP DATABASE [ IF EXISTS ] name
  
 

+
+   
+FORCE
+
+ 
+  Attempt to terminate all existing connections to the target database.
+ 
+ 
+  This will fail, if current user has no permissions to terminate other
+  connections. Required permissions are the same as with 
+  pg_terminate_backend, described
+  in .
+
+  This will also fail, if the connections do not terminate in 60 seconds.
+ 
+
+   
+
   
  
 
diff --git a/doc/src/sgml/ref/dropdb.sgml b/doc/src/sgml/ref/dropdb.sgml
index 3fbdb33716..5d10ad1c92 100644
--- a/doc/src/sgml/ref/dropdb.sgml
+++ b/doc/src/sgml/ref/dropdb.sgml
@@ -86,6 +86,20 @@ PostgreSQL documentation
   
  
 
+ 
+  -f
+  --force
+  
+   
+Force termination of connected backends before removing the database.
+   
+   
+This will add the FORCE option to the DROP
+DATABASE command sent to the server.
+   
+  
+ 
+
  
-V
--version
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 15207bf75a..760b43de8c 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -489,7 +489,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * potential waiting; we may as well throw an error first if we're gonna
 	 * throw one.
 	 */
-	if (CountOtherDBBackends(src_dboid, ¬herbackends, &npreparedxacts))
+	if (CountOtherDBBackends(src_dboid, ¬herbackends, &npreparedxacts, false))
 		ereport(ERROR,
 (errcode(ERRCODE_OBJECT_IN_USE),
  errmsg("source database \"%s\" is being accessed by other users",
@@ -779,7 +779,7 @@ createdb_failure_callback(int code, Datum arg)
  * DROP DATABASE
  */
 void
-dropdb(const char *dbname, bool missing_ok)
+dropdb(const char *dbname, bool missing_ok, bool force)
 {
 	Oid			db_id;
 	bool		db_istemplate;
@@ -787,6 +787,7 @@ dropdb

Re: Improve handling of pg_stat_statements handling of bind "IN" variables

2019-06-26 Thread Tom Lane
Greg Stark  writes:
> Actually thinking about this for two more seconds the question is what it
> would do with a query like
> WHERE col = ANY '1,2,3'::integer[]
> Or
> WHERE col = ANY ARRAY[1,2,3]
> Whichever the language binding that is failing to do parameterized queries
> is doing to emulate them.

Yeah, one interesting question is whether this is actually modeling
what happens with real-world applications --- are they sending Consts,
or Params?

I resolutely dislike the idea of marking arrays that came from IN
differently from other ones; that's just a hack and it's going to give
rise to unexplainable behavioral differences for logically-equivalent
queries.

One idea that comes to me after looking at the code involved is that
it might be an improvement across-the-board if transformAExprIn were to
reduce the generated ArrayExpr to an array Const immediately, in cases
where all the inputs are Consts.  That is going to happen anyway come
plan time, so it'd have zero impact on semantics or query performance.
Doing it earlier would cost nothing, and could even be a net win, by
reducing per-parse-node overhead in places like the rewriter.

The advantage for the problem at hand is that a Const that's an array
of 2 elements is going to look the same as a Const that's any other
number of elements so far as pg_stat_statements is concerned.

This doesn't help of course in cases where the values aren't all
Consts.  Since we eliminated Vars already, the main practical case
would be that they're Params, leading us back to the previous
question of whether apps are binding queries with different numbers
of parameter markers in an IN, and how hard pg_stat_statements should
try to fuzz that if they are.

Then, to Greg's point, there's a question of whether transformArrayExpr
should do likewise, ie try to produce an array Const immediately.
I'm a bit less excited about that, but consistency suggests that
we should have it act the same as the IN case.

regards, tom lane




RE: psql - add SHOW_ALL_RESULTS option

2019-06-26 Thread Fabien COELHO



Here is a v3 which fixes \errverbose, hopefully.


V5 is a rebase and an slightly improved documentation.


It was really v4. v5 is a rebase.

--
Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index c6c20de243..a7557af0fe 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3344,7 +3344,7 @@ select 1\; select 2\; select 3;
 psql prints only the last query result
 it receives for each request; in this example, although all
 three SELECTs are indeed executed, psql
-only prints the 3.
+only prints the 3, unless SHOW_ALL_RESULTS is set.
 
 
   
@@ -3921,6 +3921,17 @@ bar
 
   
 
+  
+SHOW_ALL_RESULTS
+
+
+When this variable is set to on, all results of a combined
+(\;) query are shown instead of just the last one.
+Default is off.
+
+
+  
+
   
 SHOW_CONTEXT
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 44a782478d..dc575911ce 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -472,6 +472,16 @@ ResetCancelConn(void)
 #endif
 }
 
+static void
+ShowErrorMessage(const PGresult *result)
+{
+	const char *error = PQerrorMessage(pset.db);
+
+	if (strlen(error))
+		pg_log_info("%s", error);
+
+	CheckConnection();
+}
 
 /*
  * AcceptResult
@@ -482,7 +492,7 @@ ResetCancelConn(void)
  * Returns true for valid result, false for error state.
  */
 static bool
-AcceptResult(const PGresult *result)
+AcceptResult(const PGresult *result, bool show_error)
 {
 	bool		OK;
 
@@ -513,15 +523,8 @@ AcceptResult(const PGresult *result)
 break;
 		}
 
-	if (!OK)
-	{
-		const char *error = PQerrorMessage(pset.db);
-
-		if (strlen(error))
-			pg_log_info("%s", error);
-
-		CheckConnection();
-	}
+	if (!OK && show_error)
+		ShowErrorMessage(result);
 
 	return OK;
 }
@@ -701,7 +704,7 @@ PSQLexec(const char *query)
 
 	ResetCancelConn();
 
-	if (!AcceptResult(res))
+	if (!AcceptResult(res, true))
 	{
 		ClearOrSaveResult(res);
 		res = NULL;
@@ -743,7 +746,7 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt)
 
 	ResetCancelConn();
 
-	if (!AcceptResult(res))
+	if (!AcceptResult(res, true))
 	{
 		ClearOrSaveResult(res);
 		return 0;
@@ -999,199 +1002,113 @@ loop_exit:
 	return success;
 }
 
-
 /*
- * ProcessResult: utility function for use by SendQuery() only
- *
- * When our command string contained a COPY FROM STDIN or COPY TO STDOUT,
- * PQexec() has stopped at the PGresult associated with the first such
- * command.  In that event, we'll marshal data for the COPY and then cycle
- * through any subsequent PGresult objects.
- *
- * When the command string contained no such COPY command, this function
- * degenerates to an AcceptResult() call.
- *
- * Changes its argument to point to the last PGresult of the command string,
- * or NULL if that result was for a COPY TO STDOUT.  (Returning NULL prevents
- * the command status from being printed, which we want in that case so that
- * the status line doesn't get taken as part of the COPY data.)
- *
- * Returns true on complete success, false otherwise.  Possible failure modes
- * include purely client-side problems; check the transaction status for the
- * server-side opinion.
+ * Marshal the COPY data.  Either subroutine will get the
+ * connection out of its COPY state, then call PQresultStatus()
+ * once and report any error.
+ *
+ * For COPY OUT, direct the output to pset.copyStream if it's set,
+ * otherwise to pset.gfname if it's set, otherwise to queryFout.
+ * For COPY IN, use pset.copyStream as data source if it's set,
+ * otherwise cur_cmd_source.
+ *
+ * Update result if further processing is necessary.  (Returning NULL
+ * prevents the command status from being printed, which we want in that
+ * case so that the status line doesn't get taken as part of the COPY data.)
  */
 static bool
-ProcessResult(PGresult **results)
+HandleCopyResult(PGresult **result)
 {
 	bool		success = true;
-	bool		first_cycle = true;
+	FILE	   *copystream;
+	PGresult   *copy_result;
+	ExecStatusType result_status = PQresultStatus(*result);
 
-	for (;;)
+	Assert(result_status == PGRES_COPY_OUT ||
+		   result_status == PGRES_COPY_IN);
+
+	SetCancelConn();
+	if (result_status == PGRES_COPY_OUT)
 	{
-		ExecStatusType result_status;
-		bool		is_copy;
-		PGresult   *next_result;
+		bool		need_close = false;
+		bool		is_pipe = false;
 
-		if (!AcceptResult(*results))
+		if (pset.copyStream)
 		{
-			/*
-			 * Failure at this point is always a server-side failure or a
-			 * failure to submit the command string.  Either way, we're
-			 * finished with this command string.
-			 */
-			success = false;
-			break;
+			/* invoked by \copy */
+			copystream = pset.copyStream;
 		}
-
-		result_status = PQresultStatus(*results);
-		switch (result_status)
+		else if (pset.gfname)
 		{
-			case PGRES_EMPTY_QUERY:
-			case PGRES_COMMAND_OK

Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-06-26 Thread Dent John
Hi Nikolay,

> On 3 Apr 2019, at 20:54, Nikolay Shaplov  wrote:
> 
> В письме от вторник, 19 марта 2019 г. 16:09:13 MSK пользователь Michael 
> Paquier написал:
> 
>> Thanks for doing the effort to split that stuff.  This looks like an
>> interesting base template for anybody willing to look after some
>> basics with index AMs, like what's done for FDWs with blackhole_fdw.
> I am not sure it is good template. Most methods are empty, and does not show 
> any example of how it should work.

I think it would probably not be a good template — not for a a solid start 
point.

There is value in having something that has all the relevant method signatures, 
just to save someone the bother of crawling docs, or scraping other contrib/ 
examples for copy/paste snippets. But I think it should really be a different 
thing. It would be a distraction to litter such a template with custom 
reloptions clutter.

I guess that assumes it is possible to create a realistic AM without 
configurable options. I’m guessing it should be. But perhaps such situations 
are rarer than I imagine…?

Better than an empty template, though, would be a concrete, but minimal, 
implementation of an INDEX/AM. I find it difficult to see how you get something 
clear and concise, while trying to simultaneously serve both INDEX/AM template 
and reloptions testing needs.

>> Please note that these
>> are visible directly via pg_class.reloptions.  So we could shave quite
>> some code.
> Values from pg_class are well tested in regression test. My point here is to 
> check that they reach index internal as expected. And there is a long way 
> between pg_class.reloptions and index internals.

I had the same thought. But on quick inspection — and perhaps I have missed 
something — I don’t see that /custom/ reloptions are really tested at all by 
the regression tests.

So I do think verifying an extension’s custom reloptions exposure would be 
valuable.

I guess you might argue that it’s the regression test suite that should 
properly test that exposure mechanism. I kind of agree. :-) But I think that 
argument falls for similar reasons you cite for your initiative — i.e., it’s 
basically pretty hard to set up the situation where any kind of custom 
reloption would ever be reported.

Hope that is useful feedback.

denty.



Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Wed, Jun 26, 2019 at 12:31:13PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:

#define SizeOfMCVList(ndims,nitems) \
is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.



I agree about the macro being underdocumented, but AFAICS it's used
correctly to check the expected length. It can't include the data values
directly, because that's variable amount of data - and it's encoded in not
yet verified part of the data.


Well, it should have some other name then.  Or *at least* a comment.
It's unbelievably misleading as it stands.



True.


That being said, maybe this is unnecessarily defensive and we should just
trust the values not being corrupted.


No, I'm on board with checking the lengths.  I just don't like how
hard it is to discern what's being checked.



Understood.


I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,



I don't think so. The pointers should be maxaligned with respect to the
whole varlena value, which is what 'raw' points to.


[ squint ... ]  OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations.  It'd read better as

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);



Yeah, that'd have been better.


Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did.  That's okay given that the two extant call sites
are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying.  So really what I'd like to see here is

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* alignment logic assumes full-size datum header */
Assert(VARATT_IS_4B(data));

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...



OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.

The main complication is with varlena values, which may or may not have
4B headers (for now there's the PG_DETOAST_DATUM call, but as you
mentioned we may want to remove it in the future). So I've stored the
length as uint32 separately, followed by the full varlena value (thanks
to that the deserialization is simpler). Not sure if that's the best
solution, though, because this way we store the length twice.

I've kept the alignment in the deserialization code, because there it
allows us to allocate the whole value as a single chunk, which I think
is useful (I admit I don't have any measurements to demonstrate that).
But if we decide to rework this later, we can - it's just in-memory
representation, not on-disk.

Is this roughly what you had in mind?

FWIW I'm sure some of the comments are stale and/or need clarification,
but it's a bit too late over here, so I'll look into that tomorrow.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..93c774ea1c 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -52,15 +52,7 @@
  *  ndim * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)
  */
 #define ITEM_SIZE(ndims)   \
-   MAXALIGN((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
-
-/*
- * Macros for convenient access to parts of a serialized MCV item.
- */
-#define ITEM_INDEXES(item) ((uint16 *) (item))
-#define ITEM_NULLS(item,ndims) ((bool *) (ITEM_INDEXES(item) + 
(ndims)))
-#define ITEM_FREQUENCY(item,ndims) ((double *) (ITEM_NULLS(item, ndims) + 
(ndims)))
-#define ITEM_BASE_FREQUENCY(item,ndims)((double *) 
(ITEM_FREQUENCY(item, ndims) + 1))
+   ((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
 
 /*
  * Used to compute size of serialized MCV list representation.
@@ -68,10 +60,15 @@
 #define MinSizeOfMCVList   \
(VARHDRSZ + siz

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> No zone1970.tab.

zone.tab is an adequate substitute - a fact which I thought was
sufficiently obvious as to not be worth mentioning.

(also see https://reviews.freebsd.org/D20646 )

 Tom> I do not think we can rely on that file being there, since zic
 Tom> itself doesn't install it; it's up to packagers whether or where
 Tom> to install the "*.tab" files.

The proposed rules I suggested do work almost as well if zone[1970].tab
is absent, though obviously that's not the optimal situation. But are
there any systems which lack it? It's next to impossible to implement a
sane "ask the user what timezone to use" procedure without it.

 Tom> In general, the point I'm trying to make is that our policy should
 Tom> be "Ties are broken arbitrarily, and if you don't like the choice
 Tom> that initdb makes, here's how to fix it".

Yes, you've repeated that point at some length, and I am not convinced.
Is anyone else?

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Daniel Gustafsson
> On 27 Jun 2019, at 00:48, Andrew Gierth  wrote:

> Tom> In general, the point I'm trying to make is that our policy should
> Tom> be "Ties are broken arbitrarily, and if you don't like the choice
> Tom> that initdb makes, here's how to fix it".
> 
> Yes, you've repeated that point at some length, and I am not convinced.
> Is anyone else?

I don’t have any insights into the patches comitted or proposed.  However,
having been lurking on the tz mailinglist for a long time, I totally see where
Tom is coming from with this.

cheers ./daniel



Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-06-26 Thread David Rowley
On Mon, 17 Jun 2019 at 15:05, Tsunakawa, Takayuki
 wrote:
> (1)
> +#define LOCKMETHODLOCALHASH_SHRINK_SIZE 64
>
> How about LOCKMETHODLOCALHASH_SHRINK_THRESHOLD, because this determines the 
> threshold value to trigger shrinkage?  Code in PostgreSQL seems to use the 
> term threshold.

That's probably better. I've renamed it to that.

> (2)
> +/* Complain if the above are not set to something sane */
> +#if LOCKMETHODLOCALHASH_SHRINK_SIZE < LOCKMETHODLOCALHASH_INIT_SIZE
> +#error "invalid LOCKMETHODLOCALHASH_SHRINK_SIZE"
> +#endif
>
> I don't think these are necessary, because these are fixed and not 
> configurable.  FYI, src/include/utils/memutils.h doesn't have #error to test 
> these macros.

Yeah. I was thinking it was overkill when I wrote it, but somehow
couldn't bring myself to remove it. Done now.

> (3)
> +   if (hash_get_num_entries(LockMethodLocalHash) == 0 &&
> +   hash_get_max_bucket(LockMethodLocalHash) >
> +   LOCKMETHODLOCALHASH_SHRINK_SIZE)
> +   CreateLocalLockHash();
>
> I get an impression that Create just creates something where there's nothing. 
>  How about Init or Recreate?

Renamed to InitLocalLoclHash()

v5 is attached.

Thank you for the review.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


shrink_bloated_locallocktable_v5.patch
Description: Binary data


Re: JOIN_SEMI planning question

2019-06-26 Thread Tom Lane
Thomas Munro  writes:
> While looking at bug #15857[1], I wondered why the following two
> queries get different plans, given the schema and data from the bug
> report:
> ...
> Here's my question: how could it ever be OK to sort/unique something
> and put it in a hash table, but not OK to put exactly the same thing
> in the hash table directly, with JOIN_SEMI logic to prevent multiple
> matches?  And likewise for the inner side of other join strategies.

I think you're thinking about it wrong.  Or maybe there's some additional
work we could put in to extend the logic.

If we're considering the case where the semijoin is c.base_id = a.id,
then we definitely can do "a SEMIJOIN c WHERE a.id = c.base_id".
If we want to join b to c first, we can do so, but we have to unique-ify c
before that join, and then both the b/c join and the later join with a
can become plain inner joins.  We *don't* get to skip unique-ifying c
before joining to b and then apply a semijoin with a later, because in
general that's going to result in the wrong number of output rows.
(Example: if a is unique but b has multiple copies of a particular join
key, and the key does appear in c, we should end with multiple output rows
having that key, and we wouldn't.)

It's possible that in some situations we could prove that semijoining
later would work, but it seems like that'd require a great deal more
analysis than the code does now.  There'd also have to be tracking of
whether the final a join still has to be a semijoin or not, which'd
now depend on which path for b/c was being considered.

> Or to put it in the language of the comment, how could you ever have
> enough rels to do a join between B and unique(C), but not enough rels
> to do a semi-join between B and C?

If you're not joining C to B at all, but to some other rel A, you can't do
it as a semijoin because you can't execute the semijoin qual correctly.

In the particular case we're looking at here, it may be possible to prove
that equivalence-class substitution from the original B/C semijoin qual
gives rise to an A/C join qual that will work as an equivalent semijoin
qual, and then it's OK to do A/C as a semijoin with that.  But I'm not
very sure what the proof conditions need to be, and I'm 100% sure that
the code isn't making any such proof now.

regards, tom lane




Re: Useless configure checks for RAND_OpenSSL (HAVE_RAND_OPENSSL)

2019-06-26 Thread Michael Paquier
On Wed, Jun 26, 2019 at 04:35:43PM +0200, Daniel Gustafsson wrote:
> None, LGTM.

Thanks, committed.
--
Michael


signature.asc
Description: PGP signature


Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Stephen Frost
Greetings,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> > On 27 Jun 2019, at 00:48, Andrew Gierth  wrote:
> 
> > Tom> In general, the point I'm trying to make is that our policy should
> > Tom> be "Ties are broken arbitrarily, and if you don't like the choice
> > Tom> that initdb makes, here's how to fix it".
> > 
> > Yes, you've repeated that point at some length, and I am not convinced.
> > Is anyone else?
> 
> I don’t have any insights into the patches comitted or proposed.  However,
> having been lurking on the tz mailinglist for a long time, I totally see where
> Tom is coming from with this.

I understand this concern, but I have to admit that I'm not entirely
thrilled with having the way we pick defaults be based on the concern
that people will complain.  If anything, this community, at least in my
experience, has thankfully been relatively reasonable and I have some
pretty serious doubts that a change like this will suddenly invite the
masses to argue with us or that, should someone try, they'd end up
getting much traction.

On the other hand, picking deprecated spellings is clearly a poor
choice, and we don't prevent people from picking whatever they want to.
I also don't see what Andrew's suggesting as being terribly
controversial, though that's likely because I'm looking through
rose-colored glasses, as the saying goes.  Even with that understanding
though, I tend to side with Andrew on this.

Thanks,

Stephen


signature.asc
Description: PGP signature


RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-06-26 Thread Tsunakawa, Takayuki
From: David Rowley [mailto:david.row...@2ndquadrant.com]
v5 is attached.


Thank you, looks good.  I find it ready for committer (I noticed the status is 
already set so.)


Regards
Takayuki Tsunakawa




Re: subscriptionCheck failures on nightjar

2019-06-26 Thread Tom Lane
Andres Freund  writes:
> On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
>> Just to make sure I understand: it's OK for the file not to be there
>> when we try to fsync it by name, because a concurrent checkpoint can
>> remove it, having determined that we don't need it anymore?  In other
>> words, we really needed either missing_ok=true semantics, or to use
>> the fd we already had instead of the name?

> I'm not yet sure that that's actually something that's supposed to
> happen, I got to spend some time analysing how this actually
> happens. Normally the contents of the slot should actually prevent it
> from being removed (as they're newer than
> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
> a bug in the drop logic in newer releases.

My animal dromedary just reproduced this failure, which we've previously
only seen on nightjar.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45

regards, tom lane




Re: Problem with default partition pruning

2019-06-26 Thread yuzuko
Hello,

On Tue, Jun 25, 2019 at 1:45 PM yuzuko  wrote:
>
> Hello Shawn, Alvaro,
>
> Thank you for testing patches and comments.
> Yes, there are two patches:
> (1) v4_default_partition_pruning.patch fixes problems with default
> partition pruning
> and (2) v3_ignore_contradictory_where_clauses_at_partprune_step.patch 
> determines
> if a given clause contradicts a sub-partitioned table's partition constraint.
> I'll post two patches together next time.
>
> Anyway, I'll rebase two patches to apply on master and fix space.
>

Attach the latest patches discussed in this thread.  I rebased the second
patch (v5_ignore_contradictory_where_clauses_at_partprune_step.patch)
on the current master.  Could you please check them again?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v4_default_partition_pruning.patch
Description: Binary data


v5_ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data


Re: GiST "choose subtree" support function to inline penalty

2019-06-26 Thread Tom Lane
=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?=  writes:
> I'm looking at PostGIS geometry GiST index build times and try to optimize
> withing the current GiST framework. The function that shows a lot on my
> flame graphs is penalty.

> I spent weekend rewriting PostGIS penalty to be as fast as possible.
> (FYI https://github.com/postgis/postgis/pull/425/files)

> However I cannot get any meaningfully faster build time. Even when I strip
> it to "just return edge extension" index build time is the same.

TBH this makes me wonder whether the real problem isn't so much "penalty
function is too slow" as "penalty function is resulting in really bad
index splits".

It might be that giving the opclass higher-level control over the split
decision can help with both aspects.  But never start micro-optimizing
an algorithm until you're sure it's the right algorithm.

regards, tom lane




Cost and execution plan

2019-06-26 Thread AminPG Jaffer
Hi

We are facing an issue where the cost shows up "cost=100.00" so
trying to find where that is set.

Could anyone point me to the code where the cost "cost=100.00" is
set?

It doesn't show up when searching through find.

find . -type f -print | xargs grep 100 | grep -v "/test/" | grep -v
"/btree_gist/"
./postgresql-9.6.12/contrib/pgcrypto/sql/blowfish.sql:decode('1001',
'hex'),
./postgresql-9.6.12/contrib/pgcrypto/expected/blowfish.out:decode('1001',
'hex'),
./postgresql-9.6.12/doc/src/sgml/html/functions-admin.html:
0001000D | 4039624
./postgresql-9.6.12/doc/src/sgml/html/wal-internals.html:>00010001.  The numbers do not wrap,
./postgresql-9.6.12/src/bin/pg_archivecleanup/pg_archivecleanup.c:   *
00010010.partial and
./postgresql-9.6.12/src/bin/pg_archivecleanup/pg_archivecleanup.c:   *
00010010.0020.backup are after
./postgresql-9.6.12/src/bin/pg_archivecleanup/pg_archivecleanup.c:   *
00010010.
./postgresql-9.6.12/src/bin/pg_archivecleanup/pg_archivecleanup.c:
"  pg_archivecleanup /mnt/server/archiverdir
00010010.0020.backup\n");
./postgresql-9.6.12/src/backend/utils/adt/numeric.c: * For input like
100, we must treat stripped digits as real. So
./postgresql-9.6.12/src/backend/utils/adt/numeric.c: * For input like
100, we must treat stripped digits as real. So
./postgresql-9.6.12/src/backend/utils/adt/cash.c:   m4 = (val /
INT64CONST(1000)) % 1000;   /* billions */
./postgresql-9.6.12/src/backend/utils/adt/cash.c:   m5 = (val /
INT64CONST(100)) % 1000;/* trillions */
./postgresql-9.6.12/src/backend/utils/adt/cash.c:   m6 = (val /
INT64CONST(10)) % 1000; /* quadrillions */
./postgresql-9.6.12/src/backend/utils/adt/date.c:
100.0
./postgresql-9.6.12/src/include/utils/date.h:#define TIME_PREC_INV
100.0


Thanks


Re: Cost and execution plan

2019-06-26 Thread Tom Lane
AminPG Jaffer  writes:
> We are facing an issue where the cost shows up "cost=100.00" so
> trying to find where that is set.

That means you've turned off enable_seqscan (or one of its siblings)
but the planner is choosing a seqscan plan (or other plan type you
tried to disable) anyway because it has no other alternative.

> Could anyone point me to the code where the cost "cost=100.00" is
> set?

Look for disable_cost.

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> OK. Attached is a patch ditching the alignment in serialized data. I've
> ditched the macros to access parts of serialized data, and everything
> gets copied.

I lack energy to actually read this patch right now, and I don't currently
have an opinion about whether it's worth another catversion bump to fix
this stuff in v12.  But I did test the patch, and I can confirm it gets
through the core regression tests on hppa (both gaur's host environment
with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).

regards, tom lane




Run-time pruning for ModifyTable

2019-06-26 Thread David Rowley
Deja vu from this time last year when despite everyone's best efforts
(mostly Alvaro) we missed getting run-time pruning in for MergeAppend
into the PG11 release.   This year it was ModifyTable, which is now
possible thanks to Amit and Tom's modifications to the inheritance
planner.

I've attached what I have so far for this.  I think it's mostly okay,
but my brain was overheating a bit at the inheritance_planner changes.
I'm not entirely certain that what I've got is correct there. My brain
struggled a bit with the code that Tom wrote to share the data
structures from the SELECT invocation of the grouping_planner() in
inheritance_planner() regarding subquery RTEs. I had to pull out some
more structures from the other PlannerInfo structure in order to get
the base quals from the target rel. I don't quite see a reason why
it's particularly wrong to tag those onto the final_rel, but I'll
prepare myself to be told that I'm wrong about that.

I'm not particularly happy about having to have written the
IS_DUMMY_MODIFYTABLE macro. I just didn't see a more simple way to
determine if the ModifyTable just contains a single dummy Append path.

I also had to change the ModifyTable resultRelInfo pointer to an array
of pointers. This seems to be required since we need to somehow ignore
ResultRelInfos which were pruned.  I didn't do any performance testing
for the added level of indirection, I just imagined that it's
unmeasurable.

I'll include this in for July 'fest.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_pruning_for_modifytable.patch
Description: Binary data


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-06-26 Thread Pavan Deolasee
Hi Andres,

On Wed, May 29, 2019 at 1:50 PM Pavan Deolasee 
wrote:

>
>
> On Tue, 28 May 2019 at 4:36 PM, Andres Freund  wrote:
>
>> Hi,
>>
>> On 2019-04-07 18:04:27 -0700, Andres Freund wrote:
>> > Here's a *prototype* patch for this.  It only implements what I
>> > described for heap_multi_insert, not for plain inserts. I wanted to see
>> > what others think before investing additional time.
>>
>> Pavan, are you planning to work on this for v13 CF1? Or have you lost
>> interest on the topic?
>>
>
> Yes, I plan to work on it.
>
>
I am sorry, but I am not able to find time to get back to this because of
other high priority items. If it still remains unaddressed in the next few
weeks, I will pick it up again. But for now, I am happy if someone wants to
pick and finish the work.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: benchmarking Flex practices

2019-06-26 Thread John Naylor
I wrote:

> > I found a possible other way to bring the size of the transition table
> > under 32k entries while keeping the existing no-backup rules in place:
> > Replace the "quotecontinue" rule with a new state. In the attached
> > draft patch, when Flex encounters a quote while inside any kind of
> > quoted string, it saves the current state and enters %xqs (think
> > 'quotestop'). If it then sees {whitespace_with_newline}{quote}, it
> > reenters the previous state and continues to slurp the string,
> > otherwise, it throws back everything and returns the string it just
> > exited. Doing it this way is a bit uglier, but with some extra
> > commentary it might not be too bad.
>
> I had an epiphany and managed to get rid of the backup states.
> Regression tests pass. The array is down to 30367 entries and the
> binary is smaller by 172kB on Linux x86-64. Performance is identical
> to master on both tests mentioned upthread. I'll clean this up and add
> it to the commitfest.

For the commitfest:

0001 is a small patch to remove some unneeded generality from the
current rules. This lowers the number of elements in the yy_transition
array from 37045 to 36201.

0002 is a cleaned up version of the above, bring the size down to 29521.

I haven't changed psqlscan.l or pgc.l, in case this approach is
changed or rejected

With the two together, the binary is about 175kB smaller than on HEAD.

I also couldn't resist playing around with the idea upthread to handle
unicode escapes in parser.c, which further reduces the number of
states down to 21068, which allows some headroom for future additions
without going back to 32-bit types in the transition array. It mostly
works, but it's quite ugly and breaks the token position handling for
unicode escape syntax errors, so it's not in a state to share.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v3-0001-Remove-some-unneeded-generality-from-the-core-Fle.patch
Description: Binary data


v3-0002-Replace-the-Flex-quotestop-rules-with-a-new-exclu.patch
Description: Binary data


Re: Removing unneeded self joins

2019-06-26 Thread Andrey Lepikhov

Alexander Lakhin detected the bug in the 'remove self joins' patch:

test:
=

CREATE TABLE t (a INT UNIQUE);
INSERT INTO t VALUES (1);
SELECT 1 FROM (SELECT x.* FROM t AS x, t AS y WHERE x.a = y.a) AS q, 
LATERAL generate_series(1, q.a) gs(i);


Description:


FUNCTIONS, TABLEFUNCS and VALUES plan nodes uses direct link to the rte 
table. We need to change varno references to relid which will be kept.


Version v.17 of the patch that fix the bug see in attachment.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 78f043e777dec93dca9a41f16f6197e78afa44a1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 27 Jun 2019 11:28:01 +0500
Subject: [PATCH] Remove Self Joins v.17

---
 src/backend/nodes/outfuncs.c  |   19 +-
 src/backend/optimizer/path/indxpath.c |   28 +-
 src/backend/optimizer/path/joinpath.c |6 +-
 src/backend/optimizer/plan/analyzejoins.c | 1004 +++--
 src/backend/optimizer/plan/planmain.c |7 +-
 src/backend/optimizer/util/pathnode.c |3 +-
 src/backend/optimizer/util/relnode.c  |   26 +-
 src/include/nodes/nodes.h |1 +
 src/include/nodes/pathnodes.h |   22 +-
 src/include/optimizer/pathnode.h  |4 +
 src/include/optimizer/paths.h |3 +-
 src/include/optimizer/planmain.h  |7 +-
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  256 +-
 src/test/regress/sql/equivclass.sql   |   17 +
 src/test/regress/sql/join.sql |  127 +++
 16 files changed, 1463 insertions(+), 99 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8400dd319e..4ac7e53eb9 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2268,7 +2268,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2340,6 +2341,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 	WRITE_BITMAPSET_FIELD(keys);
 }
 
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
 static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
@@ -4106,6 +4120,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c208e9bfb0..ad66394dbd 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueRelInfo **unique_info)
 {
 	ListCell   *ic;
 
 	Assert(list_length(exprlist) == list_length(oprlist));
 
+	if (unique_info)
+		*unique_info = NULL;
+
 	/* Short-circuit if no indexes... */
 	if (rel->indexlist == NIL)
 		return false;
@@ -3660,6 +3665,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *column_values = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3714,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 if (match_index_to_operand(rexpr, c, ind))
 {
 	matched = true; /* column is unique */
+	column_values = lappend(column_values, rinfo->outer_is_left
+			? get_leftop(rin