date:20180316

Re: segmentation fault in pg head with SQL function.

2018-03-16 Thread Michael Paquier

On Fri, Mar 16, 2018 at 11:35:13AM +0530, Prabhat Sahu wrote:
> I found a segmentation fault  on pg master Head with below steps and
> stacktrace.
> 
> postgres=# CREATE OR REPLACE FUNCTION func1() RETURNS VOID
> LANGUAGE SQL
> AS $$
> select 10;
> $$;
> CREATE FUNCTION
> 
> postgres=# select func1();
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !> \q

(Adding Peter and Tom in CC)

Problem reproducible here, and the bug has been introduced by fd1a421f.
It seems to me that the function should not be authorized to be created
to begin with, as it returns an integer in its last query, where I think
that check_sql_fn_retval is doing it wrong when called in
inline_function() as we know that it handles a function, and not a
procedure thanks to the first sanity checks at the top of the function.

Here is what happens before this commit:
=# CREATE OR REPLACE FUNCTION func1() RETURNS VOID
LANGUAGE SQL
AS $$
select 10;
$;
ERROR:  42P13: return type mismatch in function declared to return void
DETAIL:  Actual return type is integer.
CONTEXT:  SQL function "func1"
LOCATION:  check_sql_fn_retval, functions.c:1655

As far as I can see, here is the faulty code:
-   if (!rettype)
+   /*
+* If it's declared to return VOID, we don't care what's in the function.
+* (This takes care of the procedure case, as well.)
+*/
+   if (rettype == VOIDOID)
return false;

While this can be bypassed for a procedure, we should fall down and fail
for a SQL function returning void depending on the tlist generated by
the last SQL query.

@@ -1624,8 +1622,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List 
*queryTreeList,
if (fn_typtype == TYPTYPE_BASE ||
fn_typtype == TYPTYPE_DOMAIN ||
fn_typtype == TYPTYPE_ENUM ||
-   fn_typtype == TYPTYPE_RANGE ||
-   rettype == VOIDOID)
+   fn_typtype == TYPTYPE_RANGE
This also causes the error message generated to change.

Peter, what's the intention here?  Shouldn't void still be treated as a
scalar type?
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] plpgsql - additional extra checks

2018-03-16 Thread Pavel Stehule

2018-03-16 2:46 GMT+01:00 Tomas Vondra :

> On 03/04/2018 07:07 PM, Pavel Stehule wrote:
> >
> > ...
> >
> > I am sending updated patch with Tomas changes
> >
>
> Seems 2cf8c7aa48 broke this patch, as it tweaked a number of regression
> tests. Other than that, I think the patch is pretty much ready.
>
> One minor detail is that this bit from exec_stmt_execsql doesn't seem
> particularly readable:
>
> errlevel = stmt->strict || stmt->mod_stmt ? ERROR : too_many_rows_level;
> use_errhint = !(stmt->strict || stmt->mod_stmt);
>
> I had to think for a while if "||" takes precedence over "?", so I'd
> suggest adding some () around the first condition. But that makes it
> pretty much just (!use_errhint) so perhaps something like
>
> bool force_error;
>
> force_error = (stmt->strict || stmt->mod_stmt);
> errlevel = force_error ? ERROR : too_many_rows_level;
> use_errhint = !force_error;
>
> would be better?
>

good idea


>
> Actually, now that I'm looking at this part of the code again, I see the
> change from
>
> errmsg("query returned more than one row"),
>
> to
>
> errmsg("SELECT INTO query returned more than one row"),
>
> is probably bogus, because this also deals with stmt->mod_stmt, not just
> strict SELECT INTO queries. So let's just revert to the old wording.
>

fixed


>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 6c25116538..c95f168250 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -5003,7 +5003,7 @@ a_output := a_output || $$ if v_$$ || referrer_keys.kind || $$ like '$$
 
   
   
-   Additional Compile-time Checks
+   Additional Compile-time and Run-time Checks
 

 To aid the user in finding instances of simple but common problems before
@@ -5015,26 +5015,62 @@ a_output := a_output || $$ if v_$$ || referrer_keys.kind || $$ like '$$
 so you are advised to test in a separate development environment.

 
- 
-  These additional checks are enabled through the configuration variables
-  plpgsql.extra_warnings for warnings and
-  plpgsql.extra_errors for errors. Both can be set either to
-  a comma-separated list of checks, "none" or "all".
-  The default is "none". Currently the list of available checks
-  includes only one:
-  
-   
-shadowed_variables
-
- 
-  Checks if a declaration shadows a previously defined variable.
- 
-
-   
-  
+   
+Setting plpgsql.extra_warnings, or
+plpgsql.extra_errors, as appropriate, to all
+is encouraged in development and/or testing environments.
+   
+
+   
+These additional checks are enabled through the configuration variables
+plpgsql.extra_warnings for warnings and
+plpgsql.extra_errors for errors. Both can be set either to
+a comma-separated list of checks, "none" or
+"all". The default is "none". Currently
+the list of available checks includes only one:
+
+ 
+  shadowed_variables
+  
+   
+Checks if a declaration shadows a previously defined variable.
+   
+  
+ 
 
-  The following example shows the effect of plpgsql.extra_warnings
-  set to shadowed_variables:
+ 
+  strict_multi_assignment
+  
+   
+Some PL/PgSQL commands allow assigning values
+to more than one variable at a time, such as SELECT INTO.  Typically,
+the number of target variables and the number of source variables should
+match, though PL/PgSQL will use NULL for
+missing values and extra variables are ignored.  Enabling this check
+will cause PL/PgSQL to throw a WARNING or
+ERROR whenever the number of target variables and the number of source
+variables are different.
+   
+  
+ 
+
+ 
+  too_many_rows
+  
+   
+Enabling this check will cause PL/PgSQL to
+check if a given query returns more than one row when an
+INTO clause is used.  As an INTO statement will only
+ever use one row, having a query return multiple rows is generally
+either inefficient and/or nondeterministic and therefore is likely an
+error.
+   
+  
+ 
+
+
+The following example shows the effect of plpgsql.extra_warnings
+set to shadowed_variables:
 
 SET plpgsql.extra_warnings TO 'shadowed_variables';
 
@@ -5050,8 +5086,38 @@ LINE 3: f1 int;
 ^
 CREATE FUNCTION
 
- 
- 
+The below example shows the effects of setting
+plpgsql.extra_warnings to
+strict_multi_assignment:
+
+SET plpgsql.extra_warnings TO 'strict_multi_assignment';
+
+CREATE OR REPLACE FUNCTION public.foo()
+ RETURNS void
+ LANGUAGE plpgsql
+AS $$
+DECLARE
+  x int;
+  y int;
+BEGIN
+  SELECT 1 INTO x, y;
+  SELECT 1, 2 INTO x, y;
+  SELECT 1, 2, 3 INTO x, y;
+END;
+$$
+
+SELECT foo();
+WARNING:  Number of evaluated fields does not ma

Re: missing support of named convention for procedures

2018-03-16 Thread Pavel Stehule

2018-03-15 22:13 GMT+01:00 Pavel Stehule :

> Hi
>
> create or replace procedure proc2(in a int, in b int)
> as $$
> begin
>   a := a * 10;
>   b := b * 10;
> end;
> $$ language plpgsql;
>
> postgres=# call proc2(a => 10,b => 20);
> ERROR:  XX000: unrecognized node type: 107
> LOCATION:  ExecInitExprRec, execExpr.c:2114
>

Defaults are not supported too:

 postgres=# create or replace procedure foo1(a int, b int, c int default 10)
as $$
begin
  raise notice 'a: %, b: %, c: %', a, b, c;
end;
$$ language plpgsql;
CREATE PROCEDURE
postgres=# call foo1(10,20);
NOTICE:  0: a: 10, b: 20, c: -778600432
LOCATION:  exec_stmt_raise, pl_exec.c:3643
CALL


> Regards
>
> Pavel
>

Re: WaitLatchOrSocket optimization

2018-03-16 Thread Konstantin Knizhnik


Hi,

On 15.03.2018 20:25, Andres Freund wrote:

Hi,

On 2018-03-15 19:01:40 +0300, Konstantin Knizhnik wrote:

Right now function WaitLatchOrSocket is implemented in very inefficient way:
for each invocation it creates epoll instance, registers events and then
closes this instance.

Right, everything performance critical should be migrated to using a
persistent wait even set.



But there are still lot of places in Postgres where WaitLatchOrSocket or
WaitLatch are used.

Most don't matter performancewise though.
Yes, I didn't see significant difference in performance on workloads not 
including postgres_fdw.
But greeping for all occurrences of WaitLatch I found some places which 
are used to be called quite frequently, for example
waiting for latch in ProcSleep , waiting for more data in shared memory 
message queue, waiting for parallel workers to attach,

waiting for synchronous replica,...
There are about hundred of usages of WaitLatch and I am not sure that 
all of them do not have impact on performance.



There are two possible ways of fixing this issue:
1. Patch postgres_fdw to store WaitEventSet in connection.
2. Patch WaitLatchOrSocket to cache created wait event sets.

I'm strongly opposed to 2).  The lifetime of sockets will make this an
absolute mess, it'll potentially explode the number of open file
handles, and some OSs don't handle a large number of sets at the same
time.

I understand your concern.
But you are not completely right here.
First of all life time of socket doesn't actually matter: if socket is 
closed, then it is automatically removed from event set.

Cite from epoll manual:

  Q6  Will closing a file descriptor cause it to be removed from 
all epoll sets automatically?


   A6  Yes,  but  be  aware  of the following point.  A file 
descriptor is a reference to an open file description (see open(2)).  
Whenever a file
   descriptor is duplicated via dup(2), dup2(2), fcntl(2) 
F_DUPFD, or fork(2), a new file descriptor referring to the same open 
file  descrip-
   tion is created.  An open file description continues to 
exist until all file descriptors referring to it have been closed. A 
file descrip-
   tor is removed from an epoll set only after all the file 
descriptors referring to the underlying open file description have been 
closed (or
   before  if  the file descriptor is explicitly removed using 
epoll_ctl(2) EPOLL_CTL_DEL).  This means that even after a file 
descriptor that
   is part of an epoll set has been closed, events may be 
reported for that file descriptor if other file descriptors referring  
to  the  same

   underlying file description remain open.

So, only file descriptors created by epoll_create affect number of 
descriptors opened by a process.
But this number is limited by size of the cache. Right now I hardcoded 
cache size 113, but in most cases even much smaller cache will be enough.
I do not see that extra let's say 13 open file descriptors can cause 
open file descriptor limit exhaustion.


From my point of view we should either rewrite WaitLatchOrSocket to not 
use epoll at all (use poll instead), either cache created event set 
descriptors.
Result of pgbench with poll instead epoll is 140k TPS, which is 
certainly much better than 38k TPS with current master, but much worser 
than with cached epoll - 225k TPS.




Greetings,

Andres Freund


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-16 Thread Pavel Stehule

2018-03-16 5:46 GMT+01:00 Michael Paquier :

> On Fri, Mar 16, 2018 at 10:25:35AM +0900, Michael Paquier wrote:
> >> But, I suppose it is a bit too big.
> >
> > That's of course not backpatchable.
>
> So in this jungle attached is my counter-proposal.  As the same code
> pattern is repeated in three places, we could as well refactor the whole
> into a common routine, say in src/common/guc_options.c or similar.
> Perhaps just on HEAD and not back-branches as this is always annoying
> for packagers on Windows using custom scripts.  Per the lack of
> complains only doing something on HEAD, with only a subset of the
> parameters which make sense for CREATE FUNCTION is what makes the most
> sense in my opinion.
>

Last patch is good enough solution, I tested it and it is working

Little bit strange is support of GUC, that cannot be atteched to any fx.
Probably only PGC_USERSET, maybe PGC_SUSET has sense there

+if (pg_strcasecmp(configitem, "DateStyle") == 0 ||
+pg_strcasecmp(configitem, "listen_addresses") == 0 ||
+pg_strcasecmp(configitem, "local_preload_libraries")
== 0 ||
+pg_strcasecmp(configitem, "log_destination") == 0 ||
+pg_strcasecmp(configitem, "plpgsql.extra_errors") == 0
||
+pg_strcasecmp(configitem, "plpgsql.extra_warnings") ==
0 ||
+pg_strcasecmp(configitem, "search_path") == 0 ||
+pg_strcasecmp(configitem, "session_preload_libraries")
== 0 ||
+pg_strcasecmp(configitem, "shared_preload_libraries")
== 0 ||
+pg_strcasecmp(configitem, "synchronous_standby_names")
== 0 ||
+pg_strcasecmp(configitem, "temp_tablespaces") == 0 ||
+pg_strcasecmp(configitem, "wal_consistency_checking")
== 0)


another idea, how to solve this issue for extensions. The information about
format of extensions GUC can be stored in new column pg_extension - maybe
just a array of list of LIST GUC.

CREATE EXTENSION plpgsql ...
   GUC_PREFIX 'plpgsql'
   LIST_GUC ('extra_errors','extra_warnings')
   ...

Regards

Pavel



>
> As mentioned in this bug, the handling of empty values gets kind of
> tricky as in this case proconfig stores a set of quotes and not an
> actual value:
> https://www.postgresql.org/message-id/152049236165.23137.
> 5241258464332317...@wrigleys.postgresql.org
> --
> Michael
>

Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-16 Thread Kyotaro HORIGUCHI

At Fri, 16 Mar 2018 09:16:54 +0100, Pavel Stehule  
wrote in 
> 2018-03-16 5:46 GMT+01:00 Michael Paquier :
> 
> > On Fri, Mar 16, 2018 at 10:25:35AM +0900, Michael Paquier wrote:
> > >> But, I suppose it is a bit too big.
> > >
> > > That's of course not backpatchable.

I meant that it is just too big for the objective regardless of
back-patching:p

So, automatic generation is not acceptable, dynamic checking
(including that at the function creation) would be
unacceptable. So I agree that what we can do now is just maintian
the list of pg_strcasecmp.

> > So in this jungle attached is my counter-proposal.  As the same code
> > pattern is repeated in three places, we could as well refactor the whole
> > into a common routine, say in src/common/guc_options.c or similar.
> > Perhaps just on HEAD and not back-branches as this is always annoying
> > for packagers on Windows using custom scripts.  Per the lack of
> > complains only doing something on HEAD, with only a subset of the
> > parameters which make sense for CREATE FUNCTION is what makes the most
> > sense in my opinion.
> >
> 
> Last patch is good enough solution, I tested it and it is working
> 
> Little bit strange is support of GUC, that cannot be atteched to any fx.
> Probably only PGC_USERSET, maybe PGC_SUSET has sense there

That's true, but doesn't the additional rule make it more
bothersome to maintain the list?

> +if (pg_strcasecmp(configitem, "DateStyle") == 0 ||
> +pg_strcasecmp(configitem, "listen_addresses") == 0 ||
> +pg_strcasecmp(configitem, "local_preload_libraries")
> == 0 ||
> +pg_strcasecmp(configitem, "log_destination") == 0 ||
> +pg_strcasecmp(configitem, "plpgsql.extra_errors") == 0
> ||
> +pg_strcasecmp(configitem, "plpgsql.extra_warnings") ==
> 0 ||
> +pg_strcasecmp(configitem, "search_path") == 0 ||
> +pg_strcasecmp(configitem, "session_preload_libraries")
> == 0 ||
> +pg_strcasecmp(configitem, "shared_preload_libraries")
> == 0 ||
> +pg_strcasecmp(configitem, "synchronous_standby_names")
> == 0 ||
> +pg_strcasecmp(configitem, "temp_tablespaces") == 0 ||
> +pg_strcasecmp(configitem, "wal_consistency_checking")
> == 0)
> 
> 
> another idea, how to solve this issue for extensions. The information about
> format of extensions GUC can be stored in new column pg_extension - maybe
> just a array of list of LIST GUC.
> 
> CREATE EXTENSION plpgsql ...
>GUC_PREFIX 'plpgsql'
>LIST_GUC ('extra_errors','extra_warnings')
>...
> 
> Regards
> 
> Pavel
> 
> 
> 
> >
> > As mentioned in this bug, the handling of empty values gets kind of
> > tricky as in this case proconfig stores a set of quotes and not an
> > actual value:
> > https://www.postgresql.org/message-id/152049236165.23137.
> > 5241258464332317...@wrigleys.postgresql.org
> > --
> > Michael

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2018-03-16 Thread Ashutosh Bapat

On Thu, Mar 15, 2018 at 8:57 PM, Tom Lane  wrote:
> We've long made fun of Oracle(TM) for the fact that if you just want
> to evaluate some expressions, you have to write "select ... from dual"
> rather than just "select ...".  But I've realized recently that there's
> a bit of method in that madness after all.  Specifically, having to
> cope with FromExprs that contain no base relation is fairly problematic
> in the planner.  prepjointree.c is basically unable to cope with
> flattening a subquery that looks that way, although we've inserted a
> lot of overly-baroque logic to handle some subsets of the case (cf
> is_simple_subquery(), around line 1500).  If memory serves, there are
> other places that are complicated by the case.
>
> Suppose that, either in the rewriter or early in the planner, we were
> to replace such cases with nonempty FromExprs, by adding a dummy RTE
> representing a table with no columns and one row.  This would in turn
> give rise to an ordinary Path that converts to a Result plan, so that
> the case is handled without any special contortions later.  Then there
> is no case where we don't have a nonempty relids set identifying a
> subquery, so that all that special-case hackery in prepjointree.c
> goes away, and we can simplify whatever else is having a hard time
> with it.
>

The idea looks neat.

Since table in the dummy FROM clause returns one row without any
column, I guess, there will be at least one row in the output. I am
curious how would we handle cases which do not return any row
like

create function set_ret_func() returns setof record as $$select * from
pg_class where oid = 0;$$ language sql;
select set_ret_func();
 set_ret_func
--
(0 rows)

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-16 Thread Pavel Stehule

2018-03-16 9:34 GMT+01:00 Kyotaro HORIGUCHI :

> At Fri, 16 Mar 2018 09:16:54 +0100, Pavel Stehule 
> wrote in  zyujawdnu...@mail.gmail.com>
> > 2018-03-16 5:46 GMT+01:00 Michael Paquier :
> >
> > > On Fri, Mar 16, 2018 at 10:25:35AM +0900, Michael Paquier wrote:
> > > >> But, I suppose it is a bit too big.
> > > >
> > > > That's of course not backpatchable.
>
> I meant that it is just too big for the objective regardless of
> back-patching:p
>
> So, automatic generation is not acceptable, dynamic checking
> (including that at the function creation) would be
> unacceptable. So I agree that what we can do now is just maintian
> the list of pg_strcasecmp.
>
> > > So in this jungle attached is my counter-proposal.  As the same code
> > > pattern is repeated in three places, we could as well refactor the
> whole
> > > into a common routine, say in src/common/guc_options.c or similar.
> > > Perhaps just on HEAD and not back-branches as this is always annoying
> > > for packagers on Windows using custom scripts.  Per the lack of
> > > complains only doing something on HEAD, with only a subset of the
> > > parameters which make sense for CREATE FUNCTION is what makes the most
> > > sense in my opinion.
> > >
> >
> > Last patch is good enough solution, I tested it and it is working
> >
> > Little bit strange is support of GUC, that cannot be atteched to any fx.
> > Probably only PGC_USERSET, maybe PGC_SUSET has sense there
>
> That's true, but doesn't the additional rule make it more
> bothersome to maintain the list?
>

Hard to say. I have not opinion. But just when I see in this context
variables like listen_address, shared_preload_librarries, then it looks
messy.




>
> > +if (pg_strcasecmp(configitem, "DateStyle") == 0 ||
> > +pg_strcasecmp(configitem, "listen_addresses") == 0
> ||
> > +pg_strcasecmp(configitem, "local_preload_libraries")
> > == 0 ||
> > +pg_strcasecmp(configitem, "log_destination") == 0 ||
> > +pg_strcasecmp(configitem, "plpgsql.extra_errors")
> == 0
> > ||
> > +pg_strcasecmp(configitem, "plpgsql.extra_warnings")
> ==
> > 0 ||
> > +pg_strcasecmp(configitem, "search_path") == 0 ||
> > +pg_strcasecmp(configitem,
> "session_preload_libraries")
> > == 0 ||
> > +pg_strcasecmp(configitem,
> "shared_preload_libraries")
> > == 0 ||
> > +pg_strcasecmp(configitem,
> "synchronous_standby_names")
> > == 0 ||
> > +pg_strcasecmp(configitem, "temp_tablespaces") == 0
> ||
> > +pg_strcasecmp(configitem,
> "wal_consistency_checking")
> > == 0)
> >
> >
> > another idea, how to solve this issue for extensions. The information
> about
> > format of extensions GUC can be stored in new column pg_extension - maybe
> > just a array of list of LIST GUC.
> >
> > CREATE EXTENSION plpgsql ...
> >GUC_PREFIX 'plpgsql'
> >LIST_GUC ('extra_errors','extra_warnings')
> >...
> >
> > Regards
> >
> > Pavel
> >
> >
> >
> > >
> > > As mentioned in this bug, the handling of empty values gets kind of
> > > tricky as in this case proconfig stores a set of quotes and not an
> > > actual value:
> > > https://www.postgresql.org/message-id/152049236165.23137.
> > > 5241258464332317...@wrigleys.postgresql.org
> > > --
> > > Michael
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

Re: [HACKERS] [PATCH] Incremental sort

2018-03-16 Thread Alexander Korotkov

On Fri, Mar 16, 2018 at 5:12 AM, Tomas Vondra 
wrote:

> I agree those don't seem like an issue in the Incremental Sort patch,
> but like a more generic costing problems.
>

Yes, I think so too.
Do you think we can mark this patch RFC assuming that it have already got
pretty
much of review previously.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Hash join in SELECT target list expression keeps consuming memory

2018-03-16 Thread Amit Khandekar

Hi,

If the SELECT target list expression is a join subquery, and if the
subquery does a hash join, then the query keeps on consuming more and
more memory. Below is such a query :

SELECT
   (SELECT id FROM
  unnest((pg_catalog.acldefault('L',l.col1))) WITH ORDINALITY AS
perm(acl,id),
  generate_series(1, l.col1::int % 2) AS gen2(gen2_id)
  where id = gen2_id
)
FROM largetable l ;

where the table largetable is created using :
create table largetable as select * from generate_series(1, 3200)
as t(col1);

Here is the plan :
QUERY PLAN
---
 Seq Scan on largetable l  (cost=0.00..672781971.36 rows=3218 width=8)
   SubPlan 1
 ->  Hash Join  (cost=2.26..21.01 rows=500 width=8)
   Hash Cond: (gen2.gen2_id = perm.id)
   ->  Function Scan on generate_series gen2
(cost=0.01..10.01 rows=1000 width=4)
   ->  Hash  (cost=1.00..1.00 rows=100 width=8)
 ->  Function Scan on unnest perm  (cost=0.01..1.00
rows=100 width=8)

Now, if we disable hash join, the planner chooses merge join, and that
does not keep on consuming memory:

   QUERY PLAN
-
 Seq Scan on largetable l  (cost=0.00..2275308358.33 rows=3218 width=8)
   SubPlan 1
 ->  Merge Join  (cost=59.84..71.09 rows=500 width=8)
   Merge Cond: (perm.id = gen2.gen2_id)
   ->  Function Scan on unnest perm  (cost=0.01..1.00 rows=100 width=8)
   ->  Sort  (cost=59.83..62.33 rows=1000 width=4)
 Sort Key: gen2.gen2_id
 ->  Function Scan on generate_series gen2
(cost=0.01..10.01 rows=1000 width=4)

Either with merge join or nested loop join, the postgres process
memory remains constant all the time. (It chooses hash join due to
incorrect row estimates of unnest() and generate_series() functions)

I am yet to analyze the root cause of this behaviour, but meanwhile, I
am posting it here, in case this turns out to be a known
behaviour/issue for hash joins, or there is already some work being
done on it.

I suspected that the set returning functions might be leaking memory,
but then we would have seen the behaviour on other types of joins as
well.

--

Actually I encountered this issue when I tried to run pg_dump with 32M
number of blobs. The corresponding backend process consumed memory
until it was killed by the OOM killer.

Steps to reproduce the issue with pg_dump :
1. create table lo_table (id numeric, lo oid) ;
2. -- Create 32M rows
   insert into lo_table select
a.i,lo_from_bytea(0,E'\\xff00') from
generate_series(1,32775000) as a(i);
3. -- Then run pg_dump.

pg_dump backend gets killed, and pg_dump aborts wih this message, that
has the query which consumed memory :

pg_dump: [archiver (db)] query failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: [archiver (db)] query was:

SELECT l.oid,

(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = l.lomowner) AS rolname,

(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl,
row_n FROM 
pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner)))
WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner)))
AS init(init_acl) WHERE acl = init_acl)) as foo) AS lomacl,

(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl,
row_n FROM 
pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner)))
WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner)))
AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rlomacl,

NULL AS initlomacl, NULL AS initrlomacl

FROM pg_largeobject_metadata l LEFT JOIN pg_init_privs pip ON (l.oid =
pip.objoid AND pip.classoid = 'pg_largeobject'::regclass AND
pip.objsubid = 0) ;

As you can see, the subplan expressions are using pg_init_privs table,
so as to collect any changed extension ACLs for the large objects. So
this won't reproduce before commit
23f34fa4ba358671adab16773e79c17c92cbc870.

Note: the pg_dump client process itself also consumes more and more
memory, although to a lesser extent. But this is a different thing,
and is already a known/expected behaviour :
https://www.postgresql.org/message-id/29613.1476969807%40sss.pgh.pa.us


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-16 Thread Amit Kapila

On Wed, Mar 14, 2018 at 12:01 AM, Robert Haas  wrote:
>
> 0003 introduces a new upper relation to represent the result of
> applying the scan/join target to the topmost scan/join relation.  I'll
> explain further down why this seems to be needed.  Since each path
> must belong to only one relation, this involves using
> create_projection_path() for the non-partial pathlist as we already do
> for the partial pathlist, rather than apply_projection_to_path().
> This is probably marginally more expensive but I'm hoping it doesn't
> matter.  (However, I have not tested.)
>

I think in the patch series this is the questionable patch wherein it
will always add an additional projection path (whether or not it is
required) to all Paths (partial and non-partial) for the scanjoin rel
and then later remove it (if not required) in create_projection_plan.
 As you are saying, I also think it might not matter much in the grand
scheme of things and if required we can test it as well, however, I
think it is better if some other people can also share their opinion
on this matter.

Tom, do you have anything to say?


  Each non-partial path in the
> topmost scan/join rel produces a corresponding path in the new upper
> rel.  The same is done for partial paths if the scan/join target is
> parallel-safe; otherwise we can't.
>
> 0004 causes the main part of the planner to skip calling
> generate_gather_paths() from the topmost scan/join rel.  This logic is
> mostly stolen from your patch.  If the scan/join target is NOT
> parallel-safe, we perform generate_gather_paths() on the topmost
> scan/join rel.  If the scan/join target IS parallel-safe, we do it on
> the post-projection rel introduced by 0003 instead.  This is the patch
> that actually fixes Jeff's original complaint.
>

Looks good, I feel you can include the test from my patch as well.

> 0005 goes a bit further and removes a bunch of logic from
> create_ordered_paths().  The removed logic tries to satisfy the
> query's ordering requirement via cheapest_partial_path + Sort + Gather
> Merge.  Instead, it adds an additional path to the new upper rel added
> in 0003.  This again avoids a call to apply_projection_to_path() which
> could cause projection to be pushed down after costing has already
> been fixed.  Therefore, it gains the same advantages as 0004 for
> queries that would this sort of plan.
>

After this patch, part of the sorts related work will be done in
create_ordered_paths and the other in grouping_planner, which looks
bit odd, otherwise, I don't see any problem with it.

>  Currently, this loses the
> ability to set limit_tuples for the Sort path; that doesn't look too
> hard to fix but I haven't done it.  If we decide to proceed with this
> overall approach I'll see about getting it sorted out.
>

Okay, that makes sense.

> Unfortunately, when I initially tried this approach, a number of
> things broke due to the fact that create_projection_path() was not
> exactly equivalent to apply_projection_to_path().  This initially
> surprised me, because create_projection_plan() contains logic to
> eliminate the Result node that is very similar to the logic in
> apply_projection_to_path().  If apply_projection_path() sees that the
> subordinate node is projection-capable, it applies the revised target
> list to the child; if not, it adds a Result node.
> create_projection_plan() does the same thing.  However,
> create_projection_plan() can lose the physical tlist optimization for
> the subordinate node; it forces an exact projection even if the parent
> node doesn't require this.  0001 fixes this, so that we get the same
> plans with create_projection_path() that we would with
> apply_projection_to_path().  I think this patch is a good idea
> regardless of what we decide to do about the rest of this, because it
> can potentially avoid losing the physical-tlist optimization in any
> place where create_projection_path() is used.
>
> It turns out that 0001 doesn't manage to preserve the physical-tlist
> optimization when the projection path is attached to an upper
> relation.  0002 fixes this.
>

I have done some basic verification of 0001 and 0002, will do further
review/tests, if I don't see any objection from anyone else about the
overall approach.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: missing support of named convention for procedures

2018-03-16 Thread Pavel Stehule

2018-03-16 8:43 GMT+01:00 Pavel Stehule :

>
>
> 2018-03-15 22:13 GMT+01:00 Pavel Stehule :
>
>> Hi
>>
>> create or replace procedure proc2(in a int, in b int)
>> as $$
>> begin
>>   a := a * 10;
>>   b := b * 10;
>> end;
>> $$ language plpgsql;
>>
>> postgres=# call proc2(a => 10,b => 20);
>> ERROR:  XX000: unrecognized node type: 107
>> LOCATION:  ExecInitExprRec, execExpr.c:2114
>>
>
> Defaults are not supported too:
>
>  postgres=# create or replace procedure foo1(a int, b int, c int default
> 10)
> as $$
> begin
>   raise notice 'a: %, b: %, c: %', a, b, c;
> end;
> $$ language plpgsql;
> CREATE PROCEDURE
> postgres=# call foo1(10,20);
> NOTICE:  0: a: 10, b: 20, c: -778600432
> LOCATION:  exec_stmt_raise, pl_exec.c:3643
> CALL
>

attached patch fixes it

Regards

Pavel


>
>
>> Regards
>>
>> Pavel
>>
>
>
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 86fa8c0dd7..c7a44d858b 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -55,6 +55,7 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/var.h"
+#include "optimizer/clauses.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
@@ -2254,6 +2255,9 @@ ExecuteCallStmt(CallStmt *stmt, ParamListInfo params, bool atomic, DestReceiver
 		elog(ERROR, "cache lookup failed for function %u", fexpr->funcid);
 	if (!heap_attisnull(tp, Anum_pg_proc_proconfig))
 		callcontext->atomic = true;
+
+	fexpr->args = expand_function_arguments(fexpr->args, fexpr->funcresulttype, tp);
+
 	ReleaseSysCache(tp);
 
 	/* Initialize function call structure */
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index a9a09afd2b..40eae3a835 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -130,8 +130,6 @@ static Expr *simplify_function(Oid funcid,
   Oid result_collid, Oid input_collid, List **args_p,
   bool funcvariadic, bool process_args, bool allow_non_const,
   eval_const_expressions_context *context);
-static List *expand_function_arguments(List *args, Oid result_type,
-		  HeapTuple func_tuple);
 static List *reorder_function_arguments(List *args, HeapTuple func_tuple);
 static List *add_function_defaults(List *args, HeapTuple func_tuple);
 static List *fetch_function_defaults(HeapTuple func_tuple);
@@ -4112,7 +4110,7 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
  * cases it handles should never occur there.  This should be OK since it
  * will fall through very quickly if there's nothing to do.
  */
-static List *
+List *
 expand_function_arguments(List *args, Oid result_type, HeapTuple func_tuple)
 {
 	Form_pg_proc funcform = (Form_pg_proc) GETSTRUCT(func_tuple);
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..ed854fdd40 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -14,9 +14,9 @@
 #ifndef CLAUSES_H
 #define CLAUSES_H
 
+#include "access/htup.h"
 #include "nodes/relation.h"
 
-
 #define is_opclause(clause)		((clause) != NULL && IsA(clause, OpExpr))
 #define is_funcclause(clause)	((clause) != NULL && IsA(clause, FuncExpr))
 
@@ -85,4 +85,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Query *inline_set_returning_function(PlannerInfo *root,
 			  RangeTblEntry *rte);
 
+extern List *expand_function_arguments(List *args, Oid result_type,
+		  HeapTuple func_tuple);
+
 #endif			/* CLAUSES_H */
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out b/src/pl/plpgsql/src/expected/plpgsql_call.out
index 1e94a44f2b..9d7d7da5b0 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_call.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -169,3 +169,80 @@ DROP PROCEDURE test_proc1;
 DROP PROCEDURE test_proc3;
 DROP PROCEDURE test_proc4;
 DROP TABLE test1;
+-- named parameters and defaults
+CREATE PROCEDURE test_proc(a int, b int, c int DEFAULT -1)
+AS $$
+BEGIN
+  RAISE NOTICE 'a: %, b: %, c: %', a, b, c;
+END;
+$$ LANGUAGE plpgsql;
+CALL test_proc(10,20,30);
+NOTICE:  a: 10, b: 20, c: 30
+CALL test_proc(10,20);
+NOTICE:  a: 10, b: 20, c: -1
+CALL test_proc(c=>1, a=>3, b=>2);
+NOTICE:  a: 3, b: 2, c: 1
+DROP PROCEDURE test_proc;
+CREATE PROCEDURE test_proc1(INOUT _a int, INOUT _b int)
+AS $$
+BEGIN
+  RAISE NOTICE 'test_proc1: a: %, b: %', _a, _b;
+  _a := _a * 10;
+  _b := _b + 10;
+END;
+$$ LANGUAGE plpgsql;
+CALL test_proc1(10,20);
+NOTICE:  test_proc1: a: 10, b: 20
+ _a  | _b 
+-+
+ 100 | 30
+(1 row)
+
+CALL test_proc1(_b=>20, _a=>10);
+NOTICE:  test_proc1: a: 10, b: 20
+ _a  | _b 
+-+
+ 100 | 30
+(1 row)
+
+DO $$
+DECLARE a int; b int;
+BEGIN
+  a := 10; b := 30;
+  CALL test_proc1(a, b);
+  RAISE NOTICE 'a: %, b: %', a, b;
+  a := 10; b := 30;
+  CALL test_proc1(_b=>b, _a=>a);
+  RAISE NOTICE 'a: %, b: %', a, b;
+END
+$$;
+NOTICE:  test_

Re: missing support of named convention for procedures

2018-03-16 Thread Pavel Stehule

2018-03-16 11:29 GMT+01:00 Pavel Stehule :

>
>
> 2018-03-16 8:43 GMT+01:00 Pavel Stehule :
>
>>
>>
>> 2018-03-15 22:13 GMT+01:00 Pavel Stehule :
>>
>>> Hi
>>>
>>> create or replace procedure proc2(in a int, in b int)
>>> as $$
>>> begin
>>>   a := a * 10;
>>>   b := b * 10;
>>> end;
>>> $$ language plpgsql;
>>>
>>> postgres=# call proc2(a => 10,b => 20);
>>> ERROR:  XX000: unrecognized node type: 107
>>> LOCATION:  ExecInitExprRec, execExpr.c:2114
>>>
>>
>> Defaults are not supported too:
>>
>>  postgres=# create or replace procedure foo1(a int, b int, c int default
>> 10)
>> as $$
>> begin
>>   raise notice 'a: %, b: %, c: %', a, b, c;
>> end;
>> $$ language plpgsql;
>> CREATE PROCEDURE
>> postgres=# call foo1(10,20);
>> NOTICE:  0: a: 10, b: 20, c: -778600432
>> LOCATION:  exec_stmt_raise, pl_exec.c:3643
>> CALL
>>
>
> attached patch fixes it
>
> Regards
>
> Pavel
>

variadic parameters are supported too.



>
>
>>
>>
>>> Regards
>>>
>>> Pavel
>>>
>>
>>
>

Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-16 Thread Arthur Zakirov

On Fri, Mar 16, 2018 at 10:21:39AM +0900, Michael Paquier wrote:
> On Thu, Mar 15, 2018 at 01:33:51PM +0300, Arthur Zakirov wrote:
> > I think your approach has a vulnerability too. I believe that a
> > non GUC_LIST_INPUT extension GUC which was used to create a function may
> > become GUC_LIST_INPUT variable. If I'm not mistaken nothing stops from
> > that. In this case values in proconfigislist won't be valide anymore.
> 
> I don't understand what you mean here.  Are you referring to a custom
> GUC which was initially declared as not being a list, but became a list
> after a plugin upgrade with the same name?

Yes exactly. Sorry for the unclear message.

> Isn't the author to blame in this case?

Maybe he is. It may be better to rename a variable if it became a list.
I haven't strong opinion here though. I wanted to point the case where
proconfigislist column won't work.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Pavan Deolasee

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera 
wrote:

>
>
@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
int num_subplan_partition_offsets;
TupleTableSlot *partition_tuple_slot;
TupleTableSlot *root_tuple_slot;
+   List  **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
 } PartitionTupleRouting;

I am curious why you decided to add these members to PartitionTupleRouting
structure. Wouldn't ResultRelationInfo be a better place to track these or
is there some rule that we follow?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2018-03-16 Thread Ashutosh Bapat

On Tue, Mar 13, 2018 at 8:34 PM, Stephen Frost  wrote:
>
> It would really help to have some examples of exactly what is being
> proposed here wrt solutions.
>
> WCO is defined at a view level, so I'm not following the notion that
> we're going to depend on something remote to enforce the WCO when the
> remote object is just a regular table that you can't define a WCO on top
> of.  That's not the case when we're talking about foreign tables vs.
> local tables, so it's not the same.  I certainly don't think we should
> require a remote view to exist to perform the WCO check.  If the remote
> WCO is a view itself then I would expect it to operate in the same
> manner as WCO on local views does- you can have them defined as being
> cascaded or not.
>
> In other words, there is no case where we have a "foreign view."  Views
> are always local.  A "foreign table" could actually be a view, in which
> case everything we treat it as a table in the local database, but WCO
> doesn't come up in that case at all- there's no way to define WCO on a
> table, foreign or not.  If WCO is defined on the view on the remote
> server, then it should operate properly and not require anything from the
> local side.

I agree with this analysis. I have no objection about the patch anymore.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: parallel append vs. simple UNION ALL

2018-03-16 Thread Ashutosh Bapat

On Wed, Mar 14, 2018 at 2:09 AM, Robert Haas  wrote:
> On Tue, Mar 13, 2018 at 12:35 AM, Ashutosh Bapat
>  wrote:
>> It looks like it was not changed in all the places. make falied. I
>> have fixed all the instances of these two functions in the attached
>> patchset (only 0003 changes). Please check.
>
> Oops.  Thanks.
>
> I'm going to go ahead and commit 0001 here.  Any more thoughts on the rest?

Nope. I am good with the patchset.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-16 Thread Robert Haas

On Fri, Mar 16, 2018 at 6:06 AM, Amit Kapila  wrote:
> I think in the patch series this is the questionable patch wherein it
> will always add an additional projection path (whether or not it is
> required) to all Paths (partial and non-partial) for the scanjoin rel
> and then later remove it (if not required) in create_projection_plan.
>  As you are saying, I also think it might not matter much in the grand
> scheme of things and if required we can test it as well, however, I
> think it is better if some other people can also share their opinion
> on this matter.
>
> Tom, do you have anything to say?

I forgot to include part of the explanation in my previous email.  The
reason it has to work this way is that, of course, you can't include
the same path in the path list of relation B as you put into the path
of relation A; if you do, then you will be in trouble if a later
addition to the path list of relation B kicks that path out, because
it will get pfree'd, leaving a garbage pointer in the list for A,
which you may subsequently referenced.

At one point, I tried to solve the problem by sorting the cheapest
partial path from the scan/join rel and putting it back into the same
pathlist, but that also fails: in some cases, the new path dominates
all of the existing paths because it is better-sorted and only very
slightly more expensive.  So, when you call add_partial_path() for the
sort path, all of the previous paths -- including the one the sort
path is pointing to as its subpath! -- get pfree'd.

So without another relation, and the projection paths, I couldn't get
it to where I didn't have to modify paths after creating them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: parallel append vs. simple UNION ALL

2018-03-16 Thread Rajkumar Raghuwanshi

On Fri, Mar 9, 2018 at 1:04 AM, Robert Haas  wrote:

> Great.  Committed 0001.  Are you planning any further testing of this
> patch series?


Sorry I missed the mail.
Yes, I have further tested patches and find no more issues.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: inserts into partitioned table may cause crash

2018-03-16 Thread Etsuro Fujita


(2018/03/14 17:25), Etsuro Fujita wrote:

(2018/03/14 14:54), Amit Langote wrote:

Btw, I noticed that the patches place ExecPrepareTupleRouting (both the
declaration and the definition) at different relative locations within
nodeModifyTable.c in the HEAD branch vs. in the 10 branch. It might be a
good idea to bring them to the same relative location to avoid hassle
when
back-patching relevant patches in the future. I did that in the attached
updated version (v4) of the patch for HEAD, which does not make any other
changes. Although, the patch for PG-10 is unchanged, I still changed its
file name to contain v4.


That's a good idea! Thanks for the updated patches!


Sorry, I didn't look at those patches carefully, but I noticed that 
while the patch for PG10 has put the definition of that function after 
the definitions of functions such as fireBSTriggers, fireASTriggers and 
ExecSetupTransitionCaptureState, but the patch for HEAD puts it *before* 
the definitions of these functions (note: these functions don't have 
their declarations at the file header), which seems a bit inconsistent. 
 ISTM that the v3 patches for PG10 and HEAD have already put that 
function in the right place in terms of that relativity.


Best regards,
Etsuro Fujita

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Etsuro Fujita


(2018/03/16 19:43), Pavan Deolasee wrote:

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera mailto:alvhe...@2ndquadrant.com>> wrote:



@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
 int num_subplan_partition_offsets;
 TupleTableSlot *partition_tuple_slot;
 TupleTableSlot *root_tuple_slot;
+   List  **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
  } PartitionTupleRouting;



I am curious why you decided to add these members to
PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
place to track these or is there some rule that we follow?


I just started reviewing the patch, so maybe I'm missing something, but 
I think it would be a good idea to have these in that structure, not in 
ResultRelInfo, because these would be required only for partitions 
chosen via tuple routing.


Best regards,
Etsuro Fujita

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Pavan Deolasee

On Fri, Mar 16, 2018 at 5:13 PM, Etsuro Fujita 
wrote:

> (2018/03/16 19:43), Pavan Deolasee wrote:
>
>> On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera > > wrote:
>>
>
> @@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
>>  int num_subplan_partition_offsets;
>>  TupleTableSlot *partition_tuple_slot;
>>  TupleTableSlot *root_tuple_slot;
>> +   List  **partition_arbiter_indexes;
>> +   TupleTableSlot **partition_conflproj_slots;
>> +   TupleTableSlot **partition_existing_slots;
>>   } PartitionTupleRouting;
>>
>
> I am curious why you decided to add these members to
>> PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
>> place to track these or is there some rule that we follow?
>>
>
> I just started reviewing the patch, so maybe I'm missing something, but I
> think it would be a good idea to have these in that structure, not in
> ResultRelInfo, because these would be required only for partitions chosen
> via tuple routing.
>

Hmm, yeah, probably you're right. The reason I got confused is because the
patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the
converted projection info/where clause for a given partition in its
ResultRelationInfo. So I was wondering if we can
move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and
then just use that per-partition structures too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2018-03-16 Thread Shubham Barai

On 16 March 2018 at 03:57, Alexander Korotkov 
wrote:

> On Tue, Mar 13, 2018 at 3:25 PM, Alvaro Herrera 
> wrote:
>
>> Alexander Korotkov wrote:
>>
>> > And what happen if somebody concurrently set (fastupdate = on)?
>> > Can we miss conflicts because of that?
>>
>> I think it'd be better to have that option require AccessExclusive lock,
>> so that it can never be changed concurrently with readers.  Seems to me
>> that penalizing every single read to cope with this case would be a bad
>> trade-off.
>
>
> As Andrey Borodin mentioned, we already do.  Sorry for buzz :)
>
>
>
I have updated the patch based on suggestions.

Regards,
Shubham


Predicate-Locking-in-gin-index_v6.patch
Description: Binary data

Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-16 Thread Michael Paquier

On Fri, Mar 16, 2018 at 09:40:18AM +0100, Pavel Stehule wrote:
> 2018-03-16 9:34 GMT+01:00 Kyotaro HORIGUCHI > That's true, but doesn't the additional rule make it more
>> bothersome to maintain the list?
> 
> Hard to say. I have not opinion. But just when I see in this context
> variables like listen_address, shared_preload_libraries, then it looks
> messy.

Let's be clear.  I have listed all the variables in the patch to gather
more easily opinions, and because it is easier to review the whole stack
this way. I personally think that the only variables where the patch
makes sense are:
- DateStyle
- search_path
- plpgsql.extra_errors
- plpgsql.extra_warnings
- wal_consistency_checking
So I would be incline to drop the rest from the patch.  If there are
authors of popular extensions willing to get this support, let's update
the list once they argue about it and only if it makes sense.  However,
as far as I can see, there are no real candidates.  So let's keep the
list simple.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] [PATCH] Incremental sort

2018-03-16 Thread Tomas Vondra

On 03/16/2018 09:47 AM, Alexander Korotkov wrote:
> On Fri, Mar 16, 2018 at 5:12 AM, Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> I agree those don't seem like an issue in the Incremental Sort patch,
> but like a more generic costing problems.
> 
> 
> Yes, I think so too.

I wonder if we could make the costing a bit more pessimistic, to make
these loses less likely, while still keeping the main wins (particularly
for the LIMIT queries). But that seems a bit like a lost case, I guess.

> Do you think we can mark this patch RFC assuming that it have
> already got pretty much of review previously.
> 

Actually, I was going to propose to switch it to RFC, so I've just done
that. I think the patch is clearly ready for a committer to take a
closer look. I really like this improvement.

I'm going to rerun the tests, but that's mostly because I'm interested
if the change from i++ to i-- in cmpSortPresortedCols makes a measurable
difference. I don't expect to find any issues, so why wait with the RFC?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: Configurable file mode mask

2018-03-16 Thread David Steele

On 3/15/18 3:17 AM, Michael Paquier wrote:
> On Wed, Mar 14, 2018 at 02:08:19PM -0400, David Steele wrote:
> 
> When taking a base backup from a data folder which has group access,
> then the tar data, as well as the untar'ed data, are still using
> 0600 as umask for files and 0700 for folders.  Is that an expected
> behavior?  I would have imagined that sendFileWithContent() and
> _tarWriteDir() should enforce the file mode to have group access if the
> cluster has been initialized to work as such.  

We can certainly make base backup understand the group access mode.
Should we continue hard-coding the mode, or use the actual dir/file mode?

> Still as this is a
> feature aimed at being used for custom backups, that's not really a
> blocker I guess.  

Seems like a good thing to do, though, so I'll have a look for the next
patch.

> Visibly there would be no need for a -g switch in
> pg_basebackup as it is possible to guess from the received untar'ed
> files what should be the permissions of the data based on what is
> received in pg_basebackup.c.  It would also be necessary to change the
> permissions of pg_wal as this is created before receiving any files.

This part might be trickier.

> Speaking of which, we may want to switch the values used for st_mode to
> what file_perm.h is giving in basebackup.c?

Will do.

> We should also replace the hardcoded 0700 value in pg_backup_directory.c
> by what file_perm.h offers?  I would recommend to not touch at mkdtemp.c
> as this comes from NetBSD.

Will do.

> +=item $node->group_access()
> +
> +Does the data dir allow group access?
> +
> Nit: s/dir/directory/.
> 
> Indentation is weird in PostgresNode.pm for some of the chmod calls
> (tabs not spaces please).

I'll fix these in the next patch as well.

Thanks,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: SQL/JSON: functions

2018-03-16 Thread Oleg Bartunov

On 14 Mar 2018 17:11, "Alexander Korotkov" 
wrote:

On Wed, Mar 14, 2018 at 2:10 AM, Andres Freund  wrote:

> On 2018-03-14 07:54:35 +0900, Michael Paquier wrote:
> > On Tue, Mar 13, 2018 at 04:08:01PM +0300, Oleg Bartunov wrote:
> > > The docs are here
> > > https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md
> > >
> > > It's not easy to write docs for SQL/JSON in xml, so I decided to write
> in more
> > > friendly way. We'll have time to convert it to postgres format.
> >
> > If you aim at getting a feature committed first without its
> > documentation, and getting the docs written after the feature freeze
> > using a dedicated open item or such, this is much acceptable in my
> > opinion and the CF is running short in time.
>
> Given that this patch still uses PG_TRY/CATCH around as wide paths of
> code as a whole ExecEvalExpr() invocation,

I agree that we should either use PG_TRY/CATCH over some small and safe
codepaths or surround PG_TRY/CATCH with subtransactions.  PG_TRY/CATCH over
ExecEvalExpr() looks really unacceptable.

basically has gotten no
> review, I don't see this going anywhere for v11.
>

I wouldn't be co categorical at this point.  Patchset is there for about
year.
Some parts of code received more of review while some parts receives less.
We can surround all dangerous PG_TRY/CATCH pairs with subtransactions,
tolerate performance penalty and leave further optimizations for future
releases.

Agree it's not difficult.

In worst case, we can remove codepaths which use PG_TRY/CATCH and
leave only ERROR ON ERROR behavior of SQL/JSON.

No-no, json user will be really upset on this. Our goal is to be the first
relational database with strong standard compliance.

--
Alexander Korotkov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

GSOC :Thrift datatype support (2018)

2018-03-16 Thread Pranav Negi


Sir

I am interested in your GSOC projectidea. So I want to know ,can I get more 
information regarding this project andfrom where to start and what to study. 

So that I can submit the proposal as fastas possible .

Thank you 

Pranav Negi

Re:Re: [submit code] I develop a tool for pgsql, how can I submit it

2018-03-16 Thread leap

Thank you for getting back to me.

I want to know what it is in every data file and the format, so I develop 
pgcheck.
I also hope pgcheck can check data error in pgsql more easily, but there are a 
lot of work to do.
It can not be finished by myself, so I hope more people can develop it together.

I just know pg_filedump from you, it's quite cool and so many people develop 
it. ^~^
I'm very interesting to move some pgcheck's functionality to pg_filedump. 
While I have time I will try. ^~^ 

I don't know how to "submit a link to Reddit and/or Hacker News". if some 
people 
also like pgcheck after know pg_filedump, hope to help me let more people to 
know it.

there are also some problem like "Euler Taveira" say, I will try my best to 
solve it.
for some reason I have to use some core function in pgsql and some function is 
static.
Maybe I can not solve the problem very well, also hope some suggests or push 
some patch to pgcheck.

hope the way of check data error in pgsql more easily!

best wishes!

At 2018-03-14 16:33:49, "Aleksander Alekseev"  wrote:
>Hello leap,
>
>> hello!
>> I develop a tool for pgsql, I want to submit it on pgsql.
>> how can I submit it?
>> address: https://github.com/leapking/pgcheck
>
>It's always nice to see another great tool! Thanks a lot for sharing it.
>I believe you should write a blog post about it for PostgreSQL Planet or
>at least submit a link to Reddit and/or Hacker News. Also consider
>submitting a talk about it to PostgreSQL conferences.
>
>I couldn't not notice that actually it looks quite similar to
>pg_filedump [1] which is currently maintained by the community. Perhaps
>you would be interested in porting the part of pgcheck's functionality
>that is missing in pg_filedump to pg_filedump? Or maybe vice versa? :)
>
>Fun fact! pg_filedump is designed for reading, checking and recovering
>data from database files. Since these files are usually recovered from a
>backup or corrupted hard drive pg_filedump will be most likely executed
>on the system which doesn't has a PostgreSQL package installed. And it's
>distributed separately just for this reason I believe.
>
>[1]: https://git.postgresql.org/gitweb/?p=pg_filedump.git;a=summary
>
>-- 
>Best regards,
>Aleksander Alekseev

Re:Re: Re: [submit code] I develop a tool for pgsql, how can I submit it

2018-03-16 Thread leap

so cool, thank you so much!

-- 
Best regards,
leapking

At 2018-03-15 20:10:08, "Aleksander Alekseev"  wrote:
>Hello leap,
>
>> I don't know how to "submit a link to Reddit and/or Hacker News". if
>> some people also like pgcheck after know pg_filedump, hope to help me
>> let more people to know it.
>
>These are sites where people involved in IT share links they find
>interesting and other people vote for the links:
>
>https://news.ycombinator.com/
>https://www.reddit.com/r/PostgreSQL/
>
>-- 
>Best regards,
>Aleksander Alekseev

Re: Re: pgbench randomness initialization

2018-03-16 Thread Fabien COELHO




But now the documentation is back to its original state of silence on
what base or how many bases might be allowed. Could it just say
"or an unsigned decimal integer value"? Then no one will wonder.


Done in the attached.

Thanks for the reviews.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 5f28023..86a91ba 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,43 @@ pgbench  options 
 d
  
 
  
+  
--random-seed=SEED
+  
+   
+Set random generator seed.  Seeds the system random number generator,
+which then produces a sequence of initial generator states, one for
+each thread.
+Values for SEED may be:
+time (the default, the seed is based on the current 
time),
+rand (use a strong random source, failing if none
+is available), or an unsigned decimal integer value.
+The random generator is invoked explicitly from a pgbench script
+(random... functions) or implicitly (for instance 
option
+--rate uses it to schedule transactions).
+When explicitly set, the value used for seeding is shown on the 
terminal.
+Any value allowed for SEED may also be
+provided through the environment variable
+PGBENCH_RANDOM_SEED.
+To ensure that the provided seed impacts all possible uses, put this 
option
+first or use the environment variable.
+  
+  
+Setting the seed explicitly allows to reproduce a 
pgbench
+run exactly, as far as random numbers are concerned.
+As the random state is managed per thread, this means the exact same
+pgbench run for an identical invocation if there is 
one
+client per thread and there are no external or data dependencies.
+From a statistical viewpoint reproducing runs exactly is a bad idea 
because
+it can hide the performance variability or improve performance unduly,
+e.g. by hitting the same pages as a previous run.
+However, it may also be of great help for debugging, for instance
+re-running a tricky case which leads to an error.
+Use wisely.
+   
+  
+ 
+
+ 
   
--sampling-rate=rate
   

@@ -874,14 +911,19 @@ pgbench  options 
 d
 
  
   
-scale 
-   current scale factor
-  
-
-  
 client_id 
unique number identifying the client session (starts from 
zero)
   
+
+  
+random_seed 
+   random generator seed (unless overwritten with 
-D)
+  
+
+  
+scale 
+   current scale factor
+  
  
 

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 29d69de..a4c6c7b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -146,6 +146,9 @@ int64   latency_limit = 0;
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  */
@@ -561,6 +564,7 @@ usage(void)
   "  --log-prefix=PREFIX  prefix for transaction time log 
file\n"
   "   (default: \"pgbench_log\")\n"
   "  --progress-timestamp use Unix epoch timestamps for 
progress\n"
+  "  --random-seed=SEED   set random seed (\"time\", 
\"rand\", integer)\n"
   "  --sampling-rate=NUM  fraction of transactions to log 
(e.g., 0.01 for 1%%)\n"
   "\nCommon options:\n"
   "  -d, --debug  print debugging output\n"
@@ -4353,6 +4357,49 @@ printResults(TState *threads, StatsData *total, 
instr_time total_time,
}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+   /* srandom expects an unsigned int */
+   unsigned int iseed;
+
+   if (seed == NULL || strcmp(seed, "time") == 0)
+   {
+   /* rely on current time */
+   instr_time  now;
+   INSTR_TIME_SET_CURRENT(now);
+   iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+   }
+   else if (strcmp(seed, "rand") == 0)
+   {
+   /* use some "strong" random source */
+   if (!pg_strong_random(&iseed, sizeof(iseed)))
+   {
+   fprintf(stderr, "cannot seed random from a strong 
source\n");
+   exit(1);
+   }
+   }
+   else
+   {
+   /* parse seed unsigned int value */
+   char garbage;
+   if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+   {
+   fprintf(stderr,
+   "erro

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2018-03-16 Thread Tom Lane

Ashutosh Bapat  writes:
> On Thu, Mar 15, 2018 at 8:57 PM, Tom Lane  wrote:
>> Suppose that, either in the rewriter or early in the planner, we were
>> to replace such cases with nonempty FromExprs, by adding a dummy RTE
>> representing a table with no columns and one row.  This would in turn
>> give rise to an ordinary Path that converts to a Result plan, so that
>> the case is handled without any special contortions later.

> Since table in the dummy FROM clause returns one row without any
> column, I guess, there will be at least one row in the output. I am
> curious how would we handle cases which do not return any row
> like

> create function set_ret_func() returns setof record as $$select * from
> pg_class where oid = 0;$$ language sql;
> select set_ret_func();
>  set_ret_func
> --
> (0 rows)

It'd be the same as now, so far as the executor is concerned:

regression=# explain select set_ret_func();
QUERY PLAN
--
 ProjectSet  (cost=0.00..5.27 rows=1000 width=32)
   ->  Result  (cost=0.00..0.01 rows=1 width=0)
(2 rows)

The difference is that, within the planner, the ResultPath would be
associated with a "real" base relation instead of needing its very
own code path in query_planner().

regards, tom lane

Re: PATCH: Configurable file mode mask

2018-03-16 Thread Stephen Frost

Greetings David, Michael, all,

* David Steele (da...@pgmasters.net) wrote:
> On 3/15/18 3:17 AM, Michael Paquier wrote:
> > On Wed, Mar 14, 2018 at 02:08:19PM -0400, David Steele wrote:
> > 
> > When taking a base backup from a data folder which has group access,
> > then the tar data, as well as the untar'ed data, are still using
> > 0600 as umask for files and 0700 for folders.  Is that an expected
> > behavior?  I would have imagined that sendFileWithContent() and
> > _tarWriteDir() should enforce the file mode to have group access if the
> > cluster has been initialized to work as such.  
> 
> We can certainly make base backup understand the group access mode.
> Should we continue hard-coding the mode, or use the actual dir/file mode?

Given that we're already, as I recall, including the owner/group of the
file being backed up through pg_basebackup in the tarball, it seems like
we should be including whatever the current dir/file mode is too.  Those
files may not have anything to do with PostgreSQL, after all, and a
'natural' tar of the directory would capture that information too.

> > Still as this is a
> > feature aimed at being used for custom backups, that's not really a
> > blocker I guess.  
> 
> Seems like a good thing to do, though, so I'll have a look for the next
> patch.

+1.

> > Visibly there would be no need for a -g switch in
> > pg_basebackup as it is possible to guess from the received untar'ed
> > files what should be the permissions of the data based on what is
> > received in pg_basebackup.c.  It would also be necessary to change the
> > permissions of pg_wal as this is created before receiving any files.
> 
> This part might be trickier.

This seems like another case where what we should be doing, and what
people will be expecting, I'd think, is just what they're used to tar
doing in these cases- which would be setting the dir/file mode for each
file based on what's in the tarball.  Again, the files which are in the
data dir are, sadly, not always just those that PG is familiar with.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera

Pavan Deolasee wrote:

> Hmm, yeah, probably you're right. The reason I got confused is because the
> patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the
> converted projection info/where clause for a given partition in its
> ResultRelationInfo. So I was wondering if we can
> move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and
> then just use that per-partition structures too.

I wonder if the proposed structure is good actually.

Some notes as I go along.

1. ModifyTableState->mt_arbiterindexes is just a copy of
ModifyTable->arbiterIndexes.  So why do we need it?  For an
unpartitioned relation we can just use
ModifyTableState.ps->arbiterIndexes.  Now, for each partition we need to
map these indexes onto the partition indexes.  Not sure where to put
these; I'm tempted to say ResultRelInfo is the place.  Maybe the list
should always be in ResultRelInfo instead of the state/plan node?  Not
sure.

2. We don't need mt_existing to be per-relation; a single tuple slot can
do for all partitions; we just need to ExecSlotSetDescriptor to the
partition's descriptor whenever the slot is going to be used.  (This
means the slot no longer has a fixed tupdesc.  That seems OK).

3. ModifyTableState->mt_conflproj is more or less the same thing; the
same TTS can be reused by all the different projections, as long as we
set the descriptor before projecting.  So we don't
need PartitionTupleRouting->partition_conflproj_slots, but we do need a
descriptor to be used when needed.  Let's say
  PartitionTupleRouting->partition_confl_tupdesc 

I'll experiment with these ideas and see where that leads me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MCV lists for highly skewed distributions

2018-03-16 Thread Tomas Vondra

On 03/11/2018 10:23 AM, Dean Rasheed wrote:
> ...
>
> I'm moving this back to a status of "Needs review" partly because the
> code has changed significantly, but also because I want to do more
> testing, particularly with larger datasets.
> 

Thanks for working on this. The code seems fine to me, although it might
be a good idea to add comments explaining why analyze_mcv_list() starts
with full MCV lists and then removes items (essentially the explanation
why Jeff's original idea would not work so well).

Actually, one question - when deciding whether to keep the item in the
MCV list, analyze_mcv_list only compares it's frequency with an average
of the rest. But as we're removing items from the MCV list, the average
frequency of the non-MCV items is growing (we're removing items with
higher and higher frequencies). That means the estimates for the least
common items will get higher and higher - essentially overestimates. So,
couldn't/shouldn't analyze_mcv_list consider this too?

I've also done a bunch of testing regarding join cardinality estimates,
because eqjoinsel_inner() is one of the places using MCV lists too. So
I've generated tables with different sizes and data distributions, and
observed how the patch affects the join estimates.

The scripts and results are available here:

   https://github.com/tvondra/join-estimates-tests

The tables were not particularly huge at this point (1000 to 100k rows),
I've also varied number of distinct values (100 - 10k), statistics
target (10 - 10k) and distribution (for each table independently):

1) uniform

insert into t1 select random() * $ndistinct1, i
  from generate_series(1,$nrows1) s(i)

2) skewed

insert into t2 select (1 - pow(random(),2)) * $ndistinct2, i
  from generate_series(1,$nrows2) s(i)

3) skewed-inverse

insert into t2 select (1 - pow(random(),2)) * $ndistinct2, i
  from generate_series(1,$nrows2) s(i)

4) skewed

insert into t2 select (1 - pow(random(),4)) * $ndistinct2, i
  from generate_series(1,$nrows2) s(i)

5) skewed-strong-inverse

insert into t2 select (1 - pow(random(),4)) * $ndistinct2, i
  from generate_series(1,$nrows2) s(i)

The results are a bit too large to attach them here - see for example
https://github.com/tvondra/join-estimates-tests/blob/master/bench/summary.ods.

Looking at Mean Relative Error, i.e. essentially

MRE = AVERAGE(MAX(estimate/actual,actual/estimate))

over the 20 runs for each combination of parameters, The "average" sheet
shows

 (MRE for patched) / (MRE for master)

and the patch seems to clearly improve accuracy in this case. There are
a few cases where the estimate gets slightly worse (say, by 10%)
compared to current master. So I think that's nice.

I'm open to running additional tests with other distributions, table
sizes etc. if needed.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: segmentation fault in pg head with SQL function.

2018-03-16 Thread Tom Lane

Michael Paquier  writes:
> On Fri, Mar 16, 2018 at 11:35:13AM +0530, Prabhat Sahu wrote:
>> postgres=# CREATE OR REPLACE FUNCTION func1() RETURNS VOID
>> LANGUAGE SQL
>> AS $$
>> select 10;
>> $$;

> Problem reproducible here, and the bug has been introduced by fd1a421f.
> It seems to me that the function should not be authorized to be created
> to begin with, as it returns an integer in its last query, where I think
> that check_sql_fn_retval is doing it wrong when called in
> inline_function() as we know that it handles a function, and not a
> procedure thanks to the first sanity checks at the top of the function.

Hm.  Actually, I think this is my fault.  It is true that previous PG
versions would have rejected this function definition, but my intention
while revising Peter's prokind patch was that we'd start allowing a
VOID-returning SQL function to contain anything, and just ignore whatever
the last statement within it might be.  The documentation doesn't say
much about VOID-returning SQL functions, but I certainly don't see
anything saying that they can't end with a SELECT, so arguably the old
behavior is a POLA violation.  In any case, this is the behavior we
need for VOID-returning procedures, and there seems little reason not
to make functions act similarly.

So apparently I missed something with that.  Will look more closely.

regards, tom lane

Re: GSOC :Thrift datatype support (2018)

2018-03-16 Thread Aleksander Alekseev

Hello Pranav,

> I am interested in your GSOC projectidea. So I want to know ,can I get
> more information regarding this project andfrom where to start and
> what to study. 
> 
> So that I can submit the proposal as fastas possible .

A few people asked about it recently. Please see these discussions:

* 
https://www.postgresql.org/message-id/flat/20180226122111.GC24055%40e733.localdomain#20180226122111.GC24055@e733.localdomain
* 
https://www.postgresql.org/message-id/flat/CA%2BSXE9u2dXaZf5qtKHE0WXJ3%3Dj6y4ETqOd9q%2Bgp%3DpTuZKPYNrA%40mail.gmail.com#CA+SXE9u2dXaZf5qtKHE0WXJ3=j6y4ETqOd9q+gp=ptuzkpy...@mail.gmail.com
* 
https://www.postgresql.org/message-id/flat/CAKNy7Umcpg69Ltnos1LbmZF9pw0Zf7ZENHvV%2BOj1vm0jSNg%2BOw%40mail.gmail.com#cakny7umcpg69ltnos1lbmzf9pw0zf7zenhvv+oj1vm0jsng...@mail.gmail.com

If you have any questions which these threads don't answer on please
don't hesitate to ask!

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: SSL passphrase prompt external command

2018-03-16 Thread Peter Eisentraut

On 3/15/18 12:13, Daniel Gustafsson wrote:
> * In src/backend/libpq/be-secure-common.c:
> 
> +int
> +run_ssl_passphrase_command(const char *prompt, bool is_server_start, char 
> *buf, int size)
> +{
> [..]
> +   size_t  len = 0;
> [..]
> +   return len;
> +}
> 
> run_ssl_passphrase_command() is defined to return a signed int but returns
> size_t which is unsigned.  It’s obviously a remote cornercase to hit, but
> wouldn’t it be better to return the same signedness?

The OpenSSL (an GnuTLS) callbacks return an int, so we need to convert
it to int at some point anyway.  Might as well make it here.

> * The documentation for the passphrase command format states:
> 
> "If the setting starts with -, then it will be ignored during a reload and
> the SSL configuration will not be reloaded if a passphrase is needed."
> 
> In run_ssl_passphrase_command() the leading ‘-‘ is however just shifted away
> regardless of the state of is_server_start.
> 
> +   p = ssl_passphrase_command;
> +
> +   if (p[0] == '-')
> +   p++;
> +
> 
> The OpenSSL code is instead performing this in be_tls_init() in the below 
> hunk:
> 
> +   if (ssl_passphrase_command[0] && ssl_passphrase_command[0] != '-')
> 
> In order to make this generic, shouldn’t we in run_ssl_passphrase_command() do
> something along the lines of the below to ensure we aren’t executing the
> passphrase callback during reloads?
> 
> if (p[0] == '-')
> {
> /*
>  * passphrase commands with a leading '-' are only invoked on server
>  * start, and are ignored on reload.
>  */
> if (!is_server_start)
> goto error;
> p++;
> }

We need the OpenSSL code to know whether the callback supports reloads,
so it can call the dummy callback if not.

Maybe this is a bit too cute.  We could instead add another setting
"ssl_passphrase_command_support_reload".

> * In src/tools/msvc/Mkvcbuild.pm
> 
> # if building without OpenSSL
> if (!$solution->{options}->{openssl})
> {
> +   $postgres->RemoveFile('src/backend/libpq/be-secure-common.c');
> $postgres->RemoveFile('src/backend/libpq/be-secure-openssl.c');
> }
> 
> Is this because run_ssl_passphrase_command() would otherwise generate a 
> warning
> due to not being used?

I have no idea. ;-)  It's the same thing I was told to do for
fe-secure-common.c a while back.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Question about WalSndWriteData

2018-03-16 Thread Konstantin Knizhnik


Hi hackes,

I confused by the following code in WalSndWriteData:

    /* output previously gathered data in a CopyData packet */
    pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);

    /*
     * Fill the send timestamp last, so that it is taken as late as 
possible.
     * This is somewhat ugly, but the protocol is set as it's already 
used for

     * several releases by streaming physical replication.
     */
    resetStringInfo(&tmpbuf);
    now = GetCurrentTimestamp();
    pq_sendint64(&tmpbuf, now);
    memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
           tmpbuf.data, sizeof(int64));


pq_putmessage_noblock copies data from ctx->out buffer to libpq buffers.
After it we write timestamp to ctx->out buffer.
And comments says that we should do it "as late as possible".
But this timestamp is not included in the copy data packet which is 
already copied to libpq connection buffer.

And next WalSndPrepareWrite call will reset ctx->out buffer.

So I wonder what is the reason of writing timestamp in ctx->out buffer 
after processing it by pq_putmessage_noblock?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: fixing more format truncation issues

2018-03-16 Thread Peter Eisentraut

On 3/15/18 12:12, Tom Lane wrote:
> FWIW, I noticed while fooling with pre-release Fedora 28 that gcc 8.0.1
> emits a whole boatload of these warnings by default.

I have some patches for that.  I will send them in once gcc 8 is a bit
closer to release.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SSL passphrase prompt external command

2018-03-16 Thread Daniel Gustafsson

> On 16 Mar 2018, at 17:07, Peter Eisentraut  
> wrote:
> 
> On 3/15/18 12:13, Daniel Gustafsson wrote:

>> * The documentation for the passphrase command format states:
>> 
>>"If the setting starts with -, then it will be ignored during a reload and
>>the SSL configuration will not be reloaded if a passphrase is needed."
>> 
>> In run_ssl_passphrase_command() the leading ‘-‘ is however just shifted away
>> regardless of the state of is_server_start.
>> 
>> +   p = ssl_passphrase_command;
>> +
>> +   if (p[0] == '-')
>> +   p++;
>> +
>> 
>> The OpenSSL code is instead performing this in be_tls_init() in the below 
>> hunk:
>> 
>> +   if (ssl_passphrase_command[0] && ssl_passphrase_command[0] != '-')
>> 
>> In order to make this generic, shouldn’t we in run_ssl_passphrase_command() 
>> do
>> something along the lines of the below to ensure we aren’t executing the
>> passphrase callback during reloads?
>> 
>>if (p[0] == '-')
>>{
>>/*
>> * passphrase commands with a leading '-' are only invoked on server
>> * start, and are ignored on reload.
>> */
>>if (!is_server_start)
>>goto error;
>>p++;
>>}
> 
> We need the OpenSSL code to know whether the callback supports reloads,
> so it can call the dummy callback if not.

It makes sense for the OpenSSL code to know, I’m mostly wondering why
run_ssl_passphrase_command() isn’t doing the same thing wrt the leading ‘-‘?
ISTM that it should consider is_server_start as well to match the
documentation.

> Maybe this is a bit too cute.  We could instead add another setting
> "ssl_passphrase_command_support_reload”.

I think thats a good idea, this feels like an easy thing to be confused about
and get wrong (which I might have done with the above).

Either way, there is a clear path forward on this (the new setting or just
ignoring me due to being confused) so I am going to mark this ready for
committer as the remaining work is known and small.

cheers ./daniel

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Thu, Mar 15, 2018 at 7:46 PM, Robert Haas  wrote:
> On Thu, Mar 15, 2018 at 6:08 AM, Ashutosh Bapat
>  wrote:
>> In current create_grouping_paths() (without any of your patches
>> applied) we first create partial paths in partially grouped rel and
>> then add parallel path to grouped rel using those partial paths. Then
>> we hand over this to FDW and extension hooks, which may add partial
>> paths, which might throw away a partial path used to create a parallel
>> path in grouped rel causing a segfault. I think this bug exists since
>> we introduced parallel aggregation or upper relation refactoring
>> whichever happened later. Introduction of partially grouped rel has
>> just made it visible.
>
> I don't think there's really a problem here; or at least if there is,
> I'm not seeing it.  If an FDW wants to introduce partially grouped
> paths, it should do so when it is called for
> UPPERREL_PARTIAL_GROUP_AGG from within
> add_paths_to_partial_grouping_rel.  If it wants to introduce fully
> grouped paths, it should do so when it is called for
> UPPERREL_GROUP_AGG from within create_grouping_paths.  By the time the
> latter call is made, it's too late to add partially grouped paths; if
> the FDW does, that's a bug in the FDW.

Right.

>
> Admittedly, this means that commit
> 3bf05e096b9f8375e640c5d7996aa57efd7f240c subtly changed the API
> contract for FDWs.  Before that, an FDW that wanted to support partial
> aggregation would have needed to add partially grouped paths to
> UPPERREL_GROUP_AGG when called for that relation; whereas now it would
> need to add them to UPPERREL_PARTIAL_GROUP_AGG when called for that
> relation.

And when an FDW added partial paths to the grouped rel
(UPPERREL_GROUP_AGG) it might throw away the partial paths gathered by
the core code. 3bf05e096b9f8375e640c5d7996aa57efd7f240c has fixed that
bug by allowing FDW to add partial paths to UPPERREL_PARTIAL_GROUP_AGG
before adding gather paths to the UPPERREL_GROUP_AGG. But I agree that
it has subtly changed that API and we need to add this to the
documentation (may be we should have added that in the commit which
introduced partially grouped relation).

> This doesn't actually falsify any documentation, though,
> because this oddity wasn't documented before.  Possibly more
> documentation could stand to be written in this area, but that's not
> the topic of this thread.

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Thu, Mar 15, 2018 at 9:19 PM, Robert Haas  wrote:
> On Thu, Mar 15, 2018 at 9:46 AM, Jeevan Chalke
>  wrote:
>> Hmm.. you are right. Done.
>
> I don't see a reason to hold off on committing 0002 and 0003, so I've
> done that now; since they are closely related changes, I pushed them
> as a single commit.  It probably could've just been included in the
> main patch, but it's fine.

Thanks.

>
> I don't much like the code that 0001 refactors and am not keen to
> propagate it into more places.  I've separately proposed patches to
> restructure that code in
> http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com
> and if we end up deciding to adopt that approach then I think this
> patch will also need to create rels for UPPERREL_TLIST.  I suspect
> that approach would also remove the need for 0004, as that case would
> also end up being handled in a different way.  However, the jury is
> still out on whether or not the approach I've proposed there is any
> good.  Feel free to opine over on that thread.

Will take a look at that patch next week.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Fri, Mar 16, 2018 at 12:16 AM, Robert Haas  wrote:
>
> - partial_costs_set.  The comment in compute_group_path_extra_data
> doesn't look good.  It says "Set partial aggregation costs if we are
> going to calculate partial aggregates in make_grouping_rels()", but
> what it means is something more like "agg_partial_costs and
> agg_final_costs are not valid yet".  I also wonder if there's a way
> that we can figure out in advance whether we're going to need to do
> this and just do it at the appropriate place in the code, as opposed
> to doing it lazily.  Even if there were rare cases where we did it
> unnecessarily I'm not sure that would be any big deal.

I struggled with this one. In case of multi-level partitioning we will
compute fully aggregated results per partition in the levels for which
the partition keys are covered by GROUP BY clause. But beyond those
levels we will compute partial aggregates. When none of the levels can
use parallelism, only those levels which compute the partial
aggregates need these costs to be calculated. We will need to traverse
partitioning hierarchy before even starting aggregation to decide
whether we will need partial aggregation downwards. Instead of adding
that walker, I thought it better to use this flag. But more on this in
reply to your next mail.

>
> I wonder if we could simplify things by copying more information from
> the parent grouping rel to the child grouping rels.  It seems to me
> for example that the way you're computing consider_parallel for the
> child relations is kind of pointless.  The parallel-safety of the
> grouping_target can't vary across children, nor can that of the
> havingQual; the only thing that can change is whether the input rel
> has consider_parallel set.  You could cater to that by having
> GroupPathExtraData do something like extra.grouping_is_parallel_safe =
> target_parallel_safe && is_parallel_safe(root, havingQual) and then
> set each child's consider parallel flag to
> input_rel->consider_parallel && extra.grouping_is_parallel_safe.

I am actually confused by the current code itself. What parallel
workers compute is partial target which is different from the full
target. The partial target would only contain expressions in the
GROUPBY clause and partial aggregate nodes. It will not contain any
expressions comprising of full aggregates. When partial aggregate
nodes are parallel safe but the expressions using the full aggregates
are not parallel safe, the current code will not allow parallel
aggregation to take place whereas it should. That looks like an
optimization we are missing today.

That bug aside, the point is the target of grouped relation and that
of the partially grouped relation are different, giving rise to the
possibility that one is parallel safe and other is not. In fact, for
partially grouped relation, we shouldn't check parallel safety of
havingQual since havingQual is only applicable over the fully
aggregated result. That seems to be another missing optimization. OR I
am missing something here.

In case of partition-wise aggregates some levels may be performing
only partial aggregation and if partial aggregation is parallel safe,
we should allow those levels to run parallel partial aggregation.
Checking parallel safety of only grouped relation doesn't help here.
But since the optimization is already missing right now, I think this
patch shouldn't bother about it.

>
> Similarly, right now the way the patch sets the reltargets for
> grouping rels and partially grouping rels is a bit complex.
> make_grouping_rels() calls make_partial_grouping_target() separately
> for each partial grouping rel, but for non-partial grouping rels it
> gets the translated tlist as an argument.  Could we instead consider
> always building the tlist by translation from the parent, that is, a
> child grouped rel's tlist is the translation of the parent
> grouped_rel's tlist, and the child partially grouped rel's tlist is
> the translation of the parent partially_grouped_rel's tlist?  If you
> could both make that work and find a different place to compute the
> partial agg costs, make_grouping_rels() would get a lot simpler or
> perhaps go away entirely.

Hmm that's a thought. While we are translating, we allocate new nodes,
whereas make_partial_grouping_target() uses same nodes from the full
target. For a partially grouped child relation, this means that we
will allocate nodes to create partial target as well and then setrefs
will spend cycles matching node trees instead of matching pointers.
But I think we can take that hit if it saves us some complexity in the
code.

>
> I don't like this condition which appears in that function:
>
> if (extra->try_parallel_aggregation || force_partial_agg ||
> (extra->partitionwise_grouping &&
>  extra->partial_partitionwise_grouping))
>
> The problem with that is that it's got to exactly match the criteria
> for whether we're going to need the partial_grouping_rel.  If it's
>

Re: PostgreSQL opens all the indexes of a relation for every Query during Planning Time?

2018-03-16 Thread Jason Petersen

> On Mar 15, 2018, at 4:18 AM, Meenatchi Sandanam  wrote:
> 
> On checking PostgreSQL code, we see it opens all indexes of a relation for 
> every query planning time as in the attachment. If that is the case, it might 
> lead to Performance degradation for relations with many indexes. Is there any 
> other better way to handle this like Scanning/Opening only the indexes 
> related to columns specified in the query?


I’d think you’d need a pretty clear profiling run that points to this as a hot 
spot before deciding to optimize it. Do you have a query where this is the case?

--
Jason Petersen
Software Engineer | Citus Data
303.736.9255
ja...@citusdata.com

Re: chained transactions

2018-03-16 Thread Peter Eisentraut

On 3/15/18 12:39, Alvaro Herrera wrote:
>> Subject: [PATCH v1 1/8] Update function comments
>> Subject: [PATCH v1 2/8] Rename TransactionChain functions
>> Subject: [PATCH v1 3/8] Simplify parse representation of savepoint commands
>> Subject: [PATCH v1 4/8] Improve savepoint error messages
>> Subject: [PATCH v1 5/8] Change transaction state debug strings to match enum
>>  symbols

I have committed these with your suggested edits.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PL/pgSQL nested CALL with transactions

2018-03-16 Thread Peter Eisentraut

On 3/16/18 00:24, Pavel Stehule wrote:
> What is benefit of DO command in PLpgSQL? Looks bizarre for me.

Not very typical, but we apply the same execution context handling to
CALL and DO at the top level, so it would be weird not to propagate that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PL/pgSQL nested CALL with transactions

2018-03-16 Thread Pavel Stehule

2018-03-16 18:35 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 3/16/18 00:24, Pavel Stehule wrote:
> > What is benefit of DO command in PLpgSQL? Looks bizarre for me.
>
> Not very typical, but we apply the same execution context handling to
> CALL and DO at the top level, so it would be weird not to propagate that.
>

Although it is possible, I don't see any sense of introduction for DO into
plpgsql. Looks like duplicate to EXECUTE.



>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: PL/pgSQL nested CALL with transactions

2018-03-16 Thread Tom Lane

Pavel Stehule  writes:
> 2018-03-16 18:35 GMT+01:00 Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com>:
>> Not very typical, but we apply the same execution context handling to
>> CALL and DO at the top level, so it would be weird not to propagate that.

> Although it is possible, I don't see any sense of introduction for DO into
> plpgsql. Looks like duplicate to EXECUTE.

Not sure what you think is being "introduced" here.  It already works just
like any other random SQL command:

regression=# do $$
regression$# begin
regression$#   raise notice 'outer';
regression$#   do $i$ begin raise notice 'inner'; end $i$;
regression$# end $$;
NOTICE:  outer
NOTICE:  inner
DO

While certainly that's a bit silly as-is, I think it could have practical
use if the inner DO invokes a different PL.

regards, tom lane

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Fri, Mar 16, 2018 at 3:19 AM, Robert Haas  wrote:
> On Thu, Mar 15, 2018 at 2:46 PM, Robert Haas  wrote:
>> I wonder if we could simplify things by copying more information from
>> the parent grouping rel to the child grouping rels.
>
> On further review, it seems like a better idea is to generate the
> partial grouping relations from the grouping relations to which they
> correspond.  Attached is a series of proposed further refactoring
> patches.

Ok. That looks good.

>
> 0001 moves the creation of partially_grouped_rel downwards.  Instead
> of happening in create_grouping_paths(), it gets moved downward to
> add_paths_to_partial_grouping_rel(), which is renamed
> create_partial_grouping_paths() and now returns a pointer to new
> RelOptInfo.  This seems like a better design than what we've got now:
> it avoids creating the partially grouped relation if we don't need it,
> and it looks more like the other upper planner functions
> (create_grouping_paths, create_ordered_paths, etc.) which all create
> and return a new relation.

I liked that.

>
> 0002 moves the determination of which grouping strategies are possible
> upwards.  It represents them as a 'flags' variable with bits for
> GROUPING_CAN_USE_SORT, GROUPING_CAN_USE_HASH, and
> GROUPING_CAN_PARTIAL_AGG.  These are set in create_grouping_paths()
> and passed down to create_ordinary_grouping_paths().  The idea is that
> the flags value would be passed down to the partition-wise aggregate
> code which in turn would call create_ordinary_grouping_paths() for the
> child grouping relations, so that the relevant determinations are made
> only at the top level.

+1.

> This patch also renames can_parallel_agg to
> can_partial_agg and removes the parallelism-specific bits from it.

I think we need to update the comments in this function to use phrase
"partial aggregation" instead of "parallel aggregation". And I think
we need to change the conditions as well. For example if
parse->groupClause == NIL, why can't we do partial aggregation? This
is the classical case when we will need patial aggregation. Probably
we should test this with Jeevan's patches for partition-wise aggregate
to see if it considers partition-wise aggregate or not.

OR When parse->groupingSets is true, I can see why we can't use
parallel query, but we can still compute partial aggregates. This
condition doesn't hurt since partition-wise aggregation bails out when
there are grouping sets, so it's not that harmful here.

> To
> compensate for this, create_ordinary_grouping_paths() now tests the
> removed conditions instead.  This is all good stuff for partition-wise
> aggregate, since the grouped_rel->consider_parallel &&
> input_rel->partial_pathlist != NIL conditions can vary on a per-child
> basis but the rest of the stuff can't.  In some subsequent patch, the
> test should be pushed down inside create_partial_grouping_paths()
> itself, so that this function can handle both partial and non-partial
> paths as mentioned in the preceding paragraph.

I think can_parallel_agg() combines two conditions, whether partial
aggregation is possible and whether parallel aggregation is possible.
can_partial_agg() should have the first set and we should retain
can_parallel_agg() for the second set. We may then split
can_parallel_agg() into variant and invariant conditions i.e. the
conditions which change with input_rel and grouped_rel and those
don't.



>
> - create_partial_grouping_paths() is still doing
> get_agg_clause_costs() for the partial grouping target, which (I
> think) only needs to be done once.  Possibly we could handle that by
> having create_grouping_paths() do that work whenever it sets
> GROUPING_CAN_PARTIAL_AGG and pass the value downward.  You might
> complain that it won't get used unless either there are partial paths
> available for the input rel OR partition-wise aggregate is used --
> there's no point in partially aggregating a non-partial path at the
> top level.  We could just accept that as not a big deal, or maybe we
> can figure out how to make it conditional so that we only do it when
> either the input_rel has a partial path list or we have child rels.
> Or we could do as you did in your patches and save it when we compute
> it first, reusing it on each subsequent call.  Or maybe there's some
> other idea.

I am good with anything as long as we avoid repeated computation.

>
> I am sort of unclear whether we need/want GroupPathExtraData at all.
> What determines whether something gets passed via GroupPathExtraData
> or just as a separate argument?  If we have a rule that stuff that is
> common to all child grouped rels goes in there and other stuff
> doesn't, or stuff postgres_fdw needs goes in there and other stuff
> doesn't, then that might be OK.  But I'm not sure that there is such a
> rule in the v20 patches.

We have a single FDW hook for all the upper relations and that hook
can not accept grouping specific arguments. Either we need a separate
FDW hook for gro

Re: PL/pgSQL nested CALL with transactions

2018-03-16 Thread Pavel Stehule

2018-03-16 18:49 GMT+01:00 Tom Lane :

> Pavel Stehule  writes:
> > 2018-03-16 18:35 GMT+01:00 Peter Eisentraut <
> > peter.eisentr...@2ndquadrant.com>:
> >> Not very typical, but we apply the same execution context handling to
> >> CALL and DO at the top level, so it would be weird not to propagate
> that.
>
> > Although it is possible, I don't see any sense of introduction for DO
> into
> > plpgsql. Looks like duplicate to EXECUTE.
>
> Not sure what you think is being "introduced" here.  It already works just
> like any other random SQL command:
>
> regression=# do $$
> regression$# begin
> regression$#   raise notice 'outer';
> regression$#   do $i$ begin raise notice 'inner'; end $i$;
> regression$# end $$;
> NOTICE:  outer
> NOTICE:  inner
> DO
>
> While certainly that's a bit silly as-is, I think it could have practical
> use if the inner DO invokes a different PL.
>

ok, make sense

Pavel

>
> regards, tom lane
>

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera

So ExecInsert receives the ModifyTableState, and separately it receives
arbiterIndexes and the OnConflictAction, both of which are members of
the passed ModifyTableState.  I wonder why does it do that; wouldn't it
be simpler to extract those members from the node?

With the patch proposed upthread, we receive arbiterIndexes as a
parameter and if the table is a partition we summarily ignore those and
use the list as extracted from the PartitionRoutingInfo.  This is
confusing and pointless.  It seems to me that the logic ought to be "if
partition then use the list in PartitionRoutingInfo; if not partition
use it from ModifyTableState".  This requires changing as per above,
i.e. make the arbiter index list not part of the ExecInsert's API.

The OnConflictAction doesn't matter much; not passing it is merely a
matter of cleanliness.

Or is there another reason to pass the index list?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Peter Geoghegan

On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
 wrote:
> So ExecInsert receives the ModifyTableState, and separately it receives
> arbiterIndexes and the OnConflictAction, both of which are members of
> the passed ModifyTableState.  I wonder why does it do that; wouldn't it
> be simpler to extract those members from the node?

> Or is there another reason to pass the index list?

It works that way pretty much by accident, as far as I can tell.
Removing the two extra arguments sounds like a good idea.

-- 
Peter Geoghegan

Update doc links to https where appropriate?

2018-03-16 Thread Andres Freund

Hi,

Writing up docs for JITing I noticed that nearlyall wikipedia links on
the acronyms page are http:// rather than https://. Seems like we should
update links in the docs when alternatives are available?

Any counterarguments?

Greetings,

Andres Freund

Re: Hash join in SELECT target list expression keeps consuming memory

2018-03-16 Thread Tom Lane

Amit Khandekar  writes:
> If the SELECT target list expression is a join subquery, and if the
> subquery does a hash join, then the query keeps on consuming more and
> more memory. Below is such a query :

Thanks for the report!

I dug into this with valgrind, and found that the problem is that
ExecHashTableCreate allocates some memory that isn't freed by
ExecHashTableDestroy, specifically the per-hash-key function
information.  This is just dumb.  We can keep that stuff in the
hashtable's hashCxt instead, where it will get freed at the right time.
The attached patch seems to fix it just by reordering the code.

I'm surprised nobody's noticed this before; maybe the problem is
of relatively recent vintage?  Haven't checked the back branches yet.

regards, tom lane

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 06bb44b..4f069d1 100644
*** a/src/backend/executor/nodeHash.c
--- b/src/backend/executor/nodeHash.c
*** ExecHashTableCreate(HashState *state, Li
*** 472,478 
  	 * Initialize the hash table control block.
  	 *
  	 * The hashtable control block is just palloc'd from the executor's
! 	 * per-query memory context.
  	 */
  	hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
  	hashtable->nbuckets = nbuckets;
--- 472,479 
  	 * Initialize the hash table control block.
  	 *
  	 * The hashtable control block is just palloc'd from the executor's
! 	 * per-query memory context.  Everything else should be kept inside the
! 	 * subsidiary hashCxt or batchCxt.
  	 */
  	hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
  	hashtable->nbuckets = nbuckets;
*** ExecHashTableCreate(HashState *state, Li
*** 515,520 
--- 516,537 
  #endif
  
  	/*
+ 	 * Create temporary memory contexts in which to keep the hashtable working
+ 	 * storage.  See notes in executor/hashjoin.h.
+ 	 */
+ 	hashtable->hashCxt = AllocSetContextCreate(CurrentMemoryContext,
+ 			   "HashTableContext",
+ 			   ALLOCSET_DEFAULT_SIZES);
+ 
+ 	hashtable->batchCxt = AllocSetContextCreate(hashtable->hashCxt,
+ "HashBatchContext",
+ ALLOCSET_DEFAULT_SIZES);
+ 
+ 	/* Allocate data that will live for the life of the hashjoin */
+ 
+ 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
+ 
+ 	/*
  	 * Get info about the hash functions to be used for each hash key. Also
  	 * remember whether the join operators are strict.
  	 */
*** ExecHashTableCreate(HashState *state, Li
*** 540,561 
  		i++;
  	}
  
- 	/*
- 	 * Create temporary memory contexts in which to keep the hashtable working
- 	 * storage.  See notes in executor/hashjoin.h.
- 	 */
- 	hashtable->hashCxt = AllocSetContextCreate(CurrentMemoryContext,
- 			   "HashTableContext",
- 			   ALLOCSET_DEFAULT_SIZES);
- 
- 	hashtable->batchCxt = AllocSetContextCreate(hashtable->hashCxt,
- "HashBatchContext",
- ALLOCSET_DEFAULT_SIZES);
- 
- 	/* Allocate data that will live for the life of the hashjoin */
- 
- 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
- 
  	if (nbatch > 1 && hashtable->parallel_state == NULL)
  	{
  		/*
--- 557,562

Re: Update doc links to https where appropriate?

2018-03-16 Thread Daniel Gustafsson

> On 16 Mar 2018, at 19:54, Andres Freund  wrote:

> Writing up docs for JITing I noticed that nearlyall wikipedia links on
> the acronyms page are http:// rather than https://. Seems like we should
> update links in the docs when alternatives are available?

I don’t see a reason why not, since we just incur a 307 redirect on all visits
otherwise. +1 for bumping them to https.

While in there, perhaps we should update our links to the expanded page on
Wikipedia to avoid an internal 304 redirect as well?  The attached diff changes
the ones I found on a quick scan, but there might be others.

cheers ./daniel

acronyms.diff
Description: Binary data

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera

Peter Geoghegan wrote:
> On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
>  wrote:
> > So ExecInsert receives the ModifyTableState, and separately it receives
> > arbiterIndexes and the OnConflictAction, both of which are members of
> > the passed ModifyTableState.  I wonder why does it do that; wouldn't it
> > be simpler to extract those members from the node?
> 
> > Or is there another reason to pass the index list?
> 
> It works that way pretty much by accident, as far as I can tell.
> Removing the two extra arguments sounds like a good idea.

Great, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: CURRENT OF causes an error when IndexOnlyScan is used

2018-03-16 Thread Tom Lane

Yugo Nagata  writes:
> On Mon, 12 Mar 2018 13:56:24 -0400
> Tom Lane  wrote:
>> I took a quick look at this, but I'm concerned about a couple of points:

> In addition, this patch fixes only nbtree indexes, but the simillar problem
> will occur also on GIST indexes which support index-only scan. If we resolve
> this bug by fixing index access methods, I think we need to fix all existing
> indexes that support index-only scan and also describe the specification in
> the documents, comments, or README, etc. for future indexes.

Yeah, that's a pretty good point.

>> At this point Yugo-san's original patch is starting to look more
>> attractive.  It's still ugly, but at least it's not imposing a performance
>> cost on unrelated queries.

> I would like to elaborate my patch if needed and possible. Any suggestion
> would be appriciated.

I don't think there's much else to be done so far as the IndexOnlyScan
case is concerned.  However, I notice that somebody's made
search_plan_tree accept ForeignScanState and CustomScanState as possible
matches for WHERE CURRENT OF, and I find that rather troubling.  It seems
likely to me that both of those would have the same problem as
IndexOnlyScans, ie whatever they're returning is probably a virtual tuple
without any ctid field.  So we'd get the same unfriendly failure as you
complained of originally.

I wonder whether it wouldn't be a good idea to provide a way for an FDW
or CustomScan provider to return a TID, or at least give a more polite
FEATURE_NOT_SUPPORTED error than what happens now.  However, that seems
more like a new feature than a bug fix; certainly any extension of those
APIs is something we'd not back-patch.

In the meantime, we could fix execCurrent.c so that it throws
FEATURE_NOT_SUPPORTED rather than the current failure if the slot it's
looking at doesn't contain a physical tuple.

regards, tom lane

Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2018-03-16 Thread Daniel Gustafsson

> On 25 Feb 2018, at 18:22, Chapman Flack  wrote:

> Here is a patch implementing the simpler approach Heikki suggested
> (though my original leaning had been to wrench on AdvanceXLInsertBuffer
> as Michael suggests). The sheer simplicity of the smaller change
> eventually won me over, unless there's a strong objection.

I had a look at this patch today.  The patch applies on current HEAD, and has
ample documentation in the included comment.  All existing tests pass, but
there are no new tests added (more on that below).  While somewhat outside my
wheelhouse, the implementation is the simple solution with no apparent traps
that I can think of.

Regarding the problem statement of the patch.  The headers on the zeroed
segments are likely an un-intended side effect, and serve no real purpose.
While they aren’t a problem to postgres, they do reduce the compressability of
the resulting files as evidenced by the patch author.

> As noted before, the cost of the extra small MemSet is proportional
> to the number of unused pages in the segment, and that is an indication
> of how busy the system *isn't*. I don't have a time benchmark of the
> patch's impact; if I should, what would be a good methodology?

This codepath doesn’t seem performance critical enough to warrant holding off
the patch awaiting a benchmark IMO.

> I considered adding a regression test for unfilled-segment compressibility,
> but wasn't sure where it would most appropriately go. I assume a TAP test
> would be the way to do it.

Adding a test that actually compress files seems hard to make stable, and adds
a dependency on external tools which is best to avoid when possible.  I took a
stab at this and added a test that ensures the last segment in the switched-
away file is completely zeroed out, which in a sense tests compressability.

The attached patch adds the test, and a neccessary extension to check_pg_config
to allow for extracting values from pg_config.h as opposed to just returning
the number of regex matches. (needed for XLOG_BLCKSZ.)

That being said, I am not entirely convinced that the test is adding much
value.  It’s however easier to reason about a written patch than an idea so I
figured I’d add it here either way.

Setting this to Ready for Committer and offering my +1 on getting this in.

cheers ./daniel

wal_zeroed_test.patch
Description: Binary data

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera

Another thing I noticed is that the split of the ON CONFLICT slot and
its corresponding projection is pretty weird.  The projection is in
ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
make the projection work without a corresponding slot initialized with
the correct descriptor, so splitting it this way doesn't make a lot of
sense to me.

(Now, TBH the split between resultRelInfo->ri_projectReturning and
ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
returning project uses, doesn't make a lot of sense to me either; so
maybe there some reason that I'm just not seeing.  But I digress.)

So I want to propose that we move the slot to be together with the
projection node that it serves, ie. we put the slot in ResultRelInfo:

typedef struct ResultRelInfo
{
...
/* for computing ON CONFLICT DO UPDATE SET */
TupleTableSlot *ri_onConflictProjSlot;
ProjectionInfo *ri_onConflictSetProj;

and with this the structure makes more sense.  So ExecInitModifyTable
does this

/* create target slot for UPDATE SET projection */
tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 relationDesc->tdhasoid);
resultRelInfo->ri_onConflictProjSlot =
ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);

/* build UPDATE SET projection state */
resultRelInfo->ri_onConflictSetProj =
ExecBuildProjectionInfo(node->onConflictSet, econtext,
resultRelInfo->ri_onConflictProjSlot,
&mtstate->ps, relationDesc);

and then ExecOnConflictUpdate can simply do this:

/* Project the new tuple version */
ExecProject(resultRelInfo->ri_onConflictSetProj);

/* Execute UPDATE with projection */
*returning = ExecUpdate(mtstate, &tuple.t_self, NULL,
resultRelInfo->ri_onConflictProjSlot, planSlot,
&mtstate->mt_epqstate, mtstate->ps.state,
canSetTag);

Now, maybe there is some reason I'm missing for the on conflict slot for
the projection to be in ModifyTableState rather than resultRelInfo.  But
this code passes all current tests, so I don't know what that reason
would be.


Overall, the resulting code looks simpler to me than the previous
arrangements.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ExplainProperty* and units

2018-03-16 Thread Andres Freund

On 2018-03-14 13:32:10 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Only thing I wonder is if we shouldn't just *remove*
> > ExplainPropertyLong and make ExplainPropertyInteger accept 64bits of
> > input - the effort of passing and printing a 64bit integer will never be
> > relevant for explain.
> 
> -0.5 ... everywhere else, we mean "int32" when we say "int", and I don't
> think it's worth the potential confusion to do it differently here.

Since I'm +1 on this (I don't think Integer is the same as int, there's
no benefit in having two functions, and I've already written the patch
this way), I'm inclined to go with one function.  Does anybody else
have an opinion?

- Andres

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Andres Freund

Hi,

On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:
> Another thing I noticed is that the split of the ON CONFLICT slot and
> its corresponding projection is pretty weird.  The projection is in
> ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
> make the projection work without a corresponding slot initialized with
> the correct descriptor, so splitting it this way doesn't make a lot of
> sense to me.
> 
> (Now, TBH the split between resultRelInfo->ri_projectReturning and
> ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
> returning project uses, doesn't make a lot of sense to me either; so
> maybe there some reason that I'm just not seeing.  But I digress.)

The projections for different child tables / child plans can look
different, therefore it can't be stored in ModifyTableState itself. No?

The slot's descriptor is changed to be "appropriate" when necessary:
if (slot->tts_tupleDescriptor != 
RelationGetDescr(resultRelationDesc))
ExecSetSlotDescriptor(slot, 
RelationGetDescr(resultRelationDesc));


> So I want to propose that we move the slot to be together with the
> projection node that it serves, ie. we put the slot in ResultRelInfo:

That'll mean we have a good number of additional slots in some cases. I
don't think the overhead of that is going to break the bank, but it's
worth considering.

Greetings,

Andres Freund

Re: CURRENT OF causes an error when IndexOnlyScan is used

2018-03-16 Thread Tom Lane

I wrote:
> In the meantime, we could fix execCurrent.c so that it throws
> FEATURE_NOT_SUPPORTED rather than the current failure if the slot it's
> looking at doesn't contain a physical tuple.

Concretely, I think we should do the attached, so as to cover any other
situations where the scan type doesn't return a physical tuple.  I kept
it separate from your patch so it's easy to test (the original case you
gave now throws the appropriate error).

The existing error when execCurrentOf can't figure out what to do with
the plan is ERRCODE_INVALID_CURSOR_STATE with message
"cursor \"%s\" is not a simply updatable scan of table \"%s\""
so for this draft patch I just used that.  I'm not sure if it would be
a better idea to throw a different SQLSTATE or different error text
for this case.  Any thoughts on that?

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index a2f67f2..c45a488 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*** slot_attisnull(TupleTableSlot *slot, int
*** 1367,1372 
--- 1367,1398 
  }
  
  /*
+  * slot_getsysattr
+  *		This function fetches a system attribute of the slot's current tuple.
+  *		Unlike slot_getattr, if the slot does not contain system attributes,
+  *		this will return false (with a NULL attribute value) instead of
+  *		throwing an error.
+  */
+ bool
+ slot_getsysattr(TupleTableSlot *slot, int attnum,
+ Datum *value, bool *isnull)
+ {
+ 	HeapTuple	tuple = slot->tts_tuple;
+ 
+ 	Assert(attnum < 0);			/* else caller error */
+ 	if (tuple == NULL ||
+ 		tuple == &(slot->tts_minhdr))
+ 	{
+ 		/* No physical tuple, or minimal tuple, so fail */
+ 		*value = (Datum) 0;
+ 		*isnull = true;
+ 		return false;
+ 	}
+ 	*value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, isnull);
+ 	return true;
+ }
+ 
+ /*
   * heap_freetuple
   */
  void
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index ce7d4ac..2296c9b 100644
*** a/src/backend/executor/execCurrent.c
--- b/src/backend/executor/execCurrent.c
*** execCurrentOf(CurrentOfExpr *cexpr,
*** 150,157 
  	else
  	{
  		ScanState  *scanstate;
  		bool		lisnull;
- 		Oid			tuple_tableoid PG_USED_FOR_ASSERTS_ONLY;
  		ItemPointer tuple_tid;
  
  		/*
--- 150,157 
  	else
  	{
  		ScanState  *scanstate;
+ 		Datum		ldatum;
  		bool		lisnull;
  		ItemPointer tuple_tid;
  
  		/*
*** execCurrentOf(CurrentOfExpr *cexpr,
*** 183,201 
  		if (TupIsNull(scanstate->ss_ScanTupleSlot))
  			return false;
  
! 		/* Use slot_getattr to catch any possible mistakes */
! 		tuple_tableoid =
! 			DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot,
! 		  TableOidAttributeNumber,
! 		  &lisnull));
! 		Assert(!lisnull);
! 		tuple_tid = (ItemPointer)
! 			DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot,
! 		 SelfItemPointerAttributeNumber,
! 		 &lisnull));
  		Assert(!lisnull);
  
! 		Assert(tuple_tableoid == table_oid);
  
  		*current_tid = *tuple_tid;
  
--- 183,213 
  		if (TupIsNull(scanstate->ss_ScanTupleSlot))
  			return false;
  
! 		/*
! 		 * Try to fetch tableoid and CTID from the scan node's current tuple.
! 		 * If the scan type hasn't provided a physical tuple, we have to fail.
! 		 */
! 		if (!slot_getsysattr(scanstate->ss_ScanTupleSlot,
! 			 TableOidAttributeNumber,
! 			 &ldatum,
! 			 &lisnull))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_INVALID_CURSOR_STATE),
! 	 errmsg("cursor \"%s\" is not a simply updatable scan of table \"%s\"",
! 			cursor_name, table_name)));
  		Assert(!lisnull);
+ 		Assert(DatumGetObjectId(ldatum) == table_oid);
  
! 		if (!slot_getsysattr(scanstate->ss_ScanTupleSlot,
! 			 SelfItemPointerAttributeNumber,
! 			 &ldatum,
! 			 &lisnull))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_INVALID_CURSOR_STATE),
! 	 errmsg("cursor \"%s\" is not a simply updatable scan of table \"%s\"",
! 			cursor_name, table_name)));
! 		Assert(!lisnull);
! 		tuple_tid = (ItemPointer) DatumGetPointer(ldatum);
  
  		*current_tid = *tuple_tid;
  
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 0642a3a..a5779b1 100644
*** a/src/include/executor/tuptable.h
--- b/src/include/executor/tuptable.h
*** extern Datum slot_getattr(TupleTableSlot
*** 170,174 
--- 170,176 
  extern void slot_getallattrs(TupleTableSlot *slot);
  extern void slot_getsomeattrs(TupleTableSlot *slot, int attnum);
  extern bool slot_attisnull(TupleTableSlot *slot, int attnum);
+ extern bool slot_getsysattr(TupleTableSlot *slot, int attnum,
+ Datum *value, bool *isnull);
  
  #endif			/* TUPTABLE_H */

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-03-16 Thread Claudio Freire

On Fri, Feb 9, 2018 at 1:05 PM, Claudio Freire  wrote:
> Turns out that it was a tad oversized. 300k tuples seems enough.
>
> Attached is a new patch version that:
>
> - Uses an unlogged table to make the large mwm test faster
> - Uses a wait_barrier helper that waits for concurrent transactions
>   to finish before vacuuming tables, to make sure deleted tuples
>   actually are vacuumable
> - Tweaks the size of the large mwm test to be as small as possible
> - Optimizes the delete to avoid expensive operations yet attain
>   the same end result

Attached rebased versions of the patches (they weren't applying to
current master)
From d8241ba70bc70c5dd989d2dccdee03a71151c814 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH 1/2] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.

Improve test ability to spot vacuum errors
---
 src/backend/commands/vacuumlazy.c| 402 ---
 src/test/regress/expected/vacuum.out |  72 ++-
 src/test/regress/sql/vacuum.sql  |  40 +++-
 3 files changed, 438 insertions(+), 76 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 9ac84e8..793ce74 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -11,11 +11,24 @@
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
+ * initially allocate an array of TIDs of 128MB, or an upper limit that
  * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
+ * uselessly for vacuuming small tables). Additional arrays of increasingly
+ * large sizes are allocated as they become necessary.
+ *
+ * The TID array is thus represented as a list of multiple segments of
+ * varying size, beginning with the initial size of up to 128MB, and growing
+ * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
+ * is used up.
+ *
+ * Lookup in that structure happens with a binary search of segments, and then
+ * with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity.
+ *
+ * If the multiarray's total size threatens to grow beyond maintenance_work_mem,
  * we suspend the heap scan phase and perform a pass of index cleanup and page
- * compaction, then resume the heap scan with an empty TID array.
+ * compaction, then resume the heap scan with an array of logically empty but
+ * already preallocated TID segments to be refilled with more dead tuple TIDs.
  *
  * If we're processing a table with no indexes, we can just vacuum each page
  * as we go; there's no need to save up multiple tuples to minimize the number
@@ -92,6 +105,14 @@
 #define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
 
 /*
+ * Minimum (starting) size of the dead_tuples array segments. Will allocate
+ * space for 128MB worth of tid pointers in the first segment, further segments
+ * will grow in size exponentially. Don't make it too small or the segment list
+ * will grow bigger than the sweetspot for search efficiency on big vacuums.
+ */
+#define LAZY_INIT_TUPLES		Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData))
+
+/*
  * Before we consider skipping a page that's marked as clean in
  * visibility map, we must've seen at least this many clean pages.
  */
@@ -103,6 +124,27 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+typedef struct DeadTuplesSegment
+{
+	ItemPointerData last_dead_tuple;	/* Copy of the last dead tuple (unset
+		 * until the segment is fully
+		 * populated). Keep it first to simplify
+		 * binary searches */
+	int			num_dead_tuples;	/* # of entries in the segment */
+	int			max_dead_tuples;	/* # of entries allocated in the segment */
+	ItemPointer dt_tids;		/* Array of dead tuples */
+}	DeadTuplesSegment;
+
+typedef struct DeadTuplesMultiArray
+{
+	int			num_entries;	/* current # of entries */
+	int			max_entries;	/* total # of slots that can be allocated in
+ * array */
+	int			num_segs;		/* number of dead tuple segments allocated */
+	int			last_seg;		/* last dead tuple segment with data (or 0) */
+	DeadTuplesSegment *dt_segments;		/* array of num_segs segments */
+}	DeadTuplesMultiArray;
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -123,14 +165,20 @@ typedef struct LVRelStats
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	/* List of TIDs of tuples we intend to delet

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera

Andres Freund wrote:
> Hi,
> 
> On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:
> > Another thing I noticed is that the split of the ON CONFLICT slot and
> > its corresponding projection is pretty weird.  The projection is in
> > ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
> > make the projection work without a corresponding slot initialized with
> > the correct descriptor, so splitting it this way doesn't make a lot of
> > sense to me.
> > 
> > (Now, TBH the split between resultRelInfo->ri_projectReturning and
> > ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
> > returning project uses, doesn't make a lot of sense to me either; so
> > maybe there some reason that I'm just not seeing.  But I digress.)
> 
> The projections for different child tables / child plans can look
> different, therefore it can't be stored in ModifyTableState itself. No?
> 
> The slot's descriptor is changed to be "appropriate" when necessary:
>   if (slot->tts_tupleDescriptor != 
> RelationGetDescr(resultRelationDesc))
>   ExecSetSlotDescriptor(slot, 
> RelationGetDescr(resultRelationDesc));

Grumble.  This stuff looks like full of cheats to me, but I won't touch
it anyway.

> > So I want to propose that we move the slot to be together with the
> > projection node that it serves, ie. we put the slot in ResultRelInfo:
> 
> That'll mean we have a good number of additional slots in some cases. I
> don't think the overhead of that is going to break the bank, but it's
> worth considering.

Good point.

I think what I should be doing is the same as the returning stuff: keep
a tupdesc around, and use a single slot, whose descriptor is changed
just before the projection.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: Configurable file mode mask

2018-03-16 Thread Michael Paquier

On Fri, Mar 16, 2018 at 11:12:44AM -0400, Stephen Frost wrote:
> Given that we're already, as I recall, including the owner/group of the
> file being backed up through pg_basebackup in the tarball, it seems like
> we should be including whatever the current dir/file mode is too.  Those
> files may not have anything to do with PostgreSQL, after all, and a
> 'natural' tar of the directory would capture that information too.

Yes.  tar does include this information in the header associated to each
file.  Do we want an additional switch for pg_receivexlog as well by the
way which generates WAL segments with group access?  Some people take
base backups without slots and rely on an archive to look for remaining
segments up to the end-of-backup record.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-03-16 Thread Yura Sokolov

16.03.2018 04:23, Tomas Vondra пишет:
> 
> 
> On 03/10/2018 03:11 AM, Yura Sokolov wrote:
>> 08.03.2018 03:42, Tomas Vondra пишет:
>>> On 03/06/2018 06:23 AM, Yura Sokolov wrote:
 05.03.2018 18:00, Tom Lane пишет:
> Tomas Vondra  writes:
>> Snapshots are static (we don't really add new XIDs into existing ones,
>> right?), so why don't we simply sort the XIDs and then use bsearch to
>> lookup values? That should fix the linear search, without need for any
>> local hash table.
>
> +1 for looking into that, since it would avoid adding any complication
> to snapshot copying.  In principle, with enough XIDs in the snap, an
> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm
> dubious that we are often in the range where that would matter.
> We do need to worry about the cost of snapshot copying, too.

 Sorting - is the first thing I've tried. Unfortunately, sorting
 itself eats too much cpu. Filling hash table is much faster.

>>>
>>> I've been interested in the sort-based approach, so I've spent a bit of
>>> time hacking on it (patch attached). It's much less invasive compared to
>>> the hash-table, but Yura is right the hashtable gives better results.
>>>
>>> I've tried to measure the benefits using the same script I shared on
>>> Tuesday, but I kept getting really strange numbers. In fact, I've been
>>> unable to even reproduce the results I shared back then. And after a bit
>>> of head-scratching I think the script is useless - it can't possibly
>>> generate snapshots with many XIDs because all the update threads sleep
>>> for exactly the same time. And first one to sleep is first one to wake
>>> up and commit, so most of the other XIDs are above xmax (and thus not
>>> included in the snapshot). I have no idea why I got the numbers :-/
>>>
>>> But with this insight, I quickly modified the script to also consume
>>> XIDs by another thread (which simply calls txid_current). With that I'm
>>> getting snapshots with as many XIDs as there are UPDATE clients (this
>>> time I actually checked that using gdb).
>>>
>>> And for a 60-second run the tps results look like this (see the attached
>>> chart, showing the same data):
>>>
>>> writers master  hash   sort
>>>-
>>> 16   1068   1057   1070
>>> 32   1005995   1033
>>> 64   1064   1042   1117
>>> 128  1058   1021   1051
>>> 256   977   1004928
>>> 512   773935808
>>> 768   576815670
>>> 1000  413752573
>>>
>>> The sort certainly does improve performance compared to master, but it's
>>> only about half of the hashtable improvement.
>>>
>>> I don't know how much this simple workload resembles the YCSB benchmark,
>>> but I seem to recall it's touching individual rows. In which case it's
>>> likely worse due to the pg_qsort being more expensive than building the
>>> hash table.
>>>
>>> So I agree with Yura the sort is not a viable alternative to the hash
>>> table, in this case.
>>>
 Can I rely on snapshots being static? May be then there is no need
 in separate raw representation and hash table. I may try to fill hash
 table directly in GetSnapshotData instead of lazy filling. Though it
 will increase time under lock, so it is debatable and should be
 carefully measured.

>>>
>>> Yes, I think you can rely on snapshots not being modified later. That's
>>> pretty much the definition of a snapshot.
>>>
>>> You may do that in GetSnapshotData, but you certainly can't do that in
>>> the for loop there. Not only we don't want to add more work there, but
>>> you don't know the number of XIDs in that loop. And we don't want to
>>> build hash tables for small number of XIDs.
>>>
>>> One reason against building the hash table in GetSnapshotData is that
>>> we'd build it even when the snapshot is never queried. Or when it is
>>> queried, but we only need to check xmin/xmax.
>>
>> Thank you for analyze, Tomas.
>>
>> Stephen is right about bug in snapmgr.c
>> Attached version fixes bug, and also simplifies XidInXip a bit.
>>
> 
> OK, a few more comments.
> 
> 1) The code in ExtendXipSizeForHash seems somewhat redundant with
> my_log2 (that is, we could just call the existing function).

Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one
more call be more expensive than loop itself?

> 2) Apparently xhlog/subxhlog fields are used for two purposes - to store
> log2 of the two XID counts, and to remember if the hash table is built.
> That's a bit confusing (at least it should be explained in a comment)
> but most importantly it seems a bit unnecessary.

Ok, I'll add comment.

> I assume it was done to save space, but I very much doubt that makes any
> difference. So we could easily keep a separate flag. I'm prett

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-03-16 Thread Tomas Vondra

On 03/17/2018 12:03 AM, Yura Sokolov wrote:
> 16.03.2018 04:23, Tomas Vondra пишет:
>>
>> ...
>>
>> OK, a few more comments.
>>
>> 1) The code in ExtendXipSizeForHash seems somewhat redundant with
>> my_log2 (that is, we could just call the existing function).
> 
> Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one
> more call be more expensive than loop itself?
> 

I very much doubt it there would be a measurable difference. Firstly,
function calls are not that expensive, otherwise we'd be running around
and removing function calls from the hot paths. Secondly, the call only
happens with many XIDs, and in that case the cost should be out-weighted
by faster lookups. And finally, the function does stuff that seems far
more expensive than a single function call (for example allocation,
which may easily trigger a malloc).

In fact, in the interesting cases it's pretty much guaranteed to hit a
malloc, because the number of backend processes needs to be high enough
(say, 256 or more), which means

GetMaxSnapshotSubxidCount()

will translate to something like

((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
= (64 + 1) * 256
= 16640

and because XIDs are 4B each, that's ~65kB of memory (even ignoring the
ExtendXipSizeForHash business). And aset.c treats chunks larger than 8kB
as separate blocks, that are always malloc-ed independently.

But I may be missing something, and the extra call actually makes a
difference. But I think the onus of proving that is on you, and the
default should be not to duplicate code.

>> 2) Apparently xhlog/subxhlog fields are used for two purposes - to
>> store log2 of the two XID counts, and to remember if the hash table
>> is built. That's a bit confusing (at least it should be explained
>> in a comment) but most importantly it seems a bit unnecessary.
> 
> Ok, I'll add comment.
> 
>> I assume it was done to save space, but I very much doubt that
>> makes any difference. So we could easily keep a separate flag. I'm
>> pretty sure there are holes in the SnapshotData struct, so we could
>> squeeze it the flags in one of those.
> 
> There's hole just right after xhlog. But it will be a bit strange to 
> move fields around.
> 

Is SnapshotData really that sensitive to size increase? I have my doubts
about that, TBH. The default stance should be to make the code easy to
understand. That is, we should only move fields around if it actually
makes a difference.

>> But do we even need a separate flag? We could just as easily keep 
>> the log2 fields set to 0 and only set them after actually building 
>> the hash table.
> 
> There is a need to signal that there is space for hash. It is not
> always allocated. iirc, I didn't cover the case where snapshot were
> restored from file, and some other place or two.
> Only if all places where snapshot is allocated are properly modified
> to allocate space for hash, then flag could be omitted, and log2
> itself used as a flag.
> 

Hmmm, that makes it a bit inconsistent, I guess ... why not to do the
same thing on all those places?

>> Or even better, why not to store the mask so that XidInXip does not
>> need to compute it over and over (granted, that's uint32 instead
>> of just uint8, but I don't think SnapshotData is particularly
>> sensitive to this, especially considering how much larger the xid
>> hash table is).
> 
> I don't like unnecessary sizeof struct increase. And I doubt that 
> computation matters. I could be mistaken though, because it is hot
> place. Do you think it will significantly improve readability?
> 

IMHO the primary goal is to make the code easy to read and understand,
and only optimize struct size if it actually makes a difference. We have
no such proof here, and I very much doubt you'll be able to show any
difference because even with separate flags pahole says this:

struct SnapshotData {
SnapshotSatisfiesFunc  satisfies;/* 0 8 */
TransactionId  xmin; /* 8 4 */
TransactionId  xmax; /*12 4 */
TransactionId *xip;  /*16 8 */
uint32 xcnt; /*24 4 */
uint8  xhlog;/*28 1 */

/* XXX 3 bytes hole, try to pack */

TransactionId *subxip;   /*32 8 */
int32  subxcnt;  /*40 4 */
uint8  subxhlog; /*44 1 */
bool   suboverflowed;/*45 1 */
bool   takenDuringRecovery;  /*46 1 */
bool   copied;   /*47 1 */
CommandId  curcid;   /*48 4 */
uint32 speculativeToken; /*52 4 */
uint32 active_count; /*56 4 */
uin

Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-16 Thread Tomas Vondra

On 03/07/2018 02:18 PM, Arthur Zakirov wrote:
> On Wed, Mar 07, 2018 at 02:12:32PM +0100, Pavel Stehule wrote:
>> 2018-03-07 14:10 GMT+01:00 Pavel Stehule :
>>> 2018-03-07 13:58 GMT+01:00 Arthur Zakirov :
 Oh understood. Tomas suggested those commands too earlier. I'll
 implement them. But I think it is better to track files modification time
 too. Because now, without the patch, users don't have to call additional
 commands to refresh their dictionaries, so without such tracking we'll
 made dictionaries maintenance harder.

>>>
>>> Postgres hasn't any subsystem based on modification time, so
>>> introduction this sensitivity, I don't see, practical.
>>>
>>
>> Usually the shared dictionaries are used for complex language
>> based fulltext. The frequence of updates of these dictionaries is
>> less than updates PostgreSQL. The czech dictionary is same 10
>> years.
> 
> Agree. In this case auto reloading isn't important feature here.
> 

Arthur, what are your plans with this patch in the current CF?

It does not seem to be moving towards RFC very much, and reworking the
patch to use mmap() seems like a quite significant change late in the
CF. Which means it's likely to cause the patch get get bumped to the
next CF (2018-09).

FWIW I am not quite sure if the mmap() approach is better than what was
implemented by the patch. I'm not sure how exactly will it behave under
memory pressure (AFAIK it goes through page cache, which means random
parts of dictionaries might get evicted) or how well is it supported on
various platforms (say, Windows).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: User defined data types in Logical Replication

2018-03-16 Thread Huong Dangminh

> Masahiko Sawada wrote:
> 
> > Regarding to regression test, I added a new test module
> > test_subscription that creates a new user-defined data type. In a
> > subscription regression test, using test_subscription we make
> > subscriber call slot_store_error_callback and check if the subscriber
> > can correctly look up both remote and local data type strings. One
> > downside of this regression test is that in a failure case, the
> > duration of the test will be long (up to 180sec) because it has to
> > wait for the polling timeout.
> > Attached an updated patch with a regression test.
> 
> Pushed the fix to pg10 and master.  Thanks to all involved for the report,
> patches and review.
> 

Thank you all for fixing it.


---
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

[PATCH] add apps directory; add model classes; add db_tools directory;

2018-03-16 Thread Hongyuan Ma

---
 .gitignore  |   2 ++
 web/apps/__init__.py|   0
 web/apps/test_item/__init__.py  |   0
 web/apps/test_item/admin.py |   3 ++
 web/apps/test_item/models.py|  35 +++
 web/apps/test_item/tests.py |   3 ++
 web/apps/test_item/views.py |   3 ++
 web/apps/user_operation/__init__.py |   0
 web/apps/user_operation/admin.py|   3 ++
 web/apps/user_operation/models.py   |  58 
 web/apps/user_operation/tests.py|   3 ++
 web/apps/user_operation/views.py|   3 ++
 web/db.sqlite3  | Bin 0 -> 50176 bytes
 web/db_tools/data/__init__.py   |   0
 web/db_tools/data/test_branch_date.py   |  11 ++
 web/db_tools/import_test_branch_data.py |  22 
 web/pgperffarm/settings.py  |   8 -
 web/pgperffarm/urls.py  |   2 +-
 18 files changed, 154 insertions(+), 2 deletions(-)
 create mode 100644 web/apps/__init__.py
 create mode 100644 web/apps/test_item/__init__.py
 create mode 100644 web/apps/test_item/admin.py
 create mode 100644 web/apps/test_item/models.py
 create mode 100644 web/apps/test_item/tests.py
 create mode 100644 web/apps/test_item/views.py
 create mode 100644 web/apps/user_operation/__init__.py
 create mode 100644 web/apps/user_operation/admin.py
 create mode 100644 web/apps/user_operation/models.py
 create mode 100644 web/apps/user_operation/tests.py
 create mode 100644 web/apps/user_operation/views.py
 create mode 100644 web/db.sqlite3
 create mode 100644 web/db_tools/data/__init__.py
 create mode 100644 web/db_tools/data/test_branch_date.py
 create mode 100644 web/db_tools/import_test_branch_data.py

diff --git a/.gitignore b/.gitignore
index 239c691..f6dc301 100644
--- a/.gitignore
+++ b/.gitignore
@@ -5,3 +5,5 @@ migrations
 *.psp
 .DS_Store
 tags
+
+.idea
\ No newline at end of file
diff --git a/web/apps/__init__.py b/web/apps/__init__.py
new file mode 100644
index 000..e69de29
diff --git a/web/apps/test_item/__init__.py b/web/apps/test_item/__init__.py
new file mode 100644
index 000..e69de29
diff --git a/web/apps/test_item/admin.py b/web/apps/test_item/admin.py
new file mode 100644
index 000..8c38f3f
--- /dev/null
+++ b/web/apps/test_item/admin.py
@@ -0,0 +1,3 @@
+from django.contrib import admin
+
+# Register your models here.
diff --git a/web/apps/test_item/models.py b/web/apps/test_item/models.py
new file mode 100644
index 000..428dc0b
--- /dev/null
+++ b/web/apps/test_item/models.py
@@ -0,0 +1,35 @@
+from datetime import datetime
+
+from django.db import models
+
+# Create your models here.
+class TestBranch(models.Model):
+"""
+test brand
+"""
+branch_name = models.CharField(max_length=128, verbose_name="branch name", 
help_text="branch name")
+add_time = models.DateTimeField(default=datetime.now, verbose_name="branch 
added time", help_text="branch added time")
+
+class Meta:
+verbose_name = "test branch"
+verbose_name_plural = "test branch"
+
+def __str__(self):
+return self.branch_name
+
+class TestItem(models.Model):
+"""
+test_item
+"""
+test_name = models.CharField(max_length=128, verbose_name="test name", 
help_text="test name")
+test_desc = models.TextField(verbose_name="test desc", help_text="test 
desc")
+test_metric = models.DecimalField(max_digits=10, decimal_places=2, 
verbose_name="test metric", help_text="score>metric means 
improving;score<=metric means declining;")
+test_branch = models.ForeignKey(TestBranch ,verbose_name="test branch", 
help_text="test branch")
+add_time = models.DateTimeField(default=datetime.now, verbose_name="test 
added time")
+
+class Meta:
+verbose_name = "test_item"
+verbose_name_plural = "test_item"
+
+def __str__(self):
+return self.test_name
diff --git a/web/apps/test_item/tests.py b/web/apps/test_item/tests.py
new file mode 100644
index 000..a6019d6
--- /dev/null
+++ b/web/apps/test_item/tests.py
@@ -0,0 +1,3 @@
+from django.test import TestCase
+
+# Create your test_item here.
diff --git a/web/apps/test_item/views.py b/web/apps/test_item/views.py
new file mode 100644
index 000..91ea44a
--- /dev/null
+++ b/web/apps/test_item/views.py
@@ -0,0 +1,3 @@
+from django.shortcuts import render
+
+# Create your views here.
diff --git a/web/apps/user_operation/__init__.py 
b/web/apps/user_operation/__init__.py
new file mode 100644
index 000..e69de29
diff --git a/web/apps/user_operation/admin.py b/web/apps/user_operation/admin.py
new file mode 100644
index 000..8c38f3f
--- /dev/null
+++ b/web/apps/user_operation/admin.py
@@ -0,0 +1,3 @@
+from django.contrib import admin
+
+# Register your models here.
diff --git a/web/apps/user_operation/models.py 
b/web/apps/user_operation/models.py
new file mode 100644
index 000..ca426f1
--- /dev/null
+++ b/web/apps/user_operati

Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-16 Thread Arthur Zakirov

Hello Tomas,

Arthur, what are your plans with this patch in the current CF?


I think dsm-based approach is in good shape already and works nice.
I've planned only to improve the documentation a little. Also it seems I
should change 0004 part, I found that extension upgrade scripts may be made
in wrong way.
In my opinion RELOAD and UNLOAD commands can be made in next commitfest
(2018-09).
Did you look it? Have you arguments about how shared memory allocation and
releasing functions are made?


>
> It does not seem to be moving towards RFC very much, and reworking the
> patch to use mmap() seems like a quite significant change late in the
> CF. Which means it's likely to cause the patch get get bumped to the
> next CF (2018-09).


Agree. I have a draft version for mmap-based approach which works in
platforms with mmap. In Windows it is necessary to use another API
(CreateFileMapping, etc). But this approach requires more work on handling
processed dictionary files (how name them, when remove).


>
> FWIW I am not quite sure if the mmap() approach is better than what was
> implemented by the patch. I'm not sure how exactly will it behave under
> memory pressure (AFAIK it goes through page cache, which means random
> parts of dictionaries might get evicted) or how well is it supported on
> various platforms (say, Windows).
>

Yes, as I wrote mmap-based approach requires more work. The only benefit I
see is that you don't need to process a dictionary after server restart.
I'd vote for dsm-based approach.


-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-16 Thread Amit Kapila

On Fri, Mar 16, 2018 at 3:36 PM, Amit Kapila  wrote:
> On Wed, Mar 14, 2018 at 12:01 AM, Robert Haas  wrote:
>>
>> 0003 introduces a new upper relation to represent the result of
>> applying the scan/join target to the topmost scan/join relation.  I'll
>> explain further down why this seems to be needed.  Since each path
>> must belong to only one relation, this involves using
>> create_projection_path() for the non-partial pathlist as we already do
>> for the partial pathlist, rather than apply_projection_to_path().
>> This is probably marginally more expensive but I'm hoping it doesn't
>> matter.  (However, I have not tested.)
>>
>
> I think in the patch series this is the questionable patch wherein it
> will always add an additional projection path (whether or not it is
> required) to all Paths (partial and non-partial) for the scanjoin rel
> and then later remove it (if not required) in create_projection_plan.
>  As you are saying, I also think it might not matter much in the grand
> scheme of things and if required we can test it as well,
>

I have done some tests to see the impact of this patch on planning
time.  I took some simple statements and tried to compute the time
they took in planning.

Test-1
--
DO $$
DECLARE count integer;
BEGIN
For count In 1..100 Loop
Execute 'explain Select ten from tenk1';
END LOOP;
END;
$$;

In the above block, I am explaining the simple statement which will
have just one path, so there will be one additional path projection
and removal cycle for this statement.  I have just executed the above
block in psql by having \timing option 'on' and the average timing for
ten runs on HEAD is  21292.388 ms, with patches (0001.* ~ 0003) is
22405.2466 ms and with patches (0001.* ~ 0005.*) is 22537.1362.  These
results indicate that there is approximately 5~6% of the increase in
planning time.

Test-2
--
DO $$
DECLARE count integer;
BEGIN
For count In 1..100 Loop
Execute 'explain select hundred,ten from tenk1 order by hundred';
END LOOP;
END;
$$;

In the above block, I am explaining the statement which will have two
paths, so there will be two additional path projections and one
removal cycle for one of the selected paths for this statement. The
average timing for ten runs on HEAD is 32869.8343 ms, with patches
(0001.* ~ 0003) is 34068.0608 ms and with patches (0001.* ~ 0005.*) is
34097.4899 ms.  These results indicate that there is approximately
3~4% of the increase in optimizer time.  Now, ideally, this test
should have shown more impact as we are adding additional projection
path for two paths, but I think as the overall time for planning is
higher, the impact of additional work is not much visible.

I have done these tests on the Centos VM, so there is some variation
in test results.  Please find attached the detailed results of all the
tests.  I have not changed any configuration for these tests.  I think
before reaching any conclusion, it would be better if someone repeats
these tests and see if they also have a similar observation.  The
reason for doing the tests separately for first three patches (0001.*
~ 0003.*) is to see the impact of changes without any change related
to parallelism.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Test-1
-
DO $$
DECLARE count integer;
BEGIN
For count In 1..100 Loop
Execute 'explain Select ten from tenk1';
END LOOP;
END;
$$;

Head

Time: 21228.829 ms (00:21.229)
Time: 21353.244 ms (00:21.353)
Time: 21758.958 ms (00:21.759)
Time: 20999.708 ms (00:21.000)
Time: 21804.451 ms (00:21.804)
Time: 20851.715 ms (00:20.852)
Time: 21514.198 ms (00:21.514)
Time: 20677.121 ms (00:20.677)
Time: 21685.576 ms (00:21.686)
Time: 21050.080 ms (00:21.050)

Patches (upto 0003)
--
Time: 23425.923 ms (00:23.426)
Time: 21609.624 ms (00:21.610)
Time: 21865.652 ms (00:21.866)
Time: 22006.988 ms (00:22.007)
Time: 22427.953 ms (00:22.428)
Time: 23195.211 ms (00:23.195)
Time: 22692.508 ms (00:22.693)
Time: 21757.227 ms (00:21.757)
Time: 22574.941 ms (00:22.575)
Time: 22496.439 ms (00:22.496)
Time: 21648.699 ms (00:21.649)

Patches (upto 0005)
---
Time: 23003.406 ms (00:23.003)
Time: 22624.973 ms (00:22.625)
Time: 22426.394 ms (00:22.426)
Time: 22700.590 ms (00:22.701)
Time: 22452.472 ms (00:22.452)
Time: 23048.367 ms (00:23.048)
Time: 21845.491 ms (00:21.845)
Time: 22832.643 ms (00:22.833)
Time: 21995.769 ms (00:21.996)
Time: 22441.257 ms (00:22.441)

Test-2
---
DO $$
DECLARE count integer;
BEGIN
For count In 1..100 Loop
Execute 'explain select hundred,ten from tenk1 order by hundred';
END LOOP;
END;
$$;

Head
-
Time: 32229.360 ms (00:32.229)
Time: 33310.348 ms (00:33.310)
Time: 32712.002 ms (00:32.712)
Time: 32630.473 ms (00:32.630)
Time: 33097.111 ms (00:33.097)
Time: 32487.799 ms (00:32.488)
Time: 33195.718 ms (00:33.196)
Time: 33166.606 ms (00:33.167)
Time: 32533.331 ms (00:32.533)
Time: 5.595 ms (00:33.336)

Patches (upto 0003)

Re: JIT compiling with LLVM v12.1

2018-03-16 Thread Andres Freund

Hi,

On 2018-03-13 16:40:32 -0700, Andres Freund wrote:
> I've pushed a revised and rebased version of my JIT patchset.
> The git tree is at
>   https://git.postgresql.org/git/users/andresfreund/postgres.git
> in the jit branch
>   
> https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/jit

The biggest change is that this contains docbook docs. Please check it
out, I'm not entirely sure the structure is perfect.  I'll make a
language prettying pass tomorrow, to tired for that now.

> Todo:
> - some build issues with old clang versions pointed out by Thomas Munro

I've added the configure magic to properly detect capabilities of
different clang versions. This doesn't resolve the 3.4 issues Thomas had
reported however, because we still rely on -flto=thin.

If necessary we could support it by adding a 'opt -module-summary $@ -o
$@' to the %.bc rules, but that'd require some version specific
handling. Given that it doesn't yet look necessary I'm loath to go
there.

> - when to take jit_expressions into account (both exec and plan or just
>   latter)

It's just plan time now. There's a new 'jit' GUC that works *both* at
execution time and plan time, and is documented as such.

> - EXPLAIN for queries that are JITed should display units. Starting
>   thread about effort to not duplicate code for that

done.

> - GUC docs (including postgresql.conf.sample)

done.

> - more explanations of type & function signature syncing

Still WIP.

Greetings,

Andres Freund

74 matches

Mail list logo