plan cache overhead on plpgsql expression

2020-02-16 Thread Pavel Stehule
Hi

when I do some profiling of plpgsql, usually I surprised how significant
overhead has expression execution. Any calculations are very slow.

This is not typical example of plpgsql, but it shows cleanly where is a
overhead

CREATE OR REPLACE FUNCTION public.foo()
 RETURNS void
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
declare i bigint = 0;
begin
  while i < 1
  loop
i := i + 1;
  end loop;
end;
$function$

Profile of development  version

  10,04%  plpgsql.so  [.] exec_eval_simple_expr
   9,17%  postgres[.] AcquireExecutorLocks
   7,01%  postgres[.] ExecInterpExpr
   5,86%  postgres[.]
OverrideSearchPathMatchesCurrent
   4,71%  postgres[.] GetCachedPlan
   4,14%  postgres[.] AcquirePlannerLocks
   3,72%  postgres[.] RevalidateCachedQuery
   3,56%  postgres[.] MemoryContextReset
   3,43%  plpgsql.so  [.] plpgsql_param_eval_var
   3,33%  postgres[.] SPI_plan_get_cached_plan
   3,28%  plpgsql.so  [.] exec_stmt
   3,18%  postgres[.] ReleaseCachedPlan
   2,92%  postgres[.] ResourceArrayRemove
   2,81%  plpgsql.so  [.] exec_assign_value
   2,74%  plpgsql.so  [.] exec_cast_value
   2,70%  plpgsql.so  [.] exec_eval_expr
   1,96%  postgres[.] recomputeNamespacePath
   1,90%  plpgsql.so  [.] exec_eval_boolean
   1,82%  plpgsql.so  [.] exec_eval_cleanup
   1,72%  postgres[.] ScanQueryForLocks
   1,68%  postgres[.] CheckCachedPlan
   1,49%  postgres[.] ResourceArrayAdd
   1,48%  plpgsql.so  [.] exec_assign_expr
   1,42%  postgres[.]
ResourceOwnerForgetPlanCacheRef
   1,24%  plpgsql.so  [.] exec_stmts
   1,23%  plpgsql.so  [.] exec_stmt_while
   1,03%  plpgsql.so  [.] assign_simple_var
   0,73%  postgres[.] int84lt
   0,62%  postgres[.]
ResourceOwnerEnlargePlanCacheRefs
   0,54%  postgres[.] int84pl
   0,49%  plpgsql.so  [.] setup_param_list
   0,45%  postgres[.] ResourceArrayEnlarge
   0,44%  postgres[.] choose_custom_plan
   0,39%  postgres[.]
ResourceOwnerRememberPlanCacheRef
   0,30%  plpgsql.so  [.] exec_stmt_assign
   0,26%  postgres[.] GetUserId
   0,22%  plpgsql.so  [.]
SPI_plan_get_cached_plan@plt

and profile of PostgreSQL 8.2

  13,63%  plpgsql.so  [.] exec_eval_simple_expr
   9,72%  postgres[.] AllocSetAlloc
   7,84%  postgres[.]
ExecMakeFunctionResultNoSets
   6,20%  plpgsql.so  [.] exec_assign_value
   5,46%  postgres[.] AllocSetReset
   4,79%  postgres[.] ExecEvalParam
   4,53%  plpgsql.so  [.] exec_eval_datum
   4,40%  postgres[.] MemoryContextAlloc
   3,51%  plpgsql.so  [.] exec_stmt
   3,01%  plpgsql.so  [.] exec_eval_expr
   2,76%  postgres[.] int84pl
   2,11%  plpgsql.so  [.] exec_eval_cleanup
   1,77%  postgres[.] datumCopy
   1,76%  postgres[.] MemoryContextReset
   1,75%  libc-2.30.so[.] __sigsetjmp
   1,64%  postgres[.] int84lt
   1,47%  postgres[.] pfree
   1,43%  plpgsql.so  [.] exec_simple_cast_value
   1,36%  plpgsql.so  [.] MemoryContextReset@plt
   1,28%  plpgsql.so  [.] exec_stmt_while
   1,25%  plpgsql.so  [.] exec_assign_expr
   1,22%  postgres[.] check_stack_depth
   1,09%  plpgsql.so  [.] exec_eval_boolean
   1,06%  postgres[.] AllocSetFree
   0,99%  plpgsql.so  [.] free_var
   0,93%  plpgsql.so  [.] exec_cast_value
   0,93%  plpgsql.so  [.] exec_stmts
   0,78%  libc-2.30.so[.]
__memmove_sse2_unaligned_erms
   0,72%  postgres[.] datumGetSize
   0,62%  postgres   

Re: [Proposal] Global temporary tables

2020-02-16 Thread 曾文旌(义从)


> 2020年2月15日 下午6:06,Pavel Stehule  写道:
> 
> 
>> postgres=# insert into foo select generate_series(1,1);
>> INSERT 0 1
>> postgres=# \dt+ foo
>>   List of relations
>> ┌┬──┬───┬───┬─┬┬─┐
>> │ Schema │ Name │ Type  │ Owner │ Persistence │  Size  │ Description │
>> ╞╪══╪═══╪═══╪═╪╪═╡
>> │ public │ foo  │ table │ pavel │ session │ 384 kB │ │
>> └┴──┴───┴───┴─┴┴─┘
>> (1 row)
>> 
>> postgres=# truncate foo;
>> TRUNCATE TABLE
>> postgres=# \dt+ foo
>>   List of relations
>> ┌┬──┬───┬───┬─┬───┬─┐
>> │ Schema │ Name │ Type  │ Owner │ Persistence │ Size  │ Description │
>> ╞╪══╪═══╪═══╪═╪═══╪═╡
>> │ public │ foo  │ table │ pavel │ session │ 16 kB │ │
>> └┴──┴───┴───┴─┴───┴─┘
>> (1 row)
>> 
>> I expect zero size after truncate.
> Thanks for review.
> 
> I can explain, I don't think it's a bug.
> The current implementation of the truncated GTT retains two blocks of FSM 
> pages.
> The same is true for truncating regular tables in subtransactions.
> This is an implementation that truncates the table without changing the 
> relfilenode of the table.
> 
> 
> This is not extra important feature - now this is little bit a surprise, 
> because I was not under transaction.
> 
> Changing relfilenode, I think, is necessary, minimally for future VACUUM FULL 
> support.
Not allowing relfilenode changes is the current limit.
I think can improve on it. But ,This is a bit complicated.
so I'd like to know the necessity of this improvement.
Could you give me more details?

> 
> Regards
> 
> Pavel Stehule
>  
> 
> Wenjing
> 
>> 
>> Regards
>> 
>> Pavel
>> 
>> 
>> 
>> Wenjing
>> 
>> 
>> 
>> 
>> > 
>> > -- 
>> > Robert Haas
>> > EnterpriseDB: http://www.enterprisedb.com 
>> > The Enterprise PostgreSQL Company
>> 
> 



Re: [Proposal] Global temporary tables

2020-02-16 Thread Pavel Stehule
ne 16. 2. 2020 v 16:15 odesílatel 曾文旌(义从) 
napsal:

>
>
> 2020年2月15日 下午6:06,Pavel Stehule  写道:
>
>
> postgres=# insert into foo select generate_series(1,1);
>> INSERT 0 1
>> postgres=# \dt+ foo
>>   List of relations
>> ┌┬──┬───┬───┬─┬┬─┐
>> │ Schema │ Name │ Type  │ Owner │ Persistence │  Size  │ Description │
>> ╞╪══╪═══╪═══╪═╪╪═╡
>> │ public │ foo  │ table │ pavel │ session │ 384 kB │ │
>> └┴──┴───┴───┴─┴┴─┘
>> (1 row)
>>
>> postgres=# truncate foo;
>> TRUNCATE TABLE
>> postgres=# \dt+ foo
>>   List of relations
>> ┌┬──┬───┬───┬─┬───┬─┐
>> │ Schema │ Name │ Type  │ Owner │ Persistence │ Size  │ Description │
>> ╞╪══╪═══╪═══╪═╪═══╪═╡
>> │ public │ foo  │ table │ pavel │ session │ 16 kB │ │
>> └┴──┴───┴───┴─┴───┴─┘
>> (1 row)
>>
>> I expect zero size after truncate.
>>
>> Thanks for review.
>>
>> I can explain, I don't think it's a bug.
>> The current implementation of the truncated GTT retains two blocks of FSM
>> pages.
>> The same is true for truncating regular tables in subtransactions.
>> This is an implementation that truncates the table without changing the
>> relfilenode of the table.
>>
>>
> This is not extra important feature - now this is little bit a surprise,
> because I was not under transaction.
>
> Changing relfilenode, I think, is necessary, minimally for future VACUUM
> FULL support.
>
> Not allowing relfilenode changes is the current limit.
> I think can improve on it. But ,This is a bit complicated.
> so I'd like to know the necessity of this improvement.
> Could you give me more details?
>

I don't think so GTT without support of VACUUM FULL can be accepted. Just
due consistency.

Regards

Pavel


>
> Regards
>
> Pavel Stehule
>
>
>>
>> Wenjing
>>
>>
>> Regards
>>
>> Pavel
>>
>>

 Wenjing




 >
 > --
 > Robert Haas
 > EnterpriseDB: http://www.enterprisedb.com
 > The Enterprise PostgreSQL Company


>>
>


Re: plan cache overhead on plpgsql expression

2020-02-16 Thread Pavel Stehule
ne 16. 2. 2020 v 15:12 odesílatel Pavel Stehule 
napsal:

> Hi
>
> when I do some profiling of plpgsql, usually I surprised how significant
> overhead has expression execution. Any calculations are very slow.
>
> This is not typical example of plpgsql, but it shows cleanly where is a
> overhead
>
> CREATE OR REPLACE FUNCTION public.foo()
>  RETURNS void
>  LANGUAGE plpgsql
>  IMMUTABLE
> AS $function$
> declare i bigint = 0;
> begin
>   while i < 1
>   loop
> i := i + 1;
>   end loop;
> end;
> $function$
>
>
> Is interesting so overhead of plan cache about 15%
>
> The execution needs 32 sec on Postgres13 and 27sec on Postgres8.2
>

On same computer same example in Perl needs only 7 sec.

Regards

Pavel


> Regards
>
> Pavel
>
>


Re: explain HashAggregate to report bucket and memory stats

2020-02-16 Thread Justin Pryzby
Updated:

 . remove from explain analyze those tests which would display sort
   Memory/Disk.  Oops.
 . fix issue with the first patch showing zero "tuples" memory for some
   grouping sets.
 . reclassify memory as "tuples" if it has to do with "members".  So hashtable
   size is now redundant with nbuckets (if you know
   sizeof(TupleHashEntryData));

-- 
Justin
>From c989b75f820dbda0540b3d2cd092eaf1f8629baa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 15 Feb 2020 12:03:11 -0600
Subject: [PATCH v3 1/7] Run some existing tests with explain (ANALYZE)..

..in a separate, earlier patch, to better show what bits are added by later
patches for hashtable instrumentation.
---
 src/test/regress/expected/groupingsets.out| 87 ++-
 src/test/regress/expected/select_parallel.out | 20 +++---
 src/test/regress/expected/subselect.out   | 69 +
 src/test/regress/expected/union.out   | 43 ++---
 src/test/regress/sql/groupingsets.sql | 16 ++---
 src/test/regress/sql/select_parallel.sql  |  4 +-
 src/test/regress/sql/subselect.sql| 25 
 src/test/regress/sql/union.sql|  4 +-
 8 files changed, 184 insertions(+), 84 deletions(-)

diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index c1f802c..c052f7e 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -458,16 +458,17 @@ ERROR:  aggregate functions are not allowed in FROM clause of their own query le
 LINE 3:lateral (select a, b, sum(v.x) from gstest_data(v.x) ...
  ^
 -- min max optimization should still work with GROUP BY ()
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select min(unique1) from tenk1 GROUP BY ();
- QUERY PLAN 
-
- Result
+ QUERY PLAN 
+
+ Result (actual rows=1 loops=1)
InitPlan 1 (returns $0)
- ->  Limit
-   ->  Index Only Scan using tenk1_unique1 on tenk1
+ ->  Limit (actual rows=1 loops=1)
+   ->  Index Only Scan using tenk1_unique1 on tenk1 (actual rows=1 loops=1)
  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+ Heap Fetches: 0
+(6 rows)
 
 -- Views with GROUPING SET queries
 CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
@@ -1126,14 +1127,14 @@ select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a)
 ---+---+-+---
 (0 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
-   QUERY PLAN   
-
- HashAggregate
+   QUERY PLAN   
+
+ HashAggregate (actual rows=0 loops=1)
Hash Key: a, b
Hash Key: a
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (4 rows)
 
 select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
@@ -1150,16 +1151,16 @@ select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),()
|   | | 0
 (3 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
-   QUERY PLAN   
-
- MixedAggregate
+   QUERY PLAN   
+
+ MixedAggregate (actual rows=3 loops=1)
Hash Key: a, b
Group Key: ()
Group Key: ()
Group Key: ()
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (6 rows)
 
 select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
@@ -1170,15 +1171,15 @@ select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
  | 0
 (3 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
-   QUERY PLAN   
-
- Aggregate
+   QUERY PLAN   
+
+ Aggregate (actual rows=3 loops=1)
Group Key: ()
Group Key: ()
Group Key: ()
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (5 rows)
 
 -- check that functionally dependent cols are not nulled
@@ -1193,16 +1194,16 @@ select a,

Re: jsonb_object() seems to be buggy. jsonb_build_object() is good.

2020-02-16 Thread Bryn Llewellyn
Bryn Llewellyn wrote:

> ...I wrote my own wrapper for jsonb_build_array()
> and jsonb_build_object():
> 
> create function my_jsonb_build(
>   kind in varchar,
>   variadic_elements in varchar)
>   returns jsonb
>   immutable
>   language plpgsql
> as $body$
> declare
>   stmt varchar :=
> case kind
>  when 'array' then
>'select jsonb_build_array('||variadic_elements||')'
>  when 'object' then
>'select jsonb_build_object('||variadic_elements||')'
> end;
>   j jsonb;
> begin
>   execute stmt into j;
>   return j;
> end;
> $body$;
> 

Andrew replied

Please don't top-post on PostgreSQL lists.  See


The function above has many deficiencies, including lack of error
checking and use of 'execute' which will significantly affect
performance. Still, if it works for you, that's your affair.

These functions were written to accommodate PostgreSQL limitations. We
don't have a heterogenous array type.  So json_object() will return an
object where all the values are strings, even if they look like numbers,
booleans etc. And indeed, this is shown in the documented examples.
jsonb_build_object and jsonb_build_array overcome that issue, but there
the PostgreSQL limitation is that you can't pass in an actual array as
the variadic element, again because we don't have heterogenous arrays.

Bryn replies:

Ah… I didn’t know about the bottom-posting rule.

Of course I didn’t show error handling. Doing so would have increased the 
source text size and made it harder to appreciate the point.

I used dynamic SQL because I was modeling the use case where on-the-fly 
analysis determines what JSON object or array must be built—i.e. the number of 
components and the datatype of each. It’s nice that jsonb_build_object() and 
jsonb_build_array() accommodate this dynamic need by being variadic. But I 
can’t see a way to wrote the invocation using only static code.

What am I missing?



reindex concurrently and two toast indexes

2020-02-16 Thread Justin Pryzby
Forking old, long thread:
https://www.postgresql.org/message-id/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:
> About reindex invalid indexes - i found one good question in archives [1]: 
> how about toast indexes?
> I check it now, i am able drop invalid toast index, but i can not drop 
> reduntant valid index.
> Reproduce:
> session 1: begin; select from test_toast ... for update;
> session 2: reindex table CONCURRENTLY test_toast ;
> session 2: interrupt by ctrl+C
> session 1: commit
> session 2: reindex table test_toast ;
> and now we have two toast indexes. DROP INDEX is able to remove only invalid 
> ones. Valid index gives "ERROR:  permission denied: 
> "pg_toast_16426_index_ccnew" is a system catalog"
> [1]: 
> https://www.postgresql.org/message-id/CAB7nPqT%2B6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw%40mail.gmail.com

It looks like this was never addressed.

I noticed a ccnew toast index sitting around since October - what do I do with 
it ?

ts=# DROP INDEX pg_toast.pg_toast_463881620_index_ccnew;
ERROR:  permission denied: "pg_toast_463881620_index_ccnew" is a system catalog

-- 
Justin




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-02-16 Thread Andres Freund
Hi,

On 2020-02-14 13:34:03 -0500, Robert Haas wrote:
> On Fri, Feb 14, 2020 at 1:07 PM Andres Freund  wrote:
> > Yea, that seems possible.  I'm not really sure it's needed however? As
> > long as you're not teaching the locking mechanism new tricks that
> > influence the wait graph, why would the deadlock detector care? That's
> > quite different from the group locking case, where you explicitly needed
> > to teach it something fairly fundamental.
> 
> Well, you have to teach it that locks of certain types conflict even
> if they are in the same group, and that bleeds over pretty quickly
> into the whole area of deadlock detection, because lock waits are the
> edges in the graph that the deadlock detector processes.

Shouldn't this *theretically* be doable with changes mostly localized to
lock.c, by not using proc->lockGroupLeader but proc for lock types that
don't support group locking? I do see that deadlock.c largely looks at
->lockGroupLeader, but that kind of doesn't seem right to me.


> > It might still be a good idea independently to add the rule & enforce
> > that acquire heavyweight locks while holding certain classes of locks is
> > not allowed.
> 
> I think that's absolutely essential, if we're going to continue using
> the main lock manager for this. I remain somewhat unconvinced that
> doing so is the best way forward, but it is *a* way forward.

Seems like we should build this part independently of the lock.c/new
infra piece.


> > Right. For me that's *the* fundamental service that lock.c delivers. And
> > it's the fundamental bit this thread so far largely has been focusing
> > on.
> 
> For me, the deadlock detection is the far more complicated and problematic 
> bit.
> 
> > Isn't that mostly true to varying degrees for the majority of lock types
> > in lock.c? Sure, perhaps historically that's a misuse of lock.c, but
> > it's been pretty ingrained by now.  I just don't see where leaving out
> > any of these features is going to give us fundamental advantages
> > justifying a different locking infrastructure.
> 
> I think the group locking + deadlock detection things are more
> fundamental than you might be crediting, but I agree that having
> parallel mechanisms has its own set of pitfalls.

It's possible. But I'm also hesitant to believe that we'll not need
other lock types that conflict between leader/worker, but that still
need deadlock detection.  The more work we want to parallelize, the more
likely that imo will become.

Greetings,

Andres Freund




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-02-16 Thread Tom Lane
Andres Freund  writes:
> On 2020-02-14 13:34:03 -0500, Robert Haas wrote:
>> I think the group locking + deadlock detection things are more
>> fundamental than you might be crediting, but I agree that having
>> parallel mechanisms has its own set of pitfalls.

> It's possible. But I'm also hesitant to believe that we'll not need
> other lock types that conflict between leader/worker, but that still
> need deadlock detection.  The more work we want to parallelize, the more
> likely that imo will become.

Yeah.  The concept that leader and workers can't conflict seems to me
to be dependent, in a very fundamental way, on the assumption that
we only need to parallelize read-only workloads.  I don't think that's
going to have a long half-life.

regards, tom lane




Re: jsonb_object() seems to be buggy. jsonb_build_object() is good.

2020-02-16 Thread Andrew Dunstan


On 2/16/20 1:40 PM, Bryn Llewellyn wrote:
>
> Andrew replied
>
> The function above has many deficiencies, including lack of error
> checking and use of 'execute' which will significantly affect
> performance. Still, if it works for you, that's your affair.
>
> These functions were written to accommodate PostgreSQL limitations. We
> don't have a heterogenous array type.  So json_object() will return an
> object where all the values are strings, even if they look like numbers,
> booleans etc. And indeed, this is shown in the documented examples.
> jsonb_build_object and jsonb_build_array overcome that issue, but there
> the PostgreSQL limitation is that you can't pass in an actual array as
> the variadic element, again because we don't have heterogenous arrays.
>
> Bryn replies:
>
>
> Of course I didn’t show error handling. Doing so would have increased the 
> source text size and made it harder to appreciate the point.
>
> I used dynamic SQL because I was modeling the use case where on-the-fly 
> analysis determines what JSON object or array must be built—i.e. the number 
> of components and the datatype of each. It’s nice that jsonb_build_object() 
> and jsonb_build_array() accommodate this dynamic need by being variadic. But 
> I can’t see a way to wrote the invocation using only static code.
>
> What am I missing?



Probably not much, These functions work best from application code which
builds up the query. But if you do that and then call a function which
in turn calls execute you get a double whammy of interpreter overhead.
I'm also not a fan of functions that in effect take bits of SQL text and
interpolate them into a query in plpgsql, like your query does.


json_object() is meant to be an analog of the hstore() function that
takes one or two text arrays and return an hstore. Of course, it doesn't
have the issue you complained about, since all values in an hstore are
strings.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





1 Status of vertical clustered index - 2 Join using (fk_constraint) suggestion - 3 Status of pgsql's parser autonomization

2020-02-16 Thread maxzor

Hello,

1. I was told that M$ SQLServer provides huge performance deltas over 
PostgreSQL when dealing with index-unaligned queries :

create index i on t (a,b, c);
select * from t where b=... and c=...;
Columnar storage has been tried by various companies, CitusData, 
EnterpriseDB, 2ndQuadrant, Fujitsu Enterprise Postgres. It has been 
discussed quite a lot, last thread that I was able to find being in 
2017, 
https://www.postgresql.org/message-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN%2BhCy8xOLAh2doYhVg5d6HsAA%40mail.gmail.com 
where Fujitsu's patch made it quite far.

What is the status on such a storage manager extension interface ?

2. What do you think of adding a new syntax : 'from t join t2 using 
(fk_constraint)' ? And further graph algorithms to make automatic joins ?
Both 'natural join' and 'using (column_name)' are useless when the 
columns are not the same in source and destination.
Plus it is often the case that the fk_constraints are over numerous 
columns, even though this is usually advised against. But when this case 
happens there will be a significant writing speedup.
I have been bothered by this to the point that I developed a 
graphical-query-builder plugin for pgModeler,
https://github.com/maxzor/plugins/tree/master/graphicalquerybuilder#automatic-join-mode 
,

but I believe such a syntax would be much better in the core!

3. What is the status of making the internal parser of PostgreSQL less 
coupled to the core, and easier to cherry-pick from outside?
It would be great to incorporate it into companion projects : pgAdmin4, 
pgModeler, pgFormatter...


BR, Maxime Chambonnet



Re: pgindent && weirdness

2020-02-16 Thread Thomas Munro
On Fri, Jan 17, 2020 at 3:58 PM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > On 2020-Jan-16, Tom Lane wrote:
> >> ... But I was hoping to
> >> hear Piotr's opinion before moving forward.

Me too.

Thinking about this again: It's obviously not true that everything
that looks like a function call is not a cast.  You could have
"my_cast(Type)" that expands to "(Type)" or some slightly more useful
variant of that, and then "my_cast(Type) -1" would, with this patch
applied, be reformatted as "my_cast(Type) - 1" because it'd err on the
side of thinking that the expression produces a value and therefore
the minus sign must be a binary operator that needs whitespace on both
sides, and that'd be wrong.  However, it seems to me that macros that
expand to raw cast syntax (and I mean just "(Type)", not a complete
cast including the value to be cast, like "((Type) (x))") must be rare
and unusual things, and I think it's better to err on the side of
thinking that function-like macros are values, not casts.  That's all
the change does, and fortunately the authors of indent showed how to
do that with their existing special cases for offsetof and sizeof; I'm
just extending that treatment to any identifier.

Is there some other case I'm not thinking of that is confused by the
change?  I'm sure you could contrive something it screws up, but my
question is about real code that people would actually write.  Piotr,
is there an easy way to reindent some large non-PostgreSQL body of
code that uses a cousin of this code to see if it gets better or worse
with the change?




Re: jsonb_object() seems to be buggy. jsonb_build_object() is good.

2020-02-16 Thread Bryn Llewellyn
Andrew Dunstan  wrote:

Bryn Llewellyn wrote:
> 
> Andrew replied
> 
> The function above has many deficiencies, including lack of error
> checking and use of 'execute' which will significantly affect
> performance. Still, if it works for you, that's your affair.
> 
> These functions were written to accommodate PostgreSQL limitations. We
> don't have a heterogenous array type.  So json_object() will return an
> object where all the values are strings, even if they look like numbers,
> booleans etc. And indeed, this is shown in the documented examples.
> jsonb_build_object and jsonb_build_array overcome that issue, but there
> the PostgreSQL limitation is that you can't pass in an actual array as
> the variadic element, again because we don't have heterogenous arrays.
> 
> Bryn replies:
> 
> 
> Of course I didn’t show error handling. Doing so would have increased the 
> source text size and made it harder to appreciate the point.
> 
> I used dynamic SQL because I was modeling the use case where on-the-fly 
> analysis determines what JSON object or array must be built—i.e. the number 
> of components and the datatype of each. It’s nice that jsonb_build_object() 
> and jsonb_build_array() accommodate this dynamic need by being variadic. But 
> I can’t see a way to wrote the invocation using only static code.
> 
> What am I missing?



Probably not much, These functions work best from application code which
builds up the query. But if you do that and then call a function which
in turn calls execute you get a double whammy of interpreter overhead.
I'm also not a fan of functions that in effect take bits of SQL text and
interpolate them into a query in plpgsql, like your query does.


json_object() is meant to be an analog of the hstore() function that
takes one or two text arrays and return an hstore. Of course, it doesn't
have the issue you complained about, since all values in an hstore are
strings.

Bryn replied:

We don’t yet support the hstore() function in YugabyteDB. So, meanwhile, I see 
no alternative to the approach that I illustrated—whatever that implies for 
doing things of which you’re not a fan. That’s why I asked “ What am I 
missing?”. But your “ Probably not much” seems, then, to force my hand.

B.t.w., you earlier said “The double quotes  [around “dog”] serve a specific 
purpose, to allow values containing commas to be treated as a single value (see 
syntax details for the exact rules) in the resulting array of text values.” But 
this test shows that they are not needed for that purpose:

select jsonb_pretty(jsonb_object(
  '{a, 17, b, dog house, c, true}'::varchar[]
  ))

This is the result:

 {+
 "a": "17",   +
 "b": "dog house",+
 "c": "true"  +
 }

The commas are sufficient separators.

It seems to me, therefore, that writing the double quotes gives the wrong 
message: they make it look like you are indeed specifying a text value rather 
than a numeric or integer value. But we know that the double quotes do *not* 
achieve this.







Re: jsonb_object() seems to be buggy. jsonb_build_object() is good.

2020-02-16 Thread Andrew Dunstan


On 2/16/20 7:25 PM, Bryn Llewellyn wrote:
>
> B.t.w., you earlier said “The double quotes  [around “dog”] serve a specific 
> purpose, to allow values containing commas to be treated as a single value 
> (see syntax details for the exact rules) in the resulting array of text 
> values.” But this test shows that they are not needed for that purpose:


I didn't say that. Someone else did.


>
> select jsonb_pretty(jsonb_object(
>   '{a, 17, b, dog house, c, true}'::varchar[]
>   ))
>
> This is the result:
>
>  {+
>  "a": "17",   +
>  "b": "dog house",+
>  "c": "true"  +
>  }
>
> The commas are sufficient separators.
>
> It seems to me, therefore, that writing the double quotes gives the wrong 
> message: they make it look like you are indeed specifying a text value rather 
> than a numeric or integer value. But we know that the double quotes do *not* 
> achieve this.
>


No, you haven't understood what they said. If the field value contains a
comma it needs to be quoted. But none of the fields in your example do.
If your field were "dog,house" instead of "dog house" it would need to
be quoted. This had nothing to do with json, BTW, it's simply from the
rules for array literals.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: 1 Status of vertical clustered index - 2 Join using (fk_constraint) suggestion - 3 Status of pgsql's parser autonomization

2020-02-16 Thread Tomas Vondra

On Sun, Feb 16, 2020 at 10:38:29PM +0100, maxzor wrote:

Hello,

1. I was told that M$ SQLServer provides huge performance deltas over 
PostgreSQL when dealing with index-unaligned queries :

create index i on t (a,b, c);
select * from t where b=... and c=...;


Perhaps index-only scans might help here, but that generally does not
work for "SELECT *" queries.

Columnar storage has been tried by various companies, CitusData, 
EnterpriseDB, 2ndQuadrant, Fujitsu Enterprise Postgres. It has been 
discussed quite a lot, last thread that I was able to find being in 
2017, https://www.postgresql.org/message-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN%2BhCy8xOLAh2doYhVg5d6HsAA%40mail.gmail.com 
where Fujitsu's patch made it quite far.

What is the status on such a storage manager extension interface ?



I think you're looking for threads about zheap and (especially)
zedstore. Those are two "storage manager" implementations various people
are currently working on. Neither of those is likely to make it into
pg13, though :-(

2. What do you think of adding a new syntax : 'from t join t2 using 
(fk_constraint)' ? And further graph algorithms to make automatic 
joins ?
Both 'natural join' and 'using (column_name)' are useless when the 
columns are not the same in source and destination.
Plus it is often the case that the fk_constraints are over numerous 
columns, even though this is usually advised against. But when this 
case happens there will be a significant writing speedup.


I'm not really sure what's the point / benefit here. Initially it seemed
you simply propose a syntax saying "do a join using the columns in the
FK constraint" but it's unclear to me how this implies any writing
speedup? 

I have been bothered by this to the point that I developed a 
graphical-query-builder plugin for pgModeler,
https://github.com/maxzor/plugins/tree/master/graphicalquerybuilder#automatic-join-mode 
,

but I believe such a syntax would be much better in the core!



Hm, maybe.

3. What is the status of making the internal parser of PostgreSQL less 
coupled to the core, and easier to cherry-pick from outside?
It would be great to incorporate it into companion projects : 
pgAdmin4, pgModeler, pgFormatter...




I have no idea what you mean by "less coupled" here. What are the
requirements / use cases you're thinking about?


FWIW I think it's pretty bad idea to post questions about three very
different topics into a single pgsql-hackers thread. That'll just lead
to a lot of confusion.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-16 Thread Masahiko Sawada
On Sat, 15 Feb 2020 at 00:34, Justin Pryzby  wrote:
>
> On Fri, Feb 14, 2020 at 12:30:25PM +0900, Masahiko Sawada wrote:
> > * I think the function name is too generic. init_vacuum_error_callback
> > or init_vacuum_errcallback is better.
>
> > * The comment of this function is not accurate since this function is
> > not only for heap vacuum but also index vacuum. How about just
> > "Initialize vacuum error callback"?
>
> > * I think it's easier to read the code if we set the relname and
> > indname in the same order.
>
> > * The comment I wrote in the previous mail seems better, because in
> > this function the reader might get confused that 'rel' is a relation
> > or an index depending on the phase but that comment helps it.
>
> Fixed these
>
> > * rel->rd_index->indexrelid should be rel->rd_index->indrelid.
>
> Ack.  I think that's been wrong since I first wrote it two weeks ago :(
> The error is probably more obvious due to the switch statement you proposed.
>
> Thanks for continued reviews.

Thank you for updating the patch!

1.
+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
PROGRESS_VACUUM_PHASE_SCAN_HEAP);

+   /*
+* Setup error traceback support for ereport()
+*/
+   init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(&errcallback, &errcbarg, indrel,
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);

+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(&errcallback, &errcbarg, indrel,
PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);

+/* Initialize vacuum error callback */
+static void
+init_vacuum_error_callback(ErrorContextCallback *errcallback,
vacuum_error_callback_arg *errcbarg, Relation rel, int phase)

The above lines need a new line.

2.
static void
lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
{
int tupindex;
int npages;
PGRUsageru0;
Buffer  vmbuffer = InvalidBuffer;
ErrorContextCallback errcallback;
vacuum_error_callback_arg errcbarg;

/* Report that we are now vacuuming the heap */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

/*
 * Setup error traceback support for ereport()
 */
init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
error_context_stack = &errcallback;

pg_rusage_init(&ru0);
npages = 0;
:

In lazy_vacuum_heap, we set the error context and then call
pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
the opposite. And lazy_scan_heap also call pg_rusage first. I think
lazy_vacuum_heap should follow them for consistency. That is, we can
set error context after pages = 0.

3.
We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
error context in lazy_truncate_heap as well. What do you think?

I'm not sure it's worth to set the error context for FINAL_CLENAUP but
we should add the case of FINAL_CLENAUP to vacuum_error_callback as
no-op or explain it as a comment even if we don't.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: 1 Status of vertical clustered index - 2 Join using (fk_constraint) suggestion - 3 Status of pgsql's parser autonomization

2020-02-16 Thread maxzor




...

Thank you will look into it!

I'm not really sure what's the point / benefit here. Initially it seemed
you simply propose a syntax saying "do a join using the columns in the
FK constraint" but it's unclear to me how this implies any writing
speedup?
This is exactly what I mean. If you know the fk_constraint (usually 
there are simple patterns) you are all set, or else... you could use a 
function fk(t, t2) to lookup pg_constraint, or even better / more bloat, 
have psql do autocompletion for you? Corner case multiple fks between t 
and t2.

'from t join t2 using(fk(t,t2))'

I have no idea what you mean by "less coupled" here. What are the
requirements / use cases you're thinking about?
A lot of external tools do query parsing or validation, I wish they 
could use the official parser as a dependency. AFAIK it is currently not 
the case and everyone is re-implementing its subpar parser.

FWIW I think it's pretty bad idea to post questions about three very
different topics into a single pgsql-hackers thread. That'll just lead
to a lot of confusion.


Right... I figured as a newcomer I would not spam the mailing list.
Best regards





Flexible pglz_stategy values and delete const.

2020-02-16 Thread Moon, Insung
Dear Hackers.

For the current PostgreSQL, the toast data and some WAL data (page
data) are compressed using the LZ-based algorithm.
For PostgreSQL, two types of strategy(PGLZ_strategy_default,
PGLZ_strategy_always) are provided by default,
and only used of PGLZ_strategy_default(default values) on PostgreSQL core.

For PGLZ_strategy_default values:
1. reduce the size of compressed data by 25% or more
2. original data between 32 bytes and INT_MAX bytes
3. When the first 1 kb data is compressed (addition to the dictionary
and reused)
PGLZ Compression will only succeed if these conditions are met.

However, some users may want to be compressed, requiring a slightly
more loose condition.
So how about things modify a flexible strategy that is not a fixed
setting to compression strategy, to allow the user to set it?

Perhaps the configurable value is
min_input_size
min_comp_rate
first_supcess_by
I want to add this to GUC so that is can set the minimum compressible
file size and compression rate.

The compression-related strategy is applied only when compressed.
Decompression does not use strategy, so the old compressed data is not
affected by the new patch.

What do you think of this proposal?
If there are any agree to this proposal, I want to write patches.




Re: Conflict handling for COPY FROM

2020-02-16 Thread Tatsuo Ishii
> In your patch for copy.sgml:
> 
> ERROR_LIMIT 'limit_number'
> 
> I think this should be:
> 
> ERROR_LIMIT limit_number
> 
> (no single quote)

More comments:

- I think the document should stat that if limit_number = 0, all
  errors are immediately raised (behaves same as current befavior without the 
patch).

- "constraint violating rows will be returned back to the caller."
  This does explains the current implementation. I am not sure if it's
  intended or not though:

cat /tmp/a
1   1
2   2
3   3
3   4

psql test
$ psql test
psql (13devel)
Type "help" for help.

test=# select * from t1;
 i | j 
---+---
 1 | 1
 2 | 2
 3 | 3
(3 rows)

test=# copy t1 from '/tmp/a' with (error_limit 1);
ERROR:  duplicate key value violates unique constraint "t1_pkey"
DETAIL:  Key (i)=(2) already exists.
CONTEXT:  COPY t1, line 2: "2   2"

So if the number of errors raised exceeds error_limit, no constaraint
violating rows (in this case i=1, j=1) are returned.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: jsonb_object() seems to be buggy. jsonb_build_object() is good.

2020-02-16 Thread Bryn Llewellyn
On 16-Feb-2020, at 16:40, Andrew Dunstan  wrote:

On 2/16/20 7:25 PM, Bryn Llewellyn wrote:
> 
> B.t.w., you earlier said “The double quotes  [around “dog”] serve a specific 
> purpose, to allow values containing commas to be treated as a single value 
> (see syntax details for the exact rules) in the resulting array of text 
> values.” But this test shows that they are not needed for that purpose:


I didn't say that. Someone else did.


> 
> select jsonb_pretty(jsonb_object(
>  '{a, 17, b, dog house, c, true}'::varchar[]
>  ))
> 
> This is the result:
> 
> {+
> "a": "17",   +
> "b": "dog house",+
> "c": "true"  +
> }
> 
> The commas are sufficient separators.
> 
> It seems to me, therefore, that writing the double quotes gives the wrong 
> message: they make it look like you are indeed specifying a text value rather 
> than a numeric or integer value. But we know that the double quotes do *not* 
> achieve this.
> 


No, you haven't understood what they said. If the field value contains a
comma it needs to be quoted. But none of the fields in your example do.
If your field were "dog,house" instead of "dog house" it would need to
be quoted. This had nothing to do with json, BTW, it's simply from the
rules for array literals.

Bryn replied:

Got it! Thanks for helping me out, Andrew.



Re: small improvement of the elapsed time for truncating heap in vacuum

2020-02-16 Thread Kasahara Tatsuhito
Hi,

On Fri, Feb 14, 2020 at 4:50 PM Fujii Masao  wrote:
> Regarding the patch, isn't it better to put pg_rusage_init() at the
> top of do loop block? If we do this, as a side-effect, we can get
> rid of pg_rusage_init() at the top of lazy_truncate_heap().
Thanks for your reply.
Yeah, it makes sense.

Attached patch moves pg_rusage_init() to the top of do-loop-block.

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


reset_usage_v2.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch!
> 
> 1.
> The above lines need a new line.

Done, thanks.

> 2.
> In lazy_vacuum_heap, we set the error context and then call
> pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
> the opposite. And lazy_scan_heap also call pg_rusage first. I think
> lazy_vacuum_heap should follow them for consistency. That is, we can
> set error context after pages = 0.

Right. Maybe I did it the other way because the two uses of
PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other.

> 3.
> We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
> PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
> error context in lazy_truncate_heap as well. What do you think?
> 
> I'm not sure it's worth to set the error context for FINAL_CLENAUP but
> we should add the case of FINAL_CLENAUP to vacuum_error_callback as
> no-op or explain it as a comment even if we don't.

I don't have strong feelings either way.

I looked a bit at the truncation phase.  It also truncates the FSM and VM
forks, which could be misleading if the error was in one of those files and not
the main filenode.

I'd have to find a way to test it... 
...and was pleasantly surprised to see that earlier phases don't choke if I
(re)create a fake 4GB table like:

postgres=# CREATE TABLE trunc(i int);
CREATE TABLE
postgres=# \x\t
Expanded display is on.
Tuples only is on.
postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass;
relfilenode | 59068

postgres=# \! bash -xc 'truncate -s 1G 
./pgsql13.dat111/base/12689/59068{,.{1..3}}'
+ truncate -s 1G ./pgsql13.dat111/base/12689/59074 
./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 
./pgsql13.dat111/base/12689/59074.3

postgres=# \timing 
Timing is on.
postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM 
VERBOSE trunc;
INFO:  vacuuming "public.trunc"
INFO:  "trunc": found 0 removable, 0 nonremovable row versions in 524288 out of 
524288 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2098
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
524288 pages are entirely empty.
CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s.
ERROR:  canceling statement due to statement timeout
CONTEXT:  while truncating relation "public.trunc" to 0 blocks

The callback surrounding RelationTruncate() seems hard to hit unless you add
CHECK_FOR_INTERRUPTS(); the same was true for index cleanup.

The truncation uses a prefetch, which is more likely to hit any lowlevel error,
so I added callback there, too.

BTW, for the index cases, I didn't like repeating the namespace here, but WDYT ?
|CONTEXT:  while vacuuming index "public.t_i_idx3" of relation "t"

Thanks for rerere-reviewing.

-- 
Justin
>From 8295224470e0ce236025dfbff50de052978fae1d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v19 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 +++
 1 file changed, 130 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..5e734ee 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init(&ru0);
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
+	PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = &err

Re: small improvement of the elapsed time for truncating heap in vacuum

2020-02-16 Thread Masahiko Sawada
On Mon, 17 Feb 2020 at 12:44, Kasahara Tatsuhito
 wrote:
>
> Hi,
>
> On Fri, Feb 14, 2020 at 4:50 PM Fujii Masao  wrote:
> > Regarding the patch, isn't it better to put pg_rusage_init() at the
> > top of do loop block? If we do this, as a side-effect, we can get
> > rid of pg_rusage_init() at the top of lazy_truncate_heap().
> Thanks for your reply.
> Yeah, it makes sense.
>
> Attached patch moves pg_rusage_init() to the top of do-loop-block.

+1 to reset for each truncation loops.

For the patch, we can put also the declaration of ru0 into the loop.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: [PoC] Non-volatile WAL buffer

2020-02-16 Thread Takashi Menjo
Dear hackers,

I applied my patchset that mmap()-s WAL segments as WAL buffers to 
refs/tags/REL_12_0, and measured and analyzed its performance with pgbench.  
Roughly speaking, When I used *SSD and ext4* to store WAL, it was "obviously 
worse" than the original REL_12_0.  VTune told me that the CPU time of memcpy() 
called by CopyXLogRecordToWAL() got larger than before.  When I used *NVDIMM-N 
and ext4 with filesystem DAX* to store WAL, however, it achieved "not bad" 
performance compared with our previous patchset and non-volatile WAL buffer.  
Each CPU time of XLogInsert() and XLogFlush() was reduced like as non-volatile 
WAL buffer.

So I think mmap()-ing WAL segments as WAL buffers is not such a bad idea as 
long as we use PMEM, at least NVDIMM-N.

Excuse me but for now I'd keep myself not talking about how much the 
performance was, because the mmap()-ing patchset is WIP so there might be bugs 
which wrongfully  "improve" or "degrade" performance.  Also we need to know 
persistent memory programming and related features such as filesystem DAX, huge 
page faults, and WAL persistence with cache flush and memory barrier 
instructions to explain why the performance improved.  I'd talk about all the 
details at the appropriate time and place. (The conference, or here later...)

Best regards,
Takashi

-- 
Takashi Menjo 
NTT Software Innovation Center

> -Original Message-
> From: Takashi Menjo 
> Sent: Monday, February 10, 2020 6:30 PM
> To: 'Robert Haas' ; 'Heikki Linnakangas' 
> 
> Cc: 'pgsql-hack...@postgresql.org' 
> Subject: RE: [PoC] Non-volatile WAL buffer
> 
> Dear hackers,
> 
> I made another WIP patchset to mmap WAL segments as WAL buffers.  Note that 
> this is not a non-volatile WAL
> buffer patchset but its competitor.  I am measuring and analyzing the 
> performance of this patchset to compare
> with my N.V.WAL buffer.
> 
> Please wait for a several more days for the result report...
> 
> Best regards,
> Takashi
> 
> --
> Takashi Menjo  NTT Software Innovation Center
> 
> > -Original Message-
> > From: Robert Haas 
> > Sent: Wednesday, January 29, 2020 6:00 AM
> > To: Takashi Menjo 
> > Cc: Heikki Linnakangas ; pgsql-hack...@postgresql.org
> > Subject: Re: [PoC] Non-volatile WAL buffer
> >
> > On Tue, Jan 28, 2020 at 3:28 AM Takashi Menjo 
> >  wrote:
> > > I think our concerns are roughly classified into two:
> > >
> > >  (1) Performance
> > >  (2) Consistency
> > >
> > > And your "different concern" is rather into (2), I think.
> >
> > Actually, I think it was mostly a performance concern (writes
> > triggering lots of reading) but there might be a consistency issue as well.
> >
> > --
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> > Company






Re: Standards compliance of SET ROLE / SET SESSION AUTHORIZATION

2020-02-16 Thread Craig Ringer
On Sat, 15 Feb 2020 at 05:36, Tom Lane  wrote:
>
> Chapman Flack  writes:
> > On 2/14/20 4:01 PM, Tom Lane wrote:
> >> ...  A protocol-level message
> >> to set session auth could also be possible, of course.
>
> > I'll once again whimper softly and perhaps ineffectually that an
> > SQL-exposed equivalent like
>
> >  SET SESSION AUTHORIZATION foo WITH RESET COOKIE 'lkjhikuhoihkihlj';
>
> > would seem to suit the same purpose, with the advantage of being
> > immediately usable by any kind of front- or middle-end code the
> > instant there is a server version that supports it, without having
> > to wait for something new at the protocol level to trickle through
> > to n different driver implementations.
>
> Yeah, I'm not that thrilled with the idea of a protocol message
> that's not equivalent to any SQL-level functionality, either.
>
> But the immediate point here is that I think we could get away with
> playing around with SET SESSION AUTHORIZATION's semantics.  Or,
> seeing that that's just syntactic sugar for "SET session_authorization",
> we could invent some new GUCs that allow control over this, rather
> than new syntax.

Based on the argument given here I tend to agree. And I've advocated
strongly for this in the past because poolers really need it.

My main issue with using SET SESSION AUTHORIZATION is that it requires
the pooler-user to be a superuser and gives the pooler total trust to
become any and all roles on the Pg instance. That's a significant
downside, as it'd be preferable for the pooler to have no way to
become superuser and to confine its role access.

SET ROLE on the other hand offers a nice way to constrain the
available roles that a session user can ever attain. But as noted
above, has standards compliance constraints.

Because S-S-A isn't currently allowed as non-superuser, we can extend
without breaking BC since we're free to define totally new semantics
for non-superuser invocation of S-S-A. So long as we don't restrict
the currently-allowed S-S-A to self anyway.

I think the truly ideal semantics are somewhere between S-S-A and SET
ROLE, and rely on the separation of *authorization* from
*authentication*, something Pg doesn't offer much of at the moment.

I suggest something like:

* A new GRANT ROLE AUTHORIZATION FOR <> TO <>. This
grants the right for a non-superuser <> to SET SESSION
AUTHORIZATION to <>, much like our GRANT <> TO <>
works for granting SET ROLE and inheritance. But granting SESSION
AUTHORIZATION would not allow SET ROLE and would not inherit rights,
it'd be a separate catalog with separate membership query functions
etc.
* (Some more detail is needed to handle granting, and granting to,
roles that have member-roles, since we'd want to control ).
* SET SESSION AUTHORIZATION is extended to allow a non-superuser to
S-S-A to any role it been granted appropriate rights for.
* Pooler *authenticates* as a non-superuser pooler user, establishing
a normal session as the pooler login user.
* Pooler authenticates clients using appropriate pooler-defined
methods then does a protocol-level SET SESSION AUTHORIZATION to the
client's authenticated role. If a non-empty reset cookie is provided
in the S-S-A protocol message then a matching reset cookie must be
sent in any subsequent S-S-A or R-S-A messages or queries, otherwise
they fail with permission-denied.
* Pooler proxies client access to session like ususal, with no need to
specially filter.
* When the client releases the session, pooler does a protocol-level
RESET SESSION AUTHORIZATION to the pooler user, supplying the reset
cookie it gave at S-S-A time.



>
> regards, tom lane
>
>


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: assert pg_class.relnatts is consistent

2020-02-16 Thread Amit Langote
On Sun, Feb 16, 2020 at 5:25 AM Tom Lane  wrote:
> I wrote:
> > So that leads me to the attached.
> > ...
> > (I agree with Alvaro's thought of shortening AddDefaultValues,
> > but didn't do that here.)
>
> Pushed both of those.

Thank you.

It's amazing to see how simple bootstrapping has now become thanks to
the work you guys have done recently.

Thanks,
Amit




Re: [PoC] Non-volatile WAL buffer

2020-02-16 Thread Amit Langote
Menjo-san,

On Mon, Feb 17, 2020 at 1:13 PM Takashi Menjo
 wrote:
> I applied my patchset that mmap()-s WAL segments as WAL buffers to 
> refs/tags/REL_12_0, and measured and analyzed its performance with pgbench.  
> Roughly speaking, When I used *SSD and ext4* to store WAL, it was "obviously 
> worse" than the original REL_12_0.

I apologize for not having any opinion on the patches themselves, but
let me point out that it's better to base these patches on HEAD
(master branch) than REL_12_0, because all new code is committed to
the master branch, whereas stable branches such as REL_12_0 only
receive bug fixes.  Do you have any specific reason to be working on
REL_12_0?

Thanks,
Amit




Re: assert pg_class.relnatts is consistent

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 01:25:05PM +0900, Amit Langote wrote:
> > Pushed both of those.
> 
> Thank you.
> 
> It's amazing to see how simple bootstrapping has now become thanks to
> the work you guys have done recently.

On Fri, Feb 14, 2020 at 06:00:05PM +0900, Amit Langote wrote:
> > I can't write Perl myself (maybe Justin), but +1 to this idea.
> 
> I tried and think it works but not sure if that's good Perl
> programming.  See the attached.

And thanks for picking up perl so I didn't have to remember what I ever knew.

-- 
Justin




Re: Collation versioning

2020-02-16 Thread Thomas Munro
On Thu, Feb 13, 2020 at 8:13 AM Julien Rouhaud  wrote:
> Hearing no complaints on the suggestions, I'm attaching v8 to address that:
>
> - pg_dump is now using a binary_upgrade_set_index_coll_version() function
>   rather than plain DDL
> - the additional DDL is now of the form:
>   ALTER INDEX name ALTER COLLATION name REFRESH VERSION

+1

A couple of thoughts:

@@ -1115,21 +1117,44 @@ index_create(Relation heapRelation,
...
+   /*
+* Get required distinct dependencies on collations
for all index keys.
+* Collations of directly referenced column in hash
indexes can be
+* skipped is they're deterministic.
+*/
for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
{
-   if (OidIsValid(collationObjectId[i]) &&
-   collationObjectId[i] != DEFAULT_COLLATION_OID)
+   Oid colloid = collationObjectId[i];
+
+   if (OidIsValid(colloid))
{
-   referenced.classId = CollationRelationId;
-   referenced.objectId = collationObjectId[i];
-   referenced.objectSubId = 0;
+   if ((indexInfo->ii_Am != HASH_AM_OID) ||
+
!get_collation_isdeterministic(colloid))

I still don't like the way catalog/index.c has hard-coded knowledge of
HASH_AM_OID here.  Although it errs on the side of the assuming that
there *is* a version dependency (good), there is already another AM in
the tree that could safely skip it for deterministic collations AFAIK:
Bloom indexes.  I suppose that any extension AM that is doing some
kind of hashing would also like to be able to be able to opt out of
collation version checking, when that is safe.  The question is how to
model that in our system...

One way would be for each AM to declare whether it is affected by
collations; the answer could be yes/maybe (default), no,
only-non-deterministic-ones.  But that still feels like the wrong
level, not taking advantage of knowledge about operators.

A better way might be to make declarations about that sort of thing in
the catalog, somewhere in the vicinity of the operator classes, or
maybe just to have hard coded knowledge about operator classes (ie
making declarations in the manual about what eg hash functions are
allowed to consult and when), and then check which of those an index
depends on.  I am not sure what would be best, I'd need to spend some
time studying the am operator system.

Perhaps for the first version of this feature, we should just add a
new local function
index_can_skip_collation_version_dependency(indexInfo, colloid) to
encapsulate your existing logic, but add a comment that in future we
might be able to support skipping in more cases through analysis of
the catalogs.

+   
+ALTER COLLATION
+
+ 
+  This command declares that the index is compatible with the currently
+  installed version of the collations that determine its order.  It is used
+  to silence warnings caused by collation version incompatibilities and
+  should be called after rebuilding the index or otherwise verifying its
+  consistency.  Be aware that incorrect use of this command can hide index
+  corruption.
+ 
+
+   

This sounds like something that you need to do after you reindex, but
that's not true, is it?  This is something you can do *instead* of
reindex, to make it shut up about versions.  Shouldn't it be something
like "... should be issued only if the ordering is known not to have
changed since the index was built"?

+-- Test ALTER INDEX name ALTER COLLATION name REFRESH VERSION
+UPDATE pg_depend SET refobjversion = 'not a version'
+WHERE refclassid = 'pg_collation'::regclass
+AND objid::regclass::text = 'icuidx17_part'
+AND refobjversion IS NOT NULL;
+SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
+ objid
+---
+ icuidx17_part
+(1 row)
+
+ALTER INDEX icuidx17_part ALTER COLLATION "en-x-icu" REFRESH VERSION;
+SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
+ objid
+---
+(0 rows)
+

Would it be better to put refobjversion = 'not a version' in the
SELECT list, instead of the WHERE clause, with a WHERE clause that
hits that one row, so that we can see that the row still exists after
the REFRESH VERSION (while still hiding the unstable version string)?




Re: error context for vacuum to include block number

2020-02-16 Thread Masahiko Sawada
On Mon, 17 Feb 2020 at 12:57, Justin Pryzby  wrote:
>
> On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch!
> >

Thank you for updating the patch.

> > 1.
> > The above lines need a new line.
>
> Done, thanks.
>
> > 2.
> > In lazy_vacuum_heap, we set the error context and then call
> > pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
> > the opposite. And lazy_scan_heap also call pg_rusage first. I think
> > lazy_vacuum_heap should follow them for consistency. That is, we can
> > set error context after pages = 0.
>
> Right. Maybe I did it the other way because the two uses of
> PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other.
>
> > 3.
> > We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
> > PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
> > error context in lazy_truncate_heap as well. What do you think?
> >
> > I'm not sure it's worth to set the error context for FINAL_CLENAUP but
> > we should add the case of FINAL_CLENAUP to vacuum_error_callback as
> > no-op or explain it as a comment even if we don't.
>
> I don't have strong feelings either way.
>
> I looked a bit at the truncation phase.  It also truncates the FSM and VM
> forks, which could be misleading if the error was in one of those files and 
> not
> the main filenode.
>
> I'd have to find a way to test it...
> ...and was pleasantly surprised to see that earlier phases don't choke if I
> (re)create a fake 4GB table like:
>
> postgres=# CREATE TABLE trunc(i int);
> CREATE TABLE
> postgres=# \x\t
> Expanded display is on.
> Tuples only is on.
> postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass;
> relfilenode | 59068
>
> postgres=# \! bash -xc 'truncate -s 1G 
> ./pgsql13.dat111/base/12689/59068{,.{1..3}}'
> + truncate -s 1G ./pgsql13.dat111/base/12689/59074 
> ./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 
> ./pgsql13.dat111/base/12689/59074.3
>
> postgres=# \timing
> Timing is on.
> postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM 
> VERBOSE trunc;
> INFO:  vacuuming "public.trunc"
> INFO:  "trunc": found 0 removable, 0 nonremovable row versions in 524288 out 
> of 524288 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2098
> There were 0 unused item identifiers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 524288 pages are entirely empty.
> CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s.
> ERROR:  canceling statement due to statement timeout
> CONTEXT:  while truncating relation "public.trunc" to 0 blocks
>

Yeah lazy_scan_heap deals with all dummy files as new empty pages.

> The callback surrounding RelationTruncate() seems hard to hit unless you add
> CHECK_FOR_INTERRUPTS(); the same was true for index cleanup.
>
> The truncation uses a prefetch, which is more likely to hit any lowlevel 
> error,
> so I added callback there, too.
>
> BTW, for the index cases, I didn't like repeating the namespace here, but 
> WDYT ?
> |CONTEXT:  while vacuuming index "public.t_i_idx3" of relation "t"

The current message looks good to me because we cannot have a table
and its index in the different schema.

1.
pg_rusage_init(&ru0);
npages = 0;

/*
 * Setup error traceback support for ereport()
 */
init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
error_context_stack = &errcallback;

tupindex = 0;

Oops it seems to me that it's better to set error context after
tupindex = 0. Sorry for my bad.

And the above comment can be written in a single line as other same comments.

2.
@@ -2568,6 +2643,12 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
BlockNumber blkno;
BlockNumber prefetchedUntil;
instr_time  starttime;
+   ErrorContextCallback errcallback;
+   vacuum_error_callback_arg errcbarg;
+
+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
+   PROGRESS_VACUUM_PHASE_TRUNCATE);

Hmm I don't think it's a good idea to have count_nondeletable_pages
set the error context of PHASE_TRUNCATE. Because the patch sets the
error context during RelationTruncate that actually truncates the heap
but count_nondeletable_pages doesn't truncate it. I think setting the
error context only during RelationTruncate would be a good start. We
can hear other opinions from other hackers. Some hackers may want to
set the error context for whole lazy_truncate_heap.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: small improvement of the elapsed time for truncating heap in vacuum

2020-02-16 Thread Kasahara Tatsuhito
Hi,

On Mon, Feb 17, 2020 at 1:07 PM Masahiko Sawada
 wrote:
> For the patch, we can put also the declaration of ru0 into the loop.
Thanks for your reply.
Hmm, certainly that it may be better.

Fix the v2 patch and attached.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


reset_usage_v3.patch
Description: Binary data


Re: ALTER tbl rewrite loses CLUSTER ON index (consider moving indisclustered to pg_class)

2020-02-16 Thread Amit Langote
Hi Justin,

On Fri, Feb 7, 2020 at 11:39 PM Justin Pryzby  wrote:
> On Thu, Feb 06, 2020 at 02:24:47PM -0300, Alvaro Herrera wrote:
> > On 2020-Feb-06, Justin Pryzby wrote:
> >
> > > I wondered if it wouldn't be better if CLUSTER ON was stored in pg_class 
> > > as the
> > > Oid of a clustered index, rather than a boolean in pg_index.
> >
> > Maybe.  Do you want to try a patch?
>
> I think the attached is 80% complete (I didn't touch pg_dump).
>
> One objection to this change would be that all relations (including indices)
> end up with relclustered fields, and pg_index already has a number of bools, 
> so
> it's not like this one bool is wasting a byte.
>
> I think relisclustered was a's clever way of avoiding that overhead 
> (c0ad5953).
> So I would be -0.5 on moving it to pg_class..

Are you still for fixing ALTER TABLE losing relisclustered with the
patch we were working on earlier [1], if not for moving relisclustered
to pg_class anymore?

I have read elsewhere [2] that forcing ALTER TABLE to rewrite in
clustered order might not be a good option, but maybe that one is a
more radical proposal than this.

Thanks,
Amit

[1] 
https://postgr.es/m/CA%2BHiwqEt1HnXYckCdaO8%2BpOoFs7NNS5byoZ6Xg2B7epKbhS85w%40mail.gmail.com
[2] https://postgr.es/m/10984.1581181029%40sss.pgh.pa.us




Re: ALTER tbl rewrite loses CLUSTER ON index (consider moving indisclustered to pg_class)

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 02:31:42PM +0900, Amit Langote wrote:
> Hi Justin,
> 
> On Fri, Feb 7, 2020 at 11:39 PM Justin Pryzby  wrote:
> > On Thu, Feb 06, 2020 at 02:24:47PM -0300, Alvaro Herrera wrote:
> > > On 2020-Feb-06, Justin Pryzby wrote:
> > >
> > > > I wondered if it wouldn't be better if CLUSTER ON was stored in 
> > > > pg_class as the
> > > > Oid of a clustered index, rather than a boolean in pg_index.
> > >
> > > Maybe.  Do you want to try a patch?
> >
> > I think the attached is 80% complete (I didn't touch pg_dump).
> >
> > One objection to this change would be that all relations (including indices)
> > end up with relclustered fields, and pg_index already has a number of 
> > bools, so
> > it's not like this one bool is wasting a byte.
> >
> > I think relisclustered was a's clever way of avoiding that overhead 
> > (c0ad5953).
> > So I would be -0.5 on moving it to pg_class..

In case there's any confusion: "a's" was probably me halfway changing
"someone's" to "a".

> Are you still for fixing ALTER TABLE losing relisclustered with the
> patch we were working on earlier [1], if not for moving relisclustered
> to pg_class anymore?

Thanks for remembering this one.

I think your patch is the correct fix.

I forgot to say it, but moving relisclustered to pg_class doesn't help to avoid
losting indisclustered: it still needs a fix just like this.  Anyway, I
withdrew my suggestion for moving to pg_class, since it has more overhead, even
for pg_class rows for relations which can't have indexes.

> I have read elsewhere [2] that forcing ALTER TABLE to rewrite in
> clustered order might not be a good option, but maybe that one is a
> more radical proposal than this.

Right; your fix seems uncontroversial.  I ran into this (indisclustered) bug
while starting to write that patch for "ALTER rewrite in clustered order".

-- 
Justin




Re: error context for vacuum to include block number

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 02:18:11PM +0900, Masahiko Sawada wrote:
> Oops it seems to me that it's better to set error context after
> tupindex = 0. Sorry for my bad.

I take your point but did it differently - see what you think

> And the above comment can be written in a single line as other same comments.

Thanks :)

> Hmm I don't think it's a good idea to have count_nondeletable_pages
> set the error context of PHASE_TRUNCATE.

I think if we don't do it there then we shouldn't bother handling
PHASE_TRUNCATE at all, since that's what's likely to hit filesystem or other
lowlevel errors, before lazy_truncate_heap() hits them.

> Because the patch sets the
> error context during RelationTruncate that actually truncates the heap
> but count_nondeletable_pages doesn't truncate it.

I would say that ReadBuffer called by the prefetch in
count_nondeletable_pages() is called during the course of truncation, the same
as ReadBuffer called during the course of vacuuming can be attributed to
vacuuming.

> I think setting the error context only during RelationTruncate would be a
> good start. We can hear other opinions from other hackers. Some hackers may
> want to set the error context for whole lazy_truncate_heap.

I avoided doing that since it has several "return" statements, each of which
would need to "Pop the error context stack", which is at risk of being
forgotten and left unpopped by anyone who adds or changes flow control.

Also, I just added this to the TRUNCATE case, even though that should never
happen: if (BlockNumberIsValid(cbarg->blkno))...

-- 
Justin
>From 977b1b5e00ce522bd775cf91f7a9c7a9345d3171 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v20 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 ++-
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..ce3efd7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init(&ru0);
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
+	PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = &errcallback;
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +909,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1007,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1034,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = &errcallback;
 		}
 
 		/*
@@ -1597,6 +1623,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *v

Re: Conflict handling for COPY FROM

2020-02-16 Thread Surafel Temesgen
Hi,

> > ERROR_LIMIT ' class="parameter">limit_number'
> >
> > I think this should be:
> >
> > ERROR_LIMIT limit_number
> >
> > (no single quote)
>
>
Thank you .Fixed

> More comments:
>
> - I think the document should stat that if limit_number = 0, all
>   errors are immediately raised (behaves same as current befavior without
> the patch).
>
>
if we want all error to be raised error limit_number  not need to be
specified.
but if it is specified like limit_number = 0 i think it is self-explanatory


> - "constraint violating rows will be returned back to the caller."
>   This does explains the current implementation. I am not sure if it's
>   intended or not though:
>
> cat /tmp/a
> 1   1
> 2   2
> 3   3
> 3   4
>
> psql test
> $ psql test
> psql (13devel)
> Type "help" for help.
>
> test=# select * from t1;
>  i | j
> ---+---
>  1 | 1
>  2 | 2
>  3 | 3
> (3 rows)
>
> test=# copy t1 from '/tmp/a' with (error_limit 1);
> ERROR:  duplicate key value violates unique constraint "t1_pkey"
> DETAIL:  Key (i)=(2) already exists.
> CONTEXT:  COPY t1, line 2: "2   2"
>
> So if the number of errors raised exceeds error_limit, no constaraint
> violating rows (in this case i=1, j=1) are returned.
>

error_limit is specified to dictate the number of error allowed in copy
operation
to precede. If it exceed the number the operation is stopped. there may
be more conflict afterward and returning limited number of conflicting rows
have no much use

regards
Surafel
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a99f8155e4..c53e5f6d92 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
 ENCODING 'encoding_name'
+ERROR_LIMIT limit_number
 
  
 
@@ -355,6 +356,26 @@ COPY { table_name [ ( 

 
+   
+ERROR_LIMIT
+
+ 
+  Enables ignoring of errored out rows up to limit_number. If limit_number is set
+  to -1, then all errors will be ignored.
+ 
+
+ 
+  Currently, only unique or exclusion constraint violation
+  and rows formatting errors are ignored. Malformed
+  rows will rise warnings, while constraint violating rows
+  will be returned back to the caller.
+ 
+
+
+   
+

 WHERE
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..72225a85a0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/printtup.h"
 #include "catalog/dependency.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
@@ -48,7 +49,9 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
+#include "tcop/pquery.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -153,6 +156,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -182,6 +186,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+	 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -836,7 +843,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 			   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 defel->defname),
 		 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options"),
+		 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit 

tiny documentation fix

2020-02-16 Thread Amit Langote
Hi,

I propose this small fix for 27.4. Progress Reporting:

-all of its partitions are also recursively analyzed as also mentioned on
+all of its partitions are also recursively analyzed as also mentioned in
 .

Note the last word: "in" sounds more correct.

Thanks,
Amit
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9129f79bbf..30334a6d5f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3698,7 +3698,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   

 Note that when ANALYZE is run on a partitioned table,
-all of its partitions are also recursively analyzed as also mentioned on
+all of its partitions are also recursively analyzed as also mentioned in
 .  In that case, ANALYZE
 progress is reported first for the parent table, whereby its inheritance
 statistics are collected, followed by that for each partition.


Re: Conflict handling for COPY FROM

2020-02-16 Thread Tatsuo Ishii
>> test=# copy t1 from '/tmp/a' with (error_limit 1);
>> ERROR:  duplicate key value violates unique constraint "t1_pkey"
>> DETAIL:  Key (i)=(2) already exists.
>> CONTEXT:  COPY t1, line 2: "2   2"
>>
>> So if the number of errors raised exceeds error_limit, no constaraint
>> violating rows (in this case i=1, j=1) are returned.
>>
> 
> error_limit is specified to dictate the number of error allowed in copy
> operation
> to precede. If it exceed the number the operation is stopped. there may
> be more conflict afterward and returning limited number of conflicting rows
> have no much use

Still I see your explanation differs from what the document patch says.

+  Currently, only unique or exclusion constraint violation
+  and rows formatting errors are ignored. Malformed
+  rows will rise warnings, while constraint violating rows
+  will be returned back to the caller.

I am afraid once this patch is part of next version of PostgreSQL, we
get many complains/inqueires from users. What about changing like this:

  Currently, only unique or exclusion constraint violation and
  rows formatting errors are ignored. Malformed rows will rise
  warnings, while constraint violating rows will be returned back
  to the caller unless any error is raised; i.e. if any error is
  raised due to error_limit exceeds, no rows will be returned back
  to the caller.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




RE: [PoC] Non-volatile WAL buffer

2020-02-16 Thread Takashi Menjo
Hello Amit,

> I apologize for not having any opinion on the patches themselves, but let me 
> point out that it's better to base these
> patches on HEAD (master branch) than REL_12_0, because all new code is 
> committed to the master branch,
> whereas stable branches such as REL_12_0 only receive bug fixes.  Do you have 
> any specific reason to be working
> on REL_12_0?

Yes, because I think it's human-friendly to reproduce and discuss performance 
measurement.  Of course I know all new accepted patches are merged into 
master's HEAD, not stable branches and not even release tags, so I'm aware of 
rebasing my patchset onto master sooner or later.  However, if someone, 
including me, says that s/he applies my patchset to "master" and measures its 
performance, we have to pay attention to which commit the "master" really 
points to.  Although we have sha1 hashes to specify which commit, we should 
check whether the specific commit on master has patches affecting performance 
or not because master's HEAD gets new patches day by day.  On the other hand, a 
release tag clearly points the commit all we probably know.  Also we can check 
more easily the features and improvements by using release notes and user 
manuals.

Best regards,
Takashi

-- 
Takashi Menjo 
NTT Software Innovation Center
> -Original Message-
> From: Amit Langote 
> Sent: Monday, February 17, 2020 1:39 PM
> To: Takashi Menjo 
> Cc: Robert Haas ; Heikki Linnakangas 
> ; PostgreSQL-development
> 
> Subject: Re: [PoC] Non-volatile WAL buffer
> 
> Menjo-san,
> 
> On Mon, Feb 17, 2020 at 1:13 PM Takashi Menjo 
>  wrote:
> > I applied my patchset that mmap()-s WAL segments as WAL buffers to 
> > refs/tags/REL_12_0, and measured and
> analyzed its performance with pgbench.  Roughly speaking, When I used *SSD 
> and ext4* to store WAL, it was
> "obviously worse" than the original REL_12_0.
> 
> I apologize for not having any opinion on the patches themselves, but let me 
> point out that it's better to base these
> patches on HEAD (master branch) than REL_12_0, because all new code is 
> committed to the master branch,
> whereas stable branches such as REL_12_0 only receive bug fixes.  Do you have 
> any specific reason to be working
> on REL_12_0?
> 
> Thanks,
> Amit






Re: Wait event that should be reported while waiting for WAL archiving to finish

2020-02-16 Thread Fujii Masao




On 2020/02/14 15:45, Michael Paquier wrote:

On Fri, Feb 14, 2020 at 12:47:19PM +0900, Fujii Masao wrote:

logical_rewrite_truncate_v1.patch adds the description of
LogicalRewriteTruncate into the doc. This needs to be
back-patched to v10 where commit 249cf070e3 introduced
LogicalRewriteTruncate event.


Indeed.  You just be careful about the number of fields for morerows,
as that's not the same across branches.


gss_open_server_v1.patch adds the description of GSSOpenServer
into the doc and update the code in pgstat_get_wait_client().
This needs to be applied in v12 where commit b0b39f72b9 introduced
GSSOpenServer event.

gss_open_server_for_master_v1.patch does not only what the above
patch does but also update wait event enum into alphabetical order.
This needs to be applied in the master.


Thanks for splitting things.  All that looks correct to me.


Thanks for the review! Pushed the patches for
LogicalRewriteTruncate and GSSOpenServer.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Wait event that should be reported while waiting for WAL archiving to finish

2020-02-16 Thread Fujii Masao



On 2020/02/14 23:43, Robert Haas wrote:

On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
 wrote:

Fixed. Thanks for the review!


I think it would be safer to just report the wait event during
pg_usleep(100L) rather than putting those calls around the whole
loop. It does not seem impossible that ereport() or
CHECK_FOR_INTERRUPTS() could do something that reports a wait event
internally.


OK, so I attached the updated version of the patch.
Thanks for the review!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a9f6ee6e32..2f88b41529 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1335,7 +1335,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  Waiting in an extension.
 
 
- IPC
+ IPC
+ BackupWaitWalArchive
+ Waiting for WAL files required for the backup to be 
successfully archived.
+
+
  BgWorkerShutdown
  Waiting for background worker to shut down.
 
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 3813eadfb4..dce0f3b041 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11107,7 +11107,9 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, 
TimeLineID *stoptli_p)
reported_waiting = true;
}
 
+   
pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
pg_usleep(100L);
+   pgstat_report_wait_end();
 
if (++waits >= seconds_before_warning)
{
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 59dc4f31ab..6e7a57b1b7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3737,6 +3737,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 
switch (w)
{
+   case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
+   event_name = "BackupWaitWalArchive";
+   break;
case WAIT_EVENT_BGWORKER_SHUTDOWN:
event_name = "BgWorkerShutdown";
break;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3a65a51696..20313f1e32 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -817,7 +817,8 @@ typedef enum
  */
 typedef enum
 {
-   WAIT_EVENT_BGWORKER_SHUTDOWN = PG_WAIT_IPC,
+   WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC,
+   WAIT_EVENT_BGWORKER_SHUTDOWN,
WAIT_EVENT_BGWORKER_STARTUP,
WAIT_EVENT_BTREE_PAGE,
WAIT_EVENT_CLOG_GROUP_UPDATE,