Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Alex Ignatov


On 23.08.2016 15:41, Michael Paquier wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada  wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.


Rotating disks is not a problem - you can always raid them and etc. 8k 
allocation per relation  once per half an hour that is the problem. Seq 
scan is this way = random scan...



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel sec scan in plpgsql

2016-09-15 Thread Alex Ignatov

Hello!
Does parallel secscan works in plpgsql?

--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-16 Thread Alex Ignatov
 Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..163696.15 rows=1115 
width=20)



So as we can see parallel secscan doesn't works in plpgsql and sql functions.
Can somebody explains me where I was wrong?


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On 16.09.2016 07:27, Ashutosh Bapat wrote:

On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov  wrote:

Hello!
Does parallel secscan works in plpgsql?



Parallel seq scan is a query optimization that will work independent
of the source of the query - i.e whether it comes directly from a
client or a procedural language like plpgsql. So, I guess, answer to
your question is yes. If you are expecting something else, more
context will help.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-16 Thread Alex Ignatov


On 16.09.2016 16:50, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov  wrote:

No it doesn't.
Paralleling neither sql function nor plpgsql:
Here is example :

ipdr=> show max_worker_processes ;
 max_worker_processes
--
 128
(1 row)
ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set force_parallel_mode=on;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET



Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.




No changes:

ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET
ipdr=> set force_parallel_mode = off;
SET
ipdr=> select name,setting from pg_settings where name 
in('max_parallel_workers_per_gather',
ipdr(>
'min_parallel_relation_size',
ipdr(>'parallel_tuple_cost',
ipdr(>
'force_parallel_mode');
  name   | setting
-+-
 force_parallel_mode | off
 max_parallel_workers_per_gather | 128
 min_parallel_relation_size  | 0
 parallel_tuple_cost | 0
(4 rows)

ipdr=> explain (analyze,buffers) select count(*) from (select 
a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
   QUERY PLAN
-
 Aggregate  (cost=87702.33..87702.34 rows=1 width=8) (actual 
time=709.643..709.643 rows=1 loops=1)
   Buffers: shared hit=65015
   ->  Finalize HashAggregate  (cost=87364.49..87514.64 rows=15015 width=28) 
(actual time=706.382..708.456 rows=15015 loops=1)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=65015
 ->  Gather  (cost=85149.78..85299.93 rows=165165 width=20) (actual 
time=478.626..645.209 rows=180180 loops=1)
   Workers Planned: 11
   Workers Launched: 11
   Buffers: shared hit=65015
   ->  Partial HashAggregate  (cost=84149.78..84299.93 rows=15015 
width=20) (actual time=473.890..478.309 rows=15015 loops=12)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=63695
 ->  Parallel Seq Scan on test  (cost=0.00..72786.01 
rows=909101 width=20) (actual time=0.021..163.120 rows=83 loops=12)
   Buffers: shared hit=63695
 Planning time: 0.318 ms
 Execution time: 710.600 ms
(16 rows)


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4003.719..4003.720 
rows=1 loops=1)
   Buffers: shared hit=63869
 Planning time: 0.021 ms
 Execution time: 4003.769 ms
(4 rows)

auto_explain:
2016-09-16 18:02:29 MSK [29353]: [53-1] 
user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4001.275 ms  plan:
Query Text: select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t
Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..163696.15 rows=1115 
width=20)
2016-09-16 18:02:29 MSK [29353]: [54-1] user=ipdr,db=ipdr,app=psql,client=[local] 
CONTEXT:  SQL statement "select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t"
PL/pgSQL function parallel_test_plpgsql() line 5 at SQL statement


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4497.820..4497.822 
rows=1 loops=1)
   Buffers: shared hit=63695
 Planning time: 0.023 ms
 Execution time: 4497.872 ms
(4 rows)

auto_explain:
2016-09-16 18:03:23 MSK [29353]: [57-1] 
user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4497.050 ms  plan:
Query Text: select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t
Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e

Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-20 Thread Alex Ignatov


On 18.09.2016 06:54, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov  wrote:


On 16.09.2016 16:50, Amit Kapila wrote:



Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.




No changes:



Okay, it just skipped from my mind that we don't support parallel
queries for SQL statement execution (or statements executed via
exec_stmt_execsql) from plpgsql.  For detailed explanation of why that
is not feasible you can refer one of my earlier e-mails [1] on similar
topic.  I think if we can somehow get the results via Perform
statement, then it could be possible to use parallelism via plpgsql.

However, you can use it via SQL functions, an example is below:

set min_parallel_relation_size =0;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;

Load 'auto_explain';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_nested_statements = true;

create table test_plpgsql(c1 int, c2 char(1000));
insert into test_plpgsql values(generate_series(1,10),'aaa');

create or replace function parallel_test_set_sql() returns
setof bigint as $$
select count(*) from test_plpgsql;
$$language sql PARALLEL SAFE STRICT STABLE;

Then execute function as: select * from parallel_test_set_sql();  You
can see below plan if auto_explain module is loaded.

Finalize Aggregate  (cost=14806.85..14806.86 rows=1 width=8) (actual tim
e=1094.966..1094.967 rows=1 loops=1)
  ->  Gather  (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
216..1094.943 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Partial Aggregate  (cost=14806.83..14806.84 rows=1 width=8)
(actual time=177.867..177.868 rows=1 loops=3)
  ->  Parallel Seq Scan on test_plpgsql  (cost=0.00..14702.6
7 rows=41667 width=0) (actual time=0.384..142.565 rows=3 loops=3)
CONTEXT:  SQL function "parallel_test_set_sql" statement 1
LOG:  duration: 2965.040 ms  plan:
Query Text: select * from parallel_test_set_sql();
Function Scan on parallel_test_set_sql  (cost=0.25..10.25 rows=1000 widt
h=8) (actual time=2538.620..2776.955 rows=1 loops=1)


In general, I think we should support the cases as required (or
written) by you from plpgsql or sql functions.  We need more work to
support such cases. There are probably two ways of supporting such
cases, we can build some intelligence in plpgsql execution such that
it can recognise such queries and allow to use parallelism or we need
to think of enabling parallelism for cases where we don't run the plan
to completion.  Most of the use cases from plpgsql or sql function
fall into later category as they don't generally run the plan to
completion.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com



Thank you for you sugestion! That works.

But what  we can do with this function:
create or replace function parallel_test_sql(t int) returns setof bigint as
$$
   select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 
group by a,b,c,d,e)t;
$$ language sql PARALLEL SAFE STRICT STABLE;

explain (analyze,buffers) select  * from  parallel_test_sql(2);

"Function Scan on parallel_test_sql  (cost=0.25..10.25 rows=1000 width=8) (actual 
time=2410.789..2410.790 rows=1 loops=1)"
"  Buffers: shared hit=63696"
"Planning time: 0.082 ms"
"Execution time: 2410.841 ms"

2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - 
Query Tool,client=127.0.0.1 LOG:  duration: 2410.135 ms  plan:
Query Text:
   select count(*) from (select a,b,c,d,e,sum(bytes) from test where 
a>= $1 group by a,b,c,d,e)t;

Aggregate  (cost=230701.42..230701.43 rows=1 width=8)
  ->  HashAggregate  (cost=230363.59..230513.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..188696.44 rows=372 
width=20)
      Filter: (a >= $1)


No parallelism again. Looks like that Filter: (a >= $1)  breaks parallelism



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-04 Thread Alex Ignatov


-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
Sent: Monday, September 4, 2017 3:32 PM
To: i.kartys...@postgrespro.ru
Cc: pgsql-hackers 
Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica / 
proof of concept

On Mon, Sep 4, 2017 at 4:34 PM,   wrote:
> Our clients complain about this issue and therefore I want to raise 
> the discussion and suggest several solutions to this problem:
>
> I. Why does PG use Fatal when Error is enough to release lock that 
> rises lock conflict?
> "If (RecoveryConflictPending && DoingCommandRead)"
>
> II. Do we really need to truncate the table on hot standby exactly at 
> the same time when truncate on master occurs?
>
> In my case conflict happens when the autovacuum truncates table tbl1 
> on master while backend on replica is performing a long transaction 
> involving the same table tbl1. This happens because truncate takes an 
> AccessExclusiveLock. To tackle this issue we have several options:
>
> 1. We can postpone the truncate on the master until all the replicas 
> have finished their transactions (in this case, feedback requests to 
> the master should be sent frequently) Patch 1 
> vacuum_lazy_truncate.patch
>
> 2. Maybe there is an option somehow not to send AccessExclusiveLock 
> and not to truncate table on the replica right away. We could try to 
> wait a little and truncate tbl1 on replica again.
>

Can max_standby_streaming_delay help in this situation (point number - 2)?


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Hello!
In this situation this parameter (max_standby_streaming_delay) wont help 
because if you have subsequent statement on standby (following info is from 
documentation and from our experience ): Thus, if one query has resulted in 
significant delay, subsequent conflicting queries will have much less grace 
time until the standby server has caught up again. And you never now how to set 
this parameter exept to -1 which mean up to infinity delayed standby. 

On our experience only autovacuum on master took AccesExclusiveLock that raise 
this Fatal message on standby. After this AccessExclusive reached standby and 
max_standby_streaming_delay > -1 you definitely sooner or later  get this Fatal 
on recovery . 
With this patch we try to get rid of AccessEclusiveLock applied on standby 
while we have active statement on it.



--
Alex Ignatov 
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove Windows crash dump support?

2015-12-22 Thread Alex Ignatov

On 22.12.2015 18:28, Magnus Hagander wrote:



On Tue, Dec 22, 2015 at 3:53 PM, Craig Ringer <mailto:cr...@2ndquadrant.com>> wrote:


On 22 December 2015 at 22:50, Craig Ringer mailto:cr...@2ndquadrant.com>> wrote:

Hi all

Back in 2010 I submitted a small feature to allow the creation
of minidumps when backends crashed; see
commit dcb09b595f88a3bca6097a6acc17bf2ec935d55f .

At the time Windows lacked useful support for postmortem
debugging and crash-dump management in the operating system
its self, especially for applications running as services.
That has since improved considerably.

The feature was also included in 9.4


Ahem. 9.1. This is what I get for multi-tasking between writing
this and packaging an extension for 9.4.


In which version(s) of Windows was this improvement added? I think 
that's really the part that matters here, not necessarily which 
version of PostgreSQL.



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Hi all!
I think that you can debug crash dump since windbg exists.
Also I think that Postgres on Windows number  of instalations is so 
tiny  because people even today think that it is not so solid as unix 
version thats why you think that nobody use your code ;).


Today if my memory serves me right this code can not deal with buffer 
overflow. Am i right?
May be we need to add this functionality instead of drop support of it 
entirely?


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



[HACKERS] Is pg_control file crashsafe?

2016-04-28 Thread Alex Ignatov
Hello everyone!
We have some issue with truncated pg_control file on Windows after power 
failure.My questions is : 1) Is pg_control protected from say , power crash or 
partial write? 2) How PG update pg_control? By writing in it or writing in some 
temp file and after that rename it to pg_control to be atomic?3) Can PG have  
multiple pg_control copy to be more fault tolerant?
PS During some experiments we found that at present time there is no any method 
to do crash recovery with "restored" version of pg_control (based on some 
manipulations with pg_resetxlog ). Only by using pg_resetxlog and setting it 
parameters to values taken from wal file (pg_xlogdump)we can at least start PG 
and saw that PG state is at the moment of last check point. But we have no real 
confidence that PG is in consistent state(also docs on pg_resetxlogs told us 
about it too)

Alex IgnatovPostgres Professional: http://www.postgrespro.comRussian Postgres 
Company



Re: [HACKERS] Is pg_control file crashsafe?

2016-05-02 Thread Alex Ignatov



On 01.05.2016 0:55, Bruce Momjian wrote:

On Thu, Apr 28, 2016 at 09:58:00PM +, Alex Ignatov wrote:

Hello everyone!
We have some issue with truncated pg_control file on Windows after power
failure.
My questions is :
1) Is pg_control protected from say , power crash or partial write?
2) How PG update pg_control? By writing in it or writing in some temp file and
after that rename it to pg_control to be atomic?

We write pg_controldata in one write() OS call:

 if (write(fd, buffer, PG_CONTROL_SIZE) != PG_CONTROL_SIZE)


3) Can PG have  multiple pg_control copy to be more fault tolerant?

PS During some experiments we found that at present time there is no any method
to do crash recovery with "restored" version of pg_control (based on some
manipulations with pg_resetxlog ).
  Only by using pg_resetxlog and setting it parameters to values taken from wal
file (pg_xlogdump)we can at least start PG and saw that PG state is at the
moment of last check point. But we have no real confidence that PG is in
consistent state(also docs on pg_resetxlogs told us about it too)

We have talked about improving the reliability of pg_control, but
failures are so rare we have never done anything to improve it.  I know
Tatsuo has talked about making pg_control more reliable, so I am CC'ing
him.


Oh! Good. Thank you!
It is rare but as we saw now it is our reality too. One of our customers 
had this issue on previous week =)


I think that rename can help a little bit. At least on some FS it is 
atomic operation.


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-04 Thread Alex Ignatov



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On 03.05.2016 2:21, Andres Freund wrote:

Hi,

On 2016-04-28 21:58:00 +, Alex Ignatov wrote:

We have some issue with truncated pg_control file on Windows after
power failure.My questions is : 1) Is pg_control protected from say ,
power crash or partial write?


It should be. I think to make progress on this thread we're going to
need a bit more details about the exact corruption. Was the length of
the file change? Did the checksum fail? Did you just observe too old
contents?

Greetings,

Andres Freund




Length was 0 bytes after crash. It was Windows and ntfs + ssd in raid 1. 
File zeroed after power loss.


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-04 Thread Alex Ignatov



On 03.05.2016 2:17, Tom Lane wrote:

Alex Ignatov  writes:

I think that rename can help a little bit. At least on some FS it is
atomic operation.


Writing a single sector ought to be atomic too.  I'm very skeptical that
it'll be an improvement to just move the risk from one filesystem
operation to another; especially not to one where there's not even a
terribly portable way to request fsync.

regards, tom lane


pg_control is 8k long(i think it is legth of one page in default PG 
compile settings).
I also think that 8k recording can be atomic. Even if recording of one 
sector is atomic nobody can say about what sector from 8k record of 
pg_control  should be written first. It can be last sector or say sector 
number 10 from 16. That why i mentioned renaming from tmp file to 
pg_control. Renaming in FS usually is atomic operation. And after power 
loss we have either old version of pg_control or new version of it. But 
not torn pg_control file.



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-06 Thread Alex Ignatov


On 06.05.2016 0:42, Greg Stark wrote:

On 5 May 2016 12:32 am, "Tom Lane" mailto:t...@sss.pgh.pa.us>> wrote:
 >
 > To repeat, I'm pretty hesitant to change this logic.  While this is not
 > the first report we've ever heard of loss of pg_control, I believe I
could
 > count those reports without running out of fingers on one hand --- and
 > that's counting since the last century. It will take quite a lot of
 > evidence to convince me that some other implementation will be more
 > reliable.  If you just come and present a patch to use direct write, or
 > rename, or anything else for that matter, I'm going to reject it out of
 > hand unless you provide very strong evidence that it's going to be more
 > reliable than the current code across all the systems we support.

One thing we could do without much worry of being less reliable would be
to keep two copies of pg_control. Write one, fsync, then write to the
other and fsync that one.

Oracle keeps a copy of the old control file so that you can always go
back to an older version if a hardware or software bug currupts it. But
they keep a lot more data in their control file and they can be quite large.

Oracle can create more then one copy of control file. They are the same, 
not old copy and current. And their advise is just to store this copies 
on separate storage to be more fault tolerant.


PS By the way on my initial post about "is pg_control safe" i wrote in p 
3. some thoughts about multiple copies of pg_control file. Glad to see 
identity of views on this issue



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-06 Thread Alex Ignatov


On 05.05.2016 7:16, Amit Kapila wrote:

On Wed, May 4, 2016 at 8:03 PM, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:
 >
 > Amit Kapila mailto:amit.kapil...@gmail.com>> writes:
 > > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov
mailto:a.igna...@postgrespro.ru>>
 > > wrote:
 > >> On 03.05.2016 2:17, Tom Lane wrote:
 > >>> Writing a single sector ought to be atomic too.
 >
 > >> pg_control is 8k long(i think it is legth of one page in default PG
 > >> compile settings).
 >
 > > The actual data written is always sizeof(ControlFileData) which
should be
 > > less than one sector.
 >
 > Yes.  We don't care what happens to the rest of the file as long as the
 > first sector's worth is updated atomically.  See the comments for
 > PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.
 >
 > We could change to a different PG_CONTROL_SIZE pretty easily, and there's
 > certainly room to argue that reducing it to 512 or 1024 would be more
 > efficient.  I think the motivation for setting it at 8K was basically
 > "we're already assuming that 8K writes are efficient, so let's assume
 > it here too".  But since the file is only written once per checkpoint,
 > efficiency is not really a key selling point anyway.  If you could make
 > an argument that some other size would reduce the risk of failures,
 > it would be interesting --- but I suspect any such argument would be
 > very dependent on the quirks of a specific file system.
 >

How about using 512 bytes as a write size and perform direct writes
rather than going via OS buffer cache for control file?   Alex, is the
issue reproducible (to ensure that if we try to solve it in some way, do
we have way to test it as well)?

 >
 > One point worth considering is that on most file systems, rewriting
 > a fraction of a page is *less* efficient than rewriting a full page,
 > because the kernel first has to read in the old contents to fill
 > the disk buffer it's going to partially overwrite with new data.
 > This motivates against trying to reduce the write size too much.
 >

Yes, you are very much right and I have observed that recently during my
work on WAL Re-Writes [1].  However, I think that won't be the issue if
we use direct writes for control file.


[1] -
http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jbtjxbfyd67r5dlcqfyomj4f-qmfxbp1...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>


Hi!
No issue happened only once. Also any attempts to reproduce it is not 
successful yet



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-05-13 Thread Alex Ignatov


On 13.05.2016 9:39, Michael Paquier wrote:

Hi all,

Beginning a new thread because the ext4 issues are closed, and because
pg_basebackup data durability meritates a new thread. And in short
about the problem: pg_basebackup makes no effort in being sure that
the data it backs up is on disk, which is bad... One possible
recommendation is to use initdb -S after running pg_basebackup, but
making sure that data is on disk should be done before pg_basebackup
ends.

On Thu, May 12, 2016 at 8:09 PM, I wrote:

And actually this won't fly high if there is no equivalent of
walkdir() or if the fsync()'s are not applied recursively. On master
at least the refactoring had better be done cleanly first... For the
back branches, we could just have some recursive call like
fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
you think that this should be part of fe_utils or src/common/? I'd
tend to think the latter is more adapted as there is an equivalent in
the backend. On back-branches, we could just have something like
fsync_recursively that walks though the paths. An even more simple
approach would be to fsync() individually things that have been
written, but that would suck in performance.


So, attached are two patches that apply on HEAD to address the problem
of pg_basebackup that does not sync the data it writes. As
pg_basebackup cannot use directly initdb -S because, as a client-side
utility, it may be installed while initdb is not (see Fedora and
RHEL), I have refactored the code so as the routines in initdb.c doing
the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
this is 0001.

Patch 0002 is a set of fixes for pg_basebackup:
- In plain mode, fsync_pgdata is used so as all the tablespaces are
fsync'd at once. This takes care as well of the case where pg_xlog is
a symlink.
- In tar mode (no stdout), each tar file is synced individually, and
the base directory is synced once at the end.
In both cases, failures are not considered fatal.

With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
files is made durable by using fsync and durable_rename where needed
(credits to Andres mainly for this part).

This set of patches is aimed only at HEAD. Back-patchable versions of
this patch would need to copy fsync_pgdata and friends into
streamutil.c for example.

I am adding that to the next CF for review as a bug fix.
Regards,





Hi!
Do we have any confidence that data file is not being corrupted? I.e 
contains some corrupted page? Can pg_basebackup check page checksum (db 
init with initdb -k) while backing up files?


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why we don't have checksums on clog files

2016-06-06 Thread Alex Ignatov

Hello!

Why we don't have checksums on clog files.

We have checksum on pg_control, optional checksumming on data files, 
some form of checksumming on wal's. But why we don't have any 
checksumming on clogs. Corruptions on clogs lead to transaction 
visisbility problems and database consistency violation.


Can anybody explain this situation with clogs?


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-20 Thread Alex Ignatov


On 20.06.2016 16:36, Tom Lane wrote:

Robert Haas  writes:

On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas  wrote:

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.
It's my understanding that these functions exist in no small part for
compatibility with Oracle, and Oracle declines to skip the digit '1'
on the basis of an extra space in the format string, which IMHO is the
behavior any reasonable user would expect.

So Amul and I are of one opinion and Tom is of another.  Anyone else
have an opinion?

I don't necessarily have an opinion yet.  I would like to see more than
just an unsupported assertion about what Oracle's behavior is.  Also,
how should FM mode affect this?

regards, tom lane




Oracle:
SQL> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') 
from dual;

SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual
*
ERROR at line 1:
ORA-01843: not a valid month

PG:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD 
HH24:MI:SS');

  to_timestamp

 2016-01-06 14:40:39+03
(1 row)


I know about:
"These functions interpret input liberally, with minimal error checking. 
While they produce valid output, the conversion can yield unexpected 
results" from docs but by providing illegal input parameters  we have no 
any exceptions or errors about that.
I think that to_timestamp() need to has more format checking than it has 
now.


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-20 Thread Alex Ignatov


On 13.06.2016 18:52, amul sul wrote:

Hi,

It's look like bug in to_timestamp() function when format string has more 
whitespaces compare to input string, see below:

Ex.1: Two white spaces before HH24 whereas one before input time string

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD  HH24:MI:SS');
to_timestamp

2016-06-13 05:43:36-07   <— incorrect time
(1 row)



Ex.2: One whitespace before  format string

postgres=# SELECT TO_TIMESTAMP('2016/06/13 15:43:36', ' /MM/DD HH24:MI:SS');
to_timestamp
--
0016-06-13 15:43:36-07:52:58  <— incorrect year
(1 row)



If there are one or more consecutive whitespace in the format, we should skip 
those as long as we could get an actual field.
Thoughts?
Thanks & Regards,
Amul Sul




From docs about to_timestamp() ( 
https://www.postgresql.org/docs/9.5/static/functions-formatting.html)
"These functions interpret input liberally, with minimal error checking. 
While they produce valid output, the conversion can yield unexpected 
results. For example, input to these functions is not restricted by 
normal ranges, thus to_date('20096040','MMDD') returns 2014-01-17 
rather than causing an error. Casting does not have this behavior."


And it wont stop on some simple whitespace. By using to_timestamp you 
can get any output results by providing illegal input parameters values:


postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD 
HH24:MI:SS');

  to_timestamp

 2016-01-06 14:40:39+03
(1 row)


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-23 Thread Alex Ignatov


On 23.06.2016 16:30, Bruce Momjian wrote:

On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov  wrote:



On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using to_timestamp you
can get any output results by providing illegal input parameters values:
postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD
HH24:MI:SS');
   to_timestamp

  2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it is in format string, 
see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13  15:43:36', '/MM/DD 
HH24:MI:SS');
to_timestamp

2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input string comes
from the data, so I don't see having them behave the same as necessary.



To be honest they not just behave differently.  to_timestamp is just 
incorrectly  handles input data and nothing else.There is no excuse for 
such behavior:


postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', '/MM/DD 
HH24:MI:SS');

 to_timestamp
----------
 0018-08-05 13:15:43+02:30:17
(1 row)



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-23 Thread Alex Ignatov


On 23.06.2016 19:37, David G. Johnston wrote:
On Thu, Jun 23, 2016 at 12:16 PM, Alex Ignatov 
mailto:a.igna...@postgrespro.ru>>wrote:



On 23.06.2016 16:30, Bruce Momjian wrote:

On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov
mailto:a.igna...@postgrespro.ru>> wrote:


On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using
to_timestamp you
can get any output results by providing illegal input
parameters values:
postgres=# SELECT TO_TIMESTAMP('2016-06-13 99
:99:99', 'MMDD
HH24:MI:SS');
   to_timestamp

  2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it
is in format string, see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36',
'/MM/DD HH24:MI:SS');
to_timestamp

2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input
string comes
from the data, so I don't see having them behave the same as
necessary.


To be honest they not just behave differently. to_timestamp is
just incorrectly  handles input data and nothing else.There is no
excuse for such behavior:

postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36',
'/MM/DD HH24:MI:SS');
 to_timestamp
--
 0018-08-05 13:15:43+02:30:17
(1 row)


T
​o be honest I don't see how this is relevant to quoted content.  And 
you've already made this point quite clearly - repeating it isn't 
constructive.  This behavior has existed for a long time and I don't 
see that changing it is a worthwhile endeavor.  I believe a new 
function is required that has saner behavior. Otherwise given good 
input and a well-formed parse string the function does exactly what it 
is designed to do.  Avoid giving it garbage and you will be fine. 
Maybe wrap the call to the in a function that also checks for the 
expected layout and RAISE EXCEPTION if it doesn't match.


​David J.
​
​
Arguing just like that one can say that we don't even need exception 
like "division by zero". Just use well-formed numbers in denominator...
Input data  sometimes can be generated automagically. Without exception 
throwing debugging stored function containing to_timestamp can be painful.




Re: [HACKERS] Bug in to_timestamp().

2016-06-24 Thread Alex Ignatov


On 23.06.2016 20:40, Tom Lane wrote:

Robert Haas  writes:

On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston
 wrote:

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

I would definitely agree with David on that point.  Making to_timestamp
noticeably better on this score seems like a nontrivial project, and
post-beta is not the time for that sort of thing, even if we had full
consensus on what to do.  I'd suggest somebody work on a patch and put
it up for review in the next cycle.

Now, if you were to narrowly define the problem as "whether to skip
non-spaces for a space in the format", maybe that could be fixed
post-beta, but I think that's a wrongheaded approach.  to_timestamp's
issues with input that doesn't match the format are far wider than that.
IMO we should try to resolve the whole problem with one coherent change,
not make incremental incompatible changes at the margins.

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

regards, tom lane


Totally agree that we need more discussion about error handling in this 
function!


Also this behavior is observed in to_date() and to_number() function:

postgres=# SELECT 
TO_DATE('2!0!1!6!0!6-/-/-/-/-/-/-1!/-/-/-/-/-/-/-3!', '-MM-DD');

  to_date

 0002-01-01
(1 row)

postgres=# postgres=# select to_number('1$#@!!,2,%,%4,5,@%5@4..8-', 
'999G999D9S');

 to_number
---
12
(1 row)

On the our side we have some discussions about to write a patch that 
will change this incorrect  behavior. So stay tuned.


--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-24 Thread Alex Ignatov


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 20.06.2016 17:09, Albe Laurenz wrote:

Tom Lane wrote:

I don't necessarily have an opinion yet.  I would like to see more than
just an unsupported assertion about what Oracle's behavior is.  Also,
how should FM mode affect this?

I can supply what Oracle 12.1 does:

SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' /MM/DD HH24:MI:SS') AS ts 
FROM dual;

TS

2016-06-13 15:43:36.0 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', '/MM/DD  HH24:MI:SS') AS ts 
FROM dual;

TS

2016-06-13 15:43:36.0 AD

SQL> SELECT to_timestamp('2016-06-1315:43:36', '/MM/DD  HH24:MI:SS') AS 
ts FROM dual;

TS

2016-06-13 15:43:36.0 AD

(to_timestamp_tz behaves the same way.)

So Oracle seems to make no difference between one or more spaces.

Yours,
Laurenz Albe

Guys, do we need to change this behavior or may be you can tell me that 
is normal because this and this:


postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', '-MM-DD 
HH24:MI:SS');

  to_timestamp

 2016-03-01 15:43:36+03
(1 row)

but on the other side we have :

postgres=# select '2016-02-30 15:43:36'::timestamp;
ERROR:  date/time field value out of range: "2016-02-30 15:43:36"
LINE 1: select '2016-02-30 15:43:36'::timestamp;

Another bug in to_timestamp/date()?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Strange behavior of some volatile function like random(), nextval()

2016-06-29 Thread Alex Ignatov

Hello!

Got some strange behavior of random() function:

postgres=# select (select random() ) from generate_series(1,10) as i;
  random
---
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
(10 rows)

postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
  ?column?

   0.97471913928166
 0.0532126761972904
  0.331358563620597
 0.0573496259748936
  0.321165383327752
   0.48836630070582
  0.444201893173158
 0.0729857799597085
  0.661443184129894
  0.706566562876105
(10 rows)

postgres=# explain select (select random() ) from generate_series(1,10) 
as i;

QUERY PLAN
--
 Function Scan on generate_series i  (cost=0.02..10.01 rows=1000 width=0)
   InitPlan 1 (returns $0)
 ->  Result  (cost=0.00..0.01 rows=1 width=0)
(3 rows)

postgres=# explain select (select random()+i*0 ) from 
generate_series(1,10) as i;

QUERY PLAN
--
 Function Scan on generate_series i  (cost=0.00..30.00 rows=1000 width=4)
   SubPlan 1
 ->  Result  (cost=0.00..0.02 rows=1 width=0)
(3 rows)

postgres=# \df+ random();
List of functions
   Schema   |  Name  | Result data type | Argument data types | Type  | 
Security | Volatility |  Owner   | Language | Source code | Description

++--+-++--++--+--+-+--
 pg_catalog | random | double precision | | normal 
| invoker  | volatile   | postgres | internal | drandom | random value

(1 row)


Also:

postgres=# create sequence test;
CREATE SEQUENCE
postgres=# SELECT (SELECT nextval('test')) FROM generate_series(1,10) as i;
 nextval
-
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
(10 rows)

postgres=# SELECT (SELECT nextval('test')+i*0) FROM 
generate_series(1,10) as i;

 ?column?
--
2
3
4
5
6
7
8
9
   10
   11
(10 rows)


postgres=# \df+ nextval() ;

List of functions
   Schema   |  Name   | Result data type | Argument data types | Type  
| Security | Volatility |  Owner   | Language | Source code | 
Description

+-+--+-++--++--+--+-+-
 pg_catalog | nextval | bigint   | regclass| normal 
| invoker  | volatile   | postgres | internal | nextval_oid | sequence 
next value

(1 row)


Both function is volatile so from docs :

"A VOLATILE function can do anything, including modifying the database. 
It can return different results on successive calls with the same 
arguments. The optimizer makes no assumptions about the behavior of such 
functions. A query using a volatile function will re-evaluate the 
function at every row where its value is needed."


Something wrong with executor? Is it bug or executor feature related 
with  subquery?


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Strange behavior of some volatile function like random(), nextval()

2016-06-29 Thread Alex Ignatov


On 29.06.2016 15:30, David G. Johnston wrote:

More specifically...
On Wed, Jun 29, 2016 at 7:34 AM, Michael Paquier 
mailto:michael.paqu...@gmail.com>>wrote:


On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov
mailto:a.igna...@postgrespro.ru>> wrote:
> Hello!
>
> Got some strange behavior of random() function:
>
> postgres=# select (select random() ) from generate_series(1,10)
as i;
>   random
> ---
>  0.831577288918197
> [...]
> (10 rows)

I recall that this is treated as an implicit LATERAL, meaning that
random() is calculated only once.


A non-correlated (i.e., does not refer to outer variables) subquery 
placed into the target-list need only have its value computed once - 
so that is what happens.  The fact that a volatile function can return 
different values given the same arguments doesn't mean much when the 
function is only ever called a single time.​



> postgres=# select (select random()+i*0 ) from
generate_series(1,10) as i;
>   ?column?
> 
>0.97471913928166
> [...]
> (10 rows)

But not that. So those results do not surprise me.


​A correlated subquery, on the other hand, has to be called once for 
every row and is evaluated within the context supplied by said row​.  
Each time random is called it returns a new value.


Section 4.2.11 (9.6 docs)
https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES

Maybe this could be worded better but the first part talks about a 
single execution while "any one execution" is mentioned in reference 
to "the surrounding query".


​I do think that defining "correlated" and "non-correlated" subqueries 
within this section would be worthwhile.


David J.
​


In this subquery(below) we have reference to outer variables but it is 
not working as it should(or i dont understand something):


postgres=# postgres=# select id, ( select string_agg('a','') from 
generate_series(1,trunc(10*random()+1)::int) where id=id) from 
generate_series(1,10) as id;

 id | string_agg
+
  1 | aaa
  2 | aaa
...
but this query(with reference to outer var) working perfectly:
postgres=# select id,(select random() where id=id) from 
generate_series(1,10) as id;

 id |   random
+
  1 |  0.974509597290307
  2 |  0.219822214450687
...

Also this query  is working good( (id-id) do the job):
postgres=# select id, ( select string_agg('a','') from 
generate_series(1,trunc(10*random()+1)::int+(id-id)) ) from 
generate_series(1,10) as id;

 id | string_agg
+
  1 | aaa
  2 | a
...

It means that even reference to outer variables  doesn't mean that 
executor execute volatile function from subquery every time. Or there is 
something else what i should know?



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Why we lost Uber as a user

2016-07-28 Thread Alex Ignatov


On 28.07.2016 17:53, Vladimir Sitnikov wrote:



>> That's a recipe for runaway table bloat; VACUUM can't do much
because
>> there's always some minutes-old transaction hanging around (and
SNAPSHOT
>> TOO OLD doesn't really help, we're talking about minutes here), and
>> because of all of the indexes HOT isn't effective.


Just curious: what if PostgreSQL supported index that stores "primary 
key" (or unique key) instead of tids?
Am I right that kind of index would not suffer from that bloat? I'm 
assuming the primary key is not updated, thus secondary indices build 
in that way should be much less prone to bloat when updates land to 
other columns (even if tid moves, its PK does not change, thus 
secondary index row could be reused).


If that works, it could reduce index bloat, reduce the amount of WAL 
(less indices will need be updated). Of course it will make index scan 
a bit worse, however it looks like at least Uber is fine with that 
extra cost of index scan.


Does it make sense to implement that kind of index as an access method?

Vladimir


You mean IOT like Oracle have?

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company