Re: [HACKERS] Block level parallel vacuum WIP
On 23.08.2016 15:41, Michael Paquier wrote: On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada wrote: As for PoC, I implemented parallel vacuum so that each worker processes both 1 and 2 phases for particular block range. Suppose we vacuum 1000 blocks table with 4 workers, each worker processes 250 consecutive blocks in phase 1 and then reclaims dead tuples from heap and indexes (phase 2). So each worker is assigned a range of blocks, and processes them in parallel? This does not sound performance-wise. I recall Robert and Amit emails on the matter for sequential scan that this would suck performance out particularly for rotating disks. Rotating disks is not a problem - you can always raid them and etc. 8k allocation per relation once per half an hour that is the problem. Seq scan is this way = random scan... Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel sec scan in plpgsql
Hello! Does parallel secscan works in plpgsql? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
Aggregate (cost=289035.43..289035.44 rows=1 width=8) -> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e -> Seq Scan on test (cost=0.00..163696.15 rows=1115 width=20) So as we can see parallel secscan doesn't works in plpgsql and sql functions. Can somebody explains me where I was wrong? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On 16.09.2016 07:27, Ashutosh Bapat wrote: On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov wrote: Hello! Does parallel secscan works in plpgsql? Parallel seq scan is a query optimization that will work independent of the source of the query - i.e whether it comes directly from a client or a procedural language like plpgsql. So, I guess, answer to your question is yes. If you are expecting something else, more context will help. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
On 16.09.2016 16:50, Amit Kapila wrote: On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov wrote: No it doesn't. Paralleling neither sql function nor plpgsql: Here is example : ipdr=> show max_worker_processes ; max_worker_processes -- 128 (1 row) ipdr=> set max_parallel_workers_per_gather to 128; SET ipdr=> set force_parallel_mode=on; SET ipdr=> set min_parallel_relation_size =0; SET ipdr=> set parallel_tuple_cost=0; SET Can you try by setting force_parallel_mode = off;? I think it is sending the whole function execution to worker due to force_parallel_mode. No changes: ipdr=> set max_parallel_workers_per_gather to 128; SET ipdr=> set min_parallel_relation_size =0; SET ipdr=> set parallel_tuple_cost=0; SET ipdr=> set force_parallel_mode = off; SET ipdr=> select name,setting from pg_settings where name in('max_parallel_workers_per_gather', ipdr(> 'min_parallel_relation_size', ipdr(>'parallel_tuple_cost', ipdr(> 'force_parallel_mode'); name | setting -+- force_parallel_mode | off max_parallel_workers_per_gather | 128 min_parallel_relation_size | 0 parallel_tuple_cost | 0 (4 rows) ipdr=> explain (analyze,buffers) select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t; QUERY PLAN - Aggregate (cost=87702.33..87702.34 rows=1 width=8) (actual time=709.643..709.643 rows=1 loops=1) Buffers: shared hit=65015 -> Finalize HashAggregate (cost=87364.49..87514.64 rows=15015 width=28) (actual time=706.382..708.456 rows=15015 loops=1) Group Key: test.a, test.b, test.c, test.d, test.e Buffers: shared hit=65015 -> Gather (cost=85149.78..85299.93 rows=165165 width=20) (actual time=478.626..645.209 rows=180180 loops=1) Workers Planned: 11 Workers Launched: 11 Buffers: shared hit=65015 -> Partial HashAggregate (cost=84149.78..84299.93 rows=15015 width=20) (actual time=473.890..478.309 rows=15015 loops=12) Group Key: test.a, test.b, test.c, test.d, test.e Buffers: shared hit=63695 -> Parallel Seq Scan on test (cost=0.00..72786.01 rows=909101 width=20) (actual time=0.021..163.120 rows=83 loops=12) Buffers: shared hit=63695 Planning time: 0.318 ms Execution time: 710.600 ms (16 rows) ipdr=> explain (analyze,buffers) select parallel_test_plpgsql(); QUERY PLAN -- Result (cost=0.00..0.26 rows=1 width=8) (actual time=4003.719..4003.720 rows=1 loops=1) Buffers: shared hit=63869 Planning time: 0.021 ms Execution time: 4003.769 ms (4 rows) auto_explain: 2016-09-16 18:02:29 MSK [29353]: [53-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG: duration: 4001.275 ms plan: Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t Aggregate (cost=289035.43..289035.44 rows=1 width=8) -> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e -> Seq Scan on test (cost=0.00..163696.15 rows=1115 width=20) 2016-09-16 18:02:29 MSK [29353]: [54-1] user=ipdr,db=ipdr,app=psql,client=[local] CONTEXT: SQL statement "select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t" PL/pgSQL function parallel_test_plpgsql() line 5 at SQL statement ipdr=> explain (analyze,buffers) select parallel_test_plpgsql(); QUERY PLAN -- Result (cost=0.00..0.26 rows=1 width=8) (actual time=4497.820..4497.822 rows=1 loops=1) Buffers: shared hit=63695 Planning time: 0.023 ms Execution time: 4497.872 ms (4 rows) auto_explain: 2016-09-16 18:03:23 MSK [29353]: [57-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG: duration: 4497.050 ms plan: Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t Aggregate (cost=289035.43..289035.44 rows=1 width=8) -> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e
Re: [HACKERS] Parallel sec scan in plpgsql
On 18.09.2016 06:54, Amit Kapila wrote: On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov wrote: On 16.09.2016 16:50, Amit Kapila wrote: Can you try by setting force_parallel_mode = off;? I think it is sending the whole function execution to worker due to force_parallel_mode. No changes: Okay, it just skipped from my mind that we don't support parallel queries for SQL statement execution (or statements executed via exec_stmt_execsql) from plpgsql. For detailed explanation of why that is not feasible you can refer one of my earlier e-mails [1] on similar topic. I think if we can somehow get the results via Perform statement, then it could be possible to use parallelism via plpgsql. However, you can use it via SQL functions, an example is below: set min_parallel_relation_size =0; set parallel_tuple_cost=0; set parallel_setup_cost=0; Load 'auto_explain'; set auto_explain.log_min_duration = 0; set auto_explain.log_analyze = true; set auto_explain.log_nested_statements = true; create table test_plpgsql(c1 int, c2 char(1000)); insert into test_plpgsql values(generate_series(1,10),'aaa'); create or replace function parallel_test_set_sql() returns setof bigint as $$ select count(*) from test_plpgsql; $$language sql PARALLEL SAFE STRICT STABLE; Then execute function as: select * from parallel_test_set_sql(); You can see below plan if auto_explain module is loaded. Finalize Aggregate (cost=14806.85..14806.86 rows=1 width=8) (actual tim e=1094.966..1094.967 rows=1 loops=1) -> Gather (cost=14806.83..14806.84 rows=2 width=8) (actual time=472. 216..1094.943 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=14806.83..14806.84 rows=1 width=8) (actual time=177.867..177.868 rows=1 loops=3) -> Parallel Seq Scan on test_plpgsql (cost=0.00..14702.6 7 rows=41667 width=0) (actual time=0.384..142.565 rows=3 loops=3) CONTEXT: SQL function "parallel_test_set_sql" statement 1 LOG: duration: 2965.040 ms plan: Query Text: select * from parallel_test_set_sql(); Function Scan on parallel_test_set_sql (cost=0.25..10.25 rows=1000 widt h=8) (actual time=2538.620..2776.955 rows=1 loops=1) In general, I think we should support the cases as required (or written) by you from plpgsql or sql functions. We need more work to support such cases. There are probably two ways of supporting such cases, we can build some intelligence in plpgsql execution such that it can recognise such queries and allow to use parallelism or we need to think of enabling parallelism for cases where we don't run the plan to completion. Most of the use cases from plpgsql or sql function fall into later category as they don't generally run the plan to completion. [1] - https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com Thank you for you sugestion! That works. But what we can do with this function: create or replace function parallel_test_sql(t int) returns setof bigint as $$ select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t; $$ language sql PARALLEL SAFE STRICT STABLE; explain (analyze,buffers) select * from parallel_test_sql(2); "Function Scan on parallel_test_sql (cost=0.25..10.25 rows=1000 width=8) (actual time=2410.789..2410.790 rows=1 loops=1)" " Buffers: shared hit=63696" "Planning time: 0.082 ms" "Execution time: 2410.841 ms" 2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - Query Tool,client=127.0.0.1 LOG: duration: 2410.135 ms plan: Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t; Aggregate (cost=230701.42..230701.43 rows=1 width=8) -> HashAggregate (cost=230363.59..230513.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e -> Seq Scan on test (cost=0.00..188696.44 rows=372 width=20) Filter: (a >= $1) No parallelism again. Looks like that Filter: (a >= $1) breaks parallelism Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila Sent: Monday, September 4, 2017 3:32 PM To: i.kartys...@postgrespro.ru Cc: pgsql-hackers Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept On Mon, Sep 4, 2017 at 4:34 PM, wrote: > Our clients complain about this issue and therefore I want to raise > the discussion and suggest several solutions to this problem: > > I. Why does PG use Fatal when Error is enough to release lock that > rises lock conflict? > "If (RecoveryConflictPending && DoingCommandRead)" > > II. Do we really need to truncate the table on hot standby exactly at > the same time when truncate on master occurs? > > In my case conflict happens when the autovacuum truncates table tbl1 > on master while backend on replica is performing a long transaction > involving the same table tbl1. This happens because truncate takes an > AccessExclusiveLock. To tackle this issue we have several options: > > 1. We can postpone the truncate on the master until all the replicas > have finished their transactions (in this case, feedback requests to > the master should be sent frequently) Patch 1 > vacuum_lazy_truncate.patch > > 2. Maybe there is an option somehow not to send AccessExclusiveLock > and not to truncate table on the replica right away. We could try to > wait a little and truncate tbl1 on replica again. > Can max_standby_streaming_delay help in this situation (point number - 2)? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com Hello! In this situation this parameter (max_standby_streaming_delay) wont help because if you have subsequent statement on standby (following info is from documentation and from our experience ): Thus, if one query has resulted in significant delay, subsequent conflicting queries will have much less grace time until the standby server has caught up again. And you never now how to set this parameter exept to -1 which mean up to infinity delayed standby. On our experience only autovacuum on master took AccesExclusiveLock that raise this Fatal message on standby. After this AccessExclusive reached standby and max_standby_streaming_delay > -1 you definitely sooner or later get this Fatal on recovery . With this patch we try to get rid of AccessEclusiveLock applied on standby while we have active statement on it. -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove Windows crash dump support?
On 22.12.2015 18:28, Magnus Hagander wrote: On Tue, Dec 22, 2015 at 3:53 PM, Craig Ringer <mailto:cr...@2ndquadrant.com>> wrote: On 22 December 2015 at 22:50, Craig Ringer mailto:cr...@2ndquadrant.com>> wrote: Hi all Back in 2010 I submitted a small feature to allow the creation of minidumps when backends crashed; see commit dcb09b595f88a3bca6097a6acc17bf2ec935d55f . At the time Windows lacked useful support for postmortem debugging and crash-dump management in the operating system its self, especially for applications running as services. That has since improved considerably. The feature was also included in 9.4 Ahem. 9.1. This is what I get for multi-tasking between writing this and packaging an extension for 9.4. In which version(s) of Windows was this improvement added? I think that's really the part that matters here, not necessarily which version of PostgreSQL. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ Hi all! I think that you can debug crash dump since windbg exists. Also I think that Postgres on Windows number of instalations is so tiny because people even today think that it is not so solid as unix version thats why you think that nobody use your code ;). Today if my memory serves me right this code can not deal with buffer overflow. Am i right? May be we need to add this functionality instead of drop support of it entirely? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Is pg_control file crashsafe?
Hello everyone! We have some issue with truncated pg_control file on Windows after power failure.My questions is : 1) Is pg_control protected from say , power crash or partial write? 2) How PG update pg_control? By writing in it or writing in some temp file and after that rename it to pg_control to be atomic?3) Can PG have multiple pg_control copy to be more fault tolerant? PS During some experiments we found that at present time there is no any method to do crash recovery with "restored" version of pg_control (based on some manipulations with pg_resetxlog ). Only by using pg_resetxlog and setting it parameters to values taken from wal file (pg_xlogdump)we can at least start PG and saw that PG state is at the moment of last check point. But we have no real confidence that PG is in consistent state(also docs on pg_resetxlogs told us about it too) Alex IgnatovPostgres Professional: http://www.postgrespro.comRussian Postgres Company
Re: [HACKERS] Is pg_control file crashsafe?
On 01.05.2016 0:55, Bruce Momjian wrote: On Thu, Apr 28, 2016 at 09:58:00PM +, Alex Ignatov wrote: Hello everyone! We have some issue with truncated pg_control file on Windows after power failure. My questions is : 1) Is pg_control protected from say , power crash or partial write? 2) How PG update pg_control? By writing in it or writing in some temp file and after that rename it to pg_control to be atomic? We write pg_controldata in one write() OS call: if (write(fd, buffer, PG_CONTROL_SIZE) != PG_CONTROL_SIZE) 3) Can PG have multiple pg_control copy to be more fault tolerant? PS During some experiments we found that at present time there is no any method to do crash recovery with "restored" version of pg_control (based on some manipulations with pg_resetxlog ). Only by using pg_resetxlog and setting it parameters to values taken from wal file (pg_xlogdump)we can at least start PG and saw that PG state is at the moment of last check point. But we have no real confidence that PG is in consistent state(also docs on pg_resetxlogs told us about it too) We have talked about improving the reliability of pg_control, but failures are so rare we have never done anything to improve it. I know Tatsuo has talked about making pg_control more reliable, so I am CC'ing him. Oh! Good. Thank you! It is rare but as we saw now it is our reality too. One of our customers had this issue on previous week =) I think that rename can help a little bit. At least on some FS it is atomic operation. -- Alex Ignatov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On 03.05.2016 2:21, Andres Freund wrote: Hi, On 2016-04-28 21:58:00 +, Alex Ignatov wrote: We have some issue with truncated pg_control file on Windows after power failure.My questions is : 1) Is pg_control protected from say , power crash or partial write? It should be. I think to make progress on this thread we're going to need a bit more details about the exact corruption. Was the length of the file change? Did the checksum fail? Did you just observe too old contents? Greetings, Andres Freund Length was 0 bytes after crash. It was Windows and ntfs + ssd in raid 1. File zeroed after power loss. Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
On 03.05.2016 2:17, Tom Lane wrote: Alex Ignatov writes: I think that rename can help a little bit. At least on some FS it is atomic operation. Writing a single sector ought to be atomic too. I'm very skeptical that it'll be an improvement to just move the risk from one filesystem operation to another; especially not to one where there's not even a terribly portable way to request fsync. regards, tom lane pg_control is 8k long(i think it is legth of one page in default PG compile settings). I also think that 8k recording can be atomic. Even if recording of one sector is atomic nobody can say about what sector from 8k record of pg_control should be written first. It can be last sector or say sector number 10 from 16. That why i mentioned renaming from tmp file to pg_control. Renaming in FS usually is atomic operation. And after power loss we have either old version of pg_control or new version of it. But not torn pg_control file. Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
On 06.05.2016 0:42, Greg Stark wrote: On 5 May 2016 12:32 am, "Tom Lane" mailto:t...@sss.pgh.pa.us>> wrote: > > To repeat, I'm pretty hesitant to change this logic. While this is not > the first report we've ever heard of loss of pg_control, I believe I could > count those reports without running out of fingers on one hand --- and > that's counting since the last century. It will take quite a lot of > evidence to convince me that some other implementation will be more > reliable. If you just come and present a patch to use direct write, or > rename, or anything else for that matter, I'm going to reject it out of > hand unless you provide very strong evidence that it's going to be more > reliable than the current code across all the systems we support. One thing we could do without much worry of being less reliable would be to keep two copies of pg_control. Write one, fsync, then write to the other and fsync that one. Oracle keeps a copy of the old control file so that you can always go back to an older version if a hardware or software bug currupts it. But they keep a lot more data in their control file and they can be quite large. Oracle can create more then one copy of control file. They are the same, not old copy and current. And their advise is just to store this copies on separate storage to be more fault tolerant. PS By the way on my initial post about "is pg_control safe" i wrote in p 3. some thoughts about multiple copies of pg_control file. Glad to see identity of views on this issue -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
On 05.05.2016 7:16, Amit Kapila wrote: On Wed, May 4, 2016 at 8:03 PM, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote: > > Amit Kapila mailto:amit.kapil...@gmail.com>> writes: > > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>> > > wrote: > >> On 03.05.2016 2:17, Tom Lane wrote: > >>> Writing a single sector ought to be atomic too. > > >> pg_control is 8k long(i think it is legth of one page in default PG > >> compile settings). > > > The actual data written is always sizeof(ControlFileData) which should be > > less than one sector. > > Yes. We don't care what happens to the rest of the file as long as the > first sector's worth is updated atomically. See the comments for > PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile. > > We could change to a different PG_CONTROL_SIZE pretty easily, and there's > certainly room to argue that reducing it to 512 or 1024 would be more > efficient. I think the motivation for setting it at 8K was basically > "we're already assuming that 8K writes are efficient, so let's assume > it here too". But since the file is only written once per checkpoint, > efficiency is not really a key selling point anyway. If you could make > an argument that some other size would reduce the risk of failures, > it would be interesting --- but I suspect any such argument would be > very dependent on the quirks of a specific file system. > How about using 512 bytes as a write size and perform direct writes rather than going via OS buffer cache for control file? Alex, is the issue reproducible (to ensure that if we try to solve it in some way, do we have way to test it as well)? > > One point worth considering is that on most file systems, rewriting > a fraction of a page is *less* efficient than rewriting a full page, > because the kernel first has to read in the old contents to fill > the disk buffer it's going to partially overwrite with new data. > This motivates against trying to reduce the write size too much. > Yes, you are very much right and I have observed that recently during my work on WAL Re-Writes [1]. However, I think that won't be the issue if we use direct writes for control file. [1] - http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jbtjxbfyd67r5dlcqfyomj4f-qmfxbp1...@mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/> Hi! No issue happened only once. Also any attempts to reproduce it is not successful yet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On 13.05.2016 9:39, Michael Paquier wrote: Hi all, Beginning a new thread because the ext4 issues are closed, and because pg_basebackup data durability meritates a new thread. And in short about the problem: pg_basebackup makes no effort in being sure that the data it backs up is on disk, which is bad... One possible recommendation is to use initdb -S after running pg_basebackup, but making sure that data is on disk should be done before pg_basebackup ends. On Thu, May 12, 2016 at 8:09 PM, I wrote: And actually this won't fly high if there is no equivalent of walkdir() or if the fsync()'s are not applied recursively. On master at least the refactoring had better be done cleanly first... For the back branches, we could just have some recursive call like fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do you think that this should be part of fe_utils or src/common/? I'd tend to think the latter is more adapted as there is an equivalent in the backend. On back-branches, we could just have something like fsync_recursively that walks though the paths. An even more simple approach would be to fsync() individually things that have been written, but that would suck in performance. So, attached are two patches that apply on HEAD to address the problem of pg_basebackup that does not sync the data it writes. As pg_basebackup cannot use directly initdb -S because, as a client-side utility, it may be installed while initdb is not (see Fedora and RHEL), I have refactored the code so as the routines in initdb.c doing the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and this is 0001. Patch 0002 is a set of fixes for pg_basebackup: - In plain mode, fsync_pgdata is used so as all the tablespaces are fsync'd at once. This takes care as well of the case where pg_xlog is a symlink. - In tar mode (no stdout), each tar file is synced individually, and the base directory is synced once at the end. In both cases, failures are not considered fatal. With pg_basebackup -X and pg_receivexlog, the manipulation of WAL files is made durable by using fsync and durable_rename where needed (credits to Andres mainly for this part). This set of patches is aimed only at HEAD. Back-patchable versions of this patch would need to copy fsync_pgdata and friends into streamutil.c for example. I am adding that to the next CF for review as a bug fix. Regards, Hi! Do we have any confidence that data file is not being corrupted? I.e contains some corrupted page? Can pg_basebackup check page checksum (db init with initdb -k) while backing up files? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why we don't have checksums on clog files
Hello! Why we don't have checksums on clog files. We have checksum on pg_control, optional checksumming on data files, some form of checksumming on wal's. But why we don't have any checksumming on clogs. Corruptions on clogs lead to transaction visisbility problems and database consistency violation. Can anybody explain this situation with clogs? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 20.06.2016 16:36, Tom Lane wrote: Robert Haas writes: On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas wrote: I think a space in the format string should skip a whitespace character in the input string, but not a non-whitespace character. It's my understanding that these functions exist in no small part for compatibility with Oracle, and Oracle declines to skip the digit '1' on the basis of an extra space in the format string, which IMHO is the behavior any reasonable user would expect. So Amul and I are of one opinion and Tom is of another. Anyone else have an opinion? I don't necessarily have an opinion yet. I would like to see more than just an unsupported assertion about what Oracle's behavior is. Also, how should FM mode affect this? regards, tom lane Oracle: SQL> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual; SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual * ERROR at line 1: ORA-01843: not a valid month PG: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) I know about: "These functions interpret input liberally, with minimal error checking. While they produce valid output, the conversion can yield unexpected results" from docs but by providing illegal input parameters we have no any exceptions or errors about that. I think that to_timestamp() need to has more format checking than it has now. Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 13.06.2016 18:52, amul sul wrote: Hi, It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below: Ex.1: Two white spaces before HH24 whereas one before input time string postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS'); to_timestamp 2016-06-13 05:43:36-07 <— incorrect time (1 row) Ex.2: One whitespace before format string postgres=# SELECT TO_TIMESTAMP('2016/06/13 15:43:36', ' /MM/DD HH24:MI:SS'); to_timestamp -- 0016-06-13 15:43:36-07:52:58 <— incorrect year (1 row) If there are one or more consecutive whitespace in the format, we should skip those as long as we could get an actual field. Thoughts? Thanks & Regards, Amul Sul From docs about to_timestamp() ( https://www.postgresql.org/docs/9.5/static/functions-formatting.html) "These functions interpret input liberally, with minimal error checking. While they produce valid output, the conversion can yield unexpected results. For example, input to these functions is not restricted by normal ranges, thus to_date('20096040','MMDD') returns 2014-01-17 rather than causing an error. Casting does not have this behavior." And it wont stop on some simple whitespace. By using to_timestamp you can get any output results by providing illegal input parameters values: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 23.06.2016 16:30, Bruce Momjian wrote: On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote: On Monday, 20 June 2016 8:53 PM, Alex Ignatov wrote: On 13.06.2016 18:52, amul sul wrote: And it wont stop on some simple whitespace. By using to_timestamp you can get any output results by providing illegal input parameters values: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) We do consume extra space from input string, but not if it is in format string, see below: postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS'); to_timestamp 2016-06-13 15:43:36-07 (1 row) We should have same treatment for format string too. Thoughts? Comments? Well, the user specifies the format string, while the input string comes from the data, so I don't see having them behave the same as necessary. To be honest they not just behave differently. to_timestamp is just incorrectly handles input data and nothing else.There is no excuse for such behavior: postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', '/MM/DD HH24:MI:SS'); to_timestamp ---------- 0018-08-05 13:15:43+02:30:17 (1 row) Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 23.06.2016 19:37, David G. Johnston wrote: On Thu, Jun 23, 2016 at 12:16 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>>wrote: On 23.06.2016 16:30, Bruce Momjian wrote: On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote: On Monday, 20 June 2016 8:53 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>> wrote: On 13.06.2016 18:52, amul sul wrote: And it wont stop on some simple whitespace. By using to_timestamp you can get any output results by providing illegal input parameters values: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99 :99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) We do consume extra space from input string, but not if it is in format string, see below: postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS'); to_timestamp 2016-06-13 15:43:36-07 (1 row) We should have same treatment for format string too. Thoughts? Comments? Well, the user specifies the format string, while the input string comes from the data, so I don't see having them behave the same as necessary. To be honest they not just behave differently. to_timestamp is just incorrectly handles input data and nothing else.There is no excuse for such behavior: postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', '/MM/DD HH24:MI:SS'); to_timestamp -- 0018-08-05 13:15:43+02:30:17 (1 row) T o be honest I don't see how this is relevant to quoted content. And you've already made this point quite clearly - repeating it isn't constructive. This behavior has existed for a long time and I don't see that changing it is a worthwhile endeavor. I believe a new function is required that has saner behavior. Otherwise given good input and a well-formed parse string the function does exactly what it is designed to do. Avoid giving it garbage and you will be fine. Maybe wrap the call to the in a function that also checks for the expected layout and RAISE EXCEPTION if it doesn't match. David J. Arguing just like that one can say that we don't even need exception like "division by zero". Just use well-formed numbers in denominator... Input data sometimes can be generated automagically. Without exception throwing debugging stored function containing to_timestamp can be painful.
Re: [HACKERS] Bug in to_timestamp().
On 23.06.2016 20:40, Tom Lane wrote: Robert Haas writes: On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston wrote: My understanding is that is not going to change for 9.6. That's exactly what is under discussion here. I would definitely agree with David on that point. Making to_timestamp noticeably better on this score seems like a nontrivial project, and post-beta is not the time for that sort of thing, even if we had full consensus on what to do. I'd suggest somebody work on a patch and put it up for review in the next cycle. Now, if you were to narrowly define the problem as "whether to skip non-spaces for a space in the format", maybe that could be fixed post-beta, but I think that's a wrongheaded approach. to_timestamp's issues with input that doesn't match the format are far wider than that. IMO we should try to resolve the whole problem with one coherent change, not make incremental incompatible changes at the margins. At the very least I'd want to see a thought-through proposal that addresses all three of these interrelated points: * what should a space in the format match * what should a non-space, non-format-code character in the format match * how should we handle fields that are not exactly the width suggested by the format regards, tom lane Totally agree that we need more discussion about error handling in this function! Also this behavior is observed in to_date() and to_number() function: postgres=# SELECT TO_DATE('2!0!1!6!0!6-/-/-/-/-/-/-1!/-/-/-/-/-/-/-3!', '-MM-DD'); to_date 0002-01-01 (1 row) postgres=# postgres=# select to_number('1$#@!!,2,%,%4,5,@%5@4..8-', '999G999D9S'); to_number --- 12 (1 row) On the our side we have some discussions about to write a patch that will change this incorrect behavior. So stay tuned. -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On 20.06.2016 17:09, Albe Laurenz wrote: Tom Lane wrote: I don't necessarily have an opinion yet. I would like to see more than just an unsupported assertion about what Oracle's behavior is. Also, how should FM mode affect this? I can supply what Oracle 12.1 does: SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' /MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD SQL> SELECT to_timestamp('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD SQL> SELECT to_timestamp('2016-06-1315:43:36', '/MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD (to_timestamp_tz behaves the same way.) So Oracle seems to make no difference between one or more spaces. Yours, Laurenz Albe Guys, do we need to change this behavior or may be you can tell me that is normal because this and this: postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', '-MM-DD HH24:MI:SS'); to_timestamp 2016-03-01 15:43:36+03 (1 row) but on the other side we have : postgres=# select '2016-02-30 15:43:36'::timestamp; ERROR: date/time field value out of range: "2016-02-30 15:43:36" LINE 1: select '2016-02-30 15:43:36'::timestamp; Another bug in to_timestamp/date()? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Strange behavior of some volatile function like random(), nextval()
Hello! Got some strange behavior of random() function: postgres=# select (select random() ) from generate_series(1,10) as i; random --- 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 (10 rows) postgres=# select (select random()+i*0 ) from generate_series(1,10) as i; ?column? 0.97471913928166 0.0532126761972904 0.331358563620597 0.0573496259748936 0.321165383327752 0.48836630070582 0.444201893173158 0.0729857799597085 0.661443184129894 0.706566562876105 (10 rows) postgres=# explain select (select random() ) from generate_series(1,10) as i; QUERY PLAN -- Function Scan on generate_series i (cost=0.02..10.01 rows=1000 width=0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) (3 rows) postgres=# explain select (select random()+i*0 ) from generate_series(1,10) as i; QUERY PLAN -- Function Scan on generate_series i (cost=0.00..30.00 rows=1000 width=4) SubPlan 1 -> Result (cost=0.00..0.02 rows=1 width=0) (3 rows) postgres=# \df+ random(); List of functions Schema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Source code | Description ++--+-++--++--+--+-+-- pg_catalog | random | double precision | | normal | invoker | volatile | postgres | internal | drandom | random value (1 row) Also: postgres=# create sequence test; CREATE SEQUENCE postgres=# SELECT (SELECT nextval('test')) FROM generate_series(1,10) as i; nextval - 1 1 1 1 1 1 1 1 1 1 (10 rows) postgres=# SELECT (SELECT nextval('test')+i*0) FROM generate_series(1,10) as i; ?column? -- 2 3 4 5 6 7 8 9 10 11 (10 rows) postgres=# \df+ nextval() ; List of functions Schema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Source code | Description +-+--+-++--++--+--+-+- pg_catalog | nextval | bigint | regclass| normal | invoker | volatile | postgres | internal | nextval_oid | sequence next value (1 row) Both function is volatile so from docs : "A VOLATILE function can do anything, including modifying the database. It can return different results on successive calls with the same arguments. The optimizer makes no assumptions about the behavior of such functions. A query using a volatile function will re-evaluate the function at every row where its value is needed." Something wrong with executor? Is it bug or executor feature related with subquery? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Strange behavior of some volatile function like random(), nextval()
On 29.06.2016 15:30, David G. Johnston wrote: More specifically... On Wed, Jun 29, 2016 at 7:34 AM, Michael Paquier mailto:michael.paqu...@gmail.com>>wrote: On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>> wrote: > Hello! > > Got some strange behavior of random() function: > > postgres=# select (select random() ) from generate_series(1,10) as i; > random > --- > 0.831577288918197 > [...] > (10 rows) I recall that this is treated as an implicit LATERAL, meaning that random() is calculated only once. A non-correlated (i.e., does not refer to outer variables) subquery placed into the target-list need only have its value computed once - so that is what happens. The fact that a volatile function can return different values given the same arguments doesn't mean much when the function is only ever called a single time. > postgres=# select (select random()+i*0 ) from generate_series(1,10) as i; > ?column? > >0.97471913928166 > [...] > (10 rows) But not that. So those results do not surprise me. A correlated subquery, on the other hand, has to be called once for every row and is evaluated within the context supplied by said row. Each time random is called it returns a new value. Section 4.2.11 (9.6 docs) https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES Maybe this could be worded better but the first part talks about a single execution while "any one execution" is mentioned in reference to "the surrounding query". I do think that defining "correlated" and "non-correlated" subqueries within this section would be worthwhile. David J. In this subquery(below) we have reference to outer variables but it is not working as it should(or i dont understand something): postgres=# postgres=# select id, ( select string_agg('a','') from generate_series(1,trunc(10*random()+1)::int) where id=id) from generate_series(1,10) as id; id | string_agg + 1 | aaa 2 | aaa ... but this query(with reference to outer var) working perfectly: postgres=# select id,(select random() where id=id) from generate_series(1,10) as id; id | random + 1 | 0.974509597290307 2 | 0.219822214450687 ... Also this query is working good( (id-id) do the job): postgres=# select id, ( select string_agg('a','') from generate_series(1,trunc(10*random()+1)::int+(id-id)) ) from generate_series(1,10) as id; id | string_agg + 1 | aaa 2 | a ... It means that even reference to outer variables doesn't mean that executor execute volatile function from subquery every time. Or there is something else what i should know? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Why we lost Uber as a user
On 28.07.2016 17:53, Vladimir Sitnikov wrote: >> That's a recipe for runaway table bloat; VACUUM can't do much because >> there's always some minutes-old transaction hanging around (and SNAPSHOT >> TOO OLD doesn't really help, we're talking about minutes here), and >> because of all of the indexes HOT isn't effective. Just curious: what if PostgreSQL supported index that stores "primary key" (or unique key) instead of tids? Am I right that kind of index would not suffer from that bloat? I'm assuming the primary key is not updated, thus secondary indices build in that way should be much less prone to bloat when updates land to other columns (even if tid moves, its PK does not change, thus secondary index row could be reused). If that works, it could reduce index bloat, reduce the amount of WAL (less indices will need be updated). Of course it will make index scan a bit worse, however it looks like at least Uber is fine with that extra cost of index scan. Does it make sense to implement that kind of index as an access method? Vladimir You mean IOT like Oracle have? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company