On 4 June 2013 01:54, Noah Misch <n...@leadboat.com> wrote: > On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote: >> For clarity the 4 problems are >> 1. SQL execution overhead >> 2. Memory usage >> 3. Memory scrolling >> 4. Locking overhead, specifically FPWs and WAL records from FK checks >> probably in that order or thereabouts. >> >> The above is why I went for a technique that avoided SQL execution >> entirely, as well as conserving memory by de-duplicating the values in >> a hash table as we go, which avoids all three problems. The fourth was >> solved by the more extreme approach to locking. > > That nicely frames the benefits of your proposals. Makes sense. > >> I think it might be worth considering joining the after trigger queue >> directly to the referenced table(s), something like this... >> >> CREATE OR REPLACE FUNCTION after_trigger_queue() RETURNS SETOF tid AS $$ >> ... >> $$ LANGUAGE SQL; >> >> EXPLAIN >> SELECT 1 FROM ONLY "order" >> WHERE orderid IN >> (SELECT orderid FROM ONLY order_line WHERE ctid IN (SELECT >> after_trigger_queue FROM after_trigger_queue() )) >> FOR KEY SHARE; > > Agreed.
Hmm, although the above is cute, it still falls down by requiring lots of memory (problem 2) and churning memory again (problem 3). Lets rethink things to put a few options on the table and see what we get... 1. Store all the after trigger events in a big queue, then execute later. That requires us to scroll it to disk to solve problem2, but it still falls foul of problem3, which becomes severe on big data. Implementation: Implement scrolling to disk for the after trigger queue. Then implement event batching in the RI handler code, similar to what is suggested above or directly as suggested by Noah on previous post. 2. Don't store FK events in the after trigger queue at all, but apply them as we go. That solves problems2 and 3. That could be considered to be in violation of the SQL standard, which requires us to apply the checks at the end of the statement. We already violate the standard with regard to uniqueness checks, so doing it here doesn't seem unacceptable. Implementation: Given we know that COPY uses a ring buffer, and given your earlier thoughts on use of a batched SQL, I have a new suggestion. Every time the ring buffer fills, we go through the last buffers accessed, pulling out all the PKs and then issue them as a single SQL statement (as above). We can do that manually, or using the block scan mentioned previously. This uses batched SQL to solve problem1. It doesn't build up a large data structure in memory, problem2, and it also solves problem3 by accessing data blocks before they fall out of the ring buffer. If there are no duplicates in the referenced table, then this behavious will do as much as possible to make accesses to the referenced table also use a small working set. (We may even wish to consider making the batched SQL use a separate ring buffer for RI accesses). That approach doesn't make any assumptions about duplicates. 3. Strip the PK values from the rows at access time and store them in a hash table, de-duplicating as we go. If that gets too big, write seldom accessed values to a temporary table which will automatically scroll to disk when it exceeds temp_buffers. We don't create the temp table until we hit work_mem, so we only create that for larger statements. Then at the end of the statement, copy the hash table into the temp table and then join to the referenced table directly. We don't guarantee the list of PK values is completely unique, but we will have significantly reduced the number of duplicates to check. Comparison: Approach 1 suffers from memory scrolling, problem3. But it follows the SQL standard. Approach 2 solves all performance problems but doesn't follow the letter of the standard (AIUI - anyone see differently?) Approach 3 solves all performance problems and follows the SQL standard Perhaps another way would be to avoid very large COPY statements altogether, breaking down loads into smaller pieces. For example pg_dump could issue COPY in 10000 row chunks rather than big copy. That would mostly avoid problem2 and problem3 and is more realistic with "wild" data. Based on that thought, implementation option 1 looks most reasonable. Which in short summary is a) scroll after trigger queue to disk and b) batch SQL values together as Noah originally suggested, or in the SQL above. Let me know what you think of that analysis. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers