Hi, On 01/29/2018 11:23 AM, Simon Riggs wrote: > On 29 January 2018 at 07:15, Nikhil Sontakke <nikh...@2ndquadrant.com> wrote: > >>> Having this as responsibility of plugin sounds interesting. It >>> certainly narrows the scope for which we need to solve the abort >>> issue. For 2PC that may be okay as we need to somehow interact >>> with transaction manager as Simon noted. I am not sure if this >>> helps streaming use-case though as there is not going to be any >>> external transaction management involved there. > > I think we should recognize that the two cases are different. > > 2PC decoding patch is looking at what happens AFTER a PREPARE has > occurred on a transaction. > > Streaming looks to have streaming occur right in the middle of a > transaction, so it has other concerns and likely other solutions as a > result. >
I don't quite see how these cases are so different, or why would they require different handling. Can you explain? We need to deal with decoding a transaction that aborts while we're decoding it - we must not continue decoding it (in the sense of passing the changes to the plugin). I don't see any major difference between ROLLBACK and ROLLBACK PREPARED. So I think the pre-abort hook solution should work for the streaming case too. At least - I don't see why it wouldn't. While discussing this with Peter Eisentraut off-list, I think we came up with an alternative idea for the streaming case, that does not require the pre-abort hook. The important detail is that we only really care about aborts in transactions that modified catalogs in some way (e.g. by doing DDL). But we can safely decode (and stream) changes up to the point when the catalogs get modified, so we can do two different things at that point: (1) Stop streaming changes from that transaction, and instead start spilling it to disk (and then continue with the replay only when it actually commits). (2) Note the transaction ID somewhere and restart the decoding (that is, notify the downstream to throw away the data and go back in WAL to read all the data from scratch) but spilling that one transaction to disk instead of streaming it. Neither of these solutions is currently implemented and would require changes to ReorderBuffer (which currently does not support mixing of spill-to-disk and streaming) and possibly the logical decoding infrastructure (e.g. to stash the XIDs somewhere and allow restarting from previous LSN). The good thing is it does not need to know about aborted transactions and so does not require communication with transaction manager using pre-abort hooks etc. I think this would be a massive advantage. The main question (at least for me) is if it's actually cheaper, compared to the pre-abort hook. In my experience, aborts tend to be fairly rare in practice - maybe 1:1000 to commits. On the other hand, temporary tables are fairly common thing, and they count as catalog changes, of course. Maybe this is not that bad though, as it only really matters for large transactions, which is when we start to spill to disk or stream. And that should not be very often - if it happens very often, you probably need a higher limit (similarly to work_mem). The cases where this would matter are large ETL jobs, upgrade scripts and so on - these tend to be large and mix DDL (temporary tables, ALTER TABLE, ...). That's unfortunate, as it's one of the cases the streaming was supposed to help. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services