gsst...@mit.edu (Greg Stark) writes: > On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne <cbbro...@acm.org> wrote: >> It would make it easy to conclude: >> >> "This next transaction did 8328194 updates. Maybe we should do >> some kind of checkpoint (e.g. - commit transaction or such) before >> working on it." >> >> versus >> >> "This transaction we're thinking of working on had 7 updates. No >> big deal..." > > I'm puzzled how you would define this value. How do you add 7 inserts, > 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7 > inserts and 7 deletes worth twice as much as the 7 updates when > they're basically the same thing? What if the inserts fired triggers > which inserted 7 more rows, is that 14? What if the 7 updates modified > 2 TB of TOAST data but the 8238194 updates were all to the same record > and they were all HOT updates so all it did was change 8kB?
The presence of those questions (and their ambiguity) is the reason why there's a little squirming as to whether this is super-useful and super-necessary. What this offers is *SOME* idea of how much updating work a particular transaction did. It's a bit worse than you suggest: - If replication triggers have captured tuples, those would get counted. - TOAST updates might lead to extra updates being counted. But back to where you started, I'd anticipate 7 inserts, 7 deletes, and 7 updates being counted as something around 21 updates. And if that included 5 TOAST changes, it might bump up to 26. If there were replication triggers in place, that might bump the count up to 45 (which I chose arbitrarily). > In any case you'll have all the actual data from your triggers or > hooks or whatever so what value does having the system keep track of > this add? This means that when we'd pull the list of transactions to consider, we'd get something like: select * from next_transactions('4218:23', 50); [list of 50 transactions returned, each with... -> txid -> START timestamp -> COMMIT timestamp -> Approximate # of updates Then, for each of the 50, I'd pull replication log data for the corresponding transaction. If I have the approximate # of updates, that might lead me to stop short, and say: "That next update looks like a doozy! I'm going to stop and commit what I've got before doing that one." It's not strictly necessary, but would surely be useful for flow control. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html "MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;)" -- William Burrow <aa...@delete.fan.nb.ca> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers