[GENERAL] atomic commit;begin for long running transactions , in combination with savepoint.
I was wondering whether there could be an atomic commit;begin command for postgresql, in order to open up a transaction at the beginning of a unit of work in a client session, so that client applications don't have to duplicate work with having optimistic locking and version id fields in their table rows. savepoint doesn't actually commit writes in a transaction upto the time savepoint is called, but it's useful for detecting conflicts, whilst allowing work to continue ( e.g. with timeout set) . the atomic commit;begin wouldn't be necessary except a client could crash before the end of the unit of work, and work done upto that point would be lost in the transaction. the atomic commit;begin is so that clients can use postgresql's mechanisms for detecting concurrency read/write conflicts by issuing savepoints before each write, instead of the current need to begin;select for update xxx, client_versionid (or xmin) ; ( client checks version id hasn't changed against version id stored when last selected for read); update; commit . set autocommit to on , wouldn't cut it would it, because between writes there is no transaction , so savepoints couldn't be used for conflict checking. Also, if the transaction is in read committed mode, then if a write failed ,and a rollback to savepoint was done, you could do select again ,get the new value, inform the client, and if the user elected to go ahead, overwrite with their new value, it would work the second time, because one has read the committed select. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] atomic commit;begin for long running transactions , in combination with savepoint.
I meant commit and continue current transaction. The transaction is opened on the user application caching composite data from many tables regarding a root object. Because almost all applications cache data, there is apparently a pattern "optimistic offline lock" where orb middleware basically adds a version field to rows , because transactions are usually begun just when the user has done a modification to a displayed value, and now wants to change an old cached value which he believes is the current value. The middleware starts a transaction, and reads the version number, and if it has been incremented since the initial transaction that read the value and the version number, it then informs the user that a new old value exists, and whether he wants to overwrite it. This is basically a duplication of mvcc, which has to occur with all applications that can't start long running transactions from the very beginning of reading a complex object, because there are a lot of updates per work unit, and if there is a crash during the work unit, then a lot of updates would be lost, unless the client app also does it's own WAL, which is another duplication. On Sun Oct 14 1:56 , "Trevor Talbot" sent: >On 10/13/07, syan tan > wrote: >> I was wondering whether there could be an atomic commit;begin command >> for postgresql, in order to open up a transaction at the beginning of >> a unit of work in a client session, so that client applications don't >> have to duplicate work with having optimistic locking and version id >> fields in their table rows. savepoint doesn't actually commit writes >> in a transaction upto the time savepoint is called, but it's useful >> for detecting conflicts, whilst allowing work to continue ( e.g. >> with timeout set) . the atomic commit;begin wouldn't be necessary >> except a client could crash before the end of the unit of work, >> and work done upto that point would be lost in the transaction. >> the atomic commit;begin is so that clients can use postgresql's >> mechanisms for detecting concurrency read/write conflicts by >> issuing savepoints before each write, instead of the current need >> to begin;select for update xxx, client_versionid (or xmin) ; ( client >> checks version id hasn't changed against version id stored when last >> selected for read); update; commit . > >I'm not following your train of thought. It sounds as though you want >to commit data without actually leaving your current transaction, but >what do you need the transaction for? > >I don't understand how an atomic COMMIT;BEGIN would help. Consider a >transaction with serializable isolation: your snapshot view of the >data exists exactly as long as your transaction does. A COMMIT >followed by a BEGIN, whether atomic or not, is going to change your >view of the data. > >If you want it to do something else, what is that exactly? > >> Also, if the transaction is in read committed mode, then if >> a write failed ,and a rollback to savepoint was done, you could >> do select again ,get the new value, inform the client, and if >> the user elected to go ahead, overwrite with their new value, >> it would work the second time, because one has read the committed >> select. > >What is preventing you from doing that now? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] atomic commit;begin for long running transactions , in combination with savepoint.
On Sun Oct 14 7:41 , "Trevor Talbot" sent: >On 10/14/07, Syan Tan wrote: >> I meant commit and continue current transaction. The transaction is opened >> on the user application caching composite data from many tables regarding >> a root object. Because almost all applications cache data, there is >> apparently >> a pattern "optimistic offline lock" where orb middleware basically adds >> a version field to rows , because transactions are usually begun just >> when the user has done a modification to a displayed value, and now >> wants to change an old cached value which he believes is the current value. > >Well, transactional semantics won't help you here. > >In order to detect a change occurred, what you want is a SERIALIZABLE >transaction: you want the update to fail if the row it matches is no >longer the same version as your snapshot. However, in order to read >the new value to decide if you want to update it anyway, you need to >leave your current snapshot. As soon as you do that, ALL previously >read values lose the update checks that snapshot provided you. you read the old value at the beginning of the transaction, and you don't re-read it , assuming it is infrequently changed, so if someone updates it concurrently, when you try to write, then you detect the conflict, and rollback to the savepoint. You DONT want a serializable transaction, because then you can't read the other committed value after rolling back to the savepoint. >If you use a READ COMMITTED transaction, there is no check, since the >update can see the new value itself. > >Nested transactions or the ability to commit some data without leaving >the current transaction won't get you want you want either, since >you're really looking for per-row behavior, not per-transaction. > >> The middleware starts a transaction, and reads the version number, and >> if it has been incremented since the initial transaction that read >> the value and the version number, it then informs the user that >> a new old value exists, and whether he wants to overwrite it. > >A way to do this using PostgreSQL's own row version data came up >recently on this list. Have a look at this post and the one following >it: >http://archives.postgresql.org/pgsql-general/2007-10/msg00503.php this is the same as using your own version ids and incrementing them within the application, which would leave the xmin within postgresql's domain , and would also mean the application's sql is not tied to postgresql. The question was if there was a commit and continue transaction command, would it work e.g. begin; select x from t1 where id=2; (store in variable x0 , display to user) ...(much later) (user changes stored x, at client, now x1) savepoint a; answ = n; do: try: update t1 set x=x1 where id = 2; commit-and-continue; catch: rollback to savepoint a; select x from t1 where id=2 ( store as x2) if (x2 <> x0) notify user "x has changed from x0 to x2, continue to write x1?" input answ; while answ ==y; In the first pass of the loop, the transaction hasn't read x a second time so the transaction state for x is at x0, if x has been changed by another transaction's commit, then the catch will execute , and here x is selected again , and now the transaction state is that x is at x2. if the user elects to loop the while loop, then the second update will get through because the second select has advanced the version id for x, so now it can be changed. If the transaction was serializable , and another transaction has committed, this would never work, because a commit would nullify the other transactions write without this transaction ever having seen the other transactions write, so this transaction would always be forced to rollback. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] atomic commit;begin for long running transactions , in combination with savepoint.
thanks. I'll try it out, but sounds true enough; so there is no isolation level where transaction state keeps track of all the read versions of any item read so far by the transaction ? The main question is whether there's a way of avoiding implementing either those patterns of offline optimistic locking (version checking) or offline pessimistic locking ( lock table checking) , when you've got interactive client applications ? If there was an isolation level that kept track of the versions of items a transaction has read, and there was a "commit with continue" command, then you could avoid that extra work in clients - or maybe warping transactions for this purpose makes it more difficult to write transactions for the usual batch processing purposes. On Mon Oct 15 5:28 , "Trevor Talbot" sent: >On 10/15/07, Syan Tan wrote: > >> >In order to detect a change occurred, what you want is a SERIALIZABLE >> >transaction: you want the update to fail if the row it matches is no >> >longer the same version as your snapshot. However, in order to read >> >the new value to decide if you want to update it anyway, you need to >> >leave your current snapshot. As soon as you do that, ALL previously >> >read values lose the update checks that snapshot provided you. >> >> you read the old value at the beginning of the transaction, and >> you don't re-read it , assuming it is infrequently changed, so >> if someone updates it concurrently, when you try to write, then >> you detect the conflict, and rollback to the savepoint. > >Transactions don't operate based on what you've read. "UPDATE ... >WHERE ..." finds the row(s) to update using the WHERE clause right >now, not based on any previously-read values. It does not know what >you've read before. > >The only difference is in the data you _can_ read. For the >SERIALIZABLE isolation level, that data was decided at the beginning >of the transaction. A row that was updated by another transaction >will make the version that you can see effectively "read only", so >when the UPDATE tries to change it, there will be a transactional >conflict due to the isolation level. > >> You DONT want a serializable transaction, because then you can't read >> the other committed value after rolling back to the savepoint. > >Correct. But the READ COMMITTED isolation level does not limit what >data you can see at the beginning of the transaction, so an UPDATE >will always find the latest version of a row. There is no conflict as >far as the transaction is concerned. > >> >A way to do this using PostgreSQL's own row version data came up >> >recently on this list. Have a look at this post and the one following >> >it: >> >http://archives.postgresql.org/pgsql-general/2007-10/msg00503.php >> >> this is the same as using your own version ids and incrementing them >> within the application, which would leave the xmin within postgresql's >> domain , and would also mean the application's sql is not tied >> to postgresql. > >Yes. You were asking for a feature in PostgreSQL that doesn't match >standard transaction semantics though, so I figured you wouldn't mind >a PostgreSQL-specific option :) > > >Assuming READ COMMITTED isolation level: > >> begin; >> select x from t1 where id=2; >> (store in variable x0 , display to user) >> ...(much later) >> (user changes stored x, at client, now x1) >> savepoint a; >> >> answ = n; >> >> do: >> try: >> update t1 set x=x1 where id = 2; >> commit-and-continue; >> catch: >> rollback to savepoint a; >> select x from t1 where id=2 ( store as x2) >> if (x2 x0) notify user "x has changed from x0 to x2, continue to write >> x1?" >> input answ; >> >> while answ ==y; >> >> >> In the first pass of the loop, the transaction hasn't read x a second >> time so the transaction state for x is at x0, > >Transaction state is not based on what you've read; it doesn't know. > >> if x has been changed by another transaction's commit, then the catch will >> execute , > >The UPDATE will find the latest version of the row. The change made >by the other transaction is not a problem at this isolation level (you >can see it), so the UPDATE will simply proceed and change it anyway. >The catch block will never execute. > >> If the transaction was serializable , and another transaction has committed, >> this would never work, because a commit would nullify the other
Re: [GENERAL] atomic commit;begin for long running transactions , in combination with savepoint.
On Mon Oct 15 5:28 , "Trevor Talbot" sent: >The only difference is in the data you _can_ read. For the >SERIALIZABLE isolation level, that data was decided at the beginning >of the transaction. A row that was updated by another transaction >will make the version that you can see effectively "read only", so >when the UPDATE tries to change it, there will be a transactional >conflict due to the isolation level. > >> You DONT want a serializable transaction, because then you can't read >> the other committed value after rolling back to the savepoint. > >Correct. But the READ COMMITTED isolation level does not limit what >data you can see at the beginning of the transaction, so an UPDATE >will always find the latest version of a row. There is no conflict as >far as the transaction is concerned. > > >Transaction state is not based on what you've read; it doesn't know. > >> if x has been changed by another transaction's commit, then the catch will >> execute , > >The UPDATE will find the latest version of the row. The change made >by the other transaction is not a problem at this isolation level (you >can see it), so the UPDATE will simply proceed and change it anyway. >The catch block will never execute. > >> If the transaction was serializable , and another transaction has committed, >> this would never work, because a commit would nullify the other >> transactions write without this transaction ever having seen the other >> transactions write, so this transaction would always be forced to rollback. > >Exactly. But without SERIALIZABLE isolation, any operations you >perform always see the other transaction's write, so there is never >any conflict. > >Also keep in mind that MVCC is not the only way to implement >transactions; pure locking is more common in other databases. In the >locking model, most transactions prevent others from writing until >after they are finished. Rows simply can't have different versions >(and of course concurrent performance is awful). what about postgresql doing something like snapshot isolation level as per the enemy M$ ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] atomic commit;begin for long running transactions , in combination with savepoint.
my understanding was it used more resources than read committed because it keeps track of the version id of rows selected so far in a transaction, "transaction-level consistency", so it has the potential to do the xmin co-selecting , and checking, if it were a transaction isolation level in postgres. google found my reference, and the reference mentioned it was different from serializable. On Mon Oct 15 9:09 , "Trevor Talbot" sent: >On 10/15/07, Syan Tan wrote: > >> >Also keep in mind that MVCC is not the only way to implement >> >transactions; pure locking is more common in other databases. In the >> >locking model, most transactions prevent others from writing until >> >after they are finished. Rows simply can't have different versions >> >(and of course concurrent performance is awful). >> >> what about postgresql doing something like snapshot isolation level as per >> the enemy M$ ? > >SQL Server is normally a pure locking database; from what I can tell, >its snapshot isolation level adds a limited form of MVCC above that, >making its concurrent behavior closer to PostgreSQL's: >http://msdn2.microsoft.com/en-us/library/ms345124\(d=printer\).aspx ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq