In the last exciting episode, [EMAIL PROTECTED] (Jan Wieck) wrote:I look forward to your comments.
It is not evident from the paper what approach is taken to dealing with the duplicate key conflicts.
The example:
UPDATE table SET col1 = 'temp' where col = 'A'; UPDATE table SET col1 = 'A' where col = 'B'; UPDATE table SET col1 = 'B' where col = 'temp';
I can think of several approaches to this:
One fundamental flaw in eRServer is that it tries to "combine" multiple updates into one update at snapshot-time in the first place. The application can do these three steps in one single transaction, how do you split that?
You can develop an automatic recovery for that. At the time you got a dupkey error, you rollback but remember the _rserv_ts and table_id that caused the dupkey. In the next sync attempt, you fetch the row with that _rserv_ts and delete all rows from the slave table with that primary key plus fake INSERT log rows on the master for the same. Then you prepare and apply and cross fingers that nobody touched the same row again already between your last attempt and now ... which was how many hours ago? And since you can only find one dupkey per round, you might do this a few times with larger and larger lists of _rserv_ts,table_id.
The idea of not accumulating log forever, but just holding this status table (the name log is misleading in eRServer, it holds flags telling "the row with _rserv_ts=nnnn got INS|UPD|DEL'd") has one big advantage. However long your slave does not sync, your master will not run out of space.
But I don't think that there is value in the attempt to let a slave catch up the last 4 days at once anyway. Drop it and use COPY. When your slave does not come up before you have modified half your database, it will be faster this way anyway.
Jan
1. The present eRserv code reads what is in the table at the time of the 'snapshot', and so tries to pass on:
update table set col1 = 'B' where otherkey = 123; update table set col1 = 'A' where otherkey = 456;
which breaks because at some point, col1 is not unique, irrespective of what order we apply the changes in.
2. If the contents as at the time of the COMMIT are stored in the log table, then we would do all three updates in the destination DB, in order, as shown above.
Either we have to: a) Store the updated fields in the replication tables somewhere, or b) Make the third UPDATE wait for the updates to be stored in a file somewhere.
3. The replication code requires that any given key only be updated once in a 'snapshot', so that the updates may be unambiguously partitioned:
UPDATE table SET col1 = 'temp' where col = 'A' ; -- and otherkey = 123 UPDATE table SET col1 = 'A' where col = 'B'; -- and otherkey = 456 -- Must partition here before hitting #123 again -- UPDATE table SET col1 = 'B' where col = 'temp'; -- and otherkey = 123
The third UPDATE may have to be held up until the "partition" is set up, right?
4. I seem to recall a recent discussion about the possibility of
deferring the UNIQUE constraint 'til the END of a commit, with the
result that we could simplify to
update table set col1 = 'B' where otherkey = 123; update table set col1 = 'A' where otherkey = 456;
and discover that the UNIQUE constraint was relaxed just long enough for us to make the TWO changes that in the end combined to being unique.
None of these look like they turn out totally happily, or am I missing an approach?
-- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== [EMAIL PROTECTED] #
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly