> Dan Ports wrote: > On Thu, Jan 27, 2011 at 09:18:23AM -0800, Jeff Davis wrote: > >> Is there a reason we can't check for a real cycle, which would let >> T1 succeed? > > I talked today with someone who experimented with doing exactly > that in MySQL as part of a research project (Perfectly Serializable > Snapshot Isolation, paper forthcoming in ICDE) I'm wondering how this differs from what is discussed in Section 2.7 ("Serialization Graph Testing") of Cahill's doctoral thesis. That discusses a technique for trying to avoid false positives by testing the full graph for cycles, with the apparent conclusion that the costs of doing so are prohibitive. The failure of attempts to implement that technique seem to have been part of the impetus to develop the SSI technique, where a particular structure involving two to three transactions has been proven to exist in all graphs which form such a cycle. I've been able to identify four causes for false positives in the current SSI implementation: (1) Collateral reads. In particular, data skew caused by inserts past the end of an index between an ANALYZE and subsequent data access was a recurring source of performance complaints. This was fixed by having the planner probe the ends of an index to correct the costs in such situations. This has been effective at correcting the target problem, but we haven't yet excluded such index probes from predicate locking. (2) Lock granularity. This includes vertical granularity issues, such as granularity promotion to conserve shared memory and page locking versus next-key locking; and it also includes the possibility of column locking. Many improvements are still possible in this area; although each should be treated as a performance enhancement patch and subject to the appropriate performance testing to justify the extra code and complexity. In some cases the work needed to effect the reduction in serialization failures may cost more than retrying the failed transactions. (3) Dependencies other than read-write. SSI relies on current PostgreSQL MVCC handling of write-write conflicts -- when one of these occurs between two transactions running at this level, one of the transactions must fail with a serialization failure; since only one runs to successful completion, there is no question of which ran *first*. Write-read dependencies (where one transaction *can* see the work of another because they *don't* overlap) is handled in SSI by assuming that if T1 commits before T2 acquires its snapshot, T1 will appear to have run before T2. I won't go into it at length on this thread, but one has to be very careful about assuming anything else; trying to explicitly track these conflicts and consider that T2 may have appeared to run before T1 can fall apart completely in the face of some common and reasonable programming practices. (4) Length of read-write dependency (a/k/a rw-conflict) chains. Basically, it is a further extension from the Cahill papers in the direction of work which apparently failed because of the overhead of full cycle-checking. The 2008 paper (which was "best paper" at the 2008 ACM SIGMOD), just used inConflict and outConflict flag bits. The 2009 paper extended this to pointers by those names, with NULL meaning "no conflict", and a self-reference meaning "couldn't track the detail; consider it to conflict with *all* concurrent transactions". The latter brought the false positive rate down without adding too much overhead. In the PostgreSQL effort, we replaced those pointers with *lists* of conflicts. We only get to "conflict with all concurrent transactions" in certain circumstances after summarizing transaction data to avoid blowing out shared memory. The lists allowed avoidance of many false positives, facilitated earlier cleanup of much information from shared memory, and led to a cleaner implementation of the summarization logic. They also, as it happens, provide enough data to fully trace the read-write dependencies and avoid some false positives where the "dangerous structure" SSI is looking for exists, but there is neither a complete rw-conflict cycle, nor any transaction in the graph which committed early enough to make a write-read conflict possible to any transaction in the graph. Whether such rigorous tracing prevents enough false positives to justify the programming effort, code complexity, and run-time cost is anybody's guess. I only raise these to clarify the issue for the Jeff (who is reviewing the patch), since he asked. I strongly feel that none of them are issues which need to be addressed for 9.1, nor do I think they can be properly addressed within the time frame of this CF. On the other hand, perhaps an addition to the README-SSI file is warranted? -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers