On Tue, Feb 22, 2011 at 1:59 AM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > Really? The idea of the hash is to prevent you from importing arbitrary > snapshots into the system, allowing only copying snapshots that are in use > by another backend. The purpose of the hash is to make sure the snapshot the > client passes in is identical to one used by another backend.
If that's the purpose of the hash, it is utterly useless. The computation performed by the backend can easily be replicated by an adversary. > If you don't use a cryptographically secure hash, it's easy to construct a > snapshot with the same hash as an existing snapshot, with more or less > arbitrary contents. And if you did use a cryptographically secure hash, it would still be easy, unless there is some plausible way of keeping the key a secret, which I doubt. The original reason why we went with this design (send the snapshot to the client and then have the client send it back to the server) was because some people thought it could be useful to take a snapshot from the master and recreate it on a standby, or perhaps the other way around. If that's the goal, checking whether the snapshot being imported is one that's already in use is missing the point. We have to rely on our ability to detect, when importing the snapshot, any anomalies that could lead to system instability; and allow people to import an arbitrary snapshot that meets those constraints, even one that the user just created out of whole cloth. Now, as I said before, I think that's a bad idea. I think we should NOT be designing the system to allow publication of arbitrary snapshots; instead, the backend that wishes to export its snapshot should so request, getting back a token (like "1") that other backends can then pass to START TRANSACTION (SNAPSHOT '1'), or whatever syntax we end up using. In that design, there's no need to validate snapshots because they never leave the server's memory space, which IMHO is as it should be. If someone can develop a convincing argument that transmitting snapshots between the master and standby is a good idea, we can still accommodate that: the master-standby connection is now full-duplex. In fact, we may want to do that anyway for other reasons (see Kevin Grittner's musings on this point vs. true serializability). Another reason I don't like this approach is because it's possible that the representation of snapshots might change in the future. Tom mused a while back about using an LSN as a snapshot (snapshots sees all transactions committed prior to that LSN) and there are other plausible changes, too. If we expose the internal details of the snapshot mechanism to users, then (1) it becomes a user-visible feature with backward compatibility implications if we choose to break it down the road and (2) all future snapshot representations must be able to be fully sanity checked on import. Right now it's fairly easy to verify that the snapshot's xmin isn't too old and that the remaining XIDs are in a reasonable range and that's probably all we really need to be certain of, but it's not clear that some future representation would be equally easy to validate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers