On 22.02.2011 14:24, Robert Haas wrote:
On Tue, Feb 22, 2011 at 1:59 AM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com>  wrote:
If you don't use a cryptographically secure hash, it's easy to construct a
snapshot with the same hash as an existing snapshot, with more or less
arbitrary contents.

And if you did use a cryptographically secure hash, it would still be
easy, unless there is some plausible way of keeping the key a secret,
which I doubt.

This is hashing, not encryption, there is no key. The point is that even if the attacker has the hash value and knows the algorithm, he can not construct *another* snapshot that has the same hash. A cryptographically strong hash function has that property, second preimage resistance, whereas for a regular CRC or similar it's easy to create an arbitrary input that has any given checksum.

The original reason why we went with this design (send the snapshot to
the client and then have the client send it back to the server) was
because some people thought it could be useful to take a snapshot from
the master and recreate it on a standby, or perhaps the other way
around.  If that's the goal, checking whether the snapshot being
imported is one that's already in use is missing the point.  We have
to rely on our ability to detect, when importing the snapshot, any
anomalies that could lead to system instability; and allow people to
import an arbitrary snapshot that meets those constraints, even one
that the user just created out of whole cloth.

Yes. It would be good to perform those sanity checks anyway.

Now, as I said before, I think that's a bad idea.  I think we should
NOT be designing the system to allow publication of arbitrary
snapshots; instead, the backend that wishes to export its snapshot
should so request, getting back a token (like "1") that other backends
can then pass to START TRANSACTION (SNAPSHOT '1'), or whatever syntax
we end up using.

Well, I'm not sure if we should allow importing arbitrary (as long as they're sane) snapshots or not; the arguments for and against that are both compelling.

But even if we don't allow it, there's no harm in sending the whole snapshot to the client, anyway. Ie. instead of "1" as the identifier, use the snapshot itself. That leaves the door open for allowing it in the future, should we choose to do so.

 In that design, there's no need to validate
snapshots because they never leave the server's memory space, which
IMHO is as it should be.

I agree that if we don't allow importing arbitrary snapshots, we should use some out-of-band communication to get the snapshot to the backend. IOW not rely on a hash.

Another reason I don't like this approach is because it's possible
that the representation of snapshots might change in the future.  Tom
mused a while back about using an LSN as a snapshot (snapshots sees
all transactions committed prior to that LSN) and there are other
plausible changes, too.  If we expose the internal details of the
snapshot mechanism to users, then (1) it becomes a user-visible
feature with backward compatibility implications if we choose to break
it down the road

Clients should treat the snapshot as an opaque block.

and (2) all future snapshot representations must be
able to be fully sanity checked on import.  Right now it's fairly easy
to verify that the snapshot's xmin isn't too old and that the
remaining XIDs are in a reasonable range and that's probably all we
really need to be certain of, but it's not clear that some future
representation would be equally easy to validate.

Perhaps, although I can't imagine a representation that wouldn't be equally easy to validate.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to