Re: [Libguestfs] Checksums and other verification

Nir Soffer Tue, 28 Feb 2023 12:59:26 -0800

On Tue, Feb 28, 2023 at 4:13 PM Laszlo Ersek <ler...@redhat.com> wrote:
>
> On 2/28/23 12:39, Richard W.M. Jones wrote:
> > On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote:
> >> On 2/27/23 17:44, Richard W.M. Jones wrote:
> >>> On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
> >>>> Or intentionally choose a hash that can be computed out-of-order,
> >>>> such as a Merkle Tree.  But we'd need a standard setup for all
> >>>> parties to agree on how the hash is to be computed and checked, if
> >>>> it is going to be anything more than just a linear hash of the
> >>>> entire guest-visible contents.
> >>>
> >>> Unfortunately I suspect that by far the easiest way for people who
> >>> host images to compute checksums is to run 'shaXXXsum' on them or
> >>> sign them with a GPG signature, rather than engaging in a novel hash
> >>> function.  Indeed that's what is happening now:
> >>>
> >>> https://alt.fedoraproject.org/en/verify.html
> >>
> >> If the output is produced with unordered writes, but the complete
> >> output needs to be verified with a hash *chain*, that still allows
> >> for some level of asynchrony. The start of the hashing need not be
> >> delayed until after the end of output, only after the start of
> >> output.
> >>
> >> For example, nbdcopy could maintain the highest offset up to which
> >> the output is contiguous, and on a separate thread, it could be
> >> hashing the output up to that offset.
> >>
> >> Considering a gigantic output, as yet unassembled blocks could likely
> >> not be buffered in memory (that's why the writes are unordered in the
> >> first place!), so the hashing thread would have to re-read the output
> >> via NBD. Whether that would cause performance to improve or to
> >> deteriorate is undecided IMO. If the far end of the output network
> >> block device can accommodate a reader that is independent of the
> >> writers, then this level of overlap is beneficial. Otherwise, this
> >> extra reader thread would just add more thrashing, and we'd be better
> >> off with a separate read-through once writing is complete.
> >
> > In my mind I'm wondering if there's any mathematical result that lets
> > you combine each hash(block_i) into the final hash(block[1..N])
> > without needing to compute the hash of each block in order.
>
> I've now checked:
>
> https://en.wikipedia.org/wiki/SHA-2
> https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction
> https://en.wikipedia.org/wiki/One-way_compression_function#Construction_from_block_ciphers
> https://en.wikipedia.org/wiki/One-way_compression_function#Davies%E2%80%93Meyer
>
> Consider the following order of steps:
>
> - precompute hash(block[n]), with some initial IV
> - throw away block[n]
> - wait until block[n-1] is processed, providing the actual IV for
>   hashing block[n]
> - mix the new IV into hash(block[n]) without having access to block[n]
>
> If such a method existed, it would break the security (i.e., the
> original design) of the hash, IMO, as it would separate the IV from
> block[n]. In a way, it would make the "mix" and "concat" operators (of
> the underlying block cipher's chaining method) distributive. I believe
> then you could generate a bunch of *valid* hash(block[n]) values as a
> mere function of the IV, without having access to block[n]. You could
> perhaps use that for probing against other hash(block[m]) values, and
> maybe determine repeating patterns in the plaintext. I'm not a
> cryptographer so I can't exactly show what security property is broken
> by separating the IV from block[n].
>
> > (This is what blkhash solves, but unfortunately the output isn't
> > compatible with standard hashes.)
>
> Assuming blkhash is a Merkle Tree implementation, blkhash solves a
> different problem IMO.


blkhash uses a flat Merkle tree, described here:
https://www.researchgate.net/publication/323243320_Foundations_of_Applied_Cryptography_and_Cybersecurity
seection 3.9.4 2lMT: the Flat Merkle Tree construction

To support parallel hashing it uses more complex construction, but this
can be simplified to a single flat Merkle tree.

Nir

_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Checksums and other verification

Reply via email to