Re: [Libguestfs] Checksums and other verification

Eric Blake Mon, 27 Feb 2023 06:42:33 -0800

On Mon, Feb 27, 2023 at 01:56:26PM +0000, Richard W.M. Jones wrote:
> 
> https://github.com/kubevirt/containerized-data-importer/issues/1520
> 
> Hi Eric,
> 
> We had a question from the Kubevirt team related to the above issue.
> The question is roughly if it's possible to calculate the checksum of
> an image as an nbdkit filter and/or in the qemu block layer.


In the qemu block layer - yes: see Nir's https://gitlab.com/nirs/blkhash

Note that there is a huge difference between a block-based checksum (a
checksum of the block data the guest will see) and a checksum of the
original file (bytes as visible on the source, although with non-raw
files, more than one image may hash to the same guest-visible contents
despite having different host checksums).

Also, it may prove to be more efficient to generate a Merkle Tree hash
of an image (an image is divided into smaller portions in a
binary-tree fanout, where the hash of the entire image is computed by
combining hashes of child nodes up to the root of the tree - which
allows downloading blocks out of order).  [You may be more familiar
with Merkle Trees than you realize - every git commit id is ultimately
a Merkle Tree hash of all prior commits]

As for nbdkit being able to do hashing as a filter, we don't have such
a filter now, but I think it would be technically possible to
implement one.  The trickiest part would be figuring out a way to
expose the checksum to the client once the client has finally read
through the entire image.  It would be easy to have nbdkit output the
resulting hash in a secondary file for consumption by the end client,
harder but potentially more useful would be extending the NBD protocol
itself to allow the NBD client to issue a query to the server to
provide the hash directly (or an indication that the hash is not yet
known because not all blocks have been hashed yet).

> 
> Supplemental #1: could qemu-img convert calculate a checksum as it goes
> along?

Nir's work on blkhash seems like that is doable.

> 
> Supplemental #2: could we detect various sorts of common errors, such
> a webserver that is incorrectly configured and serves up an error page
> containing "<html>"; or something which is supposed to be a disk image
> but does not "look like" (in some ill-defined sense) a disk image,
> eg. it has no partition table.
> 
> I'm not sure if qemu has any existing features covering the above (and
> I know for sure that nbdkit doesn't).

Indeed.  But adding a filter that does a pre-read of the plugin's
firsts 1M during .prepare to look for an expected signature (what is
sufficient, seeing if there is a partition table?) and refuses to let
the client connect if the plugin is serving wrong data seems fairly
straightforward.

> 
> One issue is that calculating a checksum involves a linear scan of the
> image, although we can at least skip holes.

Or intentionally choose a hash that can be computed out-of-order, such
as a Merkle Tree.  But we'd need a standard setup for all parties to
agree on how the hash is to be computed and checked, if it is going to
be anything more than just a linear hash of the entire guest-visible
contents.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Checksums and other verification

Reply via email to