Nir Soffer <nir...@gmail.com> writes:

> When the `read-zeroes` is set, reads produce zeroes, and block status
> return BDRV_BLOCK_ZERO, emulating a sparse image.
>
> If we don't set `read-zeros` we report BDRV_BLOCK_DATA, but image data
> is undefined; posix_memalign, _aligned_malloc, valloc, or memalign do
> not promise to zero allocated memory.
>
> When computing a blkhash of an image via qemu-nbd, we want to test 3
> cases:
>
> 1. Sparse image: skip reading the entire image based on block status
>    result, and use a pre-computed zero block hash.
> 2. Image full of zeroes: read the entire image, detect block full of
>    zeroes and skip block hash computation.
> 3. Image full of data: read the entire image and compute a hash of all
>    blocks.
>
> This change adds `read-pattern` option. If the option is set, reads
> produce the specified pattern. With this option we can emulate an image
> full of zeroes or full of non-zeroes.
>
> The following examples shows how the new option can be used with blksum
> (or nbdcopy --blkhash) to compute a blkhash of an image using the
> null-co driver.
>
> Sparse image - the very fast path:
>
>     % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/sparse.sock \
>         "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g', 
> 'read-zeroes': true}}" &
>
>     % time blksum 'nbd+unix:///?socket=/tmp/sparse.sock'
>     300ad1efddb063822fea65ae3174cd35320939d4d0b050613628c6e1e876f8f6  
> nbd+unix:///?socket=/tmp/sparse.sock
>     blksum 'nbd+unix:///?socket=/tmp/sparse.sock'  0.05s user 0.01s system 
> 92% cpu 0.061 total
>
> Image full of zeros - same hash, 268 times slower:
>
>     % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/zero.sock \
>         "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g', 
> 'read-pattern': 0}}" &
>
>     % time blksum 'nbd+unix:///?socket=/tmp/zero.sock'
>     300ad1efddb063822fea65ae3174cd35320939d4d0b050613628c6e1e876f8f6  
> nbd+unix:///?socket=/tmp/zero.sock
>     blksum 'nbd+unix:///?socket=/tmp/zero.sock'  7.45s user 22.57s system 
> 183% cpu 16.347 total
>
> Image full of data - difference hash, heavy cpu usage:
>
>     % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/data.sock \
>         "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g', 
> 'read-pattern': 255}}" &
>
>     % time blksum 'nbd+unix:///?socket=/tmp/data.sock'
>     2c122b3ed28c83ede3c08485659fa9b56ee54ba1751db74d8ba9aa13d9866432  
> nbd+unix:///?socket=/tmp/data.sock
>     blksum 'nbd+unix:///?socket=/tmp/data.sock'  46.05s user 14.15s system 
> 448% cpu 13.414 total
>
> Specifying both `read-zeroes` and `read-pattern` is an error since
> `read-zeroes` implies a sparse image. Example errors:
>
>     % ./qemu-img map --output json \
>         "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'read-pattern': 
> -1}}"
>     qemu-img: Could not open 'json:{...}': read_pattern is out of range 
> (0-255)
>
>     % ./qemu-img map --output json \
>         "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'read-pattern': 
> 0, 'read-zeroes': true}}"
>     qemu-img: Could not open 'json:{...}': The parameters read-zeroes and 
> read-pattern are in conflict
>
> Tested on top of
> https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg05096.html.
>
> Signed-off-by: Nir Soffer <nir...@gmail.com>

[...]

> diff --git a/docs/devel/secure-coding-practices.rst 
> b/docs/devel/secure-coding-practices.rst
> index 0454cc527e..73830684ea 100644
> --- a/docs/devel/secure-coding-practices.rst
> +++ b/docs/devel/secure-coding-practices.rst
> @@ -111,5 +111,6 @@ Use of null-co block drivers
>  The ``null-co`` block driver is designed for performance: its read accesses 
> are
>  not initialized by default. In case this driver has to be used for security
>  research, it must be used with the ``read-zeroes=on`` option which fills read
> -buffers with zeroes. Security issues reported with the default
> +buffers with zeroes, or with the ``read-pattern=N`` option which fills read
> +buffers with pattern. Security issues reported with the default
>  (``read-zeroes=off``) will be discarded.
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7c95c9e36a..2205ac9758 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3295,13 +3295,23 @@
>  #
>  # @read-zeroes: if true, emulate a sparse image, and reads from the
>  #     device produce zeroes; if false, emulate an allocated image but
> -#     reads from the device leave the buffer unchanged.
> +#     reads from the device leave the buffer unchanged. Mutually
> +#     exclusive with @read-pattern.
>  #     (default: false; since: 4.1)

Correct before the patch: absent behaves just like present and false.

That's no longer the case afterwards.  Hmm.

>  #
> +# @read-pattern: if set, emulate an allocated image, and reads from the
> +#     device produce the specified byte value; if unset, reads from the
> +#     device leave the buffer unchanged. Mutually exclusive with
> +#     @read-zeroes.
> +#     (since: 10.1)

Default?  The usual way to document it would be something like (default:
false), but that's again problematic.

> +#
>  # Since: 2.9
>  ##
>  { 'struct': 'BlockdevOptionsNull',
> -  'data': { '*size': 'int', '*latency-ns': 'uint64', '*read-zeroes': 'bool' 
> } }
> +  'data': { '*size': 'int',
> +            '*latency-ns': 'uint64',
> +            '*read-zeroes': 'bool',
> +            '*read-pattern': 'uint8' } }
>  
>  ##
>  # @BlockdevOptionsNVMe:

Let's take a step back from the concrete interface and ponder what we're
trying to do.  We want three cases:

* Allocated, reads return unspecified crap (security hazard)

* Allocated, reads return specified data

* Sparse, reads return zeroes

How would we do this if we started on a green field?

Here's my try:

    bool sparse
    uint8 contents

so that

* Allocated, reads return unspecified crap (security hazard)

  @sparse is false, and @contents is absent

* Allocated, reads return specified data

  @sparse is false, and @contents is present

* Sparse, reads return zeroes

  @sparse is true, and @contents must absent or zero

I'd make @sparse either mandatory or default to true, so that we don't
default to security hazard.

Now compare this to your patch: same configuration data (bool × uint8),
better names, cleaner semantics, better defaults.

Unless we want to break compatibility, we're stuck with the name
@read-zeroes for the bool, and its trap-for-the-unwary default value,
but cleaner semantics seem possible.

Thoughts?


Reply via email to