> On 8 May 2025, at 8:28, Markus Armbruster <arm...@redhat.com> wrote:
> 
> Nir Soffer <nir...@gmail.com> writes:
> 
>> When the `read-zeroes` is set, reads produce zeroes, and block status
>> return BDRV_BLOCK_ZERO, emulating a sparse image.
>> 
>> If we don't set `read-zeros` we report BDRV_BLOCK_DATA, but image data
>> is undefined; posix_memalign, _aligned_malloc, valloc, or memalign do
>> not promise to zero allocated memory.
>> 
>> When computing a blkhash of an image via qemu-nbd, we want to test 3
>> cases:
>> 
>> 1. Sparse image: skip reading the entire image based on block status
>>   result, and use a pre-computed zero block hash.
>> 2. Image full of zeroes: read the entire image, detect block full of
>>   zeroes and skip block hash computation.
>> 3. Image full of data: read the entire image and compute a hash of all
>>   blocks.
>> 
>> This change adds `read-pattern` option. If the option is set, reads
>> produce the specified pattern. With this option we can emulate an image
>> full of zeroes or full of non-zeroes.
>> 
>> The following examples shows how the new option can be used with blksum
>> (or nbdcopy --blkhash) to compute a blkhash of an image using the
>> null-co driver.
>> 
>> Sparse image - the very fast path:
>> 
>>    % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/sparse.sock \
>>        "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g', 
>> 'read-zeroes': true}}" &
>> 
>>    % time blksum 'nbd+unix:///?socket=/tmp/sparse.sock'
>>    300ad1efddb063822fea65ae3174cd35320939d4d0b050613628c6e1e876f8f6  
>> nbd+unix:///?socket=/tmp/sparse.sock
>>    blksum 'nbd+unix:///?socket=/tmp/sparse.sock'  0.05s user 0.01s system 
>> 92% cpu 0.061 total
>> 
>> Image full of zeros - same hash, 268 times slower:
>> 
>>    % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/zero.sock \
>>        "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g', 
>> 'read-pattern': 0}}" &
>> 
>>    % time blksum 'nbd+unix:///?socket=/tmp/zero.sock'
>>    300ad1efddb063822fea65ae3174cd35320939d4d0b050613628c6e1e876f8f6  
>> nbd+unix:///?socket=/tmp/zero.sock
>>    blksum 'nbd+unix:///?socket=/tmp/zero.sock'  7.45s user 22.57s system 
>> 183% cpu 16.347 total
>> 
>> Image full of data - difference hash, heavy cpu usage:
>> 
>>    % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/data.sock \
>>        "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g', 
>> 'read-pattern': 255}}" &
>> 
>>    % time blksum 'nbd+unix:///?socket=/tmp/data.sock'
>>    2c122b3ed28c83ede3c08485659fa9b56ee54ba1751db74d8ba9aa13d9866432  
>> nbd+unix:///?socket=/tmp/data.sock
>>    blksum 'nbd+unix:///?socket=/tmp/data.sock'  46.05s user 14.15s system 
>> 448% cpu 13.414 total
>> 
>> Specifying both `read-zeroes` and `read-pattern` is an error since
>> `read-zeroes` implies a sparse image. Example errors:
>> 
>>    % ./qemu-img map --output json \
>>        "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'read-pattern': 
>> -1}}"
>>    qemu-img: Could not open 'json:{...}': read_pattern is out of range 
>> (0-255)
>> 
>>    % ./qemu-img map --output json \
>>        "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'read-pattern': 
>> 0, 'read-zeroes': true}}"
>>    qemu-img: Could not open 'json:{...}': The parameters read-zeroes and 
>> read-pattern are in conflict
>> 
>> Tested on top of
>> https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg05096.html.
>> 
>> Signed-off-by: Nir Soffer <nir...@gmail.com>
> 
> [...]
> 
>> diff --git a/docs/devel/secure-coding-practices.rst 
>> b/docs/devel/secure-coding-practices.rst
>> index 0454cc527e..73830684ea 100644
>> --- a/docs/devel/secure-coding-practices.rst
>> +++ b/docs/devel/secure-coding-practices.rst
>> @@ -111,5 +111,6 @@ Use of null-co block drivers
>> The ``null-co`` block driver is designed for performance: its read accesses 
>> are
>> not initialized by default. In case this driver has to be used for security
>> research, it must be used with the ``read-zeroes=on`` option which fills read
>> -buffers with zeroes. Security issues reported with the default
>> +buffers with zeroes, or with the ``read-pattern=N`` option which fills read
>> +buffers with pattern. Security issues reported with the default
>> (``read-zeroes=off``) will be discarded.
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 7c95c9e36a..2205ac9758 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3295,13 +3295,23 @@
>> #
>> # @read-zeroes: if true, emulate a sparse image, and reads from the
>> #     device produce zeroes; if false, emulate an allocated image but
>> -#     reads from the device leave the buffer unchanged.
>> +#     reads from the device leave the buffer unchanged. Mutually
>> +#     exclusive with @read-pattern.
>> #     (default: false; since: 4.1)
> 
> Correct before the patch: absent behaves just like present and false.
> 
> That's no longer the case afterwards.  Hmm.

I don’t get the issue

> 
>> #
>> +# @read-pattern: if set, emulate an allocated image, and reads from the
>> +#     device produce the specified byte value; if unset, reads from the
>> +#     device leave the buffer unchanged. Mutually exclusive with
>> +#     @read-zeroes.
>> +#     (since: 10.1)
> 
> Default?  The usual way to document it would be something like (default:
> false), but that's again problematic.

This is optional, so we cannot have a default. Maybe (default absent)?

> 
>> +#
>> # Since: 2.9
>> ##
>> { 'struct': 'BlockdevOptionsNull',
>> -  'data': { '*size': 'int', '*latency-ns': 'uint64', '*read-zeroes': 'bool' 
>> } }
>> +  'data': { '*size': 'int',
>> +            '*latency-ns': 'uint64',
>> +            '*read-zeroes': 'bool',
>> +            '*read-pattern': 'uint8' } }
>> 
>> ##
>> # @BlockdevOptionsNVMe:
> 
> Let's take a step back from the concrete interface and ponder what we're
> trying to do.  We want three cases:
> 
> * Allocated, reads return unspecified crap (security hazard)
> 
> * Allocated, reads return specified data
> 
> * Sparse, reads return zeroes
> 
> How would we do this if we started on a green field?
> 
> Here's my try:
> 
>    bool sparse
>    uint8 contents

The names are short and nice, but we need to consider the entire API. 
"read-zeroes" works well with "detect-zeroes". "read-pattern" works well with 
"read-zeroes” and the term “pattern” works well with qemu-io -c:

    read [-abCqrv] [-P pattern [-s off] [-l len]] off len -- reads a number of 
bytes at a specified offset
    write [-bcCfnqruz] [-P pattern | -s source_file] off len -- writes a number 
of bytes at a specified offset

There is also nbdkit pattern plugin in this space:
https://libguestfs.org/nbdkit-pattern-plugin.1.html

> 
> so that
> 
> * Allocated, reads return unspecified crap (security hazard)
> 
>  @sparse is false, and @contents is absent
> 
> * Allocated, reads return specified data
> 
>  @sparse is false, and @contents is present
> 
> * Sparse, reads return zeroes
> 
>  @sparse is true, and @contents must absent or zero

“sparse" conflicts with “contents”. I would not make both possible even if 
contents is 0. 

> 
> I'd make @sparse either mandatory

Do you mean must be always specified? Why?

> or default to true, so that we don't
> default to security hazard.

Default sparse sounds good, but it means that we must use:

    “sparse": false, “contents”: 42

So we must use the conflicting options together.

“sparse”: false is a better default, expect the unspecified bytes, but it 
should not depend on spareness.

> 
> Unless we want to break compatibility, we're stuck with the name
> @read-zeroes for the bool, and its trap-for-the-unwary default value,
> but cleaner semantics seem possible.
> 
> Thoughts?

Having "sparse": true default is little better than returning unspecified data 
by default, but sing “sparse”: false will return unspecified bytes and is not 
expected - spareness should not be related to security.

If we want to make this secure by default, maybe:

   bool sparse: emulate sparse image (default false)
   uint8 contents: read return specified bytes (default 0)
   bool insecure: read return unspecified bytes, for performance testing 
(default false)

Another way is to accept invalid value for unspecified content:

   bool sparse: emulate sparse image (default false)
   uint8 contents: read return specified bytes. -1 to return unspecified bytes 
(default 0)

This may be confusing, users may assume that -1 and 255 will give the same 
result.

Another way is to use lower level terms:

    uint8 block-status: flags to return in block_status: (default 0x0, zero 
0x1, data 0x2)
    uint8 read-pattern: return specified bytes. -1 to return unspecified bytes 
(default 0)

This is harder to use but allows more combinations that can be useful or 
testing NBD. And the default is secure.

But the important questions are:
- Do we want to keep read-zeros?
- Do we want to keep the insecure default or make insecure behavior explicit?


Reply via email to