When the new option 'irq-eventfd' is turned on, the IO emulation code
signals an eventfd when it want to (de)assert an irq. The main loop
eventfd handler does the actual irq (de)assertion. This paves the way
for iothread support since QEMU's interrupt emulation is not thread
safe.
Asserting and d
Use KVM's irqfd to send interrupts when possible. This approach is
thread safe. Moreover, it does not have the inter-thread communication
overhead of plain event notifiers since handler callback are called
in the same system call as irqfd write.
Signed-off-by: Jinhao Fan
Signed-off-by: Klaus Jens
Add an option "iothread=x" to do emulation in a seperate iothread.
This improves the performance because QEMU's main loop is responsible
for a lot of other work while iothread is dedicated to NVMe emulation.
Moreover, emulating in iothread brings the potential of polling on
SQ/CQ doorbells, which I
Damien Le Moal 于2022年8月17日周三 01:50写道:
>
> On 2022/08/15 23:25, Sam Li wrote:
> > By adding zone management operations in BlockDriver, storage controller
> > emulation can use the new block layer APIs including Report Zone and
> > four zone management operations (open, close, finish, reset).
> >
>
On 25.08.2022 16:31, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is defin
On 25.08.2022 16:31, Alexander Ivanov wrote:
Don't let high_off be more than the file size even if we don't fix the image.
Signed-off-by: Alexander Ivanov
---
block/parallels.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/parallels.c b/block/parallels.c
index
On 25.08.2022 16:31, Alexander Ivanov wrote:
Set data_end to the end of the last cluster inside the image.
In such a way we can be shure that corrupted offsets in the BAT
s/shure/sure/
can't affect on the image size.
Signed-off-by: Alexander Ivanov
---
block/parallels.c | 2 ++
1 file chan
On 25.08.2022 16:31, Alexander Ivanov wrote:
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.
Signed-off-by: Alexander Ivanov
---
block/parallels.c | 59 +
job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.
Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.
To simplify the switch from aiocontext to job lock, introduce
In this series, we want to remove the AioContext lock and instead
use the already existent job_mutex to protect the job structures
and list. This is part of the work to get rid of AioContext lock
usage in favour of smaller granularity locks.
In order to simplify reviewer's job, job lock/unlock fun
Categorize the fields in struct Job to understand which ones
need to be protected by the job mutex and which don't.
Signed-off-by: Emanuele Giuseppe Esposito
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Kevin Wolf
Reviewed-by: Stefan Hajnoczi
---
include/qemu/job.h | 61
Now that the API offers also _locked() functions, take advantage
of it and give also the caller control to take the lock and call
_locked functions.
This makes sense especially when we have for loops, because it
makes no sense to have:
for(job = job_next(); ...)
where each job_next() takes the l
They all are called with job_lock held, in job_event_*_locked()
Signed-off-by: Emanuele Giuseppe Esposito
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Stefan Hajnoczi
Reviewed-by: Kevin Wolf
---
blockjob.c | 25 +++--
1 file changed, 15 insertions(+), 10 deletion
Add missing job synchronization in the unit tests, with
explicit locks.
We are deliberately using _locked functions wrapped by a guard
instead of a normal call because the normal call will be removed
in future, as the only usage is limited to the tests.
In other words, if a function like job_paus
Same as AIO_WAIT_WHILE macro, but if we are in the Main loop
do not release and then acquire ctx_ 's aiocontext.
Once all Aiocontext locks go away, this macro will replace
AIO_WAIT_WHILE.
Signed-off-by: Emanuele Giuseppe Esposito
Reviewed-by: Stefan Hajnoczi
Reviewed-by: Vladimir Sementsov-Ogie
Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
they first search for the job and then perform an action on it.
Therefore, we need to do the search + action under the same
job mutex critical section.
Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.
Signed
Just as done with job.h, create _locked() functions in blockjob.h
These functions will be later useful when caller has already taken
the lock. All blockjob _locked functions call job _locked functions.
Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.
Signed-off-by: Ema
Once job lock is used and aiocontext is removed, mirror has
to perform job operations under the same critical section,
Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.
Signed-off-by: Emanuele Giuseppe Esposito
---
block/mirror.c | 13 +
1 file changed, 9 in
Not sure what the atomic here was supposed to do, since job.busy
is protected by the job lock. Since the whole function
is called under job_mutex, just remove the atomic.
Signed-off-by: Emanuele Giuseppe Esposito
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Stefan Hajnoczi
Reviewed-by
With "intact" we mean that all job.h functions implicitly
take the lock. Therefore API callers are unmodified.
This means that:
- many static functions that will be always called with job lock held
become _locked, and call _locked functions
- all public functions take the lock internally if need
job_event_* functions can all be static, as they are not used
outside job.c.
Same applies for job_txn_add_job().
Signed-off-by: Emanuele Giuseppe Esposito
Reviewed-by: Stefan Hajnoczi
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Kevin Wolf
---
include/qemu/job.h | 18 --
In order to make it thread safe, implement a "fake rwlock",
where we allow reads under BQL *or* job_mutex held, but
writes only under BQL *and* job_mutex.
The only write we have is in child_job_set_aio_ctx, which always
happens under drain (so the job is paused).
For this reason, introduce job_set
iostatus is the only field (together with .job) that needs
protection using the job mutex.
It is set in the main loop (GLOBAL_STATE functions) but read
in I/O code (block_job_error_action).
In order to protect it, change block_job_iostatus_set_err
to block_job_iostatus_set_err_locked(), always ca
Some callbacks implementation use bdrv_* APIs that assume the
AioContext lock is held. Make sure this invariant is documented.
Signed-off-by: Emanuele Giuseppe Esposito
---
include/qemu/job.h | 27 +--
1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/include
The same job lock is being used also to protect some of blockjob fields.
Categorize them just as done in job.h.
Reviewed-by: Vladimir Sementsov-Ogievskiy
Signed-off-by: Emanuele Giuseppe Esposito
---
include/block/blockjob.h | 32 ++--
1 file changed, 26 insertions(+
These public functions are not used anywhere, thus can be dropped.
Signed-off-by: Emanuele Giuseppe Esposito
Reviewed-by: Stefan Hajnoczi
Reviewed-by: Kevin Wolf
---
blockjob.c | 16 ++--
include/block/blockjob.h | 31 ---
2 files changed,
From: Paolo Bonzini
We want to make sure access of job->aio_context is always done
under either BQL or job_mutex. The problem is that using
aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
makes the coroutine immediately resume, so we can't hold the job lock.
And caching it
On 25.08.2022 16:31, Alexander Ivanov wrote:
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.
Signed-off-by: Alexander Ivanov
---
block/parallels.c | 84 +
Change the job_{lock/unlock} and macros to use job_mutex.
Now that they are not nop anymore, remove the aiocontext
to avoid deadlocks.
Therefore:
- when possible, remove completely the aiocontext lock/unlock pair
- if it is used by some other function too, reduce the locking
section as much as
This comment applies more on job, it was left in blockjob as in the past
the whole job logic was implemented there.
Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.
No functional change intended.
Signed-off-by: Emanuele Giuseppe Esposito
Reviewed-by: Vladimir Sementso
On 26.08.2022 15:08, Denis V. Lunev wrote:
On 25.08.2022 16:31, Alexander Ivanov wrote:
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.
Signed-off-by: Alexander Ivanov
---
block/para
These public functions are not used anywhere, thus can be dropped.
Also, since this is the final job API that doesn't use AioContext
lock and replaces it with job_lock, adjust all remaining function
documentation to clearly specify if the job lock is taken or not.
Also document the locking require
On 26.08.2022 15:23, Alexander Ivanov wrote:
On 26.08.2022 15:08, Denis V. Lunev wrote:
On 25.08.2022 16:31, Alexander Ivanov wrote:
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.
Sign
It is useful to have the ability to disable these features for
compatibility with older VMs that don't have these implemented.
Signed-off-by: Daniil Tatianin
---
hw/block/vhost-user-blk.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/hw/block/vhost-user-blk.c b/hw/blo
No reason to have this be a separate field. This also makes it more akin
to what the virtio-blk device does.
Signed-off-by: Daniil Tatianin
---
hw/block/vhost-user-blk.c | 6 ++
include/hw/virtio/vhost-user-blk.h | 1 -
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/
All the offsets in the BAT must be lower than the file size.
Fix the check condition for correct check.
Signed-off-by: Alexander Ivanov
---
block/parallels.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/parallels.c b/block/parallels.c
index 8943eccbf5..e6e8b9e369 100
This patch set attempts to align vhost-user-blk with virtio-blk in
terms of backward compatibility and flexibility. It also improves
the virtio core by introducing new common code that can be used by
a virtio device to calculate its config space size.
In particular it adds the following things:
-
This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with
Use the new common helper. As an added bonus this also makes use of
config size sanity checking via the 'max_size' field.
Signed-off-by: Daniil Tatianin
---
hw/net/virtio-net.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
i
Use the common helper instead of duplicating the same logic.
Signed-off-by: Daniil Tatianin
---
hw/block/virtio-blk.c | 16 +++-
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e9ba752f6b..10c47c2934 100644
--- a/hw/bl
This way we can reuse it for other virtio-blk devices, e.g
vhost-user-blk, which currently does not control its config space size
dynamically.
Signed-off-by: Daniil Tatianin
---
MAINTAINERS | 4 +++
hw/block/meson.build | 4 +--
hw/block/virtio-blk-co
Make vhost-user-blk backwards compatible when migrating from older VMs
running with modern features turned off, the same way it was done for
virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the
features enabled")
It's currently impossible to migrate from an older VM with
vhos
This has no more users and is superseded by virtio_get_config_size.
Signed-off-by: Daniil Tatianin
---
hw/virtio/virtio.c | 15 ---
include/hw/virtio/virtio.h | 3 ---
2 files changed, 18 deletions(-)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8518382025..c0
Add an option "iothread=x" to do emulation in a seperate iothread.
This improves the performance because QEMU's main loop is responsible
for a lot of other work while iothread is dedicated to NVMe emulation.
Moreover, emulating in iothread brings the potential of polling on
SQ/CQ doorbells, which I
When the new option 'irq-eventfd' is turned on, the IO emulation code
signals an eventfd when it want to (de)assert an irq. The main loop
eventfd handler does the actual irq (de)assertion. This paves the way
for iothread support since QEMU's interrupt emulation is not thread
safe.
Asserting and d
Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones
that are larger than the LBA size. It can only allow sequential writes, which
reduces write amplification in SSD, leading to higher throughput and increased
capacity. More details about ZBDs can be found at:
https://zone
On Aug 26 09:34, Keith Busch wrote:
> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > Use KVM's irqfd to send interrupts when possible. This approach is
> > thread safe. Moreover, it does not have the inter-thread communication
> > overhead of plain event notifiers since handler cal
On 26.08.2022 16:27, Alexander Ivanov wrote:
All the offsets in the BAT must be lower than the file size.
Fix the check condition for correct check.
Signed-off-by: Alexander Ivanov
---
block/parallels.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/parallels.c b/b
at 7:18 PM, Jinhao Fan wrote:
> @@ -4979,7 +5007,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n,
> uint64_t dma_addr,
> }
> }
> n->cq[cqid] = cq;
> -cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
> +
> +if (cq->cqid) {
> +cq->timer =
Use KVM's irqfd to send interrupts when possible. This approach is
thread safe. Moreover, it does not have the inter-thread communication
overhead of plain event notifiers since handler callback are called
in the same system call as irqfd write.
Signed-off-by: Jinhao Fan
Signed-off-by: Klaus Jens
On Aug 26 09:54, Keith Busch wrote:
> On Fri, Aug 26, 2022 at 05:45:21PM +0200, Klaus Jensen wrote:
> > On Aug 26 09:34, Keith Busch wrote:
> > > On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > > > Use KVM's irqfd to send interrupts when possible. This approach is
> > > > thread saf
On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> Use KVM's irqfd to send interrupts when possible. This approach is
> thread safe. Moreover, it does not have the inter-thread communication
> overhead of plain event notifiers since handler callback are called
> in the same system call a
Signed-off-by: Sam Li
Reviewed-by: Stefan Hajnoczi
Reviewed-by: Damien Le Moal
---
include/block/block-common.h | 43
1 file changed, 43 insertions(+)
diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..36bd0e480e 1006
at 11:34 PM, Keith Busch wrote:
> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
>> Use KVM's irqfd to send interrupts when possible. This approach is
>> thread safe. Moreover, it does not have the inter-thread communication
>> overhead of plain event notifiers since handler callback
Use get_sysfs_str_val() to get the string value of device
zoned model. Then get_sysfs_zoned_model() can convert it to
BlockZoneModel type in QEMU.
Use get_sysfs_long_val() to get the long value of zoned device
information.
Signed-off-by: Sam Li
Reviewed-by: Hannes Reinecke
Reviewed-by: Stefan H
On Fri, Aug 26, 2022 at 05:45:21PM +0200, Klaus Jensen wrote:
> On Aug 26 09:34, Keith Busch wrote:
> > On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > > Use KVM's irqfd to send interrupts when possible. This approach is
> > > thread safe. Moreover, it does not have the inter-thread
By adding zone management operations in BlockDriver, storage controller
emulation can use the new block layer APIs including Report Zone and
four zone management operations (open, close, finish, reset).
Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
zone_close(zc), zone
Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.
Signed-off-by: Sam Li
Reviewed-by: Stefan Hajnoczi
---
block.c | 14 ++
block/file-posix.c | 13 +
block/raw-format.c | 1 +
include/block/bloc
raw-format driver usually sits on top of file-posix driver. It needs to
pass through requests of zone commands.
Signed-off-by: Sam Li
Reviewed-by: Stefan Hajnoczi
---
block/raw-format.c | 13 +
1 file changed, 13 insertions(+)
diff --git a/block/raw-format.c b/block/raw-format.c
in
Add the documentation about the zoned device support to virtio-blk
emulation.
Signed-off-by: Sam Li
Reviewed-by: Stefan Hajnoczi
---
docs/devel/zoned-storage.rst | 41 ++
docs/system/qemu-block-drivers.rst.inc | 6
2 files changed, 47 insertions(+)
creat
We have added new block layer APIs of zoned block devices. Test it with:
Create a null_blk device, run each zone operation on it and see
whether reporting right zone information.
Signed-off-by: Sam Li
Reviewed-by: Stefan Hajnoczi
---
tests/qemu-iotests/tests/zoned.out | 53 ++
t
61 matches
Mail list logo