[PATCH] linux-user: report ENOTTY for unknown ioctls
The correct error number for unknown ioctls is ENOTTY. ENOSYS would mean that the ioctl() syscall itself is not implemented, which is very improbable and unexpected for userspace. ENOTTY means "Inappropriate ioctl for device". This is what the kernel returns on unknown ioctls, what qemu is trying to express and what userspace is prepared to handle. Signed-off-by: Thomas Weißschuh --- linux-user/syscall.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 69f740ff98c8..c5955313a063 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -5747,7 +5747,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg) if (ie->target_cmd == 0) { qemu_log_mask( LOG_UNIMP, "Unsupported ioctl: cmd=0x%04lx\n", (long)cmd); -return -TARGET_ENOSYS; +return -TARGET_ENOTTY; } if (ie->target_cmd == cmd) break; @@ -5759,7 +5759,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg) } else if (!ie->host_cmd) { /* Some architectures define BSD ioctls in their headers that are not implemented in Linux. */ -return -TARGET_ENOSYS; +return -TARGET_ENOTTY; } switch(arg_type[0]) { @@ -5817,7 +5817,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg) qemu_log_mask(LOG_UNIMP, "Unsupported ioctl type: cmd=0x%04lx type=%d\n", (long)cmd, arg_type[0]); -ret = -TARGET_ENOSYS; +ret = -TARGET_ENOTTY; break; } return ret; base-commit: a14b8206c5edcbbad1c71256ea9b44c3b382a9f5 -- 2.40.1
Re: [PATCH v2 1/3] hw/i2c: add mctp core
On Apr 25 10:19, Corey Minyard wrote: > On Tue, Apr 25, 2023 at 08:35:38AM +0200, Klaus Jensen wrote: > > From: Klaus Jensen > > > > Add an abstract MCTP over I2C endpoint model. This implements MCTP > > control message handling as well as handling the actual I2C transport > > (packetization). > > > > Devices are intended to derive from this and implement the class > > methods. > > > > Parts of this implementation is inspired by code[1] previously posted by > > Jonathan Cameron. > > All in all this looks good. Two comments: > > I would like to see the buffer handling consolidated into one function > and the length checked, even for (especially for) the outside users of > this code, like the nvme code. Best to avoid future issues with buffer > overruns. This will require reworking the get_message_types function, > unfortunately. > Right now the implementations (i.e. hw/nvme/nmi-i2c.c) writes directly into the mctp core buffer for get_message_bytes(). The contract is that it must not write more than the `maxlen` parameter. Is that bad? Would it be better that get_message_bytes() returned a pointer to its own buffer that hw/mctp can then copy from? > You have one trace function on a bad receive message check, but lots of > other bad receive message checks with no trace. Just a suggestion, but > it might be nice for tracking down issues to trace all the reasons a > message is dropped. > Sounds reasonable! :) Thanks for the review! signature.asc Description: PGP signature
Re: [PULL 61/73] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
"Michael S. Tsirkin" wrote: > On Tue, Apr 25, 2023 at 08:42:17PM -0400, Peter Xu wrote: >> Hi, Michael, Jonathan, >> >> On Tue, Mar 07, 2023 at 08:13:53PM -0500, Michael S. Tsirkin wrote: >> This breaks the simplest migration from QEMU 8.0->7.2 binaries on all >> machine types I think as long as the cap is present, e.g. the default >> e1000e provided by the default q35 machine can already hit it with all >> default cmdline: >> >> ./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX] >> >> 7.2 binary will have empty wmask for PCI_ERR_UNCOR_MASK, meanwhile I think >> it can also see a non-zero value, then the migration will fail at: >> >> vmstate_load :00:02.0/e1000e, e1000e >> >> qemu-7.2: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 >> cmask: ff wmask: 0 w1cmask:0 >> qemu-7.2: Failed to load PCIDevice:config >> qemu-7.2: Failed to load e1000e:parent_obj >> >> qemu-7.2: error while loading state for instance 0x0 of device >> ':00:02.0/e1000e' >> qemu-7.2: load of migration failed: Invalid argument >> >> We probably at least want to have the default value to be still zero, and >> we'd need to make sure it'll not be modified by the guest, iiuc. >> >> Below oneliner works for me and makes the migration work again: >> >> ===8<=== >> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c >> index 103667c368..563a37b79c 100644 >> --- a/hw/pci/pcie_aer.c >> +++ b/hw/pci/pcie_aer.c >> @@ -113,7 +113,7 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, >> uint16_t offset, >> pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, >> PCI_ERR_UNC_SUPPORTED); >> pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, >> - PCI_ERR_UNC_MASK_DEFAULT); >> + 0/*PCI_ERR_UNC_MASK_DEFAULT*/); >> pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, >> PCI_ERR_UNC_SUPPORTED); >> ===8<=== >> >> Anyone could have a look on a solid solution from PCI side? >> >> Copy Juan and Leonardo. >> >> Thanks, > > My bad, I forgot about this 🤦. > So we need a property and tweak it with compat machinery depending on > machine type. Jonathan, can you work on this pls? > Or I can revert for now to relieve the time pressure, > redo the patch at your leasure. I agree with Michael here, the best option is adding a new property. Later, Juan.
patch:avoid-to-create-a-null-watch-in-flush_buf
Hi, I am testing when qga push msg to libvirt, when the vm saved and restore from the memory file, the qga will be hunged in writing. In flush_buf: (gdb) f 4 #4 0x55ffc716894a in flush_buf (port=0x55ffc87bf800, buf=, len=) at ../hw/char/virtio-console.c:100 100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr, (gdb) l 95 * use throttling on host side. 96 */ 97 if (!k->is_console) { 98 virtio_serial_throttle_port(port, true); 99 if (!vcon->watch) { 100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr, 101 G_IO_OUT|G_IO_HUP, 102 chr_write_unblocked, vcon); 103 } 104 } vcon->watch is 0, but the port is throttled, so it will not process the fe event, until restart qga or restart libvirtd. This issue is happend when the vm is migrating or restore from a memory file saved. So I think it can be fixed by following: 100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr, (gdb) l 95 * use throttling on host side. 96 */ 97 if (!k->is_console) { 98 virtio_serial_throttle_port(port, true); 99 if (!vcon->watch) { 100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr, 101 G_IO_OUT|G_IO_HUP, 102 chr_write_unblocked, vcon); + if (!vcon->watch) +virtio_serial_throttle_port(port, false); 103 } 104 } I have test the code, and it works. wang...@chinatelecom.cn
Re: [RFC PATCH v2 1/9] riscv: implement Ssqosid extension and sqoscfg CSR
On 2023/4/26 04:38, Drew Fustini wrote: From: Kornel Dulęba Implement the sqoscfg CSR defined by the Ssqosid ISA extension (Supervisor-mode Quality of Service ID). The CSR contains two fields: - Resource Control ID (RCID) used determine resource allocation - Monitoring Counter ID (MCID) used to track resource usage The CSR is defined for S-mode but accessing it when V=1 shall cause a virtual instruction exception. Implement this behavior by calling the hmode predicate. Link: https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf Signed-off-by: Kornel Dulęba [dfustini: rebase on v8.0.50, reword commit message] Signed-off-by: Drew Fustini --- Changes since v1: - rebase on current master (v8.0.50) instead of 8.0.0-rc4 disas/riscv.c | 1 + target/riscv/cpu.c | 2 ++ target/riscv/cpu.h | 3 +++ target/riscv/cpu_bits.h | 5 + target/riscv/csr.c | 34 ++ 5 files changed, 45 insertions(+) diff --git a/disas/riscv.c b/disas/riscv.c index d6b0fbe5e877..94336f54637b 100644 --- a/disas/riscv.c +++ b/disas/riscv.c @@ -2100,6 +2100,7 @@ static const char *csr_name(int csrno) case 0x0143: return "stval"; case 0x0144: return "sip"; case 0x0180: return "satp"; +case 0x0181: return "sqoscfg"; case 0x0200: return "hstatus"; case 0x0202: return "hedeleg"; case 0x0203: return "hideleg"; diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index 1e97473af27b..fb3f8c43a32d 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -114,6 +114,7 @@ static const struct isa_ext_data isa_edata_arr[] = { ISA_EXT_DATA_ENTRY(smaia, true, PRIV_VERSION_1_12_0, ext_smaia), ISA_EXT_DATA_ENTRY(ssaia, true, PRIV_VERSION_1_12_0, ext_ssaia), ISA_EXT_DATA_ENTRY(sscofpmf, true, PRIV_VERSION_1_12_0, ext_sscofpmf), +ISA_EXT_DATA_ENTRY(ssqosid, true, PRIV_VERSION_1_12_0, ext_ssqosid), ISA_EXT_DATA_ENTRY(sstc, true, PRIV_VERSION_1_12_0, ext_sstc), ISA_EXT_DATA_ENTRY(svadu, true, PRIV_VERSION_1_12_0, ext_svadu), ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0, ext_svinval), @@ -1397,6 +1398,7 @@ static Property riscv_cpu_extensions[] = { DEFINE_PROP_BOOL("svadu", RISCVCPU, cfg.ext_svadu, true), +DEFINE_PROP_BOOL("ssqosid", RISCVCPU, cfg.ext_ssqosid, true), DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false), DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false), DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false), diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index 638e47c75a57..ffc1b5009d15 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -222,6 +222,8 @@ struct CPUArchState { target_ulong mcause; target_ulong mtval; /* since: priv-1.10.0 */ +target_ulong sqoscfg; + /* Machine and Supervisor interrupt priorities */ uint8_t miprio[64]; uint8_t siprio[64]; @@ -454,6 +456,7 @@ struct RISCVCPUConfig { bool ext_icboz; bool ext_zicond; bool ext_zihintpause; +bool ext_ssqosid; bool ext_smstateen; bool ext_sstc; bool ext_svadu; diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h index fca7ef0cef91..d11a3928735e 100644 --- a/target/riscv/cpu_bits.h +++ b/target/riscv/cpu_bits.h @@ -217,6 +217,7 @@ /* Supervisor Protection and Translation */ #define CSR_SPTBR 0x180 #define CSR_SATP0x180 +#define CSR_SQOSCFG 0x181 /* Supervisor-Level Window to Indirectly Accessed Registers (AIA) */ #define CSR_SISELECT0x150 @@ -898,4 +899,8 @@ typedef enum RISCVException { #define MHPMEVENT_IDX_MASK 0xF #define MHPMEVENT_SSCOF_RESVD 16 +/* SQOSCFG BITS (QOSID) */ +#define SQOSCFG_RCID 0x0FFF +#define SQOSCFG_MCID 0x0FFF + #endif diff --git a/target/riscv/csr.c b/target/riscv/csr.c index d522efc0b63a..5769b3545704 100644 --- a/target/riscv/csr.c +++ b/target/riscv/csr.c @@ -2700,6 +2700,37 @@ static RISCVException write_satp(CPURISCVState *env, int csrno, return RISCV_EXCP_NONE; } +static RISCVException check_sqoscfg(CPURISCVState *env, int csrno) +{ +RISCVCPU *cpu = env_archcpu(env); + +if (!cpu->cfg.ext_ssqosid) { +return RISCV_EXCP_ILLEGAL_INST; +} + +/* + * Even though this is an S-mode CSR the spec says that we need to throw + * and virt instruction fault if a guest tries to access it. + */ +return hmode(env, csrno); If above comments is true, use hmode() here is not right, since it only check whether H extension is supported. It need another check for guest mode access. And we should use smode() instead of hmode() here. +} + +static RISCVException read_sqoscfg(CPURISCVState *env, int csrno, +target_ulong *val) 'target_ulong' is better to align with 'CPURISCVState'. Regards, Wei
Re: test-blockjob: intermittent CI failures in msys2-64bit job
Am 25/04/2023 um 18:48 schrieb Hanna Czenczek: > On 24.04.23 20:32, Vladimir Sementsov-Ogievskiy wrote: >> On 24.04.23 16:36, Emanuele Giuseppe Esposito wrote: >>> >>> >>> Am 21/04/2023 um 12:13 schrieb Vladimir Sementsov-Ogievskiy: On 17.03.23 15:35, Thomas Huth wrote: > On 17/03/2023 11.17, Peter Maydell wrote: >> On Mon, 6 Mar 2023 at 11:16, Peter Maydell >> wrote: >>> >>> On Fri, 3 Mar 2023 at 18:36, Peter Maydell >>> wrote: I've noticed that test-blockjob seems to fail intermittently on the msys2-64bit job: https://gitlab.com/qemu-project/qemu/-/jobs/3872508803 https://gitlab.com/qemu-project/qemu/-/jobs/3871061024 https://gitlab.com/qemu-project/qemu/-/jobs/3865312440 Sample output: | 53/89 ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby: assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR 53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3 >> >>> Here's an intermittent failure from my macos x86 machine: >>> >>> 172/621 qemu:unit / test-blockjob >>> ERROR 0.26s killed by signal 6 SIGABRT >> >> And an intermittent on the freebsd 13 CI job: >> https://gitlab.com/qemu-project/qemu/-/jobs/3950667240 >> > MALLOC_PERTURB_=197 > G_TEST_BUILDDIR=/tmp/cirrus-ci-build/build/tests/unit > G_TEST_SRCDIR=/tmp/cirrus-ci-build/tests/unit > /tmp/cirrus-ci-build/build/tests/unit/test-blockjob --tap -k >> ▶ 178/650 /blockjob/ids >> OK >> 178/650 qemu:unit / test-blockjob >> ERROR 0.31s killed by signal 6 SIGABRT >> ― ✀ >> ― >> stderr: >> Assertion failed: (job->status == JOB_STATUS_STANDBY), function >> test_complete_in_standby, file ../tests/unit/test-blockjob.c, line >> 499. >> >> >> TAP parsing error: Too few tests run (expected 9, got 1) >> (test program exited with status code -6) >> ―― >> >> Anybody in the block team looking at these, or shall we just >> disable this test entirely ? > > I ran into this issue today, too: > > https://gitlab.com/thuth/qemu/-/jobs/3954367101 > > ... if nobody is interested in fixing this test, I think we should > disable it... > > Thomas > I'm looking at this now, and seems that it's broken since 6f592e5aca1a27fe1c1f6 "job.c: enable job lock/unlock and remove Aiocontext locks", as it stops critical section by new aio_context_release() call exactly after bdrv_drain_all_and(), so it's not a surprise that job may start at that moment and the following assertion fires. Emanuele could please look at it? >>> Well if I understood correctly, the only thing that was preventing the >>> job from continuing was the aiocontext lock held. >>> >>> The failing assertion even mentions that: >>> /* Lock the IO thread to prevent the job from being run */ >>> [...] >>> /* But the job cannot run, so it will remain on standby */ >>> assert(job->status == JOB_STATUS_STANDBY); >>> >>> Essentially bdrv_drain_all_end() would wake up the job coroutine, but it >>> would anyways block somewhere after since the aiocontext lock was taken >>> by the test. >>> >>> Now that we got rid of aiocontext lock in job code, nothing prevents the >>> test from resuming. >>> Mixing job lock and aiocontext acquire/release is not a good idea, and >>> it would anyways block job_resume() called by bdrv_drain_all_end(), so >>> the test itself would deadlock. >>> >>> So unless @Kevin has a better idea, this seems to be just an "hack" to >>> test stuff that is not possible to do now anymore. What I would suggest >>> is to get rid of that test, or at least of that assert function. I >>> unfortunately cannot reproduce the failure, but I think the remaining >>> functions called by the test should run as expected. >>> >> >> Thanks! I agree. Probably, alternatively we could just expand the >> drained section, like >> >> @@ -488,12 +488,6 @@ static void test_complete_in_standby(void) >> bdrv_drain_all_begin(); >> assert_job_status_is(job, JOB_STATUS_STANDBY); >> >> - /* Lock the IO thread to prevent the job from being run */ >> - aio_context_acquire(ctx); >> - /* This will schedule the job to resume it */ >> - bdrv_drain_all_end(); >> - aio_context_release(ctx); >> - >> WITH_JOB_LOCK_GUARD() { >> /* But the job cannot run, so it will remain on standby */ >> assert(job->status == JOB_STATUS_STANDBY); >> @@ -511,6 +505,7 @@ static void test_complete_in_standby(void) >> job_dismiss_locked(&job, &error_abort); >> } >> >> + bdrv_drain_all_end(); >> aio_context
Re: test-blockjob: intermittent CI failures in msys2-64bit job
On 26.04.23 09:38, Emanuele Giuseppe Esposito wrote: Am 25/04/2023 um 18:48 schrieb Hanna Czenczek: On 24.04.23 20:32, Vladimir Sementsov-Ogievskiy wrote: On 24.04.23 16:36, Emanuele Giuseppe Esposito wrote: Am 21/04/2023 um 12:13 schrieb Vladimir Sementsov-Ogievskiy: On 17.03.23 15:35, Thomas Huth wrote: On 17/03/2023 11.17, Peter Maydell wrote: On Mon, 6 Mar 2023 at 11:16, Peter Maydell wrote: On Fri, 3 Mar 2023 at 18:36, Peter Maydell wrote: I've noticed that test-blockjob seems to fail intermittently on the msys2-64bit job: https://gitlab.com/qemu-project/qemu/-/jobs/3872508803 https://gitlab.com/qemu-project/qemu/-/jobs/3871061024 https://gitlab.com/qemu-project/qemu/-/jobs/3865312440 Sample output: | 53/89 ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby: assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR 53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3 Here's an intermittent failure from my macos x86 machine: 172/621 qemu:unit / test-blockjob ERROR 0.26s killed by signal 6 SIGABRT And an intermittent on the freebsd 13 CI job: https://gitlab.com/qemu-project/qemu/-/jobs/3950667240 MALLOC_PERTURB_=197 G_TEST_BUILDDIR=/tmp/cirrus-ci-build/build/tests/unit G_TEST_SRCDIR=/tmp/cirrus-ci-build/tests/unit /tmp/cirrus-ci-build/build/tests/unit/test-blockjob --tap -k ▶ 178/650 /blockjob/ids OK 178/650 qemu:unit / test-blockjob ERROR 0.31s killed by signal 6 SIGABRT ― ✀ ― stderr: Assertion failed: (job->status == JOB_STATUS_STANDBY), function test_complete_in_standby, file ../tests/unit/test-blockjob.c, line 499. TAP parsing error: Too few tests run (expected 9, got 1) (test program exited with status code -6) ―― Anybody in the block team looking at these, or shall we just disable this test entirely ? I ran into this issue today, too: https://gitlab.com/thuth/qemu/-/jobs/3954367101 ... if nobody is interested in fixing this test, I think we should disable it... Thomas I'm looking at this now, and seems that it's broken since 6f592e5aca1a27fe1c1f6 "job.c: enable job lock/unlock and remove Aiocontext locks", as it stops critical section by new aio_context_release() call exactly after bdrv_drain_all_and(), so it's not a surprise that job may start at that moment and the following assertion fires. Emanuele could please look at it? Well if I understood correctly, the only thing that was preventing the job from continuing was the aiocontext lock held. The failing assertion even mentions that: /* Lock the IO thread to prevent the job from being run */ [...] /* But the job cannot run, so it will remain on standby */ assert(job->status == JOB_STATUS_STANDBY); Essentially bdrv_drain_all_end() would wake up the job coroutine, but it would anyways block somewhere after since the aiocontext lock was taken by the test. Now that we got rid of aiocontext lock in job code, nothing prevents the test from resuming. Mixing job lock and aiocontext acquire/release is not a good idea, and it would anyways block job_resume() called by bdrv_drain_all_end(), so the test itself would deadlock. So unless @Kevin has a better idea, this seems to be just an "hack" to test stuff that is not possible to do now anymore. What I would suggest is to get rid of that test, or at least of that assert function. I unfortunately cannot reproduce the failure, but I think the remaining functions called by the test should run as expected. Thanks! I agree. Probably, alternatively we could just expand the drained section, like @@ -488,12 +488,6 @@ static void test_complete_in_standby(void) bdrv_drain_all_begin(); assert_job_status_is(job, JOB_STATUS_STANDBY); - /* Lock the IO thread to prevent the job from being run */ - aio_context_acquire(ctx); - /* This will schedule the job to resume it */ - bdrv_drain_all_end(); - aio_context_release(ctx); - WITH_JOB_LOCK_GUARD() { /* But the job cannot run, so it will remain on standby */ assert(job->status == JOB_STATUS_STANDBY); @@ -511,6 +505,7 @@ static void test_complete_in_standby(void) job_dismiss_locked(&job, &error_abort); } + bdrv_drain_all_end(); aio_context_acquire(ctx); destroy_blk(blk); aio_context_release(ctx); But, seems that test wanted to specifically test job, that still in STANDBY exactly after drained section... [cc: Hanna] Hanna, it was your test (added in c2c731a4d35062295cd3260e66b3754588a2fad4, two years ago). Don't you remember was important to catch STANDBY job *after* a drained section? I’m not quite sure, but I think the idea was that we basically try to get as close to something that might come in over QMP. Over QMP, you can’t issue a job-complete while keeping everything drained, so I wouldn’t just e
Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx
On 4/24/23 22:02, John Snow wrote: Some important points as a pre-emptive "FAQ": - This venv is unconditionally created and lives at {build_dir}/pyvenv. - The python interpreter used by this venv is always the one identified by configure. (Which in turn is always the one specified by --python or $PYTHON) -*almost* all python scripts in qemu.git executed as part of the build system, meson, sphinx, avocado tests, vm tests or CI are always executed within this venv. (iotests are not yet integrated; I plan to tackle this separately as a follow-up in order to have a more tightly focused scope on that series.) - It remains possible to build and test fully offline. (In most cases, you just need meson and sphinx from your distro's repo.) - Distribution packaged 'meson' and 'sphinx' are still utilized whenever possible as the highest preference. - Vendored versions of e.g. 'meson' are always preferred to PyPI versions for speed, repeatability and ensuring tarball builds work as-is offline. (Sphinx will not be vendored, just like it already isn't.) - Missing dependencies, when possible, are fetched and installed on-demand automatically to make developer environments "just work". - Works for Python 3.7 and up, on Fedora, OpenSuSE, Red Hat, CentOS, Alpine, Debian, Ubuntu, NetBSD, OpenBSD, and hopefully everywhere - No new dependencies (...for most platforms. Debian and NetBSD get an asterisk.) - The meson git submodule is unused after this series and can be removed. Thanks, this looks pretty good. Some changes I'd make for the non-RFC version: - I think we should just check in the meson wheel (which also removes the need for patch 12, so it can be done in its stead) and remove the submodule - The verbosity of mkvenv.py can be tuned down and most prints replaced with logger.info() or logger.debug() - While I agree with keeping patch 18 separate, I would move it earlier so that patch 19 can be squashed into patch 14 - I am ambivalent about keeping --enable/--disable-pypi in the first committed patchset, but in any case I would move patches 16 and 20 before patch 15 Paolo
[PATCH] tests/unit/test-blockjob: Re-enable complete_in_standby test
Pause the job while draining so that pause_count will be increased and bdrv_drain_all_end() won't reset it to 0, so the job will not continue. With this fix, the test is not flaky anymore. Additionally remove useless aiocontext lock around bdrv_drain_all_end() in test_complete_in_standby(). Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable complete_in_standby test" Suggested-by: Hanna Czenczek Signed-off-by: Emanuele Giuseppe Esposito --- tests/unit/test-blockjob.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c index a130f6fefb..46d720aeee 100644 --- a/tests/unit/test-blockjob.c +++ b/tests/unit/test-blockjob.c @@ -488,11 +488,15 @@ static void test_complete_in_standby(void) bdrv_drain_all_begin(); assert_job_status_is(job, JOB_STATUS_STANDBY); +/* + * Increase pause_count so that the counter is + * unbalanced and job won't resume + */ +job_pause(job); + /* Lock the IO thread to prevent the job from being run */ -aio_context_acquire(ctx); /* This will schedule the job to resume it */ bdrv_drain_all_end(); -aio_context_release(ctx); WITH_JOB_LOCK_GUARD() { /* But the job cannot run, so it will remain on standby */ @@ -531,13 +535,6 @@ int main(int argc, char **argv) g_test_add_func("/blockjob/cancel/standby", test_cancel_standby); g_test_add_func("/blockjob/cancel/pending", test_cancel_pending); g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded); - -/* - * This test is flaky and sometimes fails in CI and otherwise: - * don't run unless user opts in via environment variable. - */ -if (getenv("QEMU_TEST_FLAKY_TESTS")) { -g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby); -} +g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby); return g_test_run(); } -- 2.39.1
Re: test-blockjob: intermittent CI failures in msys2-64bit job
Am 26/04/2023 um 10:03 schrieb Hanna Czenczek: > On 26.04.23 09:38, Emanuele Giuseppe Esposito wrote: >> >> Am 25/04/2023 um 18:48 schrieb Hanna Czenczek: >>> On 24.04.23 20:32, Vladimir Sementsov-Ogievskiy wrote: On 24.04.23 16:36, Emanuele Giuseppe Esposito wrote: > > Am 21/04/2023 um 12:13 schrieb Vladimir Sementsov-Ogievskiy: >> On 17.03.23 15:35, Thomas Huth wrote: >>> On 17/03/2023 11.17, Peter Maydell wrote: On Mon, 6 Mar 2023 at 11:16, Peter Maydell wrote: > On Fri, 3 Mar 2023 at 18:36, Peter Maydell > wrote: >> I've noticed that test-blockjob seems to fail intermittently >> on the msys2-64bit job: >> >> https://gitlab.com/qemu-project/qemu/-/jobs/3872508803 >> https://gitlab.com/qemu-project/qemu/-/jobs/3871061024 >> https://gitlab.com/qemu-project/qemu/-/jobs/3865312440 >> >> Sample output: >> | 53/89 >> ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby: >> assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR >> 53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3 > Here's an intermittent failure from my macos x86 machine: > > 172/621 qemu:unit / test-blockjob > ERROR 0.26s killed by signal 6 SIGABRT And an intermittent on the freebsd 13 CI job: https://gitlab.com/qemu-project/qemu/-/jobs/3950667240 >>> MALLOC_PERTURB_=197 >>> G_TEST_BUILDDIR=/tmp/cirrus-ci-build/build/tests/unit >>> G_TEST_SRCDIR=/tmp/cirrus-ci-build/tests/unit >>> /tmp/cirrus-ci-build/build/tests/unit/test-blockjob --tap -k ▶ 178/650 /blockjob/ids OK 178/650 qemu:unit / test-blockjob ERROR 0.31s killed by signal 6 SIGABRT ― ✀ ― stderr: Assertion failed: (job->status == JOB_STATUS_STANDBY), function test_complete_in_standby, file ../tests/unit/test-blockjob.c, line 499. TAP parsing error: Too few tests run (expected 9, got 1) (test program exited with status code -6) ―― Anybody in the block team looking at these, or shall we just disable this test entirely ? >>> I ran into this issue today, too: >>> >>> https://gitlab.com/thuth/qemu/-/jobs/3954367101 >>> >>> ... if nobody is interested in fixing this test, I think we should >>> disable it... >>> >>> Thomas >>> >> I'm looking at this now, and seems that it's broken since >> 6f592e5aca1a27fe1c1f6 "job.c: enable job lock/unlock and remove >> Aiocontext locks", as it stops critical section by new >> aio_context_release() call exactly after bdrv_drain_all_and(), so >> it's >> not a surprise that job may start at that moment and the following >> assertion fires. >> >> Emanuele could please look at it? >> > Well if I understood correctly, the only thing that was preventing the > job from continuing was the aiocontext lock held. > > The failing assertion even mentions that: > /* Lock the IO thread to prevent the job from being run */ > [...] > /* But the job cannot run, so it will remain on standby */ > assert(job->status == JOB_STATUS_STANDBY); > > Essentially bdrv_drain_all_end() would wake up the job coroutine, > but it > would anyways block somewhere after since the aiocontext lock was > taken > by the test. > > Now that we got rid of aiocontext lock in job code, nothing > prevents the > test from resuming. > Mixing job lock and aiocontext acquire/release is not a good idea, and > it would anyways block job_resume() called by bdrv_drain_all_end(), so > the test itself would deadlock. > > So unless @Kevin has a better idea, this seems to be just an "hack" to > test stuff that is not possible to do now anymore. What I would > suggest > is to get rid of that test, or at least of that assert function. I > unfortunately cannot reproduce the failure, but I think the remaining > functions called by the test should run as expected. > Thanks! I agree. Probably, alternatively we could just expand the drained section, like @@ -488,12 +488,6 @@ static void test_complete_in_standby(void) bdrv_drain_all_begin(); assert_job_status_is(job, JOB_STATUS_STANDBY); - /* Lock the IO thread to prevent the job from being run */ - aio_context_acquire(ctx); - /* This will schedule the job to resume it */ - bdrv_drain_all_end(); - aio_context_release(ctx); - WITH_JO
Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx
On Tue, Apr 25, 2023 at 02:58:53PM -0400, John Snow wrote: > On Tue, Apr 25, 2023 at 2:10 PM Daniel P. Berrangé > wrote: > > How about having --enable-pypi never|auto|force with the following > > semantics for --enable-docs + --enable-pypi > > > > > > * docs=no - pypi never used > > > > * docs=auto + pypi=never => docs only enable if sphinx is already > >installed locally, otherwise disabled > > > > * docs=auto + pypi=auto => docs enable if sphinx is already > > installed locally, or can download from > > pypi as fallback > > > > As long as this doesn't cause needless trouble for existing configure > invocations baked into scripts and the like. It's a bit more > aggressive about enabling docs than we have been in the past, but > maybe that's not a bad thing. > > > * docs=auto + pypi=force => always download sphinx from pypi > > > > So if you already have Sphinx, this should perform an upgrade to the > latest version? Essentially I meant 'force' to mean *never* use the host python installation packages. Always install all the deps in the venv, even if they exist in the host with sufficient version met. > > * docs=yes + pypi=never => ERROR if sphinx is not already > > installed locally > > > > * docs=yes + pypi=auto => docs enable if sphinx is already > > installed locally, or can download from > > pypi as fallback > > > > * docs=yes + pypi=force => always download sphinx from pypi > > Yeah, there's room for settings like these, and the above mostly makes > sense to me. > > Paolo and I had written up a more elaborate set of flags at one point > under the premise of integrating testing, too -- for instance, > pre-loading the testing dependencies for iotests (local qemu package, > qemu.qmp, etc) or for avocado tests (avocado-framework and other deps > needed for those tests). I wanted to try and check in a "minimum > viable" version of this patch set first close to the 8.1 tree opening > to iron out any bugs in the core feature and then I'd work to expand > the flags and capability of the system as differential patches. > > So, I think changing --enable-pypi too much is beyond the scope of the > current patchset, but it's on my mind for what will follow almost > immediately after it. With that in mind, we can brainstorm a little > here and throw some darts at the wall: Yep, I don't consider it a pre-requisite for this series. > > The version of flags we originally came up with was this: > > --python=... # runtime used to create venv > --enable-pip-groups=testing,devel,avocado,meson,sphinx > --enable-pip=now # install all python deps now > --enable-pip=on-demand # install qemu.git/meson/sphinx, delay the rest > --enable-pip=no# offline > --{enable,disable}-isolated-venv # let venv use system/distro if disable This feels like a bit of overkill to me, and would create a hell of a lot of combinations to test if you expand the matrix of options. > I may have pulled us in a slightly different direction with the > version of the patches I actually sent here today, but the key ideas > here were: > > - The ability to opt into or out of different package groups at > configure-time. meson is always required, docs/sphinx is optional, and > the other three groups are testing related. (testing: iotests and > qemu.qmp, devel: python self-tests, avocado: avocado tests.) I think this is especially overkill. While you can probably come up with some scenarios where you could use this fine grained selection, I don't think it would be very compelling. > - The ability to decide when packages get installed to the venv; > either at configure time ("now") or as-needed ("on-demand") or > functionally never ("no") The distinction between now vs on-demand is really just about avoiding the overhead of downloading bits that you might not end up using. eg not downloading avocado unless you will be running 'make check-avocado'. That's saving a few 10s or 100s of KB of download, but doesn't feel like it'd make a noticable difference in the overall QEMU build time. THe 'now' and 'no' options look sufficient (corresponding to the 'auto' and 'never' options I suggested above) > - The ability to enable or disable venv isolation (turning on or off > system packages). Corresponds to the 'force' option I suggested above. > > I think I abandoned the idea of configuring the venv isolation and > have it set to "always off". We may revisit this later, but I think > for this series it's fine to have glossed over it. > I also skipped over the idea of having package installation being > "now", "on-demand" or "no" -- this patch-set more or less is using a > hybrid of "now" and "on-demand" where meson and sphinx are "now" but > avocado is "on-demand". Unifying that discrepancy can occur in > subsequent patches as it closely resem
Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect
On Mon, Apr 24, 2023 at 2:51 PM Paul Durrant wrote: > > So if you drop the ring drain then this patch should still stop the > SEGVs, right? > I think that's worth a few test runs. I recall some coredumps in that condition when I was investigating early on, but I don't have them in my collection so maybe I'm misremembering. Tim
Re: [PATCH v3 2/7] target/i386: Add new EPYC CPU versions with updated cache_info
On 4/25/23 18:35, Moger, Babu wrote: Hi Maksim, On 4/25/23 07:51, Maksim Davydov wrote: On 4/24/23 19:33, Babu Moger wrote: From: Michael Roth Introduce new EPYC cpu versions: EPYC-v4 and EPYC-Rome-v3. The only difference vs. older models is an updated cache_info with the 'complex_indexing' bit unset, since this bit is not currently defined for AMD and may cause problems should it be used for something else in the future. Setting this bit will also cause CPUID validation failures when running SEV-SNP guests. Signed-off-by: Michael Roth Signed-off-by: Babu Moger Acked-by: Michael S. Tsirkin --- target/i386/cpu.c | 118 ++ 1 file changed, 118 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index e3d9eaa307..c1bc47661d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1707,6 +1707,56 @@ static const CPUCaches epyc_cache_info = { }, }; +static CPUCaches epyc_v4_cache_info = { + .l1d_cache = &(CPUCacheInfo) { + .type = DATA_CACHE, + .level = 1, + .size = 32 * KiB, + .line_size = 64, + .associativity = 8, + .partitions = 1, + .sets = 64, + .lines_per_tag = 1, + .self_init = 1, + .no_invd_sharing = true, + }, + .l1i_cache = &(CPUCacheInfo) { + .type = INSTRUCTION_CACHE, + .level = 1, + .size = 64 * KiB, + .line_size = 64, + .associativity = 4, + .partitions = 1, + .sets = 256, + .lines_per_tag = 1, + .self_init = 1, + .no_invd_sharing = true, + }, + .l2_cache = &(CPUCacheInfo) { + .type = UNIFIED_CACHE, + .level = 2, + .size = 512 * KiB, + .line_size = 64, + .associativity = 8, + .partitions = 1, + .sets = 1024, + .lines_per_tag = 1, + }, + .l3_cache = &(CPUCacheInfo) { + .type = UNIFIED_CACHE, + .level = 3, + .size = 8 * MiB, + .line_size = 64, + .associativity = 16, + .partitions = 1, + .sets = 8192, + .lines_per_tag = 1, + .self_init = true, + .inclusive = true, + .complex_indexing = false, + }, +}; + static const CPUCaches epyc_rome_cache_info = { .l1d_cache = &(CPUCacheInfo) { .type = DATA_CACHE, @@ -1757,6 +1807,56 @@ static const CPUCaches epyc_rome_cache_info = { }, }; +static const CPUCaches epyc_rome_v3_cache_info = { + .l1d_cache = &(CPUCacheInfo) { + .type = DATA_CACHE, + .level = 1, + .size = 32 * KiB, + .line_size = 64, + .associativity = 8, + .partitions = 1, + .sets = 64, + .lines_per_tag = 1, + .self_init = 1, + .no_invd_sharing = true, + }, + .l1i_cache = &(CPUCacheInfo) { + .type = INSTRUCTION_CACHE, + .level = 1, + .size = 32 * KiB, + .line_size = 64, + .associativity = 8, + .partitions = 1, + .sets = 64, + .lines_per_tag = 1, + .self_init = 1, + .no_invd_sharing = true, + }, + .l2_cache = &(CPUCacheInfo) { + .type = UNIFIED_CACHE, + .level = 2, + .size = 512 * KiB, + .line_size = 64, + .associativity = 8, + .partitions = 1, + .sets = 1024, + .lines_per_tag = 1, + }, + .l3_cache = &(CPUCacheInfo) { + .type = UNIFIED_CACHE, + .level = 3, + .size = 16 * MiB, + .line_size = 64, + .associativity = 16, + .partitions = 1, + .sets = 16384, + .lines_per_tag = 1, + .self_init = true, + .inclusive = true, + .complex_indexing = false, + }, +}; + static const CPUCaches epyc_milan_cache_info = { .l1d_cache = &(CPUCacheInfo) { .type = DATA_CACHE, @@ -4091,6 +4191,15 @@ static const X86CPUDefinition builtin_x86_defs[] = { { /* end of list */ } } }, + { + .version = 4, + .props = (PropValue[]) { + { "model-id", + "AMD EPYC-v4 Processor" }, + { /* end of list */ } + }, + .cache_info = &epyc_v4_cache_info + }, { /* end of list */ } } }, @@ -4210,6 +4319,15 @@ static const X86CPUDefinition builtin_x86_defs[] = { { /* end of list */ } } }, + { + .version = 3, + .props = (PropValue[]) { + { "model-id", + "AMD EPYC-Rome-v3 Processor" }, What do you think about adding more information to the model name to reveal its key feature? For instance, model-id can be "EPYC-Rome-v3 (NO INDEXING)", because only cache info was affected. Or alias can be used to achieve the same effect. It works well in Actually, we already thoug
Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx
On 4/26/23 10:21, Daniel P. Berrangé wrote: So if you already have Sphinx, this should perform an upgrade to the latest version? Essentially I meant 'force' to mean*never* use the host python installation packages. Always install all the deps in the venv, even if they exist in the host with sufficient version met. I think this is essentially --enable-isolated-venv. I don't think there is a usecase for "let the venv use system packages, but override them with pip right away". --python=... # runtime used to create venv --enable-pip-groups=testing,devel,avocado,meson,sphinx --enable-pip=now # install all python deps now --enable-pip=on-demand # install qemu.git/meson/sphinx, delay the rest --enable-pip=no# offline --{enable,disable}-isolated-venv # let venv use system/distro if disable This feels like a bit of overkill to me, and would create a hell of a lot of combinations to test if you expand the matrix of options. Yeah, this is a bit overkill. I think we can reduce it to three cases, corresponding to: - --enable-pypi --enable-isolated-venv - use pip to install everything, including for options in "auto" state (e.g. would install sphinx without --enable-docs) - --enable-pypi --disable-isolated-venv - use pip to install missing packages. TBD whether to do so for options in "auto" state or only for "enabled" (i.e., TBD whether to install sphinx without --enable-docs). - --disable-pypi (only meaningful for --disable-isolated-venv) - apart from vendored wheels, just use system site packages (same as QEMU <= 8.0) I think we want to hash out this detail first, and thus we should leave online mode out of the non-RFC version. It can be implemented together with isolated mode. Paolo
Re: [PATCH] tests/unit/test-blockjob: Re-enable complete_in_standby test
On 26/04/2023 10.16, Emanuele Giuseppe Esposito wrote: Pause the job while draining so that pause_count will be increased and bdrv_drain_all_end() won't reset it to 0, so the job will not continue. With this fix, the test is not flaky anymore. Additionally remove useless aiocontext lock around bdrv_drain_all_end() in test_complete_in_standby(). Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable complete_in_standby test" Suggested-by: Hanna Czenczek Signed-off-by: Emanuele Giuseppe Esposito --- tests/unit/test-blockjob.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c index a130f6fefb..46d720aeee 100644 --- a/tests/unit/test-blockjob.c +++ b/tests/unit/test-blockjob.c @@ -488,11 +488,15 @@ static void test_complete_in_standby(void) bdrv_drain_all_begin(); assert_job_status_is(job, JOB_STATUS_STANDBY); +/* + * Increase pause_count so that the counter is + * unbalanced and job won't resume + */ +job_pause(job); + /* Lock the IO thread to prevent the job from being run */ I guess the above comment should now be removed, too? -aio_context_acquire(ctx); /* This will schedule the job to resume it */ bdrv_drain_all_end(); -aio_context_release(ctx); Thomas
Re: [PATCH 8/9] docs/devel: mention the spacing requirement for QOM
On 21/04/2023 07:15, Philippe Mathieu-Daudé wrote: On 20/4/23 21:32, Mark Cave-Ayland wrote: On 20/04/2023 16:57, Alex Bennée wrote: We have a more complete document on QOM but we should at least mention the style requirements in the style guide. Signed-off-by: Alex Bennée Cc: Mark Cave-Ayland --- docs/devel/qom.rst | 2 ++ docs/devel/style.rst | 29 + 2 files changed, 31 insertions(+) A couple of points: 1) It is probably worth removing the typedefs given that they are handled by the various QOM macros 2) There should be mention of the fixed names "parent_obj" and "parent_class" for the first declaration. How about something like this: QEMU Object Model Declarations == The QEMU Object Model (QOM) provides a framework for handling objects in the base C language. The first declaration of a storage or class structure should always be the parent and leave a visual space between s/should/must/ that declaration and the new code. For a storage structure the first declaration should always be called "parent_obj" and for a class structure the first member should always be called "parent_class" as below: .. code-block:: c struct MyDeviceState { DeviceState parent_obj; /* Properties */ int prop_a; char *prob_b; Should we mention "We recommend placing instance/class properties fields just after the parent field"? I don't think I've ever seen any recommendations on placing instance/class property fields other than for the parent DeviceState/DeviceClass? IMO it doesn't seem worth committing to anything else for now, especially as which fields get exposed as properties can be quite fluid these days ;) /* Other stuff */ int internal_state; }; struct MyDeviceClass { DeviceClass parent_class; void (*new_fn1)(void); bool (*new_fn2)(CPUState *); }; Note that there is no need to provide typedefs for QOM structures since these are generated automatically by the QOM declaration macros. See :ref:`qom` for more details. ATB, Mark.
Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx
On 4/26/23 10:05, Paolo Bonzini wrote: Thanks, this looks pretty good. Some changes I'd make for the non-RFC version: - I think we should just check in the meson wheel (which also removes the need for patch 12, so it can be done in its stead) and remove the submodule - The verbosity of mkvenv.py can be tuned down and most prints replaced with logger.info() or logger.debug() - While I agree with keeping patch 18 separate, I would move it earlier so that patch 19 can be squashed into patch 14 - I am ambivalent about keeping --enable/--disable-pypi in the first committed patchset, but in any case I would move patches 16 and 20 before patch 15 Just one extra thing, since we're changing so much of Python handling and since the code is written, I would keep the Debian 10 workarounds for now, and only drop them after we drop support for 3.6. Paolo
Re: [PATCH 17/18] docs/devel: mention the spacing requirement for QOM
On 24/04/2023 10:22, Alex Bennée wrote: We have a more complete document on QOM but we should at least mention the style requirements in the style guide. Signed-off-by: Alex Bennée Cc: Mark Cave-Ayland Reviewed-by: Juan Quintela Message-Id: <20230420155723.1711048-9-alex.ben...@linaro.org> --- vppr: - use Mark's formulation, briefly mention property separation. --- docs/devel/qom.rst | 2 ++ docs/devel/style.rst | 40 2 files changed, 42 insertions(+) diff --git a/docs/devel/qom.rst b/docs/devel/qom.rst index 3e34b07c98..c9237950d0 100644 --- a/docs/devel/qom.rst +++ b/docs/devel/qom.rst @@ -1,3 +1,5 @@ +.. _qom: + === The QEMU Object Model (QOM) === diff --git a/docs/devel/style.rst b/docs/devel/style.rst index 5bc6f2f095..e9fce0fc69 100644 --- a/docs/devel/style.rst +++ b/docs/devel/style.rst @@ -628,6 +628,46 @@ are still some caveats to beware of QEMU Specific Idioms +QEMU Object Model Declarations +== + +QEMU Object Model Declarations +== Accidental repetition here :) +The QEMU Object Model (QOM) provides a framework for handling objects +in the base C language. The first declaration of a storage or class +structure should always be the parent and leave a visual space between +that declaration and the new code. It is also useful to separate +backing for properties (options driven by the user) and internal state +to make navigation easier. + +For a storage structure the first declaration should always be called +"parent_obj" and for a class structure the first member should always +be called "parent_class" as below: + +.. code-block:: c + +struct MyDeviceState { +DeviceState parent_obj; + +/* Properties */ +int prop_a; +char *prop_b; +/* Other stuff */ +int internal_state; +}; + +struct MyDeviceClass { +DeviceClass parent_class; + +void (*new_fn1)(void); +bool (*new_fn2)(CPUState *); +}; + +Note that there is no need to provide typedefs for QOM structures +since these are generated automatically by the QOM declaration macros. +See :ref:`qom` for more details. + Error handling and reporting Otherwise: Reviewed-by: Mark Cave-Ayland ATB, Mark.
Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx
On Mon, Apr 24, 2023 at 04:02:28PM -0400, John Snow wrote: > GitLab CI: https://gitlab.com/jsnow/qemu/-/pipelines/846869409 >(All green, except Python self-tests, see below) > > This patch series creates a mandatory python virtual environment > ("venv") during configure time and uses it to ensure the availability of > meson and sphinx. > > See https://www.qemu.org/2023/03/24/python/ for details. The summary is > that the goal of this series is to ensure that the `python` used to run > meson is the same `python` used to run Sphinx, tests, and any build-time > python scripting we have. As it stands, meson and sphinx (and their > extensions) *may* run in a different python environment than the one > configured and chosen by the user at configure/build time. I mentioned this when we were chatting on IRC, but to repeat for the general audience.. I think it'd be useful for support purposes if the configure script summary printed details of how we setup the venv. eg perhaps a summary for each python module of whether we added it to the venv, or relied on te site packages: python venv: meson (site), sphinx (venv), avocado (venv) With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx
On 4/26/23 10:53, Daniel P. Berrangé wrote: I think it'd be useful for support purposes if the configure script summary printed details of how we setup the venv. eg perhaps a summary for each python module of whether we added it to the venv, or relied on te site packages: python venv: meson (site), sphinx (venv), avocado (venv) Yes, I agree John did a great job with the errors but a little more work is needed to cleanup of mkvenv.py's logging of the "good" case. The installation case is already covered by "pip install"'s output, but we need to have something like: mkvenv: Creating {isolated|non-isolated} virtual environment [based on /home/pbonzini/myvenv] and when creating the console-scripts shims: mkvenv: Found avocado v90.0 Paolo
Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config
On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote: > Daniel P. Berrangé wrote: > > The 'unsigned int interations' config for migration is somewhat > > overkill. Most tests don't set it, and a value of '0' is treated > > as equivalent to '1'. The only test that does set it, xbzrle, > > used a value of '2'. > > > > This setting, however, only relates to the migration iterations > > that take place prior to allowing convergence. IOW, on top of > > this iteration count, there is always at least 1 further migration > > iteration done to deal with pages that are dirtied during the > > previous iteration(s). > > > > IOW, even with iterations==1, the xbzrle test will be running for > > a minimum of 2 iterations. With this in mind we can simplify the > > code and just get rid of the special case. > > Perhaps the old code was already wrong, but we need at least three > iterations for the xbzrle test: > - 1st iteration: xbzrle is not used, nothing is on cache. Are you sure about this ? I see ram_save_page() calling save_xbzrle_page() and unless I'm mis-understanding the code, it doesn't appear to skip anything on the 1st iteration. IIUC save_xbzrle_page will add pages into the cache on the first iteration, so the second iteration will get cache hits > - 2nd iteration: pages are put into cache, no xbzrle is used because > there is no previous page. > - 3rd iteration: We really use xbzrle now against the copy of the > previous iterations. > > And yes, this should be commented somewhere. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v2 0/3] ROM migration
Hi, > As I understand, netcard ROM file is needed only for network boot. So, > it's absolutely correct to use romfile="" option: network boot will > not work, but everything else will work correctly. Is that right? In most cases yes. The exception to the rule is virtio-net with UEFI firmware. OVMF and ArmVirt images ship with a virtio-net driver included, so network booting works without a nic rom. take care, Gerd
Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config
Daniel P. Berrangé wrote: > On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote: >> Daniel P. Berrangé wrote: >> > The 'unsigned int interations' config for migration is somewhat >> > overkill. Most tests don't set it, and a value of '0' is treated >> > as equivalent to '1'. The only test that does set it, xbzrle, >> > used a value of '2'. >> > >> > This setting, however, only relates to the migration iterations >> > that take place prior to allowing convergence. IOW, on top of >> > this iteration count, there is always at least 1 further migration >> > iteration done to deal with pages that are dirtied during the >> > previous iteration(s). >> > >> > IOW, even with iterations==1, the xbzrle test will be running for >> > a minimum of 2 iterations. With this in mind we can simplify the >> > code and just get rid of the special case. >> >> Perhaps the old code was already wrong, but we need at least three >> iterations for the xbzrle test: >> - 1st iteration: xbzrle is not used, nothing is on cache. > > Are you sure about this ? I see ram_save_page() calling > save_xbzrle_page() and unless I'm mis-understanding the > code, it doesn't appear to skip anything on the 1st > iteration. I will admit that code is convoluted as hell. And I confuse myself a lot here O:-) struct RAM_STATE { ... /* Start using XBZRLE (e.g., after the first round). */ bool xbzrle_enabled; } I.e. xbzrle_enabled() and m->xbzrle_enabled are two completely different things. static int ram_save_page(RAMState *rs, PageSearchStatus *pss) { ... if (rs->xbzrle_enabled && !migration_in_postcopy()) { pages = save_xbzrle_page(rs, pss, &p, current_addr, block, offset); } } and static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) { /* Update pss->page for the next dirty bit in ramblock */ pss_find_next_dirty(pss); if (pss->complete_round && pss->block == rs->last_seen_block && ... return PAGE_ALL_CLEAN; } if (!offset_in_ramblock(pss->block, ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) { if (!pss->block) { if (migrate_use_xbzrle()) { rs->xbzrle_enabled = true; } } ... } else { /* We've found something */ return PAGE_DIRTY_FOUND; } } > IIUC save_xbzrle_page will add pages into the cache on > the first iteration, so the second iteration will get > cache hits > >> - 2nd iteration: pages are put into cache, no xbzrle is used because >> there is no previous page. >> - 3rd iteration: We really use xbzrle now against the copy of the >> previous iterations. >> >> And yes, this should be commented somewhere. Seeing that it has been able to confuse you, a single comment will not make the trick O:-) Later, Juan.
Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config
On Wed, Apr 26, 2023 at 11:42:51AM +0200, Juan Quintela wrote: > Daniel P. Berrangé wrote: > > On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote: > >> Daniel P. Berrangé wrote: > >> > The 'unsigned int interations' config for migration is somewhat > >> > overkill. Most tests don't set it, and a value of '0' is treated > >> > as equivalent to '1'. The only test that does set it, xbzrle, > >> > used a value of '2'. > >> > > >> > This setting, however, only relates to the migration iterations > >> > that take place prior to allowing convergence. IOW, on top of > >> > this iteration count, there is always at least 1 further migration > >> > iteration done to deal with pages that are dirtied during the > >> > previous iteration(s). > >> > > >> > IOW, even with iterations==1, the xbzrle test will be running for > >> > a minimum of 2 iterations. With this in mind we can simplify the > >> > code and just get rid of the special case. > >> > >> Perhaps the old code was already wrong, but we need at least three > >> iterations for the xbzrle test: > >> - 1st iteration: xbzrle is not used, nothing is on cache. > > > > Are you sure about this ? I see ram_save_page() calling > > save_xbzrle_page() and unless I'm mis-understanding the > > code, it doesn't appear to skip anything on the 1st > > iteration. > > I will admit that code is convoluted as hell. > And I confuse myself a lot here O:-) > > struct RAM_STATE { > ... > /* Start using XBZRLE (e.g., after the first round). */ > bool xbzrle_enabled; > } > > I.e. xbzrle_enabled() and m->xbzrle_enabled are two completely different > things. Aie ! That's confusing indeed :-) Lets rename that struct field to 'xbzrle_started', to better distinguish active state from enabled state. > static int ram_save_page(RAMState *rs, PageSearchStatus *pss) > { > ... > if (rs->xbzrle_enabled && !migration_in_postcopy()) { > pages = save_xbzrle_page(rs, pss, &p, current_addr, > block, offset); > > } > > } > > and > > static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) > { > /* Update pss->page for the next dirty bit in ramblock */ > pss_find_next_dirty(pss); > > if (pss->complete_round && pss->block == rs->last_seen_block && > ... > return PAGE_ALL_CLEAN; > } > if (!offset_in_ramblock(pss->block, > ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) { > > if (!pss->block) { > > if (migrate_use_xbzrle()) { > rs->xbzrle_enabled = true; > } > } > ... > } else { > /* We've found something */ > return PAGE_DIRTY_FOUND; > } > } > > > > > IIUC save_xbzrle_page will add pages into the cache on > > the first iteration, so the second iteration will get > > cache hits > > > >> - 2nd iteration: pages are put into cache, no xbzrle is used because > >> there is no previous page. > >> - 3rd iteration: We really use xbzrle now against the copy of the > >> previous iterations. > >> > >> And yes, this should be commented somewhere. > > Seeing that it has been able to confuse you, a single comment will not > make the trick O:-) > > Later, Juan. > With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()
"manish.mishra" wrote: > multifd_send_sync_main() posts request on the multifd channel > but does not call sem_wait() on channels_ready semaphore, making > the channels_ready semaphore count keep increasing. > As a result, sem_wait() on channels_ready in multifd_send_pages() > is always non-blocking hence multifd_send_pages() keeps searching > for a free channel in a busy loop until a channel is freed. > > Signed-off-by: manish.mishra > --- > migration/multifd.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index cce3ad6988..43d26e7012 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f) > > trace_multifd_send_sync_main_signal(p->id); > > +qemu_sem_wait(&multifd_send_state->channels_ready); > qemu_mutex_lock(&p->mutex); > > if (p->quit) { We need this, but I think it is better to put it on the second loop. > @@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp) > multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); > multifd_send_state->params = g_new0(MultiFDSendParams, thread_count); > multifd_send_state->pages = multifd_pages_init(page_count); > -qemu_sem_init(&multifd_send_state->channels_ready, 0); > +qemu_sem_init(&multifd_send_state->channels_ready, thread_count); > qatomic_set(&multifd_send_state->exiting, 0); > multifd_send_state->ops = multifd_ops[migrate_multifd_compression()]; I think this bit is wrong. We should not set the channels ready until the thread is ready and channel is created. What do you think about this patch: >From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001 From: Juan Quintela Date: Wed, 26 Apr 2023 12:20:36 +0200 Subject: [PATCH] multifd: Fix the number of channels ready We don't wait in the sem when we are doing a sync_main. Make it wait there. To make things clearer, we mark the channel ready at the begining of the thread loop. This causes a busy loop in multifd_send_page(). Found-by: manish.mishra Signed-off-by: Juan Quintela --- migration/multifd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index 903df2117b..e625e8725e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; +qemu_sem_wait(&multifd_send_state->channels_ready); trace_multifd_send_sync_main_wait(p->id); qemu_sem_wait(&p->sem_sync); @@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque) p->num_packets = 1; while (true) { +qemu_sem_post(&multifd_send_state->channels_ready); qemu_sem_wait(&p->sem); if (qatomic_read(&multifd_send_state->exiting)) { @@ -736,7 +738,6 @@ static void *multifd_send_thread(void *opaque) if (flags & MULTIFD_FLAG_SYNC) { qemu_sem_post(&p->sem_sync); } -qemu_sem_post(&multifd_send_state->channels_ready); } else if (p->quit) { qemu_mutex_unlock(&p->mutex); break; -- 2.40.0
Re: [PATCH v3 01/18] hw/ide/piix: Expose output IRQ as properties for late object population
On 02/03/2023 22:40, Philippe Mathieu-Daudé wrote: Signed-off-by: Philippe Mathieu-Daudé --- hw/ide/piix.c | 14 -- include/hw/ide/piix.h | 4 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/hw/ide/piix.c b/hw/ide/piix.c index 41d60921e3..a36dac8469 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -121,6 +121,13 @@ static void piix_ide_reset(DeviceState *dev) pci_set_byte(pci_conf + 0x20, 0x01); /* BMIBA: 20-23h */ } +static void piix_ide_initfn(Object *obj) +{ +PCIIDEState *dev = PCI_IDE(obj); + +qdev_init_gpio_out_named(DEVICE(obj), dev->isa_irq, "ide-irq", 2); +} + static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp) { static const struct { @@ -133,6 +140,7 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp) }; int ret; +qemu_irq irq_out = d->isa_irq[i] ? : isa_get_irq(NULL, port_info[i].isairq); ide_bus_init(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2); ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase, port_info[i].iobase2); @@ -141,7 +149,7 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp) object_get_typename(OBJECT(d)), i); return false; } -ide_bus_init_output_irq(&d->bus[i], isa_get_irq(NULL, port_info[i].isairq)); +ide_bus_init_output_irq(&d->bus[i], irq_out); bmdma_init(&d->bus[i], &d->bmdma[i], d); d->bmdma[i].bus = &d->bus[i]; @@ -162,7 +170,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) vmstate_register(VMSTATE_IF(dev), 0, &vmstate_ide_pci, d); -for (unsigned i = 0; i < 2; i++) { +for (unsigned i = 0; i < ARRAY_SIZE(d->isa_irq); i++) { if (!pci_piix_init_bus(d, i, errp)) { return; } @@ -199,6 +207,7 @@ static void piix3_ide_class_init(ObjectClass *klass, void *data) static const TypeInfo piix3_ide_info = { .name = TYPE_PIIX3_IDE, .parent= TYPE_PCI_IDE, +.instance_init = piix_ide_initfn, .class_init= piix3_ide_class_init, }; @@ -221,6 +230,7 @@ static void piix4_ide_class_init(ObjectClass *klass, void *data) static const TypeInfo piix4_ide_info = { .name = TYPE_PIIX4_IDE, .parent= TYPE_PCI_IDE, +.instance_init = piix_ide_initfn, .class_init= piix4_ide_class_init, }; diff --git a/include/hw/ide/piix.h b/include/hw/ide/piix.h index ef3ef3d62d..533d24d408 100644 --- a/include/hw/ide/piix.h +++ b/include/hw/ide/piix.h @@ -1,6 +1,10 @@ #ifndef HW_IDE_PIIX_H #define HW_IDE_PIIX_H +/* + * QEMU interface: + * + named GPIO outputs "ide-irq": asserted by each IDE channel + */ #define TYPE_PIIX3_IDE "piix3-ide" #define TYPE_PIIX4_IDE "piix4-ide" Comparing this with Bernhard's latest series, Bernhard's patch at https://patchew.org/QEMU/20230422150728.176512-1-shen...@gmail.com/20230422150728.176512-2-shen...@gmail.com/ (with a small change) is the version we should use, since legacy IRQs are a feature of all PCI IDE controllers and not just the PIIX controllers. If we do it this way then it is possible for all PCI IDE controllers to share the same logic for BDMA and legacy/native mode switching moving forward: if a PCI IDE controller doesn't implement legacy IRQ routing then the board can leave the IRQs disconnected. ATB, Mark.
[PATCH v4 02/48] hw/net/net_tx_pkt: Decouple interface from PCI
This allows to use the network packet abstractions even if PCI is not used. Signed-off-by: Akihiko Odaki --- hw/net/net_tx_pkt.h | 31 --- hw/net/e1000e_core.c | 13 - hw/net/igb_core.c| 13 ++--- hw/net/net_tx_pkt.c | 36 +--- hw/net/vmxnet3.c | 14 +++--- 5 files changed, 54 insertions(+), 53 deletions(-) diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h index 5eb123ef90..4d7233e975 100644 --- a/hw/net/net_tx_pkt.h +++ b/hw/net/net_tx_pkt.h @@ -26,17 +26,16 @@ struct NetTxPkt; -typedef void (* NetTxPktCallback)(void *, const struct iovec *, int, const struct iovec *, int); +typedef void (*NetTxPktFreeFrag)(void *, void *, size_t); +typedef void (*NetTxPktSend)(void *, const struct iovec *, int, const struct iovec *, int); /** * Init function for tx packet functionality * * @pkt:packet pointer - * @pci_dev:PCI device processing this packet * @max_frags: max tx ip fragments */ -void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev, -uint32_t max_frags); +void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags); /** * Clean all tx packet resources. @@ -95,12 +94,11 @@ net_tx_pkt_setup_vlan_header(struct NetTxPkt *pkt, uint16_t vlan) * populate data fragment into pkt context. * * @pkt:packet - * @pa: physical address of fragment + * @base: pointer to fragment * @len:length of fragment * */ -bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa, -size_t len); +bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, void *base, size_t len); /** * Fix ip header fields and calculate IP header and pseudo header checksums. @@ -148,10 +146,11 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt); * reset tx packet private context (needed to be called between packets) * * @pkt:packet - * @dev:PCI device processing the next packet - * + * @callback: function to free the fragments + * @context:pointer to be passed to the callback */ -void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *dev); +void net_tx_pkt_reset(struct NetTxPkt *pkt, + NetTxPktFreeFrag callback, void *context); /** * Unmap a fragment mapped from a PCI device. @@ -162,6 +161,16 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *dev); */ void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len); +/** + * map data fragment from PCI device and populate it into pkt context. + * + * @pci_dev:PCI device owning fragment + * @pa: physical address of fragment + * @len:length of fragment + */ +bool net_tx_pkt_add_raw_fragment_pci(struct NetTxPkt *pkt, PCIDevice *pci_dev, + dma_addr_t pa, size_t len); + /** * Send packet to qemu. handles sw offloads if vhdr is not supported. * @@ -182,7 +191,7 @@ bool net_tx_pkt_send(struct NetTxPkt *pkt, NetClientState *nc); * @ret:operation result */ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool offload, -NetTxPktCallback callback, void *context); +NetTxPktSend callback, void *context); /** * parse raw packet data and analyze offload requirements. diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index cfa3f55e96..15821a75e0 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -746,7 +746,8 @@ e1000e_process_tx_desc(E1000ECore *core, addr = le64_to_cpu(dp->buffer_addr); if (!tx->skip_cp) { -if (!net_tx_pkt_add_raw_fragment(tx->tx_pkt, addr, split_size)) { +if (!net_tx_pkt_add_raw_fragment_pci(tx->tx_pkt, core->owner, + addr, split_size)) { tx->skip_cp = true; } } @@ -764,7 +765,7 @@ e1000e_process_tx_desc(E1000ECore *core, } tx->skip_cp = false; -net_tx_pkt_reset(tx->tx_pkt, core->owner); +net_tx_pkt_reset(tx->tx_pkt, net_tx_pkt_unmap_frag_pci, core->owner); tx->sum_needed = 0; tx->cptse = 0; @@ -3421,7 +3422,7 @@ e1000e_core_pci_realize(E1000ECore *core, qemu_add_vm_change_state_handler(e1000e_vm_state_change, core); for (i = 0; i < E1000E_NUM_QUEUES; i++) { -net_tx_pkt_init(&core->tx[i].tx_pkt, core->owner, E1000E_MAX_TX_FRAGS); +net_tx_pkt_init(&core->tx[i].tx_pkt, E1000E_MAX_TX_FRAGS); } net_rx_pkt_init(&core->rx_pkt); @@ -3446,7 +3447,8 @@ e1000e_core_pci_uninit(E1000ECore *core) qemu_del_vm_change_state_handler(core->vmstate); for (i = 0; i < E1000E_NUM_QUEUES; i++) { -net_tx_pkt_reset(core->tx[i].tx_pkt, core->owner); +net_tx_pkt_reset(core->tx[i].tx_pkt, + net_tx_pkt_unmap_frag_pci, core->owner); net_tx_pkt_uninit(core->tx[i].tx_pkt); }
[PATCH v4 01/48] hw/net/net_tx_pkt: Decouple implementation from PCI
This is intended to be followed by another change for the interface. It also fixes the leak of memory mapping when the specified memory is partially mapped. Fixes: e263cd49c7 ("Packet abstraction for VMWARE network devices") Signed-off-by: Akihiko Odaki --- hw/net/net_tx_pkt.h | 9 hw/net/net_tx_pkt.c | 53 - 2 files changed, 42 insertions(+), 20 deletions(-) diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h index e5ce6f20bc..5eb123ef90 100644 --- a/hw/net/net_tx_pkt.h +++ b/hw/net/net_tx_pkt.h @@ -153,6 +153,15 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt); */ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *dev); +/** + * Unmap a fragment mapped from a PCI device. + * + * @context:PCI device owning fragment + * @base: pointer to fragment + * @len:length of fragment + */ +void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len); + /** * Send packet to qemu. handles sw offloads if vhdr is not supported. * diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c index 8dc8568ba2..aca12ff035 100644 --- a/hw/net/net_tx_pkt.c +++ b/hw/net/net_tx_pkt.c @@ -384,10 +384,9 @@ void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt, } } -bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa, -size_t len) +static bool net_tx_pkt_add_raw_fragment_common(struct NetTxPkt *pkt, + void *base, size_t len) { -hwaddr mapped_len = 0; struct iovec *ventry; assert(pkt); @@ -395,23 +394,12 @@ bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa, return false; } -if (!len) { -return true; - } - ventry = &pkt->raw[pkt->raw_frags]; -mapped_len = len; +ventry->iov_base = base; +ventry->iov_len = len; +pkt->raw_frags++; -ventry->iov_base = pci_dma_map(pkt->pci_dev, pa, - &mapped_len, DMA_DIRECTION_TO_DEVICE); - -if ((ventry->iov_base != NULL) && (len == mapped_len)) { -ventry->iov_len = mapped_len; -pkt->raw_frags++; -return true; -} else { -return false; -} +return true; } bool net_tx_pkt_has_fragments(struct NetTxPkt *pkt) @@ -465,8 +453,9 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *pci_dev) assert(pkt->raw); for (i = 0; i < pkt->raw_frags; i++) { assert(pkt->raw[i].iov_base); -pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base, - pkt->raw[i].iov_len, DMA_DIRECTION_TO_DEVICE, 0); +net_tx_pkt_unmap_frag_pci(pkt->pci_dev, + pkt->raw[i].iov_base, + pkt->raw[i].iov_len); } } pkt->pci_dev = pci_dev; @@ -476,6 +465,30 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *pci_dev) pkt->l4proto = 0; } +void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len) +{ +pci_dma_unmap(context, base, len, DMA_DIRECTION_TO_DEVICE, 0); +} + +bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa, +size_t len) +{ +dma_addr_t mapped_len = len; +void *base = pci_dma_map(pkt->pci_dev, pa, &mapped_len, + DMA_DIRECTION_TO_DEVICE); +if (!base) { +return false; +} + +if (mapped_len != len || +!net_tx_pkt_add_raw_fragment_common(pkt, base, len)) { +net_tx_pkt_unmap_frag_pci(pkt->pci_dev, base, mapped_len); +return false; +} + +return true; +} + static void net_tx_pkt_do_sw_csum(struct NetTxPkt *pkt, struct iovec *iov, uint32_t iov_len, uint16_t csl) -- 2.40.0
[PATCH v4 00/48] igb: Fix for DPDK
Based-on: <366bbcafdb6e0373f0deb105153768a8c0bded87.ca...@gmail.com> ("[PATCH 0/1] e1000e: Fix tx/rx counters") This series has fixes and feature additions to pass DPDK Test Suite with igb. It also includes a few minor changes related to networking. Patch [01, 10] are bug fixes. Patch [11, 14] delete code which is unnecessary. Patch [15, 33] are minor changes. Patch [34, 46] implement new features. Patch [47, 48] update documentations. While this includes so many patches, it is not necessary to land them at once. Only bug fix patches may be applied first, for example. V3 -> V4: - Renamed "min_buf variable to "buf". (Sriram Yagnaraman) - Added patch "igb: Clear-on-read ICR when ICR.INTA is set". (Sriram Yagnaraman) V2 -> V3: - Fixed parameter name in hw/net/net_tx_pkt. (Philippe Mathieu-Daudé) - Added patch "igb: Clear IMS bits when committing ICR access". - Added patch "igb: Clear EICR bits for delayed MSI-X interrupts". - Added patch "e1000e: Rename a variable in e1000e_receive_internal()". - Added patch "igb: Rename a variable in igb_receive_internal()". - Added patch "e1000e: Notify only new interrupts". - Added patch "igb: Notify only new interrupts". V1 -> V2: - Dropped patch "Include the second VLAN tag in the buffer". The second VLAN tag is not used at the point and unecessary. - Added patch "e1000x: Rename TcpIpv6 into TcpIpv6Ex". - Split patch "hw/net/net_tx_pkt: Decouple from PCI". (Philippe Mathieu-Daudé) - Added advanced Rx descriptor packet encoding definitions. (Sriram Yagnaraman) - Added some constants to eth.h to derive packet oversize thresholds. - Added IGB_TX_FLAGS_VLAN_SHIFT usage. - Renamed patch "igb: Fix igb_mac_reg_init alignment". (Philippe Mathieu-Daudé) - Fixed size check for packets with double VLAN. (Sriram Yagnaraman) - Fixed timing to timestamp Tx packet. Akihiko Odaki (48): hw/net/net_tx_pkt: Decouple implementation from PCI hw/net/net_tx_pkt: Decouple interface from PCI e1000x: Fix BPRC and MPRC igb: Fix Rx packet type encoding igb: Do not require CTRL.VME for tx VLAN tagging igb: Clear IMS bits when committing ICR access net/net_rx_pkt: Use iovec for net_rx_pkt_set_protocols() e1000e: Always copy ethernet header igb: Always copy ethernet header Fix references to igb Avocado test tests/avocado: Remove unused imports tests/avocado: Remove test_igb_nomsi_kvm hw/net/net_tx_pkt: Remove net_rx_pkt_get_l4_info net/eth: Rename eth_setup_vlan_headers_ex e1000x: Share more Rx filtering logic e1000x: Take CRC into consideration for size check e1000x: Rename TcpIpv6 into TcpIpv6Ex e1000e: Always log status after building rx metadata igb: Always log status after building rx metadata igb: Remove goto igb: Read DCMD.VLE of the first Tx descriptor e1000e: Reset packet state after emptying Tx queue vmxnet3: Reset packet state after emptying Tx queue igb: Add more definitions for Tx descriptor igb: Share common VF constants igb: Fix igb_mac_reg_init coding style alignment igb: Clear EICR bits for delayed MSI-X interrupts e1000e: Rename a variable in e1000e_receive_internal() igb: Rename a variable in igb_receive_internal() net/eth: Use void pointers net/eth: Always add VLAN tag hw/net/net_rx_pkt: Enforce alignment for eth_header tests/qtest/libqos/igb: Set GPIE.Multiple_MSIX igb: Implement MSI-X single vector mode igb: Use UDP for RSS hash igb: Implement Rx SCTP CSO igb: Implement Tx SCTP CSO igb: Strip the second VLAN tag for extended VLAN igb: Filter with the second VLAN tag for extended VLAN igb: Implement igb-specific oversize check igb: Implement Rx PTP2 timestamp igb: Implement Tx timestamp e1000e: Notify only new interrupts igb: Notify only new interrupts igb: Clear-on-read ICR when ICR.INTA is set vmxnet3: Do not depend on PC MAINTAINERS: Add a reviewer for network packet abstractions docs/system/devices/igb: Note igb is tested for DPDK MAINTAINERS | 3 +- docs/system/devices/igb.rst | 14 +- hw/net/e1000e_core.h | 2 - hw/net/e1000x_common.h| 9 +- hw/net/e1000x_regs.h | 24 +- hw/net/igb_common.h | 24 +- hw/net/igb_regs.h | 67 +- hw/net/net_rx_pkt.h | 38 +- hw/net/net_tx_pkt.h | 46 +- include/net/eth.h | 29 +- include/qemu/crc32c.h | 1 + hw/net/e1000.c| 41 +- hw/net/e1000e_core.c | 292 +++ hw/net/e1000x_common.c| 79 +- hw/net/igb.c | 10 +- hw/net/igb_core.c | 717 ++ hw/net/igbvf.c| 7 - hw/net/net_rx_pkt.c | 107 ++- hw/net/net_tx_pkt.c
[PATCH v4 07/48] net/net_rx_pkt: Use iovec for net_rx_pkt_set_protocols()
igb does not properly ensure the buffer passed to net_rx_pkt_set_protocols() is contiguous for the entire L2/L3/L4 header. Allow it to pass scattered data to net_rx_pkt_set_protocols(). Fixes: 3a977deebe ("Intrdocue igb device emulation") Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/net_rx_pkt.h | 10 ++ include/net/eth.h | 6 +++--- hw/net/igb_core.c | 2 +- hw/net/net_rx_pkt.c | 14 +- hw/net/virtio-net.c | 7 +-- hw/net/vmxnet3.c| 7 ++- net/eth.c | 18 -- 7 files changed, 34 insertions(+), 30 deletions(-) diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h index d00b484900..a06f5c2675 100644 --- a/hw/net/net_rx_pkt.h +++ b/hw/net/net_rx_pkt.h @@ -55,12 +55,14 @@ size_t net_rx_pkt_get_total_len(struct NetRxPkt *pkt); * parse and set packet analysis results * * @pkt:packet - * @data: pointer to the data buffer to be parsed - * @len:data length + * @iov:received data scatter-gather list + * @iovcnt: number of elements in iov + * @iovoff: data start offset in the iov * */ -void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, const void *data, - size_t len); +void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, + const struct iovec *iov, size_t iovcnt, + size_t iovoff); /** * fetches packet analysis results diff --git a/include/net/eth.h b/include/net/eth.h index c5ae4493b4..9f19c3a695 100644 --- a/include/net/eth.h +++ b/include/net/eth.h @@ -312,10 +312,10 @@ eth_get_l2_hdr_length(const void *p) } static inline uint32_t -eth_get_l2_hdr_length_iov(const struct iovec *iov, int iovcnt) +eth_get_l2_hdr_length_iov(const struct iovec *iov, size_t iovcnt, size_t iovoff) { uint8_t p[sizeof(struct eth_header) + sizeof(struct vlan_header)]; -size_t copied = iov_to_buf(iov, iovcnt, 0, p, ARRAY_SIZE(p)); +size_t copied = iov_to_buf(iov, iovcnt, iovoff, p, ARRAY_SIZE(p)); if (copied < ARRAY_SIZE(p)) { return copied; @@ -397,7 +397,7 @@ typedef struct eth_l4_hdr_info_st { bool has_tcp_data; } eth_l4_hdr_info; -void eth_get_protocols(const struct iovec *iov, int iovcnt, +void eth_get_protocols(const struct iovec *iov, size_t iovcnt, size_t iovoff, bool *hasip4, bool *hasip6, size_t *l3hdr_off, size_t *l4hdr_off, diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index eaca5bd2b6..21a8d9ada4 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1650,7 +1650,7 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, ehdr = PKT_GET_ETH_HDR(filter_buf); net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(ehdr)); -net_rx_pkt_set_protocols(core->rx_pkt, filter_buf, size); +net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs); queues = igb_receive_assign(core, ehdr, size, &rss_info, external_tx); if (!queues) { diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c index 39cdea06de..63be6e05ad 100644 --- a/hw/net/net_rx_pkt.c +++ b/hw/net/net_rx_pkt.c @@ -103,7 +103,7 @@ net_rx_pkt_pull_data(struct NetRxPkt *pkt, iov, iovcnt, ploff, pkt->tot_len); } -eth_get_protocols(pkt->vec, pkt->vec_len, &pkt->hasip4, &pkt->hasip6, +eth_get_protocols(pkt->vec, pkt->vec_len, 0, &pkt->hasip4, &pkt->hasip6, &pkt->l3hdr_off, &pkt->l4hdr_off, &pkt->l5hdr_off, &pkt->ip6hdr_info, &pkt->ip4hdr_info, &pkt->l4hdr_info); @@ -186,17 +186,13 @@ size_t net_rx_pkt_get_total_len(struct NetRxPkt *pkt) return pkt->tot_len; } -void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, const void *data, - size_t len) +void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, + const struct iovec *iov, size_t iovcnt, + size_t iovoff) { -const struct iovec iov = { -.iov_base = (void *)data, -.iov_len = len -}; - assert(pkt); -eth_get_protocols(&iov, 1, &pkt->hasip4, &pkt->hasip6, +eth_get_protocols(iov, iovcnt, iovoff, &pkt->hasip4, &pkt->hasip6, &pkt->l3hdr_off, &pkt->l4hdr_off, &pkt->l5hdr_off, &pkt->ip6hdr_info, &pkt->ip4hdr_info, &pkt->l4hdr_info); } diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 53e1c32643..37551fd854 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1835,9 +1835,12 @@ static int virtio_net_process_rss(NetClientState *nc, const uint8_t *buf, VIRTIO_NET_HASH_REPORT_UDPv6, VIRTIO_NET_HASH_REPORT_UDPv6_EX }; +struct iovec iov = { +.iov_base = (void *)buf, +.iov_len = size +}; -net_rx_pkt_set_protocols(pkt, buf + n->host_hdr_len, - si
[PATCH v4 06/48] igb: Clear IMS bits when committing ICR access
The datasheet says contradicting statements regarding ICR accesses so it is not reliable to determine the behavior of ICR accesses. However, e1000e does clear IMS bits when reading ICR accesses and Linux also expects ICR accesses will clear IMS bits according to: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/igb/igb_main.c?h=v6.2#n8048 Fixes: 3a977deebe ("Intrdocue igb device emulation") Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 96a118b6c1..eaca5bd2b6 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -2452,16 +2452,16 @@ igb_set_ims(IGBCore *core, int index, uint32_t val) static void igb_commit_icr(IGBCore *core) { /* - * If GPIE.NSICR = 0, then the copy of IAM to IMS will occur only if at + * If GPIE.NSICR = 0, then the clear of IMS will occur only if at * least one bit is set in the IMS and there is a true interrupt as * reflected in ICR.INTA. */ if ((core->mac[GPIE] & E1000_GPIE_NSICR) || (core->mac[IMS] && (core->mac[ICR] & E1000_ICR_INT_ASSERTED))) { -igb_set_ims(core, IMS, core->mac[IAM]); -} else { -igb_update_interrupt_state(core); +igb_clear_ims_bits(core, core->mac[IAM]); } + +igb_update_interrupt_state(core); } static void igb_set_icr(IGBCore *core, int index, uint32_t val) -- 2.40.0
[PATCH v4 17/48] e1000x: Rename TcpIpv6 into TcpIpv6Ex
e1000e and igb employs NetPktRssIpV6TcpEx for RSS hash if TcpIpv6 MRQC bit is set. Moreover, igb also has a MRQC bit for NetPktRssIpV6Tcp though it is not implemented yet. Rename it to TcpIpv6Ex to avoid confusion. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/e1000x_regs.h | 24 hw/net/e1000e_core.c | 8 hw/net/igb_core.c| 8 hw/net/trace-events | 2 +- 4 files changed, 21 insertions(+), 21 deletions(-) diff --git a/hw/net/e1000x_regs.h b/hw/net/e1000x_regs.h index 6d3c4c6d3a..13760c66d3 100644 --- a/hw/net/e1000x_regs.h +++ b/hw/net/e1000x_regs.h @@ -290,18 +290,18 @@ #define E1000_RETA_IDX(hash)((hash) & (BIT(7) - 1)) #define E1000_RETA_VAL(reta, hash) (((uint8_t *)(reta))[E1000_RETA_IDX(hash)]) -#define E1000_MRQC_EN_TCPIPV4(mrqc) ((mrqc) & BIT(16)) -#define E1000_MRQC_EN_IPV4(mrqc)((mrqc) & BIT(17)) -#define E1000_MRQC_EN_TCPIPV6(mrqc) ((mrqc) & BIT(18)) -#define E1000_MRQC_EN_IPV6EX(mrqc) ((mrqc) & BIT(19)) -#define E1000_MRQC_EN_IPV6(mrqc)((mrqc) & BIT(20)) - -#define E1000_MRQ_RSS_TYPE_NONE (0) -#define E1000_MRQ_RSS_TYPE_IPV4TCP (1) -#define E1000_MRQ_RSS_TYPE_IPV4 (2) -#define E1000_MRQ_RSS_TYPE_IPV6TCP (3) -#define E1000_MRQ_RSS_TYPE_IPV6EX (4) -#define E1000_MRQ_RSS_TYPE_IPV6 (5) +#define E1000_MRQC_EN_TCPIPV4(mrqc) ((mrqc) & BIT(16)) +#define E1000_MRQC_EN_IPV4(mrqc) ((mrqc) & BIT(17)) +#define E1000_MRQC_EN_TCPIPV6EX(mrqc) ((mrqc) & BIT(18)) +#define E1000_MRQC_EN_IPV6EX(mrqc)((mrqc) & BIT(19)) +#define E1000_MRQC_EN_IPV6(mrqc) ((mrqc) & BIT(20)) + +#define E1000_MRQ_RSS_TYPE_NONE (0) +#define E1000_MRQ_RSS_TYPE_IPV4TCP(1) +#define E1000_MRQ_RSS_TYPE_IPV4 (2) +#define E1000_MRQ_RSS_TYPE_IPV6TCPEX (3) +#define E1000_MRQ_RSS_TYPE_IPV6EX (4) +#define E1000_MRQ_RSS_TYPE_IPV6 (5) #define E1000_ICR_ASSERTED BIT(31) #define E1000_EIAC_MASK0x01F0 diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 41d2435074..38d465a203 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -537,7 +537,7 @@ e1000e_rss_get_hash_type(E1000ECore *core, struct NetRxPkt *pkt) ip6info->rss_ex_dst_valid, ip6info->rss_ex_src_valid, core->mac[MRQC], -E1000_MRQC_EN_TCPIPV6(core->mac[MRQC]), +E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC]), E1000_MRQC_EN_IPV6EX(core->mac[MRQC]), E1000_MRQC_EN_IPV6(core->mac[MRQC])); @@ -546,8 +546,8 @@ e1000e_rss_get_hash_type(E1000ECore *core, struct NetRxPkt *pkt) ip6info->rss_ex_src_valid))) { if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP && -E1000_MRQC_EN_TCPIPV6(core->mac[MRQC])) { -return E1000_MRQ_RSS_TYPE_IPV6TCP; +E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC])) { +return E1000_MRQ_RSS_TYPE_IPV6TCPEX; } if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) { @@ -581,7 +581,7 @@ e1000e_rss_calc_hash(E1000ECore *core, case E1000_MRQ_RSS_TYPE_IPV4TCP: type = NetPktRssIpV4Tcp; break; -case E1000_MRQ_RSS_TYPE_IPV6TCP: +case E1000_MRQ_RSS_TYPE_IPV6TCPEX: type = NetPktRssIpV6TcpEx; break; case E1000_MRQ_RSS_TYPE_IPV6: diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 934db3c3e5..209fdad862 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -301,7 +301,7 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt) ip6info->rss_ex_dst_valid, ip6info->rss_ex_src_valid, core->mac[MRQC], -E1000_MRQC_EN_TCPIPV6(core->mac[MRQC]), +E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC]), E1000_MRQC_EN_IPV6EX(core->mac[MRQC]), E1000_MRQC_EN_IPV6(core->mac[MRQC])); @@ -310,8 +310,8 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt) ip6info->rss_ex_src_valid))) { if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP && -E1000_MRQC_EN_TCPIPV6(core->mac[MRQC])) { -return E1000_MRQ_RSS_TYPE_IPV6TCP; +E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC])) { +return E1000_MRQ_RSS_TYPE_IPV6TCPEX; } if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) { @@ -343,7 +343,7 @@ igb_rss_calc_hash(IGBCore *core, struct NetRxPkt *pkt, E1000E_RSSInfo *info) case E1000_MRQ_RSS_TYPE_IPV4TCP: type = NetPktRssIpV4Tcp; break; -case E1000_MRQ_RSS_TYPE_IPV6TCP: +case E1000_MRQ_RSS_TYPE_IPV6TCPEX: type = NetPktRssIpV6TcpEx; break; case E1000_MRQ_RSS_TYPE_IPV6:
[PATCH v4 21/48] igb: Read DCMD.VLE of the first Tx descriptor
Section 7.2.2.3 Advanced Transmit Data Descriptor says: > For frames that spans multiple descriptors, all fields apart from > DCMD.EOP, DCMD.RS, DCMD.DEXT, DTALEN, Address and DTYP are valid only > in the first descriptors and are ignored in the subsequent ones. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index bae51cbb63..162ef26789 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -613,7 +613,7 @@ igb_process_tx_desc(IGBCore *core, idx = (tx->first_olinfo_status >> 4) & 1; igb_tx_insert_vlan(core, queue_index, tx, tx->ctx[idx].vlan_macip_lens >> 16, -!!(cmd_type_len & E1000_TXD_CMD_VLE)); +!!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE)); if (igb_tx_pkt_send(core, tx, queue_index)) { igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index); -- 2.40.0
[PATCH v4 09/48] igb: Always copy ethernet header
igb_receive_internal() used to check the iov length to determine copy the iovs to a contiguous buffer, but the check is flawed in two ways: - It does not ensure that iovcnt > 0. - It does not take virtio-net header into consideration. The size of this copy is just 22 octets, which can be even less than the code size required for checks. This (wrong) optimization is probably not worth so just remove it. Removing this also allows igb to assume aligned accesses for the ethernet header. Fixes: 3a977deebe ("Intrdocue igb device emulation") Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 43 +++ 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 21a8d9ada4..1123df9e77 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -67,6 +67,11 @@ typedef struct IGBTxPktVmdqCallbackContext { NetClientState *nc; } IGBTxPktVmdqCallbackContext; +typedef struct L2Header { +struct eth_header eth; +struct vlan_header vlan; +} L2Header; + static ssize_t igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, bool has_vnet, bool *external_tx); @@ -961,15 +966,16 @@ igb_rx_is_oversized(IGBCore *core, uint16_t qn, size_t size) return size > (lpe ? max_ethernet_lpe_size : max_ethernet_vlan_size); } -static uint16_t igb_receive_assign(IGBCore *core, const struct eth_header *ehdr, +static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, size_t size, E1000E_RSSInfo *rss_info, bool *external_tx) { static const int ta_shift[] = { 4, 3, 2, 0 }; +const struct eth_header *ehdr = &l2_header->eth; uint32_t f, ra[2], *macp, rctl = core->mac[RCTL]; uint16_t queues = 0; uint16_t oversized = 0; -uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(ehdr)->h_tci) & VLAN_VID_MASK; +uint16_t vid = be16_to_cpu(l2_header->vlan.h_tci) & VLAN_VID_MASK; bool accepted = false; int i; @@ -1590,14 +1596,13 @@ static ssize_t igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, bool has_vnet, bool *external_tx) { -static const int maximum_ethernet_hdr_len = (ETH_HLEN + 4); - uint16_t queues = 0; uint32_t n = 0; -uint8_t min_buf[ETH_ZLEN]; +union { +L2Header l2_header; +uint8_t octets[ETH_ZLEN]; +} buf; struct iovec min_iov; -struct eth_header *ehdr; -uint8_t *filter_buf; size_t size, orig_size; size_t iov_ofs = 0; E1000E_RxRing rxr; @@ -1623,24 +1628,21 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, net_rx_pkt_unset_vhdr(core->rx_pkt); } -filter_buf = iov->iov_base + iov_ofs; orig_size = iov_size(iov, iovcnt); size = orig_size - iov_ofs; /* Pad to minimum Ethernet frame length */ -if (size < sizeof(min_buf)) { -iov_to_buf(iov, iovcnt, iov_ofs, min_buf, size); -memset(&min_buf[size], 0, sizeof(min_buf) - size); +if (size < sizeof(buf)) { +iov_to_buf(iov, iovcnt, iov_ofs, &buf, size); +memset(&buf.octets[size], 0, sizeof(buf) - size); e1000x_inc_reg_if_not_full(core->mac, RUC); -min_iov.iov_base = filter_buf = min_buf; -min_iov.iov_len = size = sizeof(min_buf); +min_iov.iov_base = &buf; +min_iov.iov_len = size = sizeof(buf); iovcnt = 1; iov = &min_iov; iov_ofs = 0; -} else if (iov->iov_len < maximum_ethernet_hdr_len) { -/* This is very unlikely, but may happen. */ -iov_to_buf(iov, iovcnt, iov_ofs, min_buf, maximum_ethernet_hdr_len); -filter_buf = min_buf; +} else { +iov_to_buf(iov, iovcnt, iov_ofs, &buf, sizeof(buf.l2_header)); } /* Discard oversized packets if !LPE and !SBP. */ @@ -1648,11 +1650,12 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, return orig_size; } -ehdr = PKT_GET_ETH_HDR(filter_buf); -net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(ehdr)); +net_rx_pkt_set_packet_type(core->rx_pkt, + get_eth_packet_type(&buf.l2_header.eth)); net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs); -queues = igb_receive_assign(core, ehdr, size, &rss_info, external_tx); +queues = igb_receive_assign(core, &buf.l2_header, size, +&rss_info, external_tx); if (!queues) { trace_e1000e_rx_flt_dropped(); return orig_size; -- 2.40.0
[PATCH v4 13/48] hw/net/net_tx_pkt: Remove net_rx_pkt_get_l4_info
This function is not used. Signed-off-by: Akihiko Odaki --- hw/net/net_rx_pkt.h | 9 - hw/net/net_rx_pkt.c | 5 - 2 files changed, 14 deletions(-) diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h index a06f5c2675..ce8dbdb284 100644 --- a/hw/net/net_rx_pkt.h +++ b/hw/net/net_rx_pkt.h @@ -119,15 +119,6 @@ eth_ip6_hdr_info *net_rx_pkt_get_ip6_info(struct NetRxPkt *pkt); */ eth_ip4_hdr_info *net_rx_pkt_get_ip4_info(struct NetRxPkt *pkt); -/** - * fetches L4 header analysis results - * - * Return: pointer to analysis results structure which is stored in internal - * packet area. - * - */ -eth_l4_hdr_info *net_rx_pkt_get_l4_info(struct NetRxPkt *pkt); - typedef enum { NetPktRssIpV4, NetPktRssIpV4Tcp, diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c index 63be6e05ad..6125a063d7 100644 --- a/hw/net/net_rx_pkt.c +++ b/hw/net/net_rx_pkt.c @@ -236,11 +236,6 @@ eth_ip4_hdr_info *net_rx_pkt_get_ip4_info(struct NetRxPkt *pkt) return &pkt->ip4hdr_info; } -eth_l4_hdr_info *net_rx_pkt_get_l4_info(struct NetRxPkt *pkt) -{ -return &pkt->l4hdr_info; -} - static inline void _net_rx_rss_add_chunk(uint8_t *rss_input, size_t *bytes_written, void *ptr, size_t size) -- 2.40.0
[PATCH v4 11/48] tests/avocado: Remove unused imports
Signed-off-by: Akihiko Odaki --- tests/avocado/netdev-ethtool.py | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/avocado/netdev-ethtool.py b/tests/avocado/netdev-ethtool.py index f7e9464184..8de118e313 100644 --- a/tests/avocado/netdev-ethtool.py +++ b/tests/avocado/netdev-ethtool.py @@ -7,7 +7,6 @@ from avocado import skip from avocado_qemu import QemuSystemTest -from avocado_qemu import exec_command, exec_command_and_wait_for_pattern from avocado_qemu import wait_for_console_pattern class NetDevEthtool(QemuSystemTest): -- 2.40.0
[PATCH v4 08/48] e1000e: Always copy ethernet header
e1000e_receive_internal() used to check the iov length to determine copy the iovs to a contiguous buffer, but the check is flawed in two ways: - It does not ensure that iovcnt > 0. - It does not take virtio-net header into consideration. The size of this copy is just 18 octets, which can be even less than the code size required for checks. This (wrong) optimization is probably not worth so just remove it. Fixes: 6f3fbe4ed0 ("net: Introduce e1000e device emulation") Signed-off-by: Akihiko Odaki --- hw/net/e1000e_core.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index c2d864a504..14b94db59c 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1686,12 +1686,9 @@ static ssize_t e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt, bool has_vnet) { -static const int maximum_ethernet_hdr_len = (ETH_HLEN + 4); - uint32_t n = 0; -uint8_t min_buf[ETH_ZLEN]; +uint8_t buf[ETH_ZLEN]; struct iovec min_iov; -uint8_t *filter_buf; size_t size, orig_size; size_t iov_ofs = 0; E1000E_RxRing rxr; @@ -1714,24 +1711,21 @@ e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt, net_rx_pkt_unset_vhdr(core->rx_pkt); } -filter_buf = iov->iov_base + iov_ofs; orig_size = iov_size(iov, iovcnt); size = orig_size - iov_ofs; /* Pad to minimum Ethernet frame length */ -if (size < sizeof(min_buf)) { -iov_to_buf(iov, iovcnt, iov_ofs, min_buf, size); -memset(&min_buf[size], 0, sizeof(min_buf) - size); +if (size < sizeof(buf)) { +iov_to_buf(iov, iovcnt, iov_ofs, buf, size); +memset(&buf[size], 0, sizeof(buf) - size); e1000x_inc_reg_if_not_full(core->mac, RUC); -min_iov.iov_base = filter_buf = min_buf; -min_iov.iov_len = size = sizeof(min_buf); +min_iov.iov_base = buf; +min_iov.iov_len = size = sizeof(buf); iovcnt = 1; iov = &min_iov; iov_ofs = 0; -} else if (iov->iov_len < maximum_ethernet_hdr_len) { -/* This is very unlikely, but may happen. */ -iov_to_buf(iov, iovcnt, iov_ofs, min_buf, maximum_ethernet_hdr_len); -filter_buf = min_buf; +} else { +iov_to_buf(iov, iovcnt, iov_ofs, buf, ETH_HLEN + 4); } /* Discard oversized packets if !LPE and !SBP. */ @@ -1740,9 +1734,9 @@ e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt, } net_rx_pkt_set_packet_type(core->rx_pkt, -get_eth_packet_type(PKT_GET_ETH_HDR(filter_buf))); +get_eth_packet_type(PKT_GET_ETH_HDR(buf))); -if (!e1000e_receive_filter(core, filter_buf, size)) { +if (!e1000e_receive_filter(core, buf, size)) { trace_e1000e_rx_flt_dropped(); return orig_size; } -- 2.40.0
[PATCH v4 18/48] e1000e: Always log status after building rx metadata
Without this change, the status flags may not be traced e.g. if checksum offloading is disabled. Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- hw/net/e1000e_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 38d465a203..6a213c0224 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1244,9 +1244,8 @@ e1000e_build_rx_metadata(E1000ECore *core, trace_e1000e_rx_metadata_l4_cso_disabled(); } -trace_e1000e_rx_metadata_status_flags(*status_flags); - func_exit: +trace_e1000e_rx_metadata_status_flags(*status_flags); *status_flags = cpu_to_le32(*status_flags); } -- 2.40.0
[PATCH v4 25/48] igb: Share common VF constants
The constants need to be consistent between the PF and VF. Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Sriram Yagnaraman --- hw/net/igb_common.h | 8 hw/net/igb.c| 10 +- hw/net/igbvf.c | 7 --- 3 files changed, 13 insertions(+), 12 deletions(-) diff --git a/hw/net/igb_common.h b/hw/net/igb_common.h index 69ac490f75..f2a9065791 100644 --- a/hw/net/igb_common.h +++ b/hw/net/igb_common.h @@ -28,6 +28,14 @@ #include "igb_regs.h" +#define TYPE_IGBVF "igbvf" + +#define IGBVF_MMIO_BAR_IDX (0) +#define IGBVF_MSIX_BAR_IDX (3) + +#define IGBVF_MMIO_SIZE (16 * 1024) +#define IGBVF_MSIX_SIZE (16 * 1024) + #define defreg(x) x = (E1000_##x >> 2) #define defreg_indexed(x, i) x##i = (E1000_##x(i) >> 2) #define defreg_indexeda(x, i) x##i##_A = (E1000_##x##_A(i) >> 2) diff --git a/hw/net/igb.c b/hw/net/igb.c index 51a7e9133e..1c989d7677 100644 --- a/hw/net/igb.c +++ b/hw/net/igb.c @@ -433,16 +433,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp) pcie_ari_init(pci_dev, 0x150, 1); -pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, "igbvf", +pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF, IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS, IGB_VF_OFFSET, IGB_VF_STRIDE); -pcie_sriov_pf_init_vf_bar(pci_dev, 0, +pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX, PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH, -16 * KiB); -pcie_sriov_pf_init_vf_bar(pci_dev, 3, +IGBVF_MMIO_SIZE); +pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MSIX_BAR_IDX, PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH, -16 * KiB); +IGBVF_MSIX_SIZE); igb_init_net_peer(s, pci_dev, macaddr); diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c index 70beb7af50..284ea61184 100644 --- a/hw/net/igbvf.c +++ b/hw/net/igbvf.c @@ -50,15 +50,8 @@ #include "trace.h" #include "qapi/error.h" -#define TYPE_IGBVF "igbvf" OBJECT_DECLARE_SIMPLE_TYPE(IgbVfState, IGBVF) -#define IGBVF_MMIO_BAR_IDX (0) -#define IGBVF_MSIX_BAR_IDX (3) - -#define IGBVF_MMIO_SIZE (16 * 1024) -#define IGBVF_MSIX_SIZE (16 * 1024) - struct IgbVfState { PCIDevice parent_obj; -- 2.40.0
[PATCH v4 28/48] e1000e: Rename a variable in e1000e_receive_internal()
Rename variable "n" to "causes", which properly represents the content of the variable. Signed-off-by: Akihiko Odaki --- hw/net/e1000e_core.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 7dce448657..aea70b74d9 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1650,7 +1650,7 @@ static ssize_t e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt, bool has_vnet) { -uint32_t n = 0; +uint32_t causes = 0; uint8_t buf[ETH_ZLEN]; struct iovec min_iov; size_t size, orig_size; @@ -1723,32 +1723,32 @@ e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt, /* Perform small receive detection (RSRPD) */ if (total_size < core->mac[RSRPD]) { -n |= E1000_ICS_SRPD; +causes |= E1000_ICS_SRPD; } /* Perform ACK receive detection */ if (!(core->mac[RFCTL] & E1000_RFCTL_ACK_DIS) && (e1000e_is_tcp_ack(core, core->rx_pkt))) { -n |= E1000_ICS_ACK; +causes |= E1000_ICS_ACK; } /* Check if receive descriptor minimum threshold hit */ rdmts_hit = e1000e_rx_descr_threshold_hit(core, rxr.i); -n |= e1000e_rx_wb_interrupt_cause(core, rxr.i->idx, rdmts_hit); +causes |= e1000e_rx_wb_interrupt_cause(core, rxr.i->idx, rdmts_hit); trace_e1000e_rx_written_to_guest(rxr.i->idx); } else { -n |= E1000_ICS_RXO; +causes |= E1000_ICS_RXO; retval = 0; trace_e1000e_rx_not_written_to_guest(rxr.i->idx); } -if (!e1000e_intrmgr_delay_rx_causes(core, &n)) { -trace_e1000e_rx_interrupt_set(n); -e1000e_set_interrupt_cause(core, n); +if (!e1000e_intrmgr_delay_rx_causes(core, &causes)) { +trace_e1000e_rx_interrupt_set(causes); +e1000e_set_interrupt_cause(core, causes); } else { -trace_e1000e_rx_interrupt_delayed(n); +trace_e1000e_rx_interrupt_delayed(causes); } return retval; -- 2.40.0
[PATCH v4 27/48] igb: Clear EICR bits for delayed MSI-X interrupts
Section 7.3.4.1 says: > When auto-clear is enabled for an interrupt cause, the EICR bit is > set when a cause event mapped to this vector occurs. When the EITR > Counter reaches zero, the MSI-X message is sent on PCIe. Then the > EICR bit is cleared and enabled to be set by a new cause event Signed-off-by: Akihiko Odaki --- hw/net/igb_core.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 20645c4764..edda07e564 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -97,23 +97,31 @@ igb_lower_legacy_irq(IGBCore *core) pci_set_irq(core->owner, 0); } -static void igb_msix_notify(IGBCore *core, unsigned int vector) +static void igb_msix_notify(IGBCore *core, unsigned int cause) { PCIDevice *dev = core->owner; uint16_t vfn; +uint32_t effective_eiac; +unsigned int vector; -vfn = 8 - (vector + 2) / IGBVF_MSIX_VEC_NUM; +vfn = 8 - (cause + 2) / IGBVF_MSIX_VEC_NUM; if (vfn < pcie_sriov_num_vfs(core->owner)) { dev = pcie_sriov_get_vf_at_index(core->owner, vfn); assert(dev); -vector = (vector + 2) % IGBVF_MSIX_VEC_NUM; -} else if (vector >= IGB_MSIX_VEC_NUM) { +vector = (cause + 2) % IGBVF_MSIX_VEC_NUM; +} else if (cause >= IGB_MSIX_VEC_NUM) { qemu_log_mask(LOG_GUEST_ERROR, "igb: Tried to use vector unavailable for PF"); return; +} else { +vector = cause; } msix_notify(dev, vector); + +trace_e1000e_irq_icr_clear_eiac(core->mac[EICR], core->mac[EIAC]); +effective_eiac = core->mac[EIAC] & BIT(cause); +core->mac[EICR] &= ~effective_eiac; } static inline void @@ -1834,7 +1842,6 @@ igb_eitr_should_postpone(IGBCore *core, int idx) static void igb_send_msix(IGBCore *core) { uint32_t causes = core->mac[EICR] & core->mac[EIMS]; -uint32_t effective_eiac; int vector; for (vector = 0; vector < IGB_INTR_NUM; ++vector) { @@ -1842,10 +1849,6 @@ static void igb_send_msix(IGBCore *core) trace_e1000e_irq_msix_notify_vec(vector); igb_msix_notify(core, vector); - -trace_e1000e_irq_icr_clear_eiac(core->mac[EICR], core->mac[EIAC]); -effective_eiac = core->mac[EIAC] & BIT(vector); -core->mac[EICR] &= ~effective_eiac; } } } -- 2.40.0
[PATCH v4 37/48] igb: Implement Tx SCTP CSO
Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/net_tx_pkt.h | 8 hw/net/igb_core.c | 12 +++- hw/net/net_tx_pkt.c | 18 ++ 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h index 4d7233e975..0a716e74a5 100644 --- a/hw/net/net_tx_pkt.h +++ b/hw/net/net_tx_pkt.h @@ -116,6 +116,14 @@ void net_tx_pkt_update_ip_checksums(struct NetTxPkt *pkt); */ void net_tx_pkt_update_ip_hdr_checksum(struct NetTxPkt *pkt); +/** + * Calculate the SCTP checksum. + * + * @pkt:packet + * + */ +bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt); + /** * get length of all populated data. * diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 95d46d6e6d..5eacf1cd8c 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -440,8 +440,9 @@ igb_tx_insert_vlan(IGBCore *core, uint16_t qn, struct igb_tx *tx, static bool igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx) { +uint32_t idx = (tx->first_olinfo_status >> 4) & 1; + if (tx->first_cmd_type_len & E1000_ADVTXD_DCMD_TSE) { -uint32_t idx = (tx->first_olinfo_status >> 4) & 1; uint32_t mss = tx->ctx[idx].mss_l4len_idx >> E1000_ADVTXD_MSS_SHIFT; if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, mss)) { return false; @@ -452,10 +453,11 @@ igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx) return true; } -if (tx->first_olinfo_status & E1000_ADVTXD_POTS_TXSM) { -if (!net_tx_pkt_build_vheader(tx->tx_pkt, false, true, 0)) { -return false; -} +if ((tx->first_olinfo_status & E1000_ADVTXD_POTS_TXSM) && +!((tx->ctx[idx].type_tucmd_mlhl & E1000_ADVTXD_TUCMD_L4T_SCTP) ? + net_tx_pkt_update_sctp_checksum(tx->tx_pkt) : + net_tx_pkt_build_vheader(tx->tx_pkt, false, true, 0))) { +return false; } if (tx->first_olinfo_status & E1000_ADVTXD_POTS_IXSM) { diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c index af8f77a3f0..2e5f58b3c9 100644 --- a/hw/net/net_tx_pkt.c +++ b/hw/net/net_tx_pkt.c @@ -16,6 +16,7 @@ */ #include "qemu/osdep.h" +#include "qemu/crc32c.h" #include "net/eth.h" #include "net/checksum.h" #include "net/tap.h" @@ -135,6 +136,23 @@ void net_tx_pkt_update_ip_checksums(struct NetTxPkt *pkt) pkt->virt_hdr.csum_offset, &csum, sizeof(csum)); } +bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt) +{ +uint32_t csum = 0; +struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG; + +if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, &csum, sizeof(csum)) < sizeof(csum)) { +return false; +} + +csum = cpu_to_le32(iov_crc32c(0x, pl_start_frag, pkt->payload_frags)); +if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, &csum, sizeof(csum)) < sizeof(csum)) { +return false; +} + +return true; +} + static void net_tx_pkt_calculate_hdr_len(struct NetTxPkt *pkt) { pkt->hdr_len = pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len + -- 2.40.0
[PATCH v4 20/48] igb: Remove goto
The goto is a bit confusing as it changes the control flow only if L4 protocol is not recognized. It is also different from e1000e, and noisy when comparing e1000e and igb. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 946b917f91..bae51cbb63 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1297,7 +1297,7 @@ igb_build_rx_metadata(IGBCore *core, break; default: -goto func_exit; +break; } } else { trace_e1000e_rx_metadata_l4_cso_disabled(); -- 2.40.0
[PATCH v4 04/48] igb: Fix Rx packet type encoding
igb's advanced descriptor uses a packet type encoding different from one used in e1000e's extended descriptor. Fix the logic to encode Rx packet type accordingly. Fixes: 3a977deebe ("Intrdocue igb device emulation") Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_regs.h | 5 + hw/net/igb_core.c | 38 +++--- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h index c5c5b3c3b8..21ee9a3b2d 100644 --- a/hw/net/igb_regs.h +++ b/hw/net/igb_regs.h @@ -641,6 +641,11 @@ union e1000_adv_rx_desc { #define E1000_STATUS_NUM_VFS_SHIFT 14 +#define E1000_ADVRXD_PKT_IP4 BIT(4) +#define E1000_ADVRXD_PKT_IP6 BIT(6) +#define E1000_ADVRXD_PKT_TCP BIT(8) +#define E1000_ADVRXD_PKT_UDP BIT(9) + static inline uint8_t igb_ivar_entry_rx(uint8_t i) { return i < 8 ? i * 4 : (i - 8) * 4 + 2; diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 464a41d0aa..dbd1192a8e 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1227,7 +1227,6 @@ igb_build_rx_metadata(IGBCore *core, struct virtio_net_hdr *vhdr; bool hasip4, hasip6; EthL4HdrProto l4hdr_proto; -uint32_t pkt_type; *status_flags = E1000_RXD_STAT_DD; @@ -1266,28 +1265,29 @@ igb_build_rx_metadata(IGBCore *core, trace_e1000e_rx_metadata_ack(); } -if (hasip6 && (core->mac[RFCTL] & E1000_RFCTL_IPV6_DIS)) { -trace_e1000e_rx_metadata_ipv6_filtering_disabled(); -pkt_type = E1000_RXD_PKT_MAC; -} else if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP || - l4hdr_proto == ETH_L4_HDR_PROTO_UDP) { -pkt_type = hasip4 ? E1000_RXD_PKT_IP4_XDP : E1000_RXD_PKT_IP6_XDP; -} else if (hasip4 || hasip6) { -pkt_type = hasip4 ? E1000_RXD_PKT_IP4 : E1000_RXD_PKT_IP6; -} else { -pkt_type = E1000_RXD_PKT_MAC; -} +if (pkt_info) { +*pkt_info = rss_info->enabled ? rss_info->type : 0; -trace_e1000e_rx_metadata_pkt_type(pkt_type); +if (hasip4) { +*pkt_info |= E1000_ADVRXD_PKT_IP4; +} -if (pkt_info) { -if (rss_info->enabled) { -*pkt_info = rss_info->type; +if (hasip6) { +*pkt_info |= E1000_ADVRXD_PKT_IP6; } -*pkt_info |= (pkt_type << 4); -} else { -*status_flags |= E1000_RXD_PKT_TYPE(pkt_type); +switch (l4hdr_proto) { +case ETH_L4_HDR_PROTO_TCP: +*pkt_info |= E1000_ADVRXD_PKT_TCP; +break; + +case ETH_L4_HDR_PROTO_UDP: +*pkt_info |= E1000_ADVRXD_PKT_UDP; +break; + +default: +break; +} } if (hdr_info) { -- 2.40.0
[PATCH v4 30/48] net/eth: Use void pointers
The uses of uint8_t pointers were misleading as they are never accessed as an array of octets and it even require more strict alignment to access as struct eth_header. Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- include/net/eth.h | 4 ++-- net/eth.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/net/eth.h b/include/net/eth.h index 05f56931e7..95ff24d6b8 100644 --- a/include/net/eth.h +++ b/include/net/eth.h @@ -342,12 +342,12 @@ eth_get_pkt_tci(const void *p) size_t eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff, - uint8_t *new_ehdr_buf, + void *new_ehdr_buf, uint16_t *payload_offset, uint16_t *tci); size_t eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, - uint16_t vet, uint8_t *new_ehdr_buf, + uint16_t vet, void *new_ehdr_buf, uint16_t *payload_offset, uint16_t *tci); uint16_t diff --git a/net/eth.c b/net/eth.c index b6ff89c460..f7ffbda600 100644 --- a/net/eth.c +++ b/net/eth.c @@ -226,11 +226,11 @@ void eth_get_protocols(const struct iovec *iov, size_t iovcnt, size_t iovoff, size_t eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff, - uint8_t *new_ehdr_buf, + void *new_ehdr_buf, uint16_t *payload_offset, uint16_t *tci) { struct vlan_header vlan_hdr; -struct eth_header *new_ehdr = (struct eth_header *) new_ehdr_buf; +struct eth_header *new_ehdr = new_ehdr_buf; size_t copied = iov_to_buf(iov, iovcnt, iovoff, new_ehdr, sizeof(*new_ehdr)); @@ -276,7 +276,7 @@ eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff, size_t eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, - uint16_t vet, uint8_t *new_ehdr_buf, + uint16_t vet, void *new_ehdr_buf, uint16_t *payload_offset, uint16_t *tci) { struct vlan_header vlan_hdr; -- 2.40.0
[PATCH v4 12/48] tests/avocado: Remove test_igb_nomsi_kvm
It is unlikely to find more bugs with KVM so remove test_igb_nomsi_kvm to save time to run it. Signed-off-by: Akihiko Odaki Reviewed-by: Thomas Huth Acked-by: Alex Bennée --- tests/avocado/netdev-ethtool.py | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/tests/avocado/netdev-ethtool.py b/tests/avocado/netdev-ethtool.py index 8de118e313..6da800f62b 100644 --- a/tests/avocado/netdev-ethtool.py +++ b/tests/avocado/netdev-ethtool.py @@ -29,7 +29,7 @@ def get_asset(self, name, sha1): # URL into a unique one return self.fetch_asset(name=name, locations=(url), asset_hash=sha1) -def common_test_code(self, netdev, extra_args=None, kvm=False): +def common_test_code(self, netdev, extra_args=None): # This custom kernel has drivers for all the supported network # devices we can emulate in QEMU @@ -57,9 +57,6 @@ def common_test_code(self, netdev, extra_args=None, kvm=False): '-drive', drive, '-device', netdev) -if kvm: -self.vm.add_args('-accel', 'kvm') - self.vm.set_console(console_index=0) self.vm.launch() @@ -86,13 +83,6 @@ def test_igb_nomsi(self): """ self.common_test_code("igb", "pci=nomsi") -def test_igb_nomsi_kvm(self): -""" -:avocado: tags=device:igb -""" -self.require_accelerator('kvm') -self.common_test_code("igb", "pci=nomsi", True) - # It seems the other popular cards we model in QEMU currently fail # the pattern test with: # -- 2.40.0
[PATCH v4 05/48] igb: Do not require CTRL.VME for tx VLAN tagging
While the datasheet of e1000e says it checks CTRL.VME for tx VLAN tagging, igb's datasheet has no such statements. It also says for "CTRL.VLE": > This register only affects the VLAN Strip in Rx it does not have any > influence in the Tx path in the 82576. (Appendix A. Changes from the 82575) There is no "CTRL.VLE" so it is more likely that it is a mistake of CTRL.VME. Fixes: fba7c3b788 ("igb: respect VMVIR and VMOLR for VLAN") Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index dbd1192a8e..96a118b6c1 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -402,7 +402,7 @@ igb_tx_insert_vlan(IGBCore *core, uint16_t qn, struct igb_tx *tx, } } -if (insert_vlan && e1000x_vlan_enabled(core->mac)) { +if (insert_vlan) { net_tx_pkt_setup_vlan_header_ex(tx->tx_pkt, vlan, core->mac[VET] & 0x); } -- 2.40.0
[PATCH v4 26/48] igb: Fix igb_mac_reg_init coding style alignment
Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- hw/net/igb_core.c | 96 +++ 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 56a53872cf..20645c4764 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -4027,54 +4027,54 @@ static const uint32_t igb_mac_reg_init[] = { [VMOLR0 ... VMOLR0 + 7] = 0x2600 | E1000_VMOLR_STRCRC, [RPLOLR]= E1000_RPLOLR_STRCRC, [RLPML] = 0x2600, -[TXCTL0] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL1] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL2] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL3] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL4] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL5] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL6] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL7] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL8] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL9] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL10] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL11] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL12] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL13] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL14] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, -[TXCTL15] = E1000_DCA_TXCTRL_DATA_RRO_EN | - E1000_DCA_TXCTRL_TX_WB_RO_EN | - E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL0]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL1]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL2]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL3]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL4]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL5]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL6]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL7]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL8]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL9]= E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL10] = E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +[TXCTL11] = E1000_DCA_TXCTRL_DATA_RRO_EN | + E1000_DCA_TXCTRL_TX_WB_RO_EN | + E1000_DCA_TXCTRL_DESC_RRO_EN, +
[PATCH v4 16/48] e1000x: Take CRC into consideration for size check
Section 13.7.15 Receive Length Error Count says: > Packets over 1522 bytes are oversized if LongPacketEnable is 0b > (RCTL.LPE). If LongPacketEnable (LPE) is 1b, then an incoming packet > is considered oversized if it exceeds 16384 bytes. > These lengths are based on bytes in the received packet from > through , inclusively. As QEMU processes packets without CRC, the number of bytes for CRC need to be subtracted. This change adds some size definitions to be used to derive the new size thresholds to eth.h. Signed-off-by: Akihiko Odaki --- include/net/eth.h | 2 ++ hw/net/e1000x_common.c | 10 +- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/include/net/eth.h b/include/net/eth.h index e8af5742be..05f56931e7 100644 --- a/include/net/eth.h +++ b/include/net/eth.h @@ -32,6 +32,8 @@ #define ETH_ALEN 6 #define ETH_HLEN 14 #define ETH_ZLEN 60 /* Min. octets in frame without FCS */ +#define ETH_FCS_LEN 4 +#define ETH_MTU 1500 struct eth_header { uint8_t h_dest[ETH_ALEN]; /* destination eth addr */ diff --git a/hw/net/e1000x_common.c b/hw/net/e1000x_common.c index 6cc23138a8..212873fd77 100644 --- a/hw/net/e1000x_common.c +++ b/hw/net/e1000x_common.c @@ -140,16 +140,16 @@ bool e1000x_hw_rx_enabled(uint32_t *mac) bool e1000x_is_oversized(uint32_t *mac, size_t size) { +size_t header_size = sizeof(struct eth_header) + sizeof(struct vlan_header); /* this is the size past which hardware will drop packets when setting LPE=0 */ -static const int maximum_ethernet_vlan_size = 1522; +size_t maximum_short_size = header_size + ETH_MTU; /* this is the size past which hardware will drop packets when setting LPE=1 */ -static const int maximum_ethernet_lpe_size = 16 * KiB; +size_t maximum_large_size = 16 * KiB - ETH_FCS_LEN; -if ((size > maximum_ethernet_lpe_size || -(size > maximum_ethernet_vlan_size -&& !(mac[RCTL] & E1000_RCTL_LPE))) +if ((size > maximum_large_size || +(size > maximum_short_size && !(mac[RCTL] & E1000_RCTL_LPE))) && !(mac[RCTL] & E1000_RCTL_SBP)) { e1000x_inc_reg_if_not_full(mac, ROC); trace_e1000x_rx_oversized(size); -- 2.40.0
Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()
On 26/04/23 3:58 pm, Juan Quintela wrote: "manish.mishra" wrote: multifd_send_sync_main() posts request on the multifd channel but does not call sem_wait() on channels_ready semaphore, making the channels_ready semaphore count keep increasing. As a result, sem_wait() on channels_ready in multifd_send_pages() is always non-blocking hence multifd_send_pages() keeps searching for a free channel in a busy loop until a channel is freed. Signed-off-by: manish.mishra --- migration/multifd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index cce3ad6988..43d26e7012 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f) trace_multifd_send_sync_main_signal(p->id); +qemu_sem_wait(&multifd_send_state->channels_ready); qemu_mutex_lock(&p->mutex); if (p->quit) { We need this, but I think it is better to put it on the second loop. @@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp) multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); multifd_send_state->params = g_new0(MultiFDSendParams, thread_count); multifd_send_state->pages = multifd_pages_init(page_count); -qemu_sem_init(&multifd_send_state->channels_ready, 0); +qemu_sem_init(&multifd_send_state->channels_ready, thread_count); qatomic_set(&multifd_send_state->exiting, 0); multifd_send_state->ops = multifd_ops[migrate_multifd_compression()]; I think this bit is wrong. We should not set the channels ready until the thread is ready and channel is created. What do you think about this patch: From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001 From: Juan Quintela Date: Wed, 26 Apr 2023 12:20:36 +0200 Subject: [PATCH] multifd: Fix the number of channels ready We don't wait in the sem when we are doing a sync_main. Make it wait there. To make things clearer, we mark the channel ready at the begining of the thread loop. This causes a busy loop in multifd_send_page(). Found-by: manish.mishra Signed-off-by: Juan Quintela --- migration/multifd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index 903df2117b..e625e8725e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; +qemu_sem_wait(&multifd_send_state->channels_ready); trace_multifd_send_sync_main_wait(p->id); qemu_sem_wait(&p->sem_sync); @@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque) p->num_packets = 1; while (true) { +qemu_sem_post(&multifd_send_state->channels_ready); This has one issue though, if we mark channel_ready here itself, channel is actually not ready so we can still busy loop? May be we can do one thing let the sem_post in while loop at same position itself. But we can do another post just before start of this while loop, as that will be called only once it should do work of initialising count equal to multiFD channels? qemu_sem_wait(&p->sem); if (qatomic_read(&multifd_send_state->exiting)) { @@ -736,7 +738,6 @@ static void *multifd_send_thread(void *opaque) if (flags & MULTIFD_FLAG_SYNC) { qemu_sem_post(&p->sem_sync); } -qemu_sem_post(&multifd_send_state->channels_ready); } else if (p->quit) { qemu_mutex_unlock(&p->mutex); break;
[PATCH v4 44/48] igb: Notify only new interrupts
This follows the corresponding change for e1000e. This fixes: tests/avocado/netdev-ethtool.py:NetDevEthtool.test_igb Signed-off-by: Akihiko Odaki --- hw/net/igb_core.c | 201 -- hw/net/trace-events | 11 +- .../org.centos/stream/8/x86_64/test-avocado | 1 + tests/avocado/netdev-ethtool.py | 4 - 4 files changed, 87 insertions(+), 130 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 57973c3ae4..b68e24c9ee 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -94,10 +94,7 @@ static ssize_t igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, bool has_vnet, bool *external_tx); -static inline void -igb_set_interrupt_cause(IGBCore *core, uint32_t val); - -static void igb_update_interrupt_state(IGBCore *core); +static void igb_raise_interrupts(IGBCore *core, size_t index, uint32_t causes); static void igb_reset(IGBCore *core, bool sw); static inline void @@ -913,8 +910,8 @@ igb_start_xmit(IGBCore *core, const IGB_TxRing *txr) } if (eic) { -core->mac[EICR] |= eic; -igb_set_interrupt_cause(core, E1000_ICR_TXDW); +igb_raise_interrupts(core, EICR, eic); +igb_raise_interrupts(core, ICR, E1000_ICR_TXDW); } net_tx_pkt_reset(txr->tx->tx_pkt, net_tx_pkt_unmap_frag_pci, d); @@ -1686,6 +1683,7 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, { uint16_t queues = 0; uint32_t causes = 0; +uint32_t ecauses = 0; union { L2Header l2_header; uint8_t octets[ETH_ZLEN]; @@ -1788,13 +1786,14 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, causes |= E1000_ICS_RXDMT0; } -core->mac[EICR] |= igb_rx_wb_eic(core, rxr.i->idx); +ecauses |= igb_rx_wb_eic(core, rxr.i->idx); trace_e1000e_rx_written_to_guest(rxr.i->idx); } trace_e1000e_rx_interrupt_set(causes); -igb_set_interrupt_cause(core, causes); +igb_raise_interrupts(core, EICR, ecauses); +igb_raise_interrupts(core, ICR, causes); return orig_size; } @@ -1854,7 +1853,7 @@ void igb_core_set_link_status(IGBCore *core) } if (core->mac[STATUS] != old_status) { -igb_set_interrupt_cause(core, E1000_ICR_LSC); +igb_raise_interrupts(core, ICR, E1000_ICR_LSC); } } @@ -1934,13 +1933,6 @@ igb_set_rx_control(IGBCore *core, int index, uint32_t val) } } -static inline void -igb_clear_ims_bits(IGBCore *core, uint32_t bits) -{ -trace_e1000e_irq_clear_ims(bits, core->mac[IMS], core->mac[IMS] & ~bits); -core->mac[IMS] &= ~bits; -} - static inline bool igb_postpone_interrupt(IGBIntrDelayTimer *timer) { @@ -1963,9 +1955,8 @@ igb_eitr_should_postpone(IGBCore *core, int idx) return igb_postpone_interrupt(&core->eitr[idx]); } -static void igb_send_msix(IGBCore *core) +static void igb_send_msix(IGBCore *core, uint32_t causes) { -uint32_t causes = core->mac[EICR] & core->mac[EIMS]; int vector; for (vector = 0; vector < IGB_INTR_NUM; ++vector) { @@ -1988,124 +1979,116 @@ igb_fix_icr_asserted(IGBCore *core) trace_e1000e_irq_fix_icr_asserted(core->mac[ICR]); } -static void -igb_update_interrupt_state(IGBCore *core) +static void igb_raise_interrupts(IGBCore *core, size_t index, uint32_t causes) { -uint32_t icr; -uint32_t causes; +uint32_t old_causes = core->mac[ICR] & core->mac[IMS]; +uint32_t old_ecauses = core->mac[EICR] & core->mac[EIMS]; +uint32_t raised_causes; +uint32_t raised_ecauses; uint32_t int_alloc; -icr = core->mac[ICR] & core->mac[IMS]; +trace_e1000e_irq_set(index << 2, + core->mac[index], core->mac[index] | causes); + +core->mac[index] |= causes; if (core->mac[GPIE] & E1000_GPIE_MSIX_MODE) { -if (icr) { -causes = 0; -if (icr & E1000_ICR_DRSTA) { -int_alloc = core->mac[IVAR_MISC] & 0xff; -if (int_alloc & E1000_IVAR_VALID) { -causes |= BIT(int_alloc & 0x1f); -} +raised_causes = core->mac[ICR] & core->mac[IMS] & ~old_causes; + +if (raised_causes & E1000_ICR_DRSTA) { +int_alloc = core->mac[IVAR_MISC] & 0xff; +if (int_alloc & E1000_IVAR_VALID) { +core->mac[EICR] |= BIT(int_alloc & 0x1f); } -/* Check if other bits (excluding the TCP Timer) are enabled. */ -if (icr & ~E1000_ICR_DRSTA) { -int_alloc = (core->mac[IVAR_MISC] >> 8) & 0xff; -if (int_alloc & E1000_IVAR_VALID) { -causes |= BIT(int_alloc & 0x1f); -} -trace_e1000e_irq_add_msi_other(core->mac[EICR]); +} +/* Check if other bits (excluding the TCP Timer) are enabled. */ +if (raised_causes & ~E1000_ICR_DRSTA)
[PATCH v4 15/48] e1000x: Share more Rx filtering logic
This saves some code and enables tracepoint for e1000's VLAN filtering. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/e1000x_common.h | 4 +++- hw/net/e1000.c | 35 +-- hw/net/e1000e_core.c | 47 +- hw/net/e1000x_common.c | 44 +-- hw/net/igb_core.c | 41 +++- hw/net/trace-events| 4 ++-- 6 files changed, 56 insertions(+), 119 deletions(-) diff --git a/hw/net/e1000x_common.h b/hw/net/e1000x_common.h index 0298e06283..be291684de 100644 --- a/hw/net/e1000x_common.h +++ b/hw/net/e1000x_common.h @@ -107,7 +107,9 @@ bool e1000x_rx_ready(PCIDevice *d, uint32_t *mac); bool e1000x_is_vlan_packet(const void *buf, uint16_t vet); -bool e1000x_rx_group_filter(uint32_t *mac, const uint8_t *buf); +bool e1000x_rx_vlan_filter(uint32_t *mac, const struct vlan_header *vhdr); + +bool e1000x_rx_group_filter(uint32_t *mac, const struct eth_header *ehdr); bool e1000x_hw_rx_enabled(uint32_t *mac); diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 18eb6d8876..aae5f0bdc0 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -804,36 +804,11 @@ start_xmit(E1000State *s) } static int -receive_filter(E1000State *s, const uint8_t *buf, int size) +receive_filter(E1000State *s, const void *buf) { -uint32_t rctl = s->mac_reg[RCTL]; -int isbcast = is_broadcast_ether_addr(buf); -int ismcast = is_multicast_ether_addr(buf); - -if (e1000x_is_vlan_packet(buf, le16_to_cpu(s->mac_reg[VET])) && -e1000x_vlan_rx_filter_enabled(s->mac_reg)) { -uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(buf)->h_tci); -uint32_t vfta = -ldl_le_p((uint32_t *)(s->mac_reg + VFTA) + - ((vid >> E1000_VFTA_ENTRY_SHIFT) & E1000_VFTA_ENTRY_MASK)); -if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) { -return 0; -} -} - -if (!isbcast && !ismcast && (rctl & E1000_RCTL_UPE)) { /* promiscuous ucast */ -return 1; -} - -if (ismcast && (rctl & E1000_RCTL_MPE)) { /* promiscuous mcast */ -return 1; -} - -if (isbcast && (rctl & E1000_RCTL_BAM)) { /* broadcast enabled */ -return 1; -} - -return e1000x_rx_group_filter(s->mac_reg, buf); +return (!e1000x_is_vlan_packet(buf, s->mac_reg[VET]) || +e1000x_rx_vlan_filter(s->mac_reg, PKT_GET_VLAN_HDR(buf))) && + e1000x_rx_group_filter(s->mac_reg, buf); } static void @@ -949,7 +924,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt) return size; } -if (!receive_filter(s, filter_buf, size)) { +if (!receive_filter(s, filter_buf)) { return size; } diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 14b94db59c..41d2435074 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1034,48 +1034,11 @@ e1000e_rx_l4_cso_enabled(E1000ECore *core) } static bool -e1000e_receive_filter(E1000ECore *core, const uint8_t *buf, int size) +e1000e_receive_filter(E1000ECore *core, const void *buf) { -uint32_t rctl = core->mac[RCTL]; - -if (e1000x_is_vlan_packet(buf, core->mac[VET]) && -e1000x_vlan_rx_filter_enabled(core->mac)) { -uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(buf)->h_tci); -uint32_t vfta = -ldl_le_p((uint32_t *)(core->mac + VFTA) + - ((vid >> E1000_VFTA_ENTRY_SHIFT) & E1000_VFTA_ENTRY_MASK)); -if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) { -trace_e1000e_rx_flt_vlan_mismatch(vid); -return false; -} else { -trace_e1000e_rx_flt_vlan_match(vid); -} -} - -switch (net_rx_pkt_get_packet_type(core->rx_pkt)) { -case ETH_PKT_UCAST: -if (rctl & E1000_RCTL_UPE) { -return true; /* promiscuous ucast */ -} -break; - -case ETH_PKT_BCAST: -if (rctl & E1000_RCTL_BAM) { -return true; /* broadcast enabled */ -} -break; - -case ETH_PKT_MCAST: -if (rctl & E1000_RCTL_MPE) { -return true; /* promiscuous mcast */ -} -break; - -default: -g_assert_not_reached(); -} - -return e1000x_rx_group_filter(core->mac, buf); +return (!e1000x_is_vlan_packet(buf, core->mac[VET]) || +e1000x_rx_vlan_filter(core->mac, PKT_GET_VLAN_HDR(buf))) && + e1000x_rx_group_filter(core->mac, buf); } static inline void @@ -1736,7 +1699,7 @@ e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt, net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(PKT_GET_ETH_HDR(buf))); -if (!e1000e_receive_filter(core, buf, size)) { +if (!e1000e_receive_filter(core, buf)) { trace_e1000e_rx_flt_dropped(); return orig_size; } di
[PATCH v4 29/48] igb: Rename a variable in igb_receive_internal()
Rename variable "n" to "causes", which properly represents the content of the variable. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index edda07e564..c954369964 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1569,7 +1569,7 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, bool has_vnet, bool *external_tx) { uint16_t queues = 0; -uint32_t n = 0; +uint32_t causes = 0; union { L2Header l2_header; uint8_t octets[ETH_ZLEN]; @@ -1649,19 +1649,19 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, e1000x_fcs_len(core->mac); if (!igb_has_rxbufs(core, rxr.i, total_size)) { -n |= E1000_ICS_RXO; +causes |= E1000_ICS_RXO; trace_e1000e_rx_not_written_to_guest(rxr.i->idx); continue; } -n |= E1000_ICR_RXDW; +causes |= E1000_ICR_RXDW; igb_rx_fix_l4_csum(core, core->rx_pkt); igb_write_packet_to_guest(core, core->rx_pkt, &rxr, &rss_info); /* Check if receive descriptor minimum threshold hit */ if (igb_rx_descr_threshold_hit(core, rxr.i)) { -n |= E1000_ICS_RXDMT0; +causes |= E1000_ICS_RXDMT0; } core->mac[EICR] |= igb_rx_wb_eic(core, rxr.i->idx); @@ -1669,8 +1669,8 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, trace_e1000e_rx_written_to_guest(rxr.i->idx); } -trace_e1000e_rx_interrupt_set(n); -igb_set_interrupt_cause(core, n); +trace_e1000e_rx_interrupt_set(causes); +igb_set_interrupt_cause(core, causes); return orig_size; } -- 2.40.0
[PATCH v4 24/48] igb: Add more definitions for Tx descriptor
Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_regs.h | 32 +++- hw/net/igb_core.c | 4 ++-- 2 files changed, 29 insertions(+), 7 deletions(-) diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h index 21ee9a3b2d..eb995d8b2e 100644 --- a/hw/net/igb_regs.h +++ b/hw/net/igb_regs.h @@ -42,11 +42,6 @@ union e1000_adv_tx_desc { } wb; }; -#define E1000_ADVTXD_DTYP_CTXT 0x0020 /* Advanced Context Descriptor */ -#define E1000_ADVTXD_DTYP_DATA 0x0030 /* Advanced Data Descriptor */ -#define E1000_ADVTXD_DCMD_DEXT 0x2000 /* Descriptor Extension (1=Adv) */ -#define E1000_ADVTXD_DCMD_TSE 0x8000 /* TCP/UDP Segmentation Enable */ - #define E1000_ADVTXD_POTS_IXSM 0x0100 /* Insert TCP/UDP Checksum */ #define E1000_ADVTXD_POTS_TXSM 0x0200 /* Insert TCP/UDP Checksum */ @@ -151,6 +146,10 @@ union e1000_adv_rx_desc { #define IGB_82576_VF_DEV_ID0x10CA #define IGB_I350_VF_DEV_ID 0x1520 +/* VLAN info */ +#define IGB_TX_FLAGS_VLAN_MASK 0x +#define IGB_TX_FLAGS_VLAN_SHIFT16 + /* from igb/e1000_82575.h */ #define E1000_MRQC_ENABLE_RSS_MQ0x0002 @@ -160,6 +159,29 @@ union e1000_adv_rx_desc { #define E1000_MRQC_RSS_FIELD_IPV6_UDP 0x0080 #define E1000_MRQC_RSS_FIELD_IPV6_UDP_EX0x0100 +/* Adv Transmit Descriptor Config Masks */ +#define E1000_ADVTXD_MAC_TSTAMP 0x0008 /* IEEE1588 Timestamp packet */ +#define E1000_ADVTXD_DTYP_CTXT0x0020 /* Advanced Context Descriptor */ +#define E1000_ADVTXD_DTYP_DATA0x0030 /* Advanced Data Descriptor */ +#define E1000_ADVTXD_DCMD_EOP 0x0100 /* End of Packet */ +#define E1000_ADVTXD_DCMD_IFCS0x0200 /* Insert FCS (Ethernet CRC) */ +#define E1000_ADVTXD_DCMD_RS 0x0800 /* Report Status */ +#define E1000_ADVTXD_DCMD_DEXT0x2000 /* Descriptor extension (1=Adv) */ +#define E1000_ADVTXD_DCMD_VLE 0x4000 /* VLAN pkt enable */ +#define E1000_ADVTXD_DCMD_TSE 0x8000 /* TCP Seg enable */ +#define E1000_ADVTXD_PAYLEN_SHIFT14 /* Adv desc PAYLEN shift */ + +#define E1000_ADVTXD_MACLEN_SHIFT9 /* Adv ctxt desc mac len shift */ +#define E1000_ADVTXD_TUCMD_L4T_UDP 0x /* L4 Packet TYPE of UDP */ +#define E1000_ADVTXD_TUCMD_IPV40x0400 /* IP Packet Type: 1=IPv4 */ +#define E1000_ADVTXD_TUCMD_L4T_TCP 0x0800 /* L4 Packet TYPE of TCP */ +#define E1000_ADVTXD_TUCMD_L4T_SCTP 0x1000 /* L4 packet TYPE of SCTP */ +/* IPSec Encrypt Enable for ESP */ +#define E1000_ADVTXD_L4LEN_SHIFT 8 /* Adv ctxt L4LEN shift */ +#define E1000_ADVTXD_MSS_SHIFT 16 /* Adv ctxt MSS shift */ +/* Adv ctxt IPSec SA IDX mask */ +/* Adv ctxt IPSec ESP len mask */ + /* Additional Transmit Descriptor Control definitions */ #define E1000_TXDCTL_QUEUE_ENABLE 0x0200 /* Enable specific Tx Queue */ diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 162ef26789..56a53872cf 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -418,7 +418,7 @@ igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx) { if (tx->first_cmd_type_len & E1000_ADVTXD_DCMD_TSE) { uint32_t idx = (tx->first_olinfo_status >> 4) & 1; -uint32_t mss = tx->ctx[idx].mss_l4len_idx >> 16; +uint32_t mss = tx->ctx[idx].mss_l4len_idx >> E1000_ADVTXD_MSS_SHIFT; if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, mss)) { return false; } @@ -612,7 +612,7 @@ igb_process_tx_desc(IGBCore *core, if (!tx->skip_cp && net_tx_pkt_parse(tx->tx_pkt)) { idx = (tx->first_olinfo_status >> 4) & 1; igb_tx_insert_vlan(core, queue_index, tx, -tx->ctx[idx].vlan_macip_lens >> 16, +tx->ctx[idx].vlan_macip_lens >> IGB_TX_FLAGS_VLAN_SHIFT, !!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE)); if (igb_tx_pkt_send(core, tx, queue_index)) { -- 2.40.0
[PATCH v4 10/48] Fix references to igb Avocado test
Fixes: 9f95111474 ("tests/avocado: re-factor igb test to avoid timeouts") Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- MAINTAINERS| 2 +- docs/system/devices/igb.rst| 2 +- scripts/ci/org.centos/stream/8/x86_64/test-avocado | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index ef45b5e71e..c31d2279ab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2256,7 +2256,7 @@ R: Sriram Yagnaraman S: Maintained F: docs/system/devices/igb.rst F: hw/net/igb* -F: tests/avocado/igb.py +F: tests/avocado/netdev-ethtool.py F: tests/qtest/igb-test.c F: tests/qtest/libqos/igb.c diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst index 70edadd574..afe036dad2 100644 --- a/docs/system/devices/igb.rst +++ b/docs/system/devices/igb.rst @@ -60,7 +60,7 @@ Avocado test and can be ran with the following command: .. code:: shell - make check-avocado AVOCADO_TESTS=tests/avocado/igb.py + make check-avocado AVOCADO_TESTS=tests/avocado/netdev-ethtool.py References == diff --git a/scripts/ci/org.centos/stream/8/x86_64/test-avocado b/scripts/ci/org.centos/stream/8/x86_64/test-avocado index d2c0e5fb4c..a1aa601ee3 100755 --- a/scripts/ci/org.centos/stream/8/x86_64/test-avocado +++ b/scripts/ci/org.centos/stream/8/x86_64/test-avocado @@ -30,7 +30,7 @@ make get-vm-images tests/avocado/cpu_queries.py:QueryCPUModelExpansion.test \ tests/avocado/empty_cpu_model.py:EmptyCPUModel.test \ tests/avocado/hotplug_cpu.py:HotPlugCPU.test \ -tests/avocado/igb.py:IGB.test \ +tests/avocado/netdev-ethtool.py:NetDevEthtool.test_igb_nomsi \ tests/avocado/info_usernet.py:InfoUsernet.test_hostfwd \ tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu \ tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt \ -- 2.40.0
[PATCH v4 46/48] vmxnet3: Do not depend on PC
vmxnet3 has no dependency on PC, and VMware Fusion actually makes it available on Apple Silicon according to: https://kb.vmware.com/s/article/90364 Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- hw/net/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/Kconfig b/hw/net/Kconfig index 18c7851efe..98e00be4f9 100644 --- a/hw/net/Kconfig +++ b/hw/net/Kconfig @@ -56,7 +56,7 @@ config RTL8139_PCI config VMXNET3_PCI bool -default y if PCI_DEVICES && PC_PCI +default y if PCI_DEVICES depends on PCI config SMC91C111 -- 2.40.0
[PATCH v4 31/48] net/eth: Always add VLAN tag
It is possible to have another VLAN tag even if the packet is already tagged. Signed-off-by: Akihiko Odaki --- include/net/eth.h | 4 ++-- hw/net/net_tx_pkt.c | 16 +++- net/eth.c | 22 ++ 3 files changed, 15 insertions(+), 27 deletions(-) diff --git a/include/net/eth.h b/include/net/eth.h index 95ff24d6b8..048e434685 100644 --- a/include/net/eth.h +++ b/include/net/eth.h @@ -353,8 +353,8 @@ eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, uint16_t eth_get_l3_proto(const struct iovec *l2hdr_iov, int iovcnt, size_t l2hdr_len); -void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag, -uint16_t vlan_ethtype, bool *is_new); +void eth_setup_vlan_headers(struct eth_header *ehdr, size_t *ehdr_size, +uint16_t vlan_tag, uint16_t vlan_ethtype); uint8_t eth_get_gso_type(uint16_t l3_proto, uint8_t *l3_hdr, uint8_t l4proto); diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c index ce6b102391..af8f77a3f0 100644 --- a/hw/net/net_tx_pkt.c +++ b/hw/net/net_tx_pkt.c @@ -40,7 +40,10 @@ struct NetTxPkt { struct iovec *vec; -uint8_t l2_hdr[ETH_MAX_L2_HDR_LEN]; +struct { +struct eth_header eth; +struct vlan_header vlan[3]; +} l2_hdr; union { struct ip_header ip; struct ip6_header ip6; @@ -365,18 +368,13 @@ bool net_tx_pkt_build_vheader(struct NetTxPkt *pkt, bool tso_enable, void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt, uint16_t vlan, uint16_t vlan_ethtype) { -bool is_new; assert(pkt); eth_setup_vlan_headers(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base, -vlan, vlan_ethtype, &is_new); + &pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len, + vlan, vlan_ethtype); -/* update l2hdrlen */ -if (is_new) { -pkt->hdr_len += sizeof(struct vlan_header); -pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len += -sizeof(struct vlan_header); -} +pkt->hdr_len += sizeof(struct vlan_header); } bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, void *base, size_t len) diff --git a/net/eth.c b/net/eth.c index f7ffbda600..5307978486 100644 --- a/net/eth.c +++ b/net/eth.c @@ -21,26 +21,16 @@ #include "net/checksum.h" #include "net/tap.h" -void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag, -uint16_t vlan_ethtype, bool *is_new) +void eth_setup_vlan_headers(struct eth_header *ehdr, size_t *ehdr_size, +uint16_t vlan_tag, uint16_t vlan_ethtype) { struct vlan_header *vhdr = PKT_GET_VLAN_HDR(ehdr); -switch (be16_to_cpu(ehdr->h_proto)) { -case ETH_P_VLAN: -case ETH_P_DVLAN: -/* vlan hdr exists */ -*is_new = false; -break; - -default: -/* No VLAN header, put a new one */ -vhdr->h_proto = ehdr->h_proto; -ehdr->h_proto = cpu_to_be16(vlan_ethtype); -*is_new = true; -break; -} +memmove(vhdr + 1, vhdr, *ehdr_size - ETH_HLEN); vhdr->h_tci = cpu_to_be16(vlan_tag); +vhdr->h_proto = ehdr->h_proto; +ehdr->h_proto = cpu_to_be16(vlan_ethtype); +*ehdr_size += sizeof(*vhdr); } uint8_t -- 2.40.0
[PATCH v4 14/48] net/eth: Rename eth_setup_vlan_headers_ex
The old eth_setup_vlan_headers has no user so remove it and rename eth_setup_vlan_headers_ex. Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- include/net/eth.h | 9 + hw/net/net_tx_pkt.c | 2 +- net/eth.c | 2 +- 3 files changed, 3 insertions(+), 10 deletions(-) diff --git a/include/net/eth.h b/include/net/eth.h index 9f19c3a695..e8af5742be 100644 --- a/include/net/eth.h +++ b/include/net/eth.h @@ -351,16 +351,9 @@ eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, uint16_t eth_get_l3_proto(const struct iovec *l2hdr_iov, int iovcnt, size_t l2hdr_len); -void eth_setup_vlan_headers_ex(struct eth_header *ehdr, uint16_t vlan_tag, +void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag, uint16_t vlan_ethtype, bool *is_new); -static inline void -eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag, -bool *is_new) -{ -eth_setup_vlan_headers_ex(ehdr, vlan_tag, ETH_P_VLAN, is_new); -} - uint8_t eth_get_gso_type(uint16_t l3_proto, uint8_t *l3_hdr, uint8_t l4proto); diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c index cc36750c9b..ce6b102391 100644 --- a/hw/net/net_tx_pkt.c +++ b/hw/net/net_tx_pkt.c @@ -368,7 +368,7 @@ void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt, bool is_new; assert(pkt); -eth_setup_vlan_headers_ex(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base, +eth_setup_vlan_headers(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base, vlan, vlan_ethtype, &is_new); /* update l2hdrlen */ diff --git a/net/eth.c b/net/eth.c index d7b30df79f..b6ff89c460 100644 --- a/net/eth.c +++ b/net/eth.c @@ -21,7 +21,7 @@ #include "net/checksum.h" #include "net/tap.h" -void eth_setup_vlan_headers_ex(struct eth_header *ehdr, uint16_t vlan_tag, +void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag, uint16_t vlan_ethtype, bool *is_new) { struct vlan_header *vhdr = PKT_GET_VLAN_HDR(ehdr); -- 2.40.0
[PATCH v4 33/48] tests/qtest/libqos/igb: Set GPIE.Multiple_MSIX
GPIE.Multiple_MSIX is not set by default, and needs to be set to get interrupts from multiple MSI-X vectors. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- tests/qtest/libqos/igb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/qtest/libqos/igb.c b/tests/qtest/libqos/igb.c index 12fb531bf0..a603468beb 100644 --- a/tests/qtest/libqos/igb.c +++ b/tests/qtest/libqos/igb.c @@ -114,6 +114,7 @@ static void igb_pci_start_hw(QOSGraphObject *obj) e1000e_macreg_write(&d->e1000e, E1000_RCTL, E1000_RCTL_EN); /* Enable all interrupts */ +e1000e_macreg_write(&d->e1000e, E1000_GPIE, E1000_GPIE_MSIX_MODE); e1000e_macreg_write(&d->e1000e, E1000_IMS, 0x); e1000e_macreg_write(&d->e1000e, E1000_EIMS, 0x); -- 2.40.0
[PATCH v4 36/48] igb: Implement Rx SCTP CSO
Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_regs.h | 1 + include/net/eth.h | 4 ++- include/qemu/crc32c.h | 1 + hw/net/e1000e_core.c | 5 hw/net/igb_core.c | 15 +- hw/net/net_rx_pkt.c | 64 +++ net/eth.c | 4 +++ util/crc32c.c | 8 ++ 8 files changed, 89 insertions(+), 13 deletions(-) diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h index e6ac26dc0e..4b4ebd3369 100644 --- a/hw/net/igb_regs.h +++ b/hw/net/igb_regs.h @@ -670,6 +670,7 @@ union e1000_adv_rx_desc { #define E1000_ADVRXD_PKT_IP6 BIT(6) #define E1000_ADVRXD_PKT_TCP BIT(8) #define E1000_ADVRXD_PKT_UDP BIT(9) +#define E1000_ADVRXD_PKT_SCTP BIT(10) static inline uint8_t igb_ivar_entry_rx(uint8_t i) { diff --git a/include/net/eth.h b/include/net/eth.h index 048e434685..75e7f1551c 100644 --- a/include/net/eth.h +++ b/include/net/eth.h @@ -224,6 +224,7 @@ struct tcp_hdr { #define IP_HEADER_VERSION_6 (6) #define IP_PROTO_TCP (6) #define IP_PROTO_UDP (17) +#define IP_PROTO_SCTP (132) #define IPTOS_ECN_MASK0x03 #define IPTOS_ECN(x) ((x) & IPTOS_ECN_MASK) #define IPTOS_ECN_CE 0x03 @@ -379,7 +380,8 @@ typedef struct eth_ip4_hdr_info_st { typedef enum EthL4HdrProto { ETH_L4_HDR_PROTO_INVALID, ETH_L4_HDR_PROTO_TCP, -ETH_L4_HDR_PROTO_UDP +ETH_L4_HDR_PROTO_UDP, +ETH_L4_HDR_PROTO_SCTP } EthL4HdrProto; typedef struct eth_l4_hdr_info_st { diff --git a/include/qemu/crc32c.h b/include/qemu/crc32c.h index 5b78884c38..88b4d2b3b3 100644 --- a/include/qemu/crc32c.h +++ b/include/qemu/crc32c.h @@ -30,5 +30,6 @@ uint32_t crc32c(uint32_t crc, const uint8_t *data, unsigned int length); +uint32_t iov_crc32c(uint32_t crc, const struct iovec *iov, size_t iov_cnt); #endif diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index aea70b74d9..0b939ff5a3 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1114,6 +1114,11 @@ e1000e_verify_csum_in_sw(E1000ECore *core, return; } +if (l4hdr_proto != ETH_L4_HDR_PROTO_TCP && +l4hdr_proto != ETH_L4_HDR_PROTO_UDP) { +return; +} + if (!net_rx_pkt_validate_l4_csum(pkt, &csum_valid)) { trace_e1000e_rx_metadata_l4_csum_validation_failed(); return; diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 41a2e5bf7b..95d46d6e6d 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1220,7 +1220,7 @@ igb_build_rx_metadata(IGBCore *core, uint16_t *vlan_tag) { struct virtio_net_hdr *vhdr; -bool hasip4, hasip6; +bool hasip4, hasip6, csum_valid; EthL4HdrProto l4hdr_proto; *status_flags = E1000_RXD_STAT_DD; @@ -1280,6 +1280,10 @@ igb_build_rx_metadata(IGBCore *core, *pkt_info |= E1000_ADVRXD_PKT_UDP; break; +case ETH_L4_HDR_PROTO_SCTP: +*pkt_info |= E1000_ADVRXD_PKT_SCTP; +break; + default: break; } @@ -1312,6 +1316,15 @@ igb_build_rx_metadata(IGBCore *core, if (igb_rx_l4_cso_enabled(core)) { switch (l4hdr_proto) { +case ETH_L4_HDR_PROTO_SCTP: +if (!net_rx_pkt_validate_l4_csum(pkt, &csum_valid)) { +trace_e1000e_rx_metadata_l4_csum_validation_failed(); +goto func_exit; +} +if (!csum_valid) { +*status_flags |= E1000_RXDEXT_STATERR_TCPE; +} +/* fall through */ case ETH_L4_HDR_PROTO_TCP: *status_flags |= E1000_RXD_STAT_TCPCS; break; diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c index 1de42b4f51..3575c8b9f9 100644 --- a/hw/net/net_rx_pkt.c +++ b/hw/net/net_rx_pkt.c @@ -16,6 +16,7 @@ */ #include "qemu/osdep.h" +#include "qemu/crc32c.h" #include "trace.h" #include "net_rx_pkt.h" #include "net/checksum.h" @@ -554,32 +555,73 @@ _net_rx_pkt_calc_l4_csum(struct NetRxPkt *pkt) return csum; } -bool net_rx_pkt_validate_l4_csum(struct NetRxPkt *pkt, bool *csum_valid) +static bool +_net_rx_pkt_validate_sctp_sum(struct NetRxPkt *pkt) { -uint16_t csum; +size_t csum_off; +size_t off = pkt->l4hdr_off; +size_t vec_len = pkt->vec_len; +struct iovec *vec; +uint32_t calculated = 0; +uint32_t original; +bool valid; -trace_net_rx_pkt_l4_csum_validate_entry(); +for (vec = pkt->vec; vec->iov_len < off; vec++) { +off -= vec->iov_len; +vec_len--; +} -if (pkt->l4hdr_info.proto != ETH_L4_HDR_PROTO_TCP && -pkt->l4hdr_info.proto != ETH_L4_HDR_PROTO_UDP) { -trace_net_rx_pkt_l4_csum_validate_not_xxp(); +csum_off = off + 8; + +if (!iov_to_buf(vec, vec_len, csum_off, &original, sizeof(original))) { return false; } -if (pkt->l4hdr_info.proto == ETH_L4_HDR_PROTO_UDP && -pkt->l4hdr_info.hdr.udp.uh_sum
[PATCH v4 34/48] igb: Implement MSI-X single vector mode
Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index c954369964..6d55b43fb4 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1873,7 +1873,7 @@ igb_update_interrupt_state(IGBCore *core) icr = core->mac[ICR] & core->mac[IMS]; -if (msix_enabled(core->owner)) { +if (core->mac[GPIE] & E1000_GPIE_MSIX_MODE) { if (icr) { causes = 0; if (icr & E1000_ICR_DRSTA) { @@ -1908,7 +1908,12 @@ igb_update_interrupt_state(IGBCore *core) trace_e1000e_irq_pending_interrupts(core->mac[ICR] & core->mac[IMS], core->mac[ICR], core->mac[IMS]); -if (msi_enabled(core->owner)) { +if (msix_enabled(core->owner)) { +if (icr) { +trace_e1000e_irq_msix_notify_vec(0); +msix_notify(core->owner, 0); +} +} else if (msi_enabled(core->owner)) { if (icr) { msi_notify(core->owner, 0); } -- 2.40.0
[PATCH v4 23/48] vmxnet3: Reset packet state after emptying Tx queue
Keeping Tx packet state after the transmit queue is emptied but this behavior is unreliable as the state can be reset anytime the migration happens. Always reset Tx packet state always after the queue is emptied. Signed-off-by: Akihiko Odaki --- hw/net/vmxnet3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c index 05f41b6dfa..18b9edfdb2 100644 --- a/hw/net/vmxnet3.c +++ b/hw/net/vmxnet3.c @@ -681,6 +681,8 @@ static void vmxnet3_process_tx_queue(VMXNET3State *s, int qidx) net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s)); } } + +net_tx_pkt_reset(s->tx_pkt, net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s)); } static inline void @@ -1159,7 +1161,6 @@ static void vmxnet3_deactivate_device(VMXNET3State *s) { if (s->device_active) { VMW_CBPRN("Deactivating vmxnet3..."); -net_tx_pkt_reset(s->tx_pkt, net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s)); net_tx_pkt_uninit(s->tx_pkt); net_rx_pkt_uninit(s->rx_pkt); s->device_active = false; -- 2.40.0
Re: [PATCH] tests/unit/test-blockjob: Re-enable complete_in_standby test
Am 26/04/2023 um 10:45 schrieb Thomas Huth: > On 26/04/2023 10.16, Emanuele Giuseppe Esposito wrote: >> Pause the job while draining so that pause_count will be >> increased and bdrv_drain_all_end() won't reset it to 0, so the >> job will not continue. >> >> With this fix, the test is not flaky anymore. >> >> Additionally remove useless aiocontext lock around bdrv_drain_all_end() >> in test_complete_in_standby(). >> >> Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable >> complete_in_standby test" >> Suggested-by: Hanna Czenczek >> Signed-off-by: Emanuele Giuseppe Esposito >> --- >> tests/unit/test-blockjob.c | 17 +++-- >> 1 file changed, 7 insertions(+), 10 deletions(-) >> >> diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c >> index a130f6fefb..46d720aeee 100644 >> --- a/tests/unit/test-blockjob.c >> +++ b/tests/unit/test-blockjob.c >> @@ -488,11 +488,15 @@ static void test_complete_in_standby(void) >> bdrv_drain_all_begin(); >> assert_job_status_is(job, JOB_STATUS_STANDBY); >> + /* >> + * Increase pause_count so that the counter is >> + * unbalanced and job won't resume >> + */ >> + job_pause(job); >> + >> /* Lock the IO thread to prevent the job from being run */ > > I guess the above comment should now be removed, too? > >> - aio_context_acquire(ctx); >> /* This will schedule the job to resume it */ >> bdrv_drain_all_end(); >> - aio_context_release(ctx); > > Thomas > Makes sense, resending Emanuele
[PATCH v4 45/48] igb: Clear-on-read ICR when ICR.INTA is set
For GPIE.NSICR, Section 7.3.2.1.2 says: > ICR bits are cleared on register read. If GPIE.NSICR = 0b, then the > clear on read occurs only if no bit is set in the IMS or at least one > bit is set in the IMS and there is a true interrupt as reflected in > ICR.INTA. e1000e does similar though it checks for CTRL_EXT.IAME, which does not exist on igb. Suggested-by: Sriram Yagnaraman Signed-off-by: Akihiko Odaki --- hw/net/igb_core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index b68e24c9ee..29190054c6 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -2598,6 +2598,8 @@ igb_mac_icr_read(IGBCore *core, int index) } else if (core->mac[IMS] == 0) { trace_e1000e_irq_icr_clear_zero_ims(); igb_lower_interrupts(core, ICR, 0x); +} else if (core->mac[ICR] & E1000_ICR_INT_ASSERTED) { +igb_lower_interrupts(core, ICR, 0x); } else if (!msix_enabled(core->owner)) { trace_e1000e_irq_icr_clear_nonmsix_icr_read(); igb_lower_interrupts(core, ICR, 0x); -- 2.40.0
[PATCH v4 32/48] hw/net/net_rx_pkt: Enforce alignment for eth_header
eth_strip_vlan and eth_strip_vlan_ex refers to ehdr_buf as struct eth_header. Enforce alignment for the structure. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/net_rx_pkt.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c index 6125a063d7..1de42b4f51 100644 --- a/hw/net/net_rx_pkt.c +++ b/hw/net/net_rx_pkt.c @@ -23,7 +23,10 @@ struct NetRxPkt { struct virtio_net_hdr virt_hdr; -uint8_t ehdr_buf[sizeof(struct eth_header) + sizeof(struct vlan_header)]; +struct { +struct eth_header eth; +struct vlan_header vlan; +} ehdr_buf; struct iovec *vec; uint16_t vec_len_total; uint16_t vec_len; @@ -89,7 +92,7 @@ net_rx_pkt_pull_data(struct NetRxPkt *pkt, if (pkt->ehdr_buf_len) { net_rx_pkt_iovec_realloc(pkt, iovcnt + 1); -pkt->vec[0].iov_base = pkt->ehdr_buf; +pkt->vec[0].iov_base = &pkt->ehdr_buf; pkt->vec[0].iov_len = pkt->ehdr_buf_len; pkt->tot_len = pllen + pkt->ehdr_buf_len; @@ -120,7 +123,7 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt, assert(pkt); if (strip_vlan) { -pkt->ehdr_buf_len = eth_strip_vlan(iov, iovcnt, iovoff, pkt->ehdr_buf, +pkt->ehdr_buf_len = eth_strip_vlan(iov, iovcnt, iovoff, &pkt->ehdr_buf, &ploff, &tci); } else { pkt->ehdr_buf_len = 0; @@ -142,7 +145,7 @@ void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt, if (strip_vlan) { pkt->ehdr_buf_len = eth_strip_vlan_ex(iov, iovcnt, iovoff, vet, - pkt->ehdr_buf, + &pkt->ehdr_buf, &ploff, &tci); } else { pkt->ehdr_buf_len = 0; -- 2.40.0
[PATCH v4 35/48] igb: Use UDP for RSS hash
e1000e does not support using UDP for RSS hash, but igb does. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_regs.h | 3 +++ hw/net/igb_core.c | 16 2 files changed, 19 insertions(+) diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h index eb995d8b2e..e6ac26dc0e 100644 --- a/hw/net/igb_regs.h +++ b/hw/net/igb_regs.h @@ -659,6 +659,9 @@ union e1000_adv_rx_desc { #define E1000_RSS_QUEUE(reta, hash) (E1000_RETA_VAL(reta, hash) & 0x0F) +#define E1000_MRQ_RSS_TYPE_IPV4UDP 7 +#define E1000_MRQ_RSS_TYPE_IPV6UDP 8 + #define E1000_STATUS_IOV_MODE 0x0004 #define E1000_STATUS_NUM_VFS_SHIFT 14 diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 6d55b43fb4..41a2e5bf7b 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -287,6 +287,11 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt) return E1000_MRQ_RSS_TYPE_IPV4TCP; } +if (l4hdr_proto == ETH_L4_HDR_PROTO_UDP && +(core->mac[MRQC] & E1000_MRQC_RSS_FIELD_IPV4_UDP)) { +return E1000_MRQ_RSS_TYPE_IPV4UDP; +} + if (E1000_MRQC_EN_IPV4(core->mac[MRQC])) { return E1000_MRQ_RSS_TYPE_IPV4; } @@ -322,6 +327,11 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt) return E1000_MRQ_RSS_TYPE_IPV6TCPEX; } +if (l4hdr_proto == ETH_L4_HDR_PROTO_UDP && +(core->mac[MRQC] & E1000_MRQC_RSS_FIELD_IPV6_UDP)) { +return E1000_MRQ_RSS_TYPE_IPV6UDP; +} + if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) { return E1000_MRQ_RSS_TYPE_IPV6EX; } @@ -360,6 +370,12 @@ igb_rss_calc_hash(IGBCore *core, struct NetRxPkt *pkt, E1000E_RSSInfo *info) case E1000_MRQ_RSS_TYPE_IPV6EX: type = NetPktRssIpV6Ex; break; +case E1000_MRQ_RSS_TYPE_IPV4UDP: +type = NetPktRssIpV4Udp; +break; +case E1000_MRQ_RSS_TYPE_IPV6UDP: +type = NetPktRssIpV6Udp; +break; default: assert(false); return 0; -- 2.40.0
[PATCH v4 43/48] e1000e: Notify only new interrupts
In MSI-X mode, if there are interrupts already notified but not cleared and a new interrupt arrives, e1000e incorrectly notifies the notified ones again along with the new one. To fix this issue, replace e1000e_update_interrupt_state() with two new functions: e1000e_raise_interrupts() and e1000e_lower_interrupts(). These functions don't only raise or lower interrupts, but it also performs register writes which updates the interrupt state. Before it performs a register write, these function determines the interrupts already raised, and compares with the interrupts raised after the register write to determine the interrupts to notify. The introduction of these functions made tracepoints which assumes that the caller of e1000e_update_interrupt_state() performs register writes obsolete. These tracepoints are now removed, and alternative ones are added to the new functions. Signed-off-by: Akihiko Odaki --- hw/net/e1000e_core.h | 2 - hw/net/e1000e_core.c | 153 +++ hw/net/trace-events | 2 + 3 files changed, 69 insertions(+), 88 deletions(-) diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h index 213a70530d..66b025cc43 100644 --- a/hw/net/e1000e_core.h +++ b/hw/net/e1000e_core.h @@ -111,8 +111,6 @@ struct E1000Core { PCIDevice *owner; void (*owner_start_recv)(PCIDevice *d); -uint32_t msi_causes_pending; - int64_t timadj; }; diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index d601386992..9f185d099c 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -165,14 +165,14 @@ e1000e_intrmgr_on_throttling_timer(void *opaque) timer->running = false; -if (msi_enabled(timer->core->owner)) { -trace_e1000e_irq_msi_notify_postponed(); -/* Clear msi_causes_pending to fire MSI eventually */ -timer->core->msi_causes_pending = 0; -e1000e_set_interrupt_cause(timer->core, 0); -} else { -trace_e1000e_irq_legacy_notify_postponed(); -e1000e_set_interrupt_cause(timer->core, 0); +if (timer->core->mac[IMS] & timer->core->mac[ICR]) { +if (msi_enabled(timer->core->owner)) { +trace_e1000e_irq_msi_notify_postponed(); +msi_notify(timer->core->owner, 0); +} else { +trace_e1000e_irq_legacy_notify_postponed(); +e1000e_raise_legacy_irq(timer->core); +} } } @@ -366,10 +366,6 @@ static void e1000e_intrmgr_fire_all_timers(E1000ECore *core) { int i; -uint32_t val = e1000e_intmgr_collect_delayed_causes(core); - -trace_e1000e_irq_adding_delayed_causes(val, core->mac[ICR]); -core->mac[ICR] |= val; if (core->itr.running) { timer_del(core->itr.timer); @@ -1974,13 +1970,6 @@ void(*e1000e_phyreg_writeops[E1000E_PHY_PAGES][E1000E_PHY_PAGE_SIZE]) } }; -static inline void -e1000e_clear_ims_bits(E1000ECore *core, uint32_t bits) -{ -trace_e1000e_irq_clear_ims(bits, core->mac[IMS], core->mac[IMS] & ~bits); -core->mac[IMS] &= ~bits; -} - static inline bool e1000e_postpone_interrupt(E1000IntrDelayTimer *timer) { @@ -2038,7 +2027,6 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t cause, uint32_t int_cfg) effective_eiac = core->mac[EIAC] & cause; core->mac[ICR] &= ~effective_eiac; -core->msi_causes_pending &= ~effective_eiac; if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { core->mac[IMS] &= ~effective_eiac; @@ -2130,33 +2118,17 @@ e1000e_fix_icr_asserted(E1000ECore *core) trace_e1000e_irq_fix_icr_asserted(core->mac[ICR]); } -static void -e1000e_send_msi(E1000ECore *core, bool msix) +static void e1000e_raise_interrupts(E1000ECore *core, +size_t index, uint32_t causes) { -uint32_t causes = core->mac[ICR] & core->mac[IMS] & ~E1000_ICR_ASSERTED; - -core->msi_causes_pending &= causes; -causes ^= core->msi_causes_pending; -if (causes == 0) { -return; -} -core->msi_causes_pending |= causes; +bool is_msix = msix_enabled(core->owner); +uint32_t old_causes = core->mac[IMS] & core->mac[ICR]; +uint32_t raised_causes; -if (msix) { -e1000e_msix_notify(core, causes); -} else { -if (!e1000e_itr_should_postpone(core)) { -trace_e1000e_irq_msi_notify(causes); -msi_notify(core->owner, 0); -} -} -} +trace_e1000e_irq_set(index << 2, + core->mac[index], core->mac[index] | causes); -static void -e1000e_update_interrupt_state(E1000ECore *core) -{ -bool interrupts_pending; -bool is_msix = msix_enabled(core->owner); +core->mac[index] |= causes; /* Set ICR[OTHER] for MSI-X */ if (is_msix) { @@ -2178,40 +2150,58 @@ e1000e_update_interrupt_state(E1000ECore *core) */ core->mac[ICS] = core->mac[ICR]; -interrupts_pending = (core->mac[IMS] & core->mac[ICR]) ? true : false; -if (!interrupts_pending) { -core->msi_causes_pending =
[PATCH v2] tests/unit/test-blockjob: Re-enable complete_in_standby test
Pause the job while draining so that pause_count will be increased and bdrv_drain_all_end() won't reset it to 0, so the job will not continue. With this fix, the test is not flaky anymore. Additionally remove useless aiocontext lock around bdrv_drain_all_end() in test_complete_in_standby(). Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable complete_in_standby test" Suggested-by: Hanna Czenczek Signed-off-by: Emanuele Giuseppe Esposito --- tests/unit/test-blockjob.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c index a130f6fefb..7c03958feb 100644 --- a/tests/unit/test-blockjob.c +++ b/tests/unit/test-blockjob.c @@ -488,11 +488,14 @@ static void test_complete_in_standby(void) bdrv_drain_all_begin(); assert_job_status_is(job, JOB_STATUS_STANDBY); -/* Lock the IO thread to prevent the job from being run */ -aio_context_acquire(ctx); +/* + * Increase pause_count so that the counter is + * unbalanced and job won't resume + */ +job_pause(job); + /* This will schedule the job to resume it */ bdrv_drain_all_end(); -aio_context_release(ctx); WITH_JOB_LOCK_GUARD() { /* But the job cannot run, so it will remain on standby */ @@ -531,13 +534,6 @@ int main(int argc, char **argv) g_test_add_func("/blockjob/cancel/standby", test_cancel_standby); g_test_add_func("/blockjob/cancel/pending", test_cancel_pending); g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded); - -/* - * This test is flaky and sometimes fails in CI and otherwise: - * don't run unless user opts in via environment variable. - */ -if (getenv("QEMU_TEST_FLAKY_TESTS")) { -g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby); -} +g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby); return g_test_run(); } -- 2.39.1
[PATCH v4 42/48] igb: Implement Tx timestamp
Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_regs.h | 3 +++ hw/net/igb_core.c | 7 +++ 2 files changed, 10 insertions(+) diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h index 894705599d..82ff195dfc 100644 --- a/hw/net/igb_regs.h +++ b/hw/net/igb_regs.h @@ -322,6 +322,9 @@ union e1000_adv_rx_desc { /* E1000_EITR_CNT_IGNR is only for 82576 and newer */ #define E1000_EITR_CNT_IGNR 0x8000 /* Don't reset counters on write */ +#define E1000_TSYNCTXCTL_VALID0x0001 /* tx timestamp valid */ +#define E1000_TSYNCTXCTL_ENABLED 0x0010 /* enable tx timestampping */ + /* PCI Express Control */ #define E1000_GCR_CMPL_TMOUT_MASK 0xF000 #define E1000_GCR_CMPL_TMOUT_10ms 0x1000 diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index e0433ae7cf..57973c3ae4 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -659,6 +659,13 @@ igb_process_tx_desc(IGBCore *core, tx->ctx[idx].vlan_macip_lens >> IGB_TX_FLAGS_VLAN_SHIFT, !!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE)); +if ((tx->first_cmd_type_len & E1000_ADVTXD_MAC_TSTAMP) && +(core->mac[TSYNCTXCTL] & E1000_TSYNCTXCTL_ENABLED) && +!(core->mac[TSYNCTXCTL] & E1000_TSYNCTXCTL_VALID)) { +core->mac[TSYNCTXCTL] |= E1000_TSYNCTXCTL_VALID; +e1000x_timestamp(core->mac, core->timadj, TXSTMPL, TXSTMPH); +} + if (igb_tx_pkt_send(core, tx, queue_index)) { igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index); } -- 2.40.0
[PATCH v4 41/48] igb: Implement Rx PTP2 timestamp
Signed-off-by: Akihiko Odaki --- hw/net/igb_common.h | 16 +++--- hw/net/igb_regs.h | 23 hw/net/igb_core.c | 129 3 files changed, 127 insertions(+), 41 deletions(-) diff --git a/hw/net/igb_common.h b/hw/net/igb_common.h index f2a9065791..5c261ba9d3 100644 --- a/hw/net/igb_common.h +++ b/hw/net/igb_common.h @@ -51,7 +51,7 @@ defreg_indexeda(x, 0), defreg_indexeda(x, 1), \ defreg_indexeda(x, 2), defreg_indexeda(x, 3) -#define defregv(x) defreg_indexed(x, 0), defreg_indexed(x, 1), \ +#define defreg8(x) defreg_indexed(x, 0), defreg_indexed(x, 1), \ defreg_indexed(x, 2), defreg_indexed(x, 3), \ defreg_indexed(x, 4), defreg_indexed(x, 5), \ defreg_indexed(x, 6), defreg_indexed(x, 7) @@ -122,6 +122,8 @@ enum { defreg(EICS),defreg(EIMS),defreg(EIMC), defreg(EIAM), defreg(EICR),defreg(IVAR_MISC), defreg(GPIE), +defreg(TSYNCRXCFG), defreg8(ETQF), + defreg(RXPBS), defregd(RDBAL), defregd(RDBAH), defregd(RDLEN), defregd(SRRCTL),defregd(RDH), defregd(RDT), defregd(RXDCTL),defregd(RXCTL), defregd(RQDPC), defreg(RA2), @@ -133,15 +135,15 @@ enum { defreg(VT_CTL), -defregv(P2VMAILBOX), defregv(V2PMAILBOX), defreg(MBVFICR), defreg(MBVFIMR), +defreg8(P2VMAILBOX), defreg8(V2PMAILBOX), defreg(MBVFICR), defreg(MBVFIMR), defreg(VFLRE), defreg(VFRE),defreg(VFTE), defreg(WVBR), defreg(QDE), defreg(DTXSWC), defreg_indexed(VLVF, 0), -defregv(VMOLR), defreg(RPLOLR), defregv(VMBMEM), defregv(VMVIR), +defreg8(VMOLR), defreg(RPLOLR), defreg8(VMBMEM), defreg8(VMVIR), -defregv(PVTCTRL),defregv(PVTEICS),defregv(PVTEIMS), defregv(PVTEIMC), -defregv(PVTEIAC),defregv(PVTEIAM),defregv(PVTEICR), defregv(PVFGPRC), -defregv(PVFGPTC),defregv(PVFGORC),defregv(PVFGOTC), defregv(PVFMPRC), -defregv(PVFGPRLBC), defregv(PVFGPTLBC), defregv(PVFGORLBC), defregv(PVFGOTLBC), +defreg8(PVTCTRL),defreg8(PVTEICS),defreg8(PVTEIMS), defreg8(PVTEIMC), +defreg8(PVTEIAC),defreg8(PVTEIAM),defreg8(PVTEICR), defreg8(PVFGPRC), +defreg8(PVFGPTC),defreg8(PVFGORC),defreg8(PVFGOTC), defreg8(PVFMPRC), +defreg8(PVFGPRLBC), defreg8(PVFGPTLBC), defreg8(PVFGORLBC), defreg8(PVFGOTLBC), defreg(MTA_A), diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h index 4b4ebd3369..894705599d 100644 --- a/hw/net/igb_regs.h +++ b/hw/net/igb_regs.h @@ -210,6 +210,15 @@ union e1000_adv_rx_desc { #define E1000_DCA_TXCTRL_CPUID_SHIFT 24 /* Tx CPUID now in the last byte */ #define E1000_DCA_RXCTRL_CPUID_SHIFT 24 /* Rx CPUID now in the last byte */ +/* ETQF register bit definitions */ +#define E1000_ETQF_FILTER_ENABLE BIT(26) +#define E1000_ETQF_1588BIT(30) +#define E1000_ETQF_IMM_INT BIT(29) +#define E1000_ETQF_QUEUE_ENABLEBIT(31) +#define E1000_ETQF_QUEUE_SHIFT 16 +#define E1000_ETQF_QUEUE_MASK 0x0007 +#define E1000_ETQF_ETYPE_MASK 0x + #define E1000_DTXSWC_MAC_SPOOF_MASK 0x00FF /* Per VF MAC spoof control */ #define E1000_DTXSWC_VLAN_SPOOF_MASK 0xFF00 /* Per VF VLAN spoof control */ #define E1000_DTXSWC_LLE_MASK 0x00FF /* Per VF Local LB enables */ @@ -384,6 +393,20 @@ union e1000_adv_rx_desc { #define E1000_FRTIMER 0x01048 /* Free Running Timer - RW */ #define E1000_FCRTV 0x02460 /* Flow Control Refresh Timer Value - RW */ +#define E1000_TSYNCRXCFG 0x05F50 /* Time Sync Rx Configuration - RW */ + +/* Filtering Registers */ +#define E1000_SAQF(_n) (0x5980 + 4 * (_n)) +#define E1000_DAQF(_n) (0x59A0 + 4 * (_n)) +#define E1000_SPQF(_n) (0x59C0 + 4 * (_n)) +#define E1000_FTQF(_n) (0x59E0 + 4 * (_n)) +#define E1000_SAQF0 E1000_SAQF(0) +#define E1000_DAQF0 E1000_DAQF(0) +#define E1000_SPQF0 E1000_SPQF(0) +#define E1000_FTQF0 E1000_FTQF(0) +#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */ +#define E1000_ETQF(_n) (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */ + #define E1000_RQDPC(_n) (0x0C030 + ((_n) * 0x40)) #define E1000_RXPBS 0x02404 /* Rx Packet Buffer Size - RW */ diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index c04ec01117..e0433ae7cf 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -72,6 +72,24 @@ typedef struct L2Header { struct vlan_header vlan[2]; } L2Header; +typedef struct PTP2 { +uint8_t message_id_transport_specific; +uint8_t version_ptp; +uint16_t message_length; +uint8_t subdomain_number; +uint8_t reserved0; +uint16_t flags; +uint64_t correction; +uint8_t reserved1[5]; +uint8_t source_communication_technology; +uint32_t source_uuid_lo; +uint16_t source_uuid_hi; +uint16_t source_port_id; +uint16_t sequ
[PATCH v4 19/48] igb: Always log status after building rx metadata
Without this change, the status flags may not be traced e.g. if checksum offloading is disabled. Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- hw/net/igb_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 209fdad862..946b917f91 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1303,9 +1303,8 @@ igb_build_rx_metadata(IGBCore *core, trace_e1000e_rx_metadata_l4_cso_disabled(); } -trace_e1000e_rx_metadata_status_flags(*status_flags); - func_exit: +trace_e1000e_rx_metadata_status_flags(*status_flags); *status_flags = cpu_to_le32(*status_flags); } -- 2.40.0
[PATCH v4 48/48] docs/system/devices/igb: Note igb is tested for DPDK
Signed-off-by: Akihiko Odaki --- docs/system/devices/igb.rst | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst index afe036dad2..60c10bf7c7 100644 --- a/docs/system/devices/igb.rst +++ b/docs/system/devices/igb.rst @@ -14,7 +14,8 @@ Limitations === This igb implementation was tested with Linux Test Project [2]_ and Windows HLK -[3]_ during the initial development. The command used when testing with LTP is: +[3]_ during the initial development. Later it was also tested with DPDK Test +Suite [4]_. The command used when testing with LTP is: .. code-block:: shell @@ -22,8 +23,8 @@ This igb implementation was tested with Linux Test Project [2]_ and Windows HLK Be aware that this implementation lacks many functionalities available with the actual hardware, and you may experience various failures if you try to use it -with a different operating system other than Linux and Windows or if you try -functionalities not covered by the tests. +with a different operating system other than DPDK, Linux, and Windows or if you +try functionalities not covered by the tests. Using igb = @@ -32,7 +33,7 @@ Using igb should be nothing different from using another network device. See :ref:`pcsys_005fnetwork` in general. However, you may also need to perform additional steps to activate SR-IOV -feature on your guest. For Linux, refer to [4]_. +feature on your guest. For Linux, refer to [5]_. Developing igb == @@ -68,4 +69,5 @@ References .. [1] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82576eb-gigabit-ethernet-controller-datasheet.pdf .. [2] https://github.com/linux-test-project/ltp .. [3] https://learn.microsoft.com/en-us/windows-hardware/test/hlk/ -.. [4] https://docs.kernel.org/PCI/pci-iov-howto.html +.. [4] https://doc.dpdk.org/dts/gsg/ +.. [5] https://docs.kernel.org/PCI/pci-iov-howto.html -- 2.40.0
Re: [PATCH 01/13] hw/ide/pci: Expose legacy interrupts as GPIOs
On 22/04/2023 16:07, Bernhard Beschow wrote: Exposing the legacy IDE interrupts as GPIOs allows them to be connected in the parent device through qdev_connect_gpio_out(), i.e. without accessing private data of TYPE_PCI_IDE. Signed-off-by: Bernhard Beschow --- hw/ide/pci.c | 8 1 file changed, 8 insertions(+) diff --git a/hw/ide/pci.c b/hw/ide/pci.c index fc9224bbc9..942e216b9b 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -522,10 +522,18 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d) bm->pci_dev = d; } +static void pci_ide_init(Object *obj) +{ +PCIIDEState *d = PCI_IDE(obj); + +qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq)); Just one minor nit: can we make this qdev_init_gpio_out_named() and call it "isa-irq" to match? This is for 2 reasons: firstly these are PCI devices and so an unnamed IRQ/gpio could be considered to belong to PCI, and secondly it gives the gpio the same name as the struct field. From my previous email I think this should supercede Phil's patch at https://patchew.org/QEMU/20230302224058.43315-1-phi...@linaro.org/20230302224058.43315-2-phi...@linaro.org/. +} + static const TypeInfo pci_ide_type_info = { .name = TYPE_PCI_IDE, .parent = TYPE_PCI_DEVICE, .instance_size = sizeof(PCIIDEState), +.instance_init = pci_ide_init, .abstract = true, .interfaces = (InterfaceInfo[]) { { INTERFACE_CONVENTIONAL_PCI_DEVICE }, Otherwise: Reviewed-by: Mark Cave-Ayland ATB, Mark.
[PATCH v4 38/48] igb: Strip the second VLAN tag for extended VLAN
Signed-off-by: Akihiko Odaki --- hw/net/net_rx_pkt.h | 19 include/net/eth.h| 4 ++-- hw/net/e1000e_core.c | 3 ++- hw/net/igb_core.c| 14 ++-- hw/net/net_rx_pkt.c | 15 + net/eth.c| 52 6 files changed, 65 insertions(+), 42 deletions(-) diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h index ce8dbdb284..55ec67a1a7 100644 --- a/hw/net/net_rx_pkt.h +++ b/hw/net/net_rx_pkt.h @@ -223,18 +223,19 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt, /** * attach scatter-gather data to rx packet * -* @pkt:packet -* @iov:received data scatter-gather list -* @iovcnt number of elements in iov -* @iovoff data start offset in the iov -* @strip_vlan: should the module strip vlan from data -* @vet:VLAN tag Ethernet type +* @pkt: packet +* @iov: received data scatter-gather list +* @iovcnt: number of elements in iov +* @iovoff: data start offset in the iov +* @strip_vlan_index: index of Q tag if it is to be stripped. negative otherwise. +* @vet: VLAN tag Ethernet type +* @vet_ext: outer VLAN tag Ethernet type * */ void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt, - const struct iovec *iov, int iovcnt, - size_t iovoff, bool strip_vlan, - uint16_t vet); +const struct iovec *iov, int iovcnt, +size_t iovoff, int strip_vlan_index, +uint16_t vet, uint16_t vet_ext); /** * attach data to rx packet diff --git a/include/net/eth.h b/include/net/eth.h index 75e7f1551c..3b80b6e07f 100644 --- a/include/net/eth.h +++ b/include/net/eth.h @@ -347,8 +347,8 @@ eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff, uint16_t *payload_offset, uint16_t *tci); size_t -eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, - uint16_t vet, void *new_ehdr_buf, +eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, int index, + uint16_t vet, uint16_t vet_ext, void *new_ehdr_buf, uint16_t *payload_offset, uint16_t *tci); uint16_t diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 0b939ff5a3..d601386992 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1711,7 +1711,8 @@ e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt, } net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs, - e1000x_vlan_enabled(core->mac), core->mac[VET]); + e1000x_vlan_enabled(core->mac) ? 0 : -1, + core->mac[VET], 0); e1000e_rss_parse_packet(core, core->rx_pkt, &rss_info); e1000e_rx_ring_init(core, &rxr, rss_info.queue); diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 5eacf1cd8c..688eaf7319 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -1611,6 +1611,7 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, E1000E_RxRing rxr; E1000E_RSSInfo rss_info; size_t total_size; +int strip_vlan_index; int i; trace_e1000e_rx_receive_iov(iovcnt); @@ -1672,9 +1673,18 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, igb_rx_ring_init(core, &rxr, i); +if (!igb_rx_strip_vlan(core, rxr.i)) { +strip_vlan_index = -1; +} else if (core->mac[CTRL_EXT] & BIT(26)) { +strip_vlan_index = 1; +} else { +strip_vlan_index = 0; +} + net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs, - igb_rx_strip_vlan(core, rxr.i), - core->mac[VET] & 0x); + strip_vlan_index, + core->mac[VET] & 0x, + core->mac[VET] >> 16); total_size = net_rx_pkt_get_total_len(core->rx_pkt) + e1000x_fcs_len(core->mac); diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c index 3575c8b9f9..32e5f3f9cf 100644 --- a/hw/net/net_rx_pkt.c +++ b/hw/net/net_rx_pkt.c @@ -137,20 +137,17 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt, void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt, const struct iovec *iov, int iovcnt, -size_t iovoff, bool strip_vlan, -uint16_t vet) +size_t iovoff, int strip_vlan_index, +uint16_t vet, uint16_t vet_ext) { uint16_t tci = 0; uint16_t ploff = iovoff; assert(pkt); -if (strip_vlan) { -pkt->
[PATCH v4 22/48] e1000e: Reset packet state after emptying Tx queue
Keeping Tx packet state after the transmit queue is emptied has some problems: - The datasheet says the descriptors can be reused after the transmit queue is emptied, but the Tx packet state may keep references to them. - The Tx packet state cannot be migrated so it can be reset anytime the migration happens. Always reset Tx packet state always after the queue is emptied. Signed-off-by: Akihiko Odaki --- hw/net/e1000e_core.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 6a213c0224..7dce448657 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -959,6 +959,8 @@ e1000e_start_xmit(E1000ECore *core, const E1000E_TxRing *txr) if (!ide || !e1000e_intrmgr_delay_tx_causes(core, &cause)) { e1000e_set_interrupt_cause(core, cause); } + +net_tx_pkt_reset(txr->tx->tx_pkt, net_tx_pkt_unmap_frag_pci, core->owner); } static bool @@ -3389,8 +3391,6 @@ e1000e_core_pci_uninit(E1000ECore *core) qemu_del_vm_change_state_handler(core->vmstate); for (i = 0; i < E1000E_NUM_QUEUES; i++) { -net_tx_pkt_reset(core->tx[i].tx_pkt, - net_tx_pkt_unmap_frag_pci, core->owner); net_tx_pkt_uninit(core->tx[i].tx_pkt); } @@ -3515,8 +3515,6 @@ static void e1000e_reset(E1000ECore *core, bool sw) e1000x_reset_mac_addr(core->owner_nic, core->mac, core->permanent_mac); for (i = 0; i < ARRAY_SIZE(core->tx); i++) { -net_tx_pkt_reset(core->tx[i].tx_pkt, - net_tx_pkt_unmap_frag_pci, core->owner); memset(&core->tx[i].props, 0, sizeof(core->tx[i].props)); core->tx[i].skip_cp = false; } -- 2.40.0
[PATCH v4 47/48] MAINTAINERS: Add a reviewer for network packet abstractions
I have made significant changes for network packet abstractions so add me as a reviewer. Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index c31d2279ab..8b2ef5943c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2214,6 +2214,7 @@ F: tests/qtest/fuzz-megasas-test.c Network packet abstractions M: Dmitry Fleytman +R: Akihiko Odaki S: Maintained F: include/net/eth.h F: net/eth.c -- 2.40.0
[PATCH v4 40/48] igb: Implement igb-specific oversize check
igb has a configurable size limit for LPE, and uses different limits depending on whether the packet is treated as a VLAN packet. Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/igb_core.c | 36 +--- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 5345f57031..c04ec01117 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -980,16 +980,13 @@ igb_rx_l4_cso_enabled(IGBCore *core) return !!(core->mac[RXCSUM] & E1000_RXCSUM_TUOFLD); } -static bool -igb_rx_is_oversized(IGBCore *core, uint16_t qn, size_t size) +static bool igb_rx_is_oversized(IGBCore *core, const struct eth_header *ehdr, +size_t size, size_t vlan_num, +bool lpe, uint16_t rlpml) { -uint16_t pool = qn % IGB_NUM_VM_POOLS; -bool lpe = !!(core->mac[VMOLR0 + pool] & E1000_VMOLR_LPE); -int max_ethernet_lpe_size = -core->mac[VMOLR0 + pool] & E1000_VMOLR_RLPML_MASK; -int max_ethernet_vlan_size = 1522; - -return size > (lpe ? max_ethernet_lpe_size : max_ethernet_vlan_size); +size_t vlan_header_size = sizeof(struct vlan_header) * vlan_num; +size_t header_size = sizeof(struct eth_header) + vlan_header_size; +return lpe ? size + ETH_FCS_LEN > rlpml : size > header_size + ETH_MTU; } static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, @@ -1002,6 +999,8 @@ static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, uint16_t queues = 0; uint16_t oversized = 0; size_t vlan_num = 0; +bool lpe; +uint16_t rlpml; int i; memset(rss_info, 0, sizeof(E1000E_RSSInfo)); @@ -1021,6 +1020,14 @@ static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, } } +lpe = !!(core->mac[RCTL] & E1000_RCTL_LPE); +rlpml = core->mac[RLPML]; +if (!(core->mac[RCTL] & E1000_RCTL_SBP) && +igb_rx_is_oversized(core, ehdr, size, vlan_num, lpe, rlpml)) { +trace_e1000x_rx_oversized(size); +return queues; +} + if (vlan_num && !e1000x_rx_vlan_filter(core->mac, l2_header->vlan + vlan_num - 1)) { return queues; @@ -1106,7 +1113,11 @@ static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, queues &= core->mac[VFRE]; if (queues) { for (i = 0; i < IGB_NUM_VM_POOLS; i++) { -if ((queues & BIT(i)) && igb_rx_is_oversized(core, i, size)) { +lpe = !!(core->mac[VMOLR0 + i] & E1000_VMOLR_LPE); +rlpml = core->mac[VMOLR0 + i] & E1000_VMOLR_RLPML_MASK; +if ((queues & BIT(i)) && +igb_rx_is_oversized(core, ehdr, size, vlan_num, +lpe, rlpml)) { oversized |= BIT(i); } } @@ -1662,11 +1673,6 @@ igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt, iov_to_buf(iov, iovcnt, iov_ofs, &buf, sizeof(buf.l2_header)); } -/* Discard oversized packets if !LPE and !SBP. */ -if (e1000x_is_oversized(core->mac, size)) { -return orig_size; -} - net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(&buf.l2_header.eth)); net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs); -- 2.40.0
[PATCH v4 39/48] igb: Filter with the second VLAN tag for extended VLAN
Signed-off-by: Akihiko Odaki --- hw/net/igb_core.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c index 688eaf7319..5345f57031 100644 --- a/hw/net/igb_core.c +++ b/hw/net/igb_core.c @@ -69,7 +69,7 @@ typedef struct IGBTxPktVmdqCallbackContext { typedef struct L2Header { struct eth_header eth; -struct vlan_header vlan; +struct vlan_header vlan[2]; } L2Header; static ssize_t @@ -1001,7 +1001,7 @@ static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, uint32_t f, ra[2], *macp, rctl = core->mac[RCTL]; uint16_t queues = 0; uint16_t oversized = 0; -uint16_t vid = be16_to_cpu(l2_header->vlan.h_tci) & VLAN_VID_MASK; +size_t vlan_num = 0; int i; memset(rss_info, 0, sizeof(E1000E_RSSInfo)); @@ -1010,8 +1010,19 @@ static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, *external_tx = true; } -if (e1000x_is_vlan_packet(ehdr, core->mac[VET] & 0x) && -!e1000x_rx_vlan_filter(core->mac, PKT_GET_VLAN_HDR(ehdr))) { +if (core->mac[CTRL_EXT] & BIT(26)) { +if (be16_to_cpu(ehdr->h_proto) == core->mac[VET] >> 16 && +be16_to_cpu(l2_header->vlan[0].h_proto) == (core->mac[VET] & 0x)) { +vlan_num = 2; +} +} else { +if (be16_to_cpu(ehdr->h_proto) == (core->mac[VET] & 0x)) { +vlan_num = 1; +} +} + +if (vlan_num && +!e1000x_rx_vlan_filter(core->mac, l2_header->vlan + vlan_num - 1)) { return queues; } @@ -1065,7 +1076,9 @@ static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header, if (e1000x_vlan_rx_filter_enabled(core->mac)) { uint16_t mask = 0; -if (e1000x_is_vlan_packet(ehdr, core->mac[VET] & 0x)) { +if (vlan_num) { +uint16_t vid = be16_to_cpu(l2_header->vlan[vlan_num - 1].h_tci) & VLAN_VID_MASK; + for (i = 0; i < E1000_VLVF_ARRAY_SIZE; i++) { if ((core->mac[VLVF0 + i] & E1000_VLVF_VLANID_MASK) == vid && (core->mac[VLVF0 + i] & E1000_VLVF_VLANID_ENABLE)) { -- 2.40.0
[PATCH v4 03/48] e1000x: Fix BPRC and MPRC
Before this change, e1000 and the common code updated BPRC and MPRC depending on the matched filter, but e1000e and igb decided to update those counters by deriving the packet type independently. This inconsistency caused a multicast packet to be counted twice. Updating BPRC and MPRC depending on are fundamentally flawed anyway as a filter can be used for different types of packets. For example, it is possible to filter broadcast packets with MTA. Always determine what counters to update by inspecting the packets. Fixes: 3b27430177 ("e1000: Implementing various counters") Signed-off-by: Akihiko Odaki Reviewed-by: Sriram Yagnaraman --- hw/net/e1000x_common.h | 5 +++-- hw/net/e1000.c | 6 +++--- hw/net/e1000e_core.c | 20 +++- hw/net/e1000x_common.c | 25 +++-- hw/net/igb_core.c | 22 +- 5 files changed, 33 insertions(+), 45 deletions(-) diff --git a/hw/net/e1000x_common.h b/hw/net/e1000x_common.h index 911abd8a90..0298e06283 100644 --- a/hw/net/e1000x_common.h +++ b/hw/net/e1000x_common.h @@ -91,8 +91,9 @@ e1000x_update_regs_on_link_up(uint32_t *mac, uint16_t *phy) } void e1000x_update_rx_total_stats(uint32_t *mac, - size_t data_size, - size_t data_fcs_size); + eth_pkt_types_e pkt_type, + size_t pkt_size, + size_t pkt_fcs_size); void e1000x_core_prepare_eeprom(uint16_t *eeprom, const uint16_t *templ, diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 59bacb5d3b..18eb6d8876 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -826,12 +826,10 @@ receive_filter(E1000State *s, const uint8_t *buf, int size) } if (ismcast && (rctl & E1000_RCTL_MPE)) { /* promiscuous mcast */ -e1000x_inc_reg_if_not_full(s->mac_reg, MPRC); return 1; } if (isbcast && (rctl & E1000_RCTL_BAM)) { /* broadcast enabled */ -e1000x_inc_reg_if_not_full(s->mac_reg, BPRC); return 1; } @@ -922,6 +920,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt) size_t desc_offset; size_t desc_size; size_t total_size; +eth_pkt_types_e pkt_type; if (!e1000x_hw_rx_enabled(s->mac_reg)) { return -1; @@ -971,6 +970,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt) size -= 4; } +pkt_type = get_eth_packet_type(PKT_GET_ETH_HDR(filter_buf)); rdh_start = s->mac_reg[RDH]; desc_offset = 0; total_size = size + e1000x_fcs_len(s->mac_reg); @@ -1036,7 +1036,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt) } } while (desc_offset < total_size); -e1000x_update_rx_total_stats(s->mac_reg, size, total_size); +e1000x_update_rx_total_stats(s->mac_reg, pkt_type, size, total_size); n = E1000_ICS_RXT0; if ((rdt = s->mac_reg[RDT]) < s->mac_reg[RDH]) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 15821a75e0..c2d864a504 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1488,24 +1488,10 @@ e1000e_write_to_rx_buffers(E1000ECore *core, } static void -e1000e_update_rx_stats(E1000ECore *core, - size_t data_size, - size_t data_fcs_size) +e1000e_update_rx_stats(E1000ECore *core, size_t pkt_size, size_t pkt_fcs_size) { -e1000x_update_rx_total_stats(core->mac, data_size, data_fcs_size); - -switch (net_rx_pkt_get_packet_type(core->rx_pkt)) { -case ETH_PKT_BCAST: -e1000x_inc_reg_if_not_full(core->mac, BPRC); -break; - -case ETH_PKT_MCAST: -e1000x_inc_reg_if_not_full(core->mac, MPRC); -break; - -default: -break; -} +eth_pkt_types_e pkt_type = net_rx_pkt_get_packet_type(core->rx_pkt); +e1000x_update_rx_total_stats(core->mac, pkt_type, pkt_size, pkt_fcs_size); } static inline bool diff --git a/hw/net/e1000x_common.c b/hw/net/e1000x_common.c index 4c8e7dcf70..7694673bcc 100644 --- a/hw/net/e1000x_common.c +++ b/hw/net/e1000x_common.c @@ -80,7 +80,6 @@ bool e1000x_rx_group_filter(uint32_t *mac, const uint8_t *buf) f = mta_shift[(rctl >> E1000_RCTL_MO_SHIFT) & 3]; f = (((buf[5] << 8) | buf[4]) >> f) & 0xfff; if (mac[MTA + (f >> 5)] & (1 << (f & 0x1f))) { -e1000x_inc_reg_if_not_full(mac, MPRC); return true; } @@ -212,13 +211,14 @@ e1000x_rxbufsize(uint32_t rctl) void e1000x_update_rx_total_stats(uint32_t *mac, - size_t data_size, - size_t data_fcs_size) + eth_pkt_types_e pkt_type, + size_t pkt_size, + size_t pkt_fcs_size) { static const int PRCregs[6] = { PRC64, PRC127, PRC255, PRC511,
Re: [PATCH 02/13] hw/ide/via: Implement ISA IRQ routing
On 22/04/2023 16:07, Bernhard Beschow wrote: The VIA south bridge allows the legacy IDE interrupts to be routed to four different ISA interrupts. This can be configured through the 0x4a register in the PCI configuration space of the ISA function. The default routing matches the legacy ISA IRQs, that is 14 and 15. Implement this missing piece of the VIA south bridge. Signed-off-by: Bernhard Beschow --- hw/ide/via.c | 6 -- hw/isa/vt82c686.c | 17 + 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/hw/ide/via.c b/hw/ide/via.c index 177baea9a7..0caae52276 100644 --- a/hw/ide/via.c +++ b/hw/ide/via.c @@ -31,6 +31,7 @@ #include "sysemu/dma.h" #include "hw/isa/vt82c686.h" #include "hw/ide/pci.h" +#include "hw/irq.h" #include "trace.h" static uint64_t bmdma_read(void *opaque, hwaddr addr, @@ -104,7 +105,8 @@ static void bmdma_setup_bar(PCIIDEState *d) static void via_ide_set_irq(void *opaque, int n, int level) { -PCIDevice *d = PCI_DEVICE(opaque); +PCIIDEState *s = opaque; +PCIDevice *d = PCI_DEVICE(s); if (level) { d->config[0x70 + n * 8] |= 0x80; @@ -112,7 +114,7 @@ static void via_ide_set_irq(void *opaque, int n, int level) d->config[0x70 + n * 8] &= ~0x80; } -via_isa_set_irq(pci_get_function_0(d), 14 + n, level); +qemu_set_irq(s->isa_irq[n], level); } static void via_ide_reset(DeviceState *dev) diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c index ca89119ce0..c7e29bb46a 100644 --- a/hw/isa/vt82c686.c +++ b/hw/isa/vt82c686.c @@ -568,9 +568,19 @@ static const VMStateDescription vmstate_via = { } }; +static void via_isa_set_ide_irq(void *opaque, int n, int level) +{ +static const uint8_t irqs[] = { 14, 15, 10, 11 }; +ViaISAState *s = opaque; +uint8_t irq = irqs[(s->dev.config[0x4a] >> (n * 2)) & 0x3]; + +qemu_set_irq(s->isa_irqs_in[irq], level); +} + static void via_isa_init(Object *obj) { ViaISAState *s = VIA_ISA(obj); +DeviceState *dev = DEVICE(s); object_initialize_child(obj, "rtc", &s->rtc, TYPE_MC146818_RTC); object_initialize_child(obj, "ide", &s->ide, TYPE_VIA_IDE); @@ -578,6 +588,8 @@ static void via_isa_init(Object *obj) object_initialize_child(obj, "uhci2", &s->uhci[1], TYPE_VT82C686B_USB_UHCI); object_initialize_child(obj, "ac97", &s->ac97, TYPE_VIA_AC97); object_initialize_child(obj, "mc97", &s->mc97, TYPE_VIA_MC97); + +qdev_init_gpio_in_named(dev, via_isa_set_ide_irq, "ide", ARRAY_SIZE(s->ide.isa_irq)); } static const TypeInfo via_isa_info = { @@ -692,6 +704,10 @@ static void via_isa_realize(PCIDevice *d, Error **errp) if (!qdev_realize(DEVICE(&s->ide), BUS(pci_bus), errp)) { return; } +for (i = 0; i < 2; i++) { +qdev_connect_gpio_out(DEVICE(&s->ide), i, + qdev_get_gpio_in_named(DEVICE(s), "ide", i)); +} /* Functions 2-3: USB Ports */ for (i = 0; i < ARRAY_SIZE(s->uhci); i++) { @@ -814,6 +830,7 @@ static void vt8231_isa_reset(DeviceState *dev) PCI_COMMAND_MASTER | PCI_COMMAND_SPECIAL); pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM); +pci_conf[0x4a] = 0x04; /* IDE interrupt Routing */ pci_conf[0x58] = 0x40; /* Miscellaneous Control 0 */ pci_conf[0x67] = 0x08; /* Fast IR Config */ pci_conf[0x6b] = 0x01; /* Fast IR I/O Base */ I see there is still some further discussion on the exact datasheet being used, however the basic mechanism of wiring up the IDE IRQs using qdev_connect_gpio_out{_named}() in via_isa_realize(): Reviewed-by: Mark Cave-Ayland ATB, Mark.
Re: [PATCH 03/13] hw/isa/vt82c686: Remove via_isa_set_irq()
On 22/04/2023 16:07, Bernhard Beschow wrote: Now that via_isa_set_irq() is unused it can be removed. Signed-off-by: Bernhard Beschow --- include/hw/isa/vt82c686.h | 2 -- hw/isa/vt82c686.c | 6 -- 2 files changed, 8 deletions(-) diff --git a/include/hw/isa/vt82c686.h b/include/hw/isa/vt82c686.h index da1722daf2..b6e95b2851 100644 --- a/include/hw/isa/vt82c686.h +++ b/include/hw/isa/vt82c686.h @@ -34,6 +34,4 @@ struct ViaAC97State { uint32_t ac97_cmd; }; -void via_isa_set_irq(PCIDevice *d, int n, int level); - #endif diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c index c7e29bb46a..a69888a396 100644 --- a/hw/isa/vt82c686.c +++ b/hw/isa/vt82c686.c @@ -604,12 +604,6 @@ static const TypeInfo via_isa_info = { }, }; -void via_isa_set_irq(PCIDevice *d, int n, int level) -{ -ViaISAState *s = VIA_ISA(d); -qemu_set_irq(s->isa_irqs_in[n], level); -} - static void via_isa_request_i8259_irq(void *opaque, int irq, int level) { ViaISAState *s = opaque; Reviewed-by: Mark Cave-Ayland ATB, Mark.
Re: [PATCH 04/13] hw/ide: Extract IDEBus assignment into bmdma_init()
On 22/04/2023 16:07, Bernhard Beschow wrote: Every invocation of bmdma_init() is followed by `d->bmdma[i].bus = &d->bus[i]`. Resolve this redundancy by extracting it into bmdma_init(). Signed-off-by: Bernhard Beschow --- hw/ide/cmd646.c | 1 - hw/ide/pci.c | 1 + hw/ide/piix.c| 1 - hw/ide/sii3112.c | 1 - hw/ide/via.c | 1 - 5 files changed, 1 insertion(+), 4 deletions(-) diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index a68357c1c5..a094a6e12a 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -297,7 +297,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error **errp) ide_bus_init_output_irq(&d->bus[i], qdev_get_gpio_in(ds, i)); bmdma_init(&d->bus[i], &d->bmdma[i], d); -d->bmdma[i].bus = &d->bus[i]; ide_bus_register_restart_cb(&d->bus[i]); } } diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 942e216b9b..67e0998ff0 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -519,6 +519,7 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d) bus->dma = &bm->dma; bm->irq = bus->irq; bus->irq = qemu_allocate_irq(bmdma_irq, bm, 0); +bm->bus = bus; bm->pci_dev = d; } diff --git a/hw/ide/piix.c b/hw/ide/piix.c index 41d60921e3..a32f7ccece 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -144,7 +144,6 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp) ide_bus_init_output_irq(&d->bus[i], isa_get_irq(NULL, port_info[i].isairq)); bmdma_init(&d->bus[i], &d->bmdma[i], d); -d->bmdma[i].bus = &d->bus[i]; ide_bus_register_restart_cb(&d->bus[i]); return true; diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c index f9becdff8e..5dd3d03c29 100644 --- a/hw/ide/sii3112.c +++ b/hw/ide/sii3112.c @@ -287,7 +287,6 @@ static void sii3112_pci_realize(PCIDevice *dev, Error **errp) ide_bus_init_output_irq(&s->bus[i], qdev_get_gpio_in(ds, i)); bmdma_init(&s->bus[i], &s->bmdma[i], s); -s->bmdma[i].bus = &s->bus[i]; ide_bus_register_restart_cb(&s->bus[i]); } } diff --git a/hw/ide/via.c b/hw/ide/via.c index 0caae52276..91253fa4ef 100644 --- a/hw/ide/via.c +++ b/hw/ide/via.c @@ -196,7 +196,6 @@ static void via_ide_realize(PCIDevice *dev, Error **errp) ide_bus_init_output_irq(&d->bus[i], qdev_get_gpio_in(ds, i)); bmdma_init(&d->bus[i], &d->bmdma[i], d); -d->bmdma[i].bus = &d->bus[i]; ide_bus_register_restart_cb(&d->bus[i]); } } Reviewed-by: Mark Cave-Ayland ATB, Mark.
Re: [RFC PATCH 0/3] Deprecate the qemu-system-i386 binary
On 4/25/23 15:38, Thomas Huth wrote: - CPU types have different suffixes between the -x86_64 and -i386 variant (see TYPE_X86_CPU in cpu-qom.h) ... do we need to care about this in the new qemu-system-i386 symlink run mode? - The code in target/i386/tcg/sysemu/smm_helper.c looks like it maybe needs a runtime switch, too ... or is it ok to leave this hard-coded to the x86_64 version? Yes, it would have to switch based on the CPU's LM feature. Anyway, I'd like to get some feedback on this idea here... What do you think of the idea of getting rid of the qemu-system-i386 binary this way in the future? I wonder if we should take this a step further and rename qemu-system-x86_64 to qemu-system-x86! Distros can if they wish create symlinks to both qemu-system-i386 and qemu-system-x86_64. Then we would name the CPUs "foo-x86" and alias them to foo-x86_64 and, if they don't have LM set, to foo-i386 as well. Paolo
Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()
"manish.mishra" wrote: > On 26/04/23 3:58 pm, Juan Quintela wrote: >> "manish.mishra" wrote: >>> multifd_send_sync_main() posts request on the multifd channel >>> but does not call sem_wait() on channels_ready semaphore, making >>> the channels_ready semaphore count keep increasing. >>> As a result, sem_wait() on channels_ready in multifd_send_pages() >>> is always non-blocking hence multifd_send_pages() keeps searching >>> for a free channel in a busy loop until a channel is freed. >>> >>> Signed-off-by: manish.mishra >>> --- >>> migration/multifd.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/migration/multifd.c b/migration/multifd.c >>> index cce3ad6988..43d26e7012 100644 >>> --- a/migration/multifd.c >>> +++ b/migration/multifd.c >>> @@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f) >>> trace_multifd_send_sync_main_signal(p->id); >>> +qemu_sem_wait(&multifd_send_state->channels_ready); >>> qemu_mutex_lock(&p->mutex); >>> if (p->quit) { >> We need this, but I think it is better to put it on the second loop. >> >>> @@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp) >>> multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); >>> multifd_send_state->params = g_new0(MultiFDSendParams, thread_count); >>> multifd_send_state->pages = multifd_pages_init(page_count); >>> -qemu_sem_init(&multifd_send_state->channels_ready, 0); >>> +qemu_sem_init(&multifd_send_state->channels_ready, thread_count); >>> qatomic_set(&multifd_send_state->exiting, 0); >>> multifd_send_state->ops = multifd_ops[migrate_multifd_compression()]; >> I think this bit is wrong. >> We should not set the channels ready until the thread is ready and >> channel is created. >> >> What do you think about this patch: >> >> From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001 >> From: Juan Quintela >> Date: Wed, 26 Apr 2023 12:20:36 +0200 >> Subject: [PATCH] multifd: Fix the number of channels ready >> >> We don't wait in the sem when we are doing a sync_main. Make it wait >> there. To make things clearer, we mark the channel ready at the >> begining of the thread loop. >> >> This causes a busy loop in multifd_send_page(). >> Found-by: manish.mishra >> >> Signed-off-by: Juan Quintela >> --- >> migration/multifd.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/migration/multifd.c b/migration/multifd.c >> index 903df2117b..e625e8725e 100644 >> --- a/migration/multifd.c >> +++ b/migration/multifd.c >> @@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f) >> for (i = 0; i < migrate_multifd_channels(); i++) { >> MultiFDSendParams *p = &multifd_send_state->params[i]; >> +qemu_sem_wait(&multifd_send_state->channels_ready); >> trace_multifd_send_sync_main_wait(p->id); >> qemu_sem_wait(&p->sem_sync); >> @@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque) >> p->num_packets = 1; >> while (true) { >> +qemu_sem_post(&multifd_send_state->channels_ready); > > > This has one issue though, if we mark channel_ready here itself, channel is > actually not ready so we can still busy loop? Before: while (true) { sem_post(channels_ready) } And you want to add to the initialization a counter equal to the number of channels. Now: while (true) { sem_post(channels_ready) } It is semantically the same, but when we setup it ready it means that when we set it to 1, we now that the channel and thread are ready for action. > May be we can do one thing let the sem_post in while loop at same > position itself. But we can do another post just before start I can see how this can make any difference. > of this while loop, as that will be called only once it should do work > of initialising count equal to multiFD channels? Yeap. But I can see what difference do we have here. Later, Juan.
Re: [PATCH 05/13] hw/ide: Extract pci_ide_class_init()
On 22/04/2023 16:07, Bernhard Beschow wrote: Resolves redundant code in every PCI IDE device model. I think this needs to mention that it's moving the PCIDeviceClass::exit() function from all of the PCI IDE controller implementations to a common implementation in the parent PCI_IDE type. --- include/hw/ide/pci.h | 1 - hw/ide/cmd646.c | 15 --- hw/ide/pci.c | 25 - hw/ide/piix.c| 19 --- hw/ide/sii3112.c | 3 ++- hw/ide/via.c | 15 --- 6 files changed, 26 insertions(+), 52 deletions(-) diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h index 74c127e32f..7bc4e53d02 100644 --- a/include/hw/ide/pci.h +++ b/include/hw/ide/pci.h @@ -61,7 +61,6 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val); extern MemoryRegionOps bmdma_addr_ioport_ops; void pci_ide_create_devs(PCIDevice *dev); -extern const VMStateDescription vmstate_ide_pci; extern const MemoryRegionOps pci_ide_cmd_le_ops; extern const MemoryRegionOps pci_ide_data_le_ops; #endif diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index a094a6e12a..9aabf80e52 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -301,17 +301,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error **errp) } } -static void pci_cmd646_ide_exitfn(PCIDevice *dev) -{ -PCIIDEState *d = PCI_IDE(dev); -unsigned i; - -for (i = 0; i < 2; ++i) { -memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io); -memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport); -} -} - static Property cmd646_ide_properties[] = { DEFINE_PROP_UINT32("secondary", PCIIDEState, secondary, 0), DEFINE_PROP_END_OF_LIST(), @@ -323,17 +312,13 @@ static void cmd646_ide_class_init(ObjectClass *klass, void *data) PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); dc->reset = cmd646_reset; -dc->vmsd = &vmstate_ide_pci; k->realize = pci_cmd646_ide_realize; -k->exit = pci_cmd646_ide_exitfn; k->vendor_id = PCI_VENDOR_ID_CMD; k->device_id = PCI_DEVICE_ID_CMD_646; k->revision = 0x07; -k->class_id = PCI_CLASS_STORAGE_IDE; k->config_read = cmd646_pci_config_read; k->config_write = cmd646_pci_config_write; device_class_set_props(dc, cmd646_ide_properties); -set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); } static const TypeInfo cmd646_ide_info = { diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 67e0998ff0..8bea92e394 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -467,7 +467,7 @@ static int ide_pci_post_load(void *opaque, int version_id) return 0; } -const VMStateDescription vmstate_ide_pci = { +static const VMStateDescription vmstate_ide_pci = { .name = "ide", .version_id = 3, .minimum_version_id = 0, @@ -530,11 +530,34 @@ static void pci_ide_init(Object *obj) qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq)); } +static void pci_ide_exitfn(PCIDevice *dev) +{ +PCIIDEState *d = PCI_IDE(dev); +unsigned i; + +for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) { +memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io); +memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport); +} +} + +static void pci_ide_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +dc->vmsd = &vmstate_ide_pci; +k->exit = pci_ide_exitfn; +k->class_id = PCI_CLASS_STORAGE_IDE; +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); +} + static const TypeInfo pci_ide_type_info = { .name = TYPE_PCI_IDE, .parent = TYPE_PCI_DEVICE, .instance_size = sizeof(PCIIDEState), .instance_init = pci_ide_init, +.class_init = pci_ide_class_init, .abstract = true, .interfaces = (InterfaceInfo[]) { { INTERFACE_CONVENTIONAL_PCI_DEVICE }, diff --git a/hw/ide/piix.c b/hw/ide/piix.c index a32f7ccece..4e6ca99123 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -159,8 +159,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) bmdma_setup_bar(d); pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar); -vmstate_register(VMSTATE_IF(dev), 0, &vmstate_ide_pci, d); - Presumably this still survives migration between a pre-series and post-series QEMU using the PIIX IDE controller? for (unsigned i = 0; i < 2; i++) { if (!pci_piix_init_bus(d, i, errp)) { return; @@ -168,17 +166,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) } } -static void pci_piix_ide_exitfn(PCIDevice *dev) -{ -PCIIDEState *d = PCI_IDE(dev); -unsigned i; - -for (i = 0; i < 2; ++i) { -memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io); -memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport); -} -}
Re: [PATCH] linux-user: report ENOTTY for unknown ioctls
On 26/4/23 09:06, Thomas Weißschuh wrote: The correct error number for unknown ioctls is ENOTTY. ENOSYS would mean that the ioctl() syscall itself is not implemented, which is very improbable and unexpected for userspace. ENOTTY means "Inappropriate ioctl for device". This is what the kernel returns on unknown ioctls, what qemu is trying to express and what userspace is prepared to handle. Signed-off-by: Thomas Weißschuh --- linux-user/syscall.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PULL 00/17] QAPI patches patches for 2023-04-26
On 4/26/23 06:57, Markus Armbruster wrote: The following changes since commit 327ec8d6c2a2223b78d311153a471036e474c5c5: Merge tag 'pull-tcg-20230423' ofhttps://gitlab.com/rth7680/qemu into staging (2023-04-23 11:20:37 +0100) are available in the Git repository at: https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2023-04-26 for you to fetch changes up to a17dbc4b79a28ffb9511f192474ffefd88214cde: qapi: allow unions to contain further unions (2023-04-26 07:52:45 +0200) QAPI patches patches for 2023-04-26 Daniel P. Berrangé (2): qapi: support updating expected test output via make qapi: allow unions to contain further unions Markus Armbruster (15): qapi: Fix error message format regression qapi/schema: Use super() qapi: Clean up after removal of simple unions qapi: Split up check_type() qapi: Improve error message for unexpected array types qapi: Simplify code a bit after previous commits qapi: Fix error message when type name or array is expected qapi: Fix to reject 'data': 'mumble' in struct tests/qapi-schema: Improve union discriminator coverage tests/qapi-schema: Rename a few conditionals tests/qapi-schema: Clean up positive test for conditionals tests/qapi-schema: Cover optional conditional struct member qapi: Fix code generated for optional conditional struct member qapi: Require boxed for conditional command and event arguments qapi: Improve specificity of type/member descriptions Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate. r~
Re: [PULL 00/25] Block layer patches
On 4/25/23 14:13, Kevin Wolf wrote: The following changes since commit ac5f7bf8e208cd7893dbb1a9520559e569a4677c: Merge tag 'migration-20230424-pull-request' ofhttps://gitlab.com/juan.quintela/qemu into staging (2023-04-24 15:00:39 +0100) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to 8c1e8fb2e7fc2cbeb57703e143965a4cd3ad301a: block/monitor: Fix crash when executing HMP commit (2023-04-25 15:11:57 +0200) Block layer patches - Protect BlockBackend.queued_requests with its own lock - Switch to AIO_WAIT_WHILE_UNLOCKED() where possible - AioContext removal: LinuxAioState/LuringState/ThreadPool - Add more coroutine_fn annotations, use bdrv/blk_co_* - Fix crash when execute hmp_commit Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate. r~
Re: [PATCH 06/13] hw/ide: Extract bmdma_init_ops()
On 22/04/2023 16:07, Bernhard Beschow wrote: There are three private copies of bmdma_setup_bar() with small adaptions. Consolidate them into one public implementation. While at it rename the function to bmdma_init_ops() to reflect that the memory regions being initialized represent BMDMA operations. The actual mapping as a PCI BAR is still performed separately in each device. Note that the bmdma_bar attribute will be renamed in a separate commit. Signed-off-by: Bernhard Beschow --- include/hw/ide/pci.h | 1 + hw/ide/cmd646.c | 20 +--- hw/ide/pci.c | 16 hw/ide/piix.c| 19 +-- hw/ide/via.c | 19 +-- 5 files changed, 20 insertions(+), 55 deletions(-) diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h index 7bc4e53d02..597c77c7ad 100644 --- a/include/hw/ide/pci.h +++ b/include/hw/ide/pci.h @@ -57,6 +57,7 @@ struct PCIIDEState { }; void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d); +void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops); void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val); extern MemoryRegionOps bmdma_addr_ioport_ops; void pci_ide_create_devs(PCIDevice *dev); diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index 9aabf80e52..6fd09fe74e 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -161,24 +161,6 @@ static const MemoryRegionOps cmd646_bmdma_ops = { .write = bmdma_write, }; -static void bmdma_setup_bar(PCIIDEState *d) -{ -BMDMAState *bm; -int i; - -memory_region_init(&d->bmdma_bar, OBJECT(d), "cmd646-bmdma", 16); -for(i = 0;i < 2; i++) { -bm = &d->bmdma[i]; -memory_region_init_io(&bm->extra_io, OBJECT(d), &cmd646_bmdma_ops, bm, - "cmd646-bmdma-bus", 4); -memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io); -memory_region_init_io(&bm->addr_ioport, OBJECT(d), - &bmdma_addr_ioport_ops, bm, - "cmd646-bmdma-ioport", 4); -memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, &bm->addr_ioport); -} -} - static void cmd646_update_irq(PCIDevice *pd) { int pci_level; @@ -285,7 +267,7 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error **errp) &d->bus[1], "cmd646-cmd1", 4); pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]); -bmdma_setup_bar(d); +bmdma_init_ops(d, &cmd646_bmdma_ops); pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar); /* TODO: RST# value should be 0 */ diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 8bea92e394..65ed6f7f72 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -523,6 +523,22 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d) bm->pci_dev = d; } +void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops) +{ +size_t i; + +memory_region_init(&d->bmdma_bar, OBJECT(d), "bmdma-container", 16); +for (i = 0; i < ARRAY_SIZE(d->bmdma); i++) { +BMDMAState *bm = &d->bmdma[i]; + +memory_region_init_io(&bm->extra_io, OBJECT(d), bmdma_ops, bm, "bmdma-ops", 4); +memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io); +memory_region_init_io(&bm->addr_ioport, OBJECT(d), &bmdma_addr_ioport_ops, bm, + "bmdma-ioport-ops", 4); +memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, &bm->addr_ioport); +} +} + static void pci_ide_init(Object *obj) { PCIIDEState *d = PCI_IDE(obj); diff --git a/hw/ide/piix.c b/hw/ide/piix.c index 4e6ca99123..5611473d37 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -86,23 +86,6 @@ static const MemoryRegionOps piix_bmdma_ops = { .write = bmdma_write, }; -static void bmdma_setup_bar(PCIIDEState *d) -{ -int i; - -memory_region_init(&d->bmdma_bar, OBJECT(d), "piix-bmdma-container", 16); -for(i = 0;i < 2; i++) { -BMDMAState *bm = &d->bmdma[i]; - -memory_region_init_io(&bm->extra_io, OBJECT(d), &piix_bmdma_ops, bm, - "piix-bmdma", 4); -memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io); -memory_region_init_io(&bm->addr_ioport, OBJECT(d), - &bmdma_addr_ioport_ops, bm, "bmdma", 4); -memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, &bm->addr_ioport); -} -} - static void piix_ide_reset(DeviceState *dev) { PCIIDEState *d = PCI_IDE(dev); @@ -156,7 +139,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode -bmdma_setup_bar(d); +bmdma_init_ops(d, &piix_bmdma_ops); pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar); for (unsigned i = 0; i < 2; i++) { diff --git a/hw/ide/via.c b/hw/ide/via.c index 287143a005..40704e2857 100644 --- a/hw
Re: [PATCH 07/13] hw/ide: Extract pci_ide_{cmd, data}_le_ops initialization into base class constructor
On 22/04/2023 16:07, Bernhard Beschow wrote: There is redundant code in cmd646 and via which can be extracted into the base class. In case of piix and sii3112 this is currently unneccessary but shouldn't interfere since the memory regions aren't mapped by those devices. In few commits later this will be changed, i.e. those device models will also make use of these memory regions. Signed-off-by: Bernhard Beschow --- hw/ide/cmd646.c | 11 --- hw/ide/pci.c| 10 ++ hw/ide/via.c| 11 --- 3 files changed, 10 insertions(+), 22 deletions(-) diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index 6fd09fe74e..85716aaf17 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -251,20 +251,9 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error **errp) dev->wmask[MRDMODE] = 0x0; dev->w1cmask[MRDMODE] = MRDMODE_INTR_CH0 | MRDMODE_INTR_CH1; -memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops, - &d->bus[0], "cmd646-data0", 8); pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[0]); - -memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops, - &d->bus[0], "cmd646-cmd0", 4); pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[0]); - -memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops, - &d->bus[1], "cmd646-data1", 8); pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[1]); - -memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops, - &d->bus[1], "cmd646-cmd1", 4); pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]); bmdma_init_ops(d, &cmd646_bmdma_ops); diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 65ed6f7f72..a9194313bd 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -543,6 +543,16 @@ static void pci_ide_init(Object *obj) { PCIIDEState *d = PCI_IDE(obj); +memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops, + &d->bus[0], "pci-ide0-data-ops", 8); +memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops, + &d->bus[0], "pci-ide0-cmd-ops", 4); + +memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops, + &d->bus[1], "pci-ide1-data-ops", 8); +memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops, + &d->bus[1], "pci-ide1-cmd-ops", 4); + qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq)); } diff --git a/hw/ide/via.c b/hw/ide/via.c index 40704e2857..704a8024cb 100644 --- a/hw/ide/via.c +++ b/hw/ide/via.c @@ -154,20 +154,9 @@ static void via_ide_realize(PCIDevice *dev, Error **errp) dev->wmask[PCI_INTERRUPT_LINE] = 0; dev->wmask[PCI_CLASS_PROG] = 5; -memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops, - &d->bus[0], "via-ide0-data", 8); pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[0]); - -memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops, - &d->bus[0], "via-ide0-cmd", 4); pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[0]); - -memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops, - &d->bus[1], "via-ide1-data", 8); pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[1]); - -memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops, - &d->bus[1], "via-ide1-cmd", 4); pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]); bmdma_init_ops(d, &via_bmdma_ops); I'd also be inclined to agree with Phil/Zoltan re: dropping the trailing "-ops" in the name, otherwise: Reviewed-by: Mark Cave-Ayland ATB, Mark.
Re: [PATCH 08/13] hw/ide: Rename PCIIDEState::*_bar attributes
On 22/04/2023 16:07, Bernhard Beschow wrote: The attributes represent memory regions containing operations which are mapped by the device models into PCI BARs. Reflect this by changing the suffic into "_ops". Note that in a few commits piix will also use the {cmd,data}_ops but won't map them into BARs. This further suggests that the "_bar" suffix doesn't match very well. Signed-off-by: Bernhard Beschow --- include/hw/ide/pci.h | 6 +++--- hw/ide/cmd646.c | 10 +- hw/ide/pci.c | 18 +- hw/ide/piix.c| 2 +- hw/ide/via.c | 10 +- 5 files changed, 23 insertions(+), 23 deletions(-) diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h index 597c77c7ad..5025df5b82 100644 --- a/include/hw/ide/pci.h +++ b/include/hw/ide/pci.h @@ -51,9 +51,9 @@ struct PCIIDEState { BMDMAState bmdma[2]; qemu_irq isa_irq[2]; uint32_t secondary; /* used only for cmd646 */ -MemoryRegion bmdma_bar; -MemoryRegion cmd_bar[2]; -MemoryRegion data_bar[2]; +MemoryRegion bmdma_ops; +MemoryRegion cmd_ops[2]; +MemoryRegion data_ops[2]; }; void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d); diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index 85716aaf17..b9d005a357 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -251,13 +251,13 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error **errp) dev->wmask[MRDMODE] = 0x0; dev->w1cmask[MRDMODE] = MRDMODE_INTR_CH0 | MRDMODE_INTR_CH1; -pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[0]); -pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[0]); -pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[1]); -pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]); +pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_ops[0]); +pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_ops[0]); +pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_ops[1]); +pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_ops[1]); bmdma_init_ops(d, &cmd646_bmdma_ops); -pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar); +pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_ops); /* TODO: RST# value should be 0 */ pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1 diff --git a/hw/ide/pci.c b/hw/ide/pci.c index a9194313bd..b2fcc00a64 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -527,15 +527,15 @@ void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops) { size_t i; -memory_region_init(&d->bmdma_bar, OBJECT(d), "bmdma-container", 16); +memory_region_init(&d->bmdma_ops, OBJECT(d), "bmdma-container", 16); for (i = 0; i < ARRAY_SIZE(d->bmdma); i++) { BMDMAState *bm = &d->bmdma[i]; memory_region_init_io(&bm->extra_io, OBJECT(d), bmdma_ops, bm, "bmdma-ops", 4); -memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io); +memory_region_add_subregion(&d->bmdma_ops, i * 8, &bm->extra_io); memory_region_init_io(&bm->addr_ioport, OBJECT(d), &bmdma_addr_ioport_ops, bm, "bmdma-ioport-ops", 4); -memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, &bm->addr_ioport); +memory_region_add_subregion(&d->bmdma_ops, i * 8 + 4, &bm->addr_ioport); } } @@ -543,14 +543,14 @@ static void pci_ide_init(Object *obj) { PCIIDEState *d = PCI_IDE(obj); -memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops, +memory_region_init_io(&d->data_ops[0], OBJECT(d), &pci_ide_data_le_ops, &d->bus[0], "pci-ide0-data-ops", 8); -memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops, +memory_region_init_io(&d->cmd_ops[0], OBJECT(d), &pci_ide_cmd_le_ops, &d->bus[0], "pci-ide0-cmd-ops", 4); -memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops, +memory_region_init_io(&d->data_ops[1], OBJECT(d), &pci_ide_data_le_ops, &d->bus[1], "pci-ide1-data-ops", 8); -memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops, +memory_region_init_io(&d->cmd_ops[1], OBJECT(d), &pci_ide_cmd_le_ops, &d->bus[1], "pci-ide1-cmd-ops", 4); qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq)); @@ -562,8 +562,8 @@ static void pci_ide_exitfn(PCIDevice *dev) unsigned i; for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) { -memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io); -memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport); +memory_region_del_subregion(&d->bmdma_ops, &d->bmdma[i].extra_io); +memory_region_del_subregion(&d->bmdma_ops, &d->bmdma[i].addr_ioport); } } diff --git a/hw/i
Re: [PATCH v3 18/57] tcg/loongarch64: Rationalize args to tcg_out_qemu_{ld, st}
On 24/4/23 07:40, Richard Henderson wrote: Interpret the variable argument placement in the caller. Shift some code around slightly to share more between softmmu and user-only. Signed-off-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 100 +-- 1 file changed, 42 insertions(+), 58 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()
On 26/04/23 4:35 pm, Juan Quintela wrote: "manish.mishra" wrote: On 26/04/23 3:58 pm, Juan Quintela wrote: "manish.mishra" wrote: multifd_send_sync_main() posts request on the multifd channel but does not call sem_wait() on channels_ready semaphore, making the channels_ready semaphore count keep increasing. As a result, sem_wait() on channels_ready in multifd_send_pages() is always non-blocking hence multifd_send_pages() keeps searching for a free channel in a busy loop until a channel is freed. Signed-off-by: manish.mishra --- migration/multifd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index cce3ad6988..43d26e7012 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f) trace_multifd_send_sync_main_signal(p->id); +qemu_sem_wait(&multifd_send_state->channels_ready); qemu_mutex_lock(&p->mutex); if (p->quit) { We need this, but I think it is better to put it on the second loop. @@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp) multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); multifd_send_state->params = g_new0(MultiFDSendParams, thread_count); multifd_send_state->pages = multifd_pages_init(page_count); -qemu_sem_init(&multifd_send_state->channels_ready, 0); +qemu_sem_init(&multifd_send_state->channels_ready, thread_count); qatomic_set(&multifd_send_state->exiting, 0); multifd_send_state->ops = multifd_ops[migrate_multifd_compression()]; I think this bit is wrong. We should not set the channels ready until the thread is ready and channel is created. What do you think about this patch: From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001 From: Juan Quintela Date: Wed, 26 Apr 2023 12:20:36 +0200 Subject: [PATCH] multifd: Fix the number of channels ready We don't wait in the sem when we are doing a sync_main. Make it wait there. To make things clearer, we mark the channel ready at the begining of the thread loop. This causes a busy loop in multifd_send_page(). Found-by: manish.mishra Signed-off-by: Juan Quintela --- migration/multifd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index 903df2117b..e625e8725e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; +qemu_sem_wait(&multifd_send_state->channels_ready); trace_multifd_send_sync_main_wait(p->id); qemu_sem_wait(&p->sem_sync); @@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque) p->num_packets = 1; while (true) { +qemu_sem_post(&multifd_send_state->channels_ready); This has one issue though, if we mark channel_ready here itself, channel is actually not ready so we can still busy loop? Before: while (true) { sem_post(channels_ready) } And you want to add to the initialization a counter equal to the number of channels. Now: while (true) { sem_post(channels_ready) } It is semantically the same, but when we setup it ready it means that when we set it to 1, we now that the channel and thread are ready for action. May be we can do one thing let the sem_post in while loop at same position itself. But we can do another post just before start I can see how this can make any difference. of this while loop, as that will be called only once it should do work of initialising count equal to multiFD channels? Yeap. But I can see what difference do we have here. Later, Juan. Thanks Juan, Just confirming if i misunderstood something :) I meant your approach makes sense, i was just suggesting a small change. To do something like this. qemu_sem_init(&multifd_send_state->channels_ready, 0); static void *multifd_send_thread(void *opaque) { ... sem_post(channels_ready); // Post once at start of thread and let one in loop as it is. while (true) { sem_post(channels_ready) } } Something like below has issue that we are marking channel_ready even before channel is actually ready, i meant if network is slow it may take some time to update pending_job and hence we can busy loop in send_multifd_page(). static void *multifd_send_thread(void *opaque) { ... while (true) { sem_post(channels_ready); } } Not sure if we are already in agreement :) just confirming. Thanks Manish Mishra
Re: [PATCH 09/13] hw/ide/piix: Disuse isa_get_irq()
On 22/04/2023 16:07, Bernhard Beschow wrote: isa_get_irq() asks for an ISADevice which piix-ide doesn't provide. Passing a NULL pointer works but causes the isabus global to be used then. By fishing out TYPE_ISA_BUS from the QOM tree it is possible to achieve the same as using isa_get_irq(). This is an alternative solution to commit 9405d87be25d 'hw/ide: Fix crash when plugging a piix3-ide device into the x-remote machine' which allows for cleaning up the ISA API while keeping PIIX IDE functions user-createable. Signed-off-by: Bernhard Beschow --- hw/ide/piix.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/hw/ide/piix.c b/hw/ide/piix.c index 6942b484f9..a3a15dc7db 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -104,7 +104,8 @@ static void piix_ide_reset(DeviceState *dev) pci_set_byte(pci_conf + 0x20, 0x01); /* BMIBA: 20-23h */ } -static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp) +static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus, + Error **errp) { static const struct { int iobase; @@ -124,7 +125,8 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp) object_get_typename(OBJECT(d)), i); return false; } -ide_bus_init_output_irq(&d->bus[i], isa_get_irq(NULL, port_info[i].isairq)); +ide_bus_init_output_irq(&d->bus[i], +isa_bus_get_irq(isa_bus, port_info[i].isairq)); I don't think is the right solution here, since ultimately we want to move the IRQ routing out of the device itself and into the PCI-ISA bridge. I'd go for the same solution as you've done for VIA IDE in patch 2, i.e. update the PIIX interrupt handler to set the legacy_irqs in PCIIDEState, and then wire them up to the ISA IRQs 14 and 15 similar to as Phil as done in his patches: https://patchew.org/QEMU/20230302224058.43315-1-phi...@linaro.org/20230302224058.43315-4-phi...@linaro.org/ https://patchew.org/QEMU/20230302224058.43315-1-phi...@linaro.org/20230302224058.43315-5-phi...@linaro.org/ This also reminds me, given that the first patch above is doing wiring in pc_init1() then we are still missing part of your tidy-up series :/ bmdma_init(&d->bus[i], &d->bmdma[i], d); ide_bus_register_restart_cb(&d->bus[i]); @@ -136,14 +138,29 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) { PCIIDEState *d = PCI_IDE(dev); uint8_t *pci_conf = dev->config; +ISABus *isa_bus; +bool ambiguous; pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode bmdma_init_ops(d, &piix_bmdma_ops); pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_ops); +isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS, &ambiguous)); +if (ambiguous) { +error_setg(errp, + "More than one ISA bus found while %s supports only one", + object_get_typename(OBJECT(d))); +return; +} +if (!isa_bus) { +error_setg(errp, "No ISA bus found while %s requires one", + object_get_typename(OBJECT(d))); +return; +} Again I think this should go away with using PCIIDEState's legacy_irqs, since you simply let the board wire them up to the ISABus (or not) as required. for (unsigned i = 0; i < 2; i++) { -if (!pci_piix_init_bus(d, i, errp)) { +if (!pci_piix_init_bus(d, i, isa_bus, errp)) { return; } } ATB, Mark.
Re: [PATCH 10/13] hw/ide/piix: Reuse PCIIDEState::{cmd,data}_ops
On 22/04/2023 16:07, Bernhard Beschow wrote: Now that PCIIDEState::{cmd,data}_ops are initialized in the base class constructor there is an opportunity for PIIX to reuse these attributes. This resolves usage of ide_init_ioport() which would fall back internally to using the isabus global due to NULL being passed as ISADevice by PIIX. Signed-off-by: Bernhard Beschow --- hw/ide/piix.c | 30 +- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/hw/ide/piix.c b/hw/ide/piix.c index a3a15dc7db..406a67fa0f 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -104,34 +104,32 @@ static void piix_ide_reset(DeviceState *dev) pci_set_byte(pci_conf + 0x20, 0x01); /* BMIBA: 20-23h */ } -static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus, - Error **errp) +static void pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus) { static const struct { int iobase; int iobase2; int isairq; } port_info[] = { -{0x1f0, 0x3f6, 14}, -{0x170, 0x376, 15}, +{0x1f0, 0x3f4, 14}, +{0x170, 0x374, 15}, }; -int ret; +MemoryRegion *address_space_io = pci_address_space_io(PCI_DEVICE(d)); ide_bus_init(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2); -ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase, - port_info[i].iobase2); -if (ret) { -error_setg_errno(errp, -ret, "Failed to realize %s port %u", - object_get_typename(OBJECT(d)), i); -return false; -} +memory_region_add_subregion(address_space_io, port_info[i].iobase, +&d->data_ops[i]); +/* + * PIIX forwards the last byte of cmd_ops to ISA. Model this using a low + * prio so competing memory regions take precedence. + */ +memory_region_add_subregion_overlap(address_space_io, port_info[i].iobase2, +&d->cmd_ops[i], -1); Interesting. Is this behaviour documented somewhere and/or used in one of your test images at all? If I'd have seen this myself, I probably thought that the addresses were a typo... ide_bus_init_output_irq(&d->bus[i], isa_bus_get_irq(isa_bus, port_info[i].isairq)); bmdma_init(&d->bus[i], &d->bmdma[i], d); ide_bus_register_restart_cb(&d->bus[i]); - -return true; } static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) @@ -160,9 +158,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) } for (unsigned i = 0; i < 2; i++) { -if (!pci_piix_init_bus(d, i, isa_bus, errp)) { -return; -} +pci_piix_init_bus(d, i, isa_bus); } } ATB, Mark.
Re: [PATCH v2] tests/unit/test-blockjob: Re-enable complete_in_standby test
On 26.04.23 13:40, Emanuele Giuseppe Esposito wrote: Pause the job while draining so that pause_count will be increased and bdrv_drain_all_end() won't reset it to 0, so the job will not continue. With this fix, the test is not flaky anymore. Additionally remove useless aiocontext lock around bdrv_drain_all_end() in test_complete_in_standby(). Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable complete_in_standby test" Suggested-by: Hanna Czenczek Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH 11/13] hw/ide/sii3112: Reuse PCIIDEState::{cmd,data}_ops
On 22/04/2023 16:07, Bernhard Beschow wrote: Allows to unexport pci_ide_{cmd,data}_le_ops and models TYPE_SII3112_PCI as a standard-compliant PCI IDE device. Signed-off-by: Bernhard Beschow --- include/hw/ide/pci.h | 2 -- hw/ide/pci.c | 4 ++-- hw/ide/sii3112.c | 50 3 files changed, 20 insertions(+), 36 deletions(-) diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h index 5025df5b82..dbb4b13161 100644 --- a/include/hw/ide/pci.h +++ b/include/hw/ide/pci.h @@ -62,6 +62,4 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val); extern MemoryRegionOps bmdma_addr_ioport_ops; void pci_ide_create_devs(PCIDevice *dev); -extern const MemoryRegionOps pci_ide_cmd_le_ops; -extern const MemoryRegionOps pci_ide_data_le_ops; #endif diff --git a/hw/ide/pci.c b/hw/ide/pci.c index b2fcc00a64..97ccc75aa6 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -60,7 +60,7 @@ static void pci_ide_ctrl_write(void *opaque, hwaddr addr, ide_ctrl_write(bus, addr + 2, data); } -const MemoryRegionOps pci_ide_cmd_le_ops = { +static const MemoryRegionOps pci_ide_cmd_le_ops = { .read = pci_ide_status_read, .write = pci_ide_ctrl_write, .endianness = DEVICE_LITTLE_ENDIAN, @@ -98,7 +98,7 @@ static void pci_ide_data_write(void *opaque, hwaddr addr, } } -const MemoryRegionOps pci_ide_data_le_ops = { +static const MemoryRegionOps pci_ide_data_le_ops = { .read = pci_ide_data_read, .write = pci_ide_data_write, .endianness = DEVICE_LITTLE_ENDIAN, diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c index 0af897a9ef..9cf920369f 100644 --- a/hw/ide/sii3112.c +++ b/hw/ide/sii3112.c @@ -88,21 +88,9 @@ static uint64_t sii3112_reg_read(void *opaque, hwaddr addr, val |= (d->regs[1].confstat & (1UL << 11) ? (1 << 4) : 0); val |= (uint32_t)d->i.bmdma[1].status << 16; break; -case 0x80 ... 0x87: -val = pci_ide_data_le_ops.read(&d->i.bus[0], addr - 0x80, size); -break; -case 0x8a: -val = pci_ide_cmd_le_ops.read(&d->i.bus[0], 2, size); -break; case 0xa0: val = d->regs[0].confstat; break; -case 0xc0 ... 0xc7: -val = pci_ide_data_le_ops.read(&d->i.bus[1], addr - 0xc0, size); -break; -case 0xca: -val = pci_ide_cmd_le_ops.read(&d->i.bus[1], 2, size); -break; case 0xe0: val = d->regs[1].confstat; break; @@ -171,18 +159,6 @@ static void sii3112_reg_write(void *opaque, hwaddr addr, case 0x0c ... 0x0f: bmdma_addr_ioport_ops.write(&d->i.bmdma[1], addr - 12, val, size); break; -case 0x80 ... 0x87: -pci_ide_data_le_ops.write(&d->i.bus[0], addr - 0x80, val, size); -break; -case 0x8a: -pci_ide_cmd_le_ops.write(&d->i.bus[0], 2, val, size); -break; -case 0xc0 ... 0xc7: -pci_ide_data_le_ops.write(&d->i.bus[1], addr - 0xc0, val, size); -break; -case 0xca: -pci_ide_cmd_le_ops.write(&d->i.bus[1], 2, val, size); -break; case 0x100: d->regs[0].scontrol = val & 0xfff; if (val & 1) { @@ -259,6 +235,11 @@ static void sii3112_pci_realize(PCIDevice *dev, Error **errp) pci_config_set_interrupt_pin(dev->config, 1); pci_set_byte(dev->config + PCI_CACHE_LINE_SIZE, 8); +pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &s->data_ops[0]); +pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &s->cmd_ops[0]); +pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &s->data_ops[1]); +pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &s->cmd_ops[1]); + /* BAR5 is in PCI memory space */ memory_region_init_io(&d->mmio, OBJECT(d), &sii3112_reg_ops, d, "sii3112.bar5", 0x200); @@ -266,17 +247,22 @@ static void sii3112_pci_realize(PCIDevice *dev, Error **errp) /* BAR0-BAR4 are PCI I/O space aliases into BAR5 */ mr = g_new(MemoryRegion, 1); -memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", &d->mmio, 0x80, 8); -pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, mr); +memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", &s->data_ops[0], 0, + memory_region_size(&s->data_ops[0])); +memory_region_add_subregion_overlap(&d->mmio, 0x80, mr, 1); mr = g_new(MemoryRegion, 1); -memory_region_init_alias(mr, OBJECT(d), "sii3112.bar1", &d->mmio, 0x88, 4); -pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, mr); +memory_region_init_alias(mr, OBJECT(d), "sii3112.bar1", &s->cmd_ops[0], 0, + memory_region_size(&s->cmd_ops[0])); +memory_region_add_subregion_overlap(&d->mmio, 0x88, mr, 1); mr = g_new(MemoryRegion, 1); -memory_region_init_alias(mr, OBJECT(d), "sii3112.bar2", &d->mmio, 0xc0, 8); -pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, mr); +memory_region_init_alias(mr, OBJECT(d),
[PATCH] hw/remote: Fix vfu_cfg trace offset format
The printed offset value is prefixed with 0x, but was actually printed in decimal. To spare others the confusion, adjust the format specifier to hexadecimal. Signed-off-by: Mattias Nissler --- hw/remote/trace-events | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/remote/trace-events b/hw/remote/trace-events index c167b3c7a5..0d1b7d56a5 100644 --- a/hw/remote/trace-events +++ b/hw/remote/trace-events @@ -5,8 +5,8 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, # vfio-user-obj.c vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s" -vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x" -vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x" +vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x -> 0x%x" +vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x" vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes" vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64"" vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64"" -- 2.25.1
Test-qga:Qga write is blocked when the client is no connected, virtio-serial port is throttled.
Hi, I am testing when qga push msg to libvirt, when the vm saved and restore from the memory file, the qga will be hunged in writing. In flush_buf:(gdb) f 4#4 0x55ffc716894a in flush_buf (port=0x55ffc87bf800, buf=, len=) at ../hw/char/virtio-console.c:100100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,(gdb) l95 * use throttling on host side.96 */97 if (!k->is_console) {98 virtio_serial_throttle_port(port, true);99 if (!vcon->watch) {100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,101 G_IO_OUT|G_IO_HUP,102 chr_write_unblocked, vcon);103 }104 } vcon->watch is 0, but the port is throttled, so it will not process the fe event, until restart qga or restart libvirtd.This issue is happend when the vm is migrated in or restore from a memory file saved.So I think it can be fixed by following: 100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,(gdb) l95 * use throttling on host side.96 */97 if (!k->is_console) {98 virtio_serial_throttle_port(port, true);99 if (!vcon->watch) {100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,101 G_IO_OUT|G_IO_HUP,102 chr_write_unblocked, vcon);+ if (!vcon->watch)+virtio_serial_throttle_port(port, false);103 }104 }I have test the code, and it works.
Re: [PATCH 00/21] Hexagon (target/hexagon) short-circuit and move to DisasContext
On 4/26/23 02:39, Taylor Simpson wrote: This patch series achieves two major goals Goal 1: Short-circuit packet semantics In certain cases, we can avoid the overhead of writing to hex_new_value and write directly to hex_gpr. Here's a simple example of the TCG generated for 0x004000b4: 0x7800c020 { R0 = #0x1 } BEFORE: 004000b4 movi_i32 new_r0,$0x1 mov_i32 r0,new_r0 AFTER: 004000b4 movi_i32 r0,$0x1 Goal 2: Move bookkeeping items from CPUHexagonState to DisasContext Suggested-by: Richard Henderson Several fields in CPUHexagonState are only used for bookkeeping within the translation of a packet. With recent changes to eliminate the need to free TCGv variables, these make more sense to be transient and kept in DisasContext. This patch series can be divided into 3 main parts Part 1: Patches 1-9 Cleanup in preparation for parts 2 and 3 The main goal is to move functionality out of generated helpers Part 2: Patches 10-15 Short-circuit packet semantics Part 3: Patches 16-21 Move bookkeeping items from CPUHexagonState to DisasContext Taylor Simpson (21): meson.build Add CONFIG_HEXAGON_IDEF_PARSER Hexagon (target/hexagon) Add DisasContext arg to gen_log_reg_write Hexagon (target/hexagon) Add overrides for loop setup instructions Hexagon (target/hexagon) Add overrides for allocframe/deallocframe Hexagon (target/hexagon) Add overrides for clr[tf]new Hexagon (target/hexagon) Remove log_reg_write from op_helper.[ch] Hexagon (target/hexagon) Eliminate uses of log_pred_write function Hexagon (target/hexagon) Clean up pred_written usage Hexagon (target/hexagon) Don't overlap dest writes with source reads Hexagon (target/hexagon) Mark registers as read during packet analysis Hexagon (target/hexagon) Short-circuit packet register writes Hexagon (target/hexagon) Short-circuit packet predicate writes Hexagon (target/hexagon) Short-circuit packet HVX writes Hexagon (target/hexagon) Short-circuit more HVX single instruction packets Hexagon (target/hexagon) Add overrides for disabled idef-parser insns Hexagon (target/hexagon) Make special new_value for USR Hexagon (target/hexagon) Move new_value to DisasContext Hexagon (target/hexagon) Move new_pred_value to DisasContext Hexagon (target/hexagon) Move pred_written to DisasContext Hexagon (target/hexagon) Move pkt_has_store_s1 to DisasContext Hexagon (target/hexagon) Move items to DisasContext meson.build | 1 + target/hexagon/cpu.h| 10 +- target/hexagon/gen_tcg.h| 118 ++- target/hexagon/gen_tcg_hvx.h| 23 ++ target/hexagon/genptr.h | 6 +- target/hexagon/helper.h | 6 +- target/hexagon/macros.h | 57 ++-- target/hexagon/op_helper.h | 16 +- target/hexagon/translate.h | 52 ++- target/hexagon/attribs_def.h.inc| 6 +- target/hexagon/arch.c | 3 +- target/hexagon/cpu.c| 5 +- target/hexagon/genptr.c | 347 target/hexagon/idef-parser/parser-helpers.c | 4 +- target/hexagon/op_helper.c | 154 ++--- target/hexagon/translate.c | 274 +++- tests/tcg/hexagon/hvx_misc.c| 21 ++ tests/tcg/hexagon/read_write_overlap.c | 136 target/hexagon/README | 6 +- target/hexagon/gen_analyze_funcs.py | 51 ++- target/hexagon/gen_helper_funcs.py | 9 +- target/hexagon/gen_helper_protos.py | 10 +- target/hexagon/gen_idef_parser_funcs.py | 7 + target/hexagon/gen_tcg_funcs.py | 21 +- target/hexagon/hex_common.py| 16 +- tests/tcg/hexagon/Makefile.target | 1 + 26 files changed, 1066 insertions(+), 294 deletions(-) create mode 100644 tests/tcg/hexagon/read_write_overlap.c For some reason this patchset didn't arrive in one piece, see patchew: https://patchew.org/QEMU/20230426003945.1318446-1-tsimp...@quicinc.com/ -- Anton Johansson, rev.ng Labs Srl.