[PATCH] linux-user: report ENOTTY for unknown ioctls

2023-04-26 Thread Thomas Weißschuh
The correct error number for unknown ioctls is ENOTTY.

ENOSYS would mean that the ioctl() syscall itself is not implemented,
which is very improbable and unexpected for userspace.

ENOTTY means "Inappropriate ioctl for device". This is what the kernel
returns on unknown ioctls, what qemu is trying to express and what
userspace is prepared to handle.

Signed-off-by: Thomas Weißschuh 
---
 linux-user/syscall.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 69f740ff98c8..c5955313a063 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5747,7 +5747,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg)
 if (ie->target_cmd == 0) {
 qemu_log_mask(
 LOG_UNIMP, "Unsupported ioctl: cmd=0x%04lx\n", (long)cmd);
-return -TARGET_ENOSYS;
+return -TARGET_ENOTTY;
 }
 if (ie->target_cmd == cmd)
 break;
@@ -5759,7 +5759,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg)
 } else if (!ie->host_cmd) {
 /* Some architectures define BSD ioctls in their headers
that are not implemented in Linux.  */
-return -TARGET_ENOSYS;
+return -TARGET_ENOTTY;
 }
 
 switch(arg_type[0]) {
@@ -5817,7 +5817,7 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg)
 qemu_log_mask(LOG_UNIMP,
   "Unsupported ioctl type: cmd=0x%04lx type=%d\n",
   (long)cmd, arg_type[0]);
-ret = -TARGET_ENOSYS;
+ret = -TARGET_ENOTTY;
 break;
 }
 return ret;

base-commit: a14b8206c5edcbbad1c71256ea9b44c3b382a9f5
-- 
2.40.1




Re: [PATCH v2 1/3] hw/i2c: add mctp core

2023-04-26 Thread Klaus Jensen
On Apr 25 10:19, Corey Minyard wrote:
> On Tue, Apr 25, 2023 at 08:35:38AM +0200, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > Add an abstract MCTP over I2C endpoint model. This implements MCTP
> > control message handling as well as handling the actual I2C transport
> > (packetization).
> > 
> > Devices are intended to derive from this and implement the class
> > methods.
> > 
> > Parts of this implementation is inspired by code[1] previously posted by
> > Jonathan Cameron.
> 
> All in all this looks good.  Two comments:
> 
> I would like to see the buffer handling consolidated into one function
> and the length checked, even for (especially for) the outside users of
> this code, like the nvme code.  Best to avoid future issues with buffer
> overruns.  This will require reworking the get_message_types function,
> unfortunately.
> 

Right now the implementations (i.e. hw/nvme/nmi-i2c.c) writes directly
into the mctp core buffer for get_message_bytes(). The contract is that
it must not write more than the `maxlen` parameter. Is that bad? Would
it be better that get_message_bytes() returned a pointer to its own
buffer that hw/mctp can then copy from?

> You have one trace function on a bad receive message check, but lots of
> other bad receive message checks with no trace.  Just a suggestion, but
> it might be nice for tracking down issues to trace all the reasons a
> message is dropped.
> 

Sounds reasonable! :)

Thanks for the review!


signature.asc
Description: PGP signature


Re: [PULL 61/73] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register

2023-04-26 Thread Juan Quintela
"Michael S. Tsirkin"  wrote:
> On Tue, Apr 25, 2023 at 08:42:17PM -0400, Peter Xu wrote:
>> Hi, Michael, Jonathan,
>> 
>> On Tue, Mar 07, 2023 at 08:13:53PM -0500, Michael S. Tsirkin wrote:
>> This breaks the simplest migration from QEMU 8.0->7.2 binaries on all
>> machine types I think as long as the cap is present, e.g. the default
>> e1000e provided by the default q35 machine can already hit it with all
>> default cmdline:
>> 
>>   ./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX]
>> 
>> 7.2 binary will have empty wmask for PCI_ERR_UNCOR_MASK, meanwhile I think
>> it can also see a non-zero value, then the migration will fail at:
>> 
>> vmstate_load :00:02.0/e1000e, e1000e 
>>   
>> qemu-7.2: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 
>> cmask: ff wmask: 0 w1cmask:0
>> qemu-7.2: Failed to load PCIDevice:config   
>> qemu-7.2: Failed to load e1000e:parent_obj   
>>  
>> qemu-7.2: error while loading state for instance 0x0 of device 
>> ':00:02.0/e1000e'  
>> qemu-7.2: load of migration failed: Invalid argument
>> 
>> We probably at least want to have the default value to be still zero, and
>> we'd need to make sure it'll not be modified by the guest, iiuc.
>> 
>> Below oneliner works for me and makes the migration work again:
>> 
>> ===8<===
>> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
>> index 103667c368..563a37b79c 100644
>> --- a/hw/pci/pcie_aer.c
>> +++ b/hw/pci/pcie_aer.c
>> @@ -113,7 +113,7 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, 
>> uint16_t offset,
>>  pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
>>   PCI_ERR_UNC_SUPPORTED);
>>  pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
>> - PCI_ERR_UNC_MASK_DEFAULT);
>> + 0/*PCI_ERR_UNC_MASK_DEFAULT*/);
>>  pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
>>   PCI_ERR_UNC_SUPPORTED);
>> ===8<===
>> 
>> Anyone could have a look on a solid solution from PCI side?
>> 
>> Copy Juan and Leonardo.
>> 
>> Thanks,
>
> My bad, I forgot about this 🤦.
> So we need a property and tweak it with compat machinery depending on
> machine type. Jonathan, can you work on this pls?
> Or I can revert for now to relieve the time pressure,
> redo the patch at your leasure.

I agree with Michael here, the best option is adding a new property.

Later, Juan.




patch:avoid-to-create-a-null-watch-in-flush_buf

2023-04-26 Thread wang...@chinatelecom.cn
Hi, I am testing when qga push msg to libvirt, when the vm saved and restore 
from the memory file, the qga will be hunged in writing.

In flush_buf:
(gdb) f 4
#4  0x55ffc716894a in flush_buf (port=0x55ffc87bf800, buf=, 
len=) at ../hw/char/virtio-console.c:100
100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,
(gdb) l
95   * use throttling on host side.
96   */
97  if (!k->is_console) {
98  virtio_serial_throttle_port(port, true);
99  if (!vcon->watch) {
100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,
101 G_IO_OUT|G_IO_HUP,
102 
chr_write_unblocked, vcon);
103 }
104 }
vcon->watch is 0, but the port is throttled, so it will not process the fe 
event, until restart qga or restart libvirtd.
This issue is happend when the vm is migrating or restore from a memory file 
saved.
So I think it can be fixed by following:
100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,
(gdb) l
95   * use throttling on host side.
96   */
97  if (!k->is_console) {
98  virtio_serial_throttle_port(port, true);
99  if (!vcon->watch) {
100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,
101 G_IO_OUT|G_IO_HUP,
102 
chr_write_unblocked, vcon);
+   if (!vcon->watch)
+virtio_serial_throttle_port(port, false);
103 }
104 }
I have test the code, and it works.



wang...@chinatelecom.cn


Re: [RFC PATCH v2 1/9] riscv: implement Ssqosid extension and sqoscfg CSR

2023-04-26 Thread Weiwei Li



On 2023/4/26 04:38, Drew Fustini wrote:

From: Kornel Dulęba 

Implement the sqoscfg CSR defined by the Ssqosid ISA extension
(Supervisor-mode Quality of Service ID). The CSR contains two fields:

   - Resource Control ID (RCID) used determine resource allocation
   - Monitoring Counter ID (MCID) used to track resource usage

The CSR is defined for S-mode but accessing it when V=1 shall cause a
virtual instruction exception. Implement this behavior by calling the
hmode predicate.

Link: https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf
Signed-off-by: Kornel Dulęba 
[dfustini: rebase on v8.0.50, reword commit message]
Signed-off-by: Drew Fustini 
---
Changes since v1:
- rebase on current master (v8.0.50) instead of 8.0.0-rc4

  disas/riscv.c   |  1 +
  target/riscv/cpu.c  |  2 ++
  target/riscv/cpu.h  |  3 +++
  target/riscv/cpu_bits.h |  5 +
  target/riscv/csr.c  | 34 ++
  5 files changed, 45 insertions(+)

diff --git a/disas/riscv.c b/disas/riscv.c
index d6b0fbe5e877..94336f54637b 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -2100,6 +2100,7 @@ static const char *csr_name(int csrno)
  case 0x0143: return "stval";
  case 0x0144: return "sip";
  case 0x0180: return "satp";
+case 0x0181: return "sqoscfg";
  case 0x0200: return "hstatus";
  case 0x0202: return "hedeleg";
  case 0x0203: return "hideleg";
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af27b..fb3f8c43a32d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -114,6 +114,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(smaia, true, PRIV_VERSION_1_12_0, ext_smaia),
  ISA_EXT_DATA_ENTRY(ssaia, true, PRIV_VERSION_1_12_0, ext_ssaia),
  ISA_EXT_DATA_ENTRY(sscofpmf, true, PRIV_VERSION_1_12_0, ext_sscofpmf),
+ISA_EXT_DATA_ENTRY(ssqosid, true, PRIV_VERSION_1_12_0, ext_ssqosid),
  ISA_EXT_DATA_ENTRY(sstc, true, PRIV_VERSION_1_12_0, ext_sstc),
  ISA_EXT_DATA_ENTRY(svadu, true, PRIV_VERSION_1_12_0, ext_svadu),
  ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0, ext_svinval),
@@ -1397,6 +1398,7 @@ static Property riscv_cpu_extensions[] = {
  
  DEFINE_PROP_BOOL("svadu", RISCVCPU, cfg.ext_svadu, true),
  
+DEFINE_PROP_BOOL("ssqosid", RISCVCPU, cfg.ext_ssqosid, true),

  DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
  DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a57..ffc1b5009d15 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -222,6 +222,8 @@ struct CPUArchState {
  target_ulong mcause;
  target_ulong mtval;  /* since: priv-1.10.0 */
  
+target_ulong sqoscfg;

+
  /* Machine and Supervisor interrupt priorities */
  uint8_t miprio[64];
  uint8_t siprio[64];
@@ -454,6 +456,7 @@ struct RISCVCPUConfig {
  bool ext_icboz;
  bool ext_zicond;
  bool ext_zihintpause;
+bool ext_ssqosid;
  bool ext_smstateen;
  bool ext_sstc;
  bool ext_svadu;
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index fca7ef0cef91..d11a3928735e 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -217,6 +217,7 @@
  /* Supervisor Protection and Translation */
  #define CSR_SPTBR   0x180
  #define CSR_SATP0x180
+#define CSR_SQOSCFG 0x181
  
  /* Supervisor-Level Window to Indirectly Accessed Registers (AIA) */

  #define CSR_SISELECT0x150
@@ -898,4 +899,8 @@ typedef enum RISCVException {
  #define MHPMEVENT_IDX_MASK 0xF
  #define MHPMEVENT_SSCOF_RESVD  16
  
+/* SQOSCFG BITS (QOSID) */

+#define SQOSCFG_RCID  0x0FFF
+#define SQOSCFG_MCID  0x0FFF
+
  #endif
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d522efc0b63a..5769b3545704 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -2700,6 +2700,37 @@ static RISCVException write_satp(CPURISCVState *env, int 
csrno,
  return RISCV_EXCP_NONE;
  }
  
+static RISCVException check_sqoscfg(CPURISCVState *env, int csrno)

+{
+RISCVCPU *cpu = env_archcpu(env);
+
+if (!cpu->cfg.ext_ssqosid) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+/*
+ * Even though this is an S-mode CSR the spec says that we need to throw
+ * and virt instruction fault if a guest tries to access it.
+ */
+return hmode(env, csrno);


If above comments is true, use hmode() here is not right, since it only 
check whether H extension is supported.


It need another check for guest mode access. And we should use smode() 
instead of hmode() here.



+}
+
+static RISCVException read_sqoscfg(CPURISCVState *env, int csrno,
+target_ulong *val)


'target_ulong' is better to align with 'CPURISCVState'.

Regards,

Wei

Re: test-blockjob: intermittent CI failures in msys2-64bit job

2023-04-26 Thread Emanuele Giuseppe Esposito



Am 25/04/2023 um 18:48 schrieb Hanna Czenczek:
> On 24.04.23 20:32, Vladimir Sementsov-Ogievskiy wrote:
>> On 24.04.23 16:36, Emanuele Giuseppe Esposito wrote:
>>>
>>>
>>> Am 21/04/2023 um 12:13 schrieb Vladimir Sementsov-Ogievskiy:
 On 17.03.23 15:35, Thomas Huth wrote:
> On 17/03/2023 11.17, Peter Maydell wrote:
>> On Mon, 6 Mar 2023 at 11:16, Peter Maydell 
>> wrote:
>>>
>>> On Fri, 3 Mar 2023 at 18:36, Peter Maydell
>>>  wrote:

 I've noticed that test-blockjob seems to fail intermittently
 on the msys2-64bit job:

 https://gitlab.com/qemu-project/qemu/-/jobs/3872508803
 https://gitlab.com/qemu-project/qemu/-/jobs/3871061024
 https://gitlab.com/qemu-project/qemu/-/jobs/3865312440

 Sample output:
 | 53/89
 ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby:
 assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR
 53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3
>>
>>> Here's an intermittent failure from my macos x86 machine:
>>>
>>> 172/621 qemu:unit / test-blockjob
>>>  ERROR   0.26s   killed by signal 6 SIGABRT
>>
>> And an intermittent on the freebsd 13 CI job:
>> https://gitlab.com/qemu-project/qemu/-/jobs/3950667240
>>
> MALLOC_PERTURB_=197
> G_TEST_BUILDDIR=/tmp/cirrus-ci-build/build/tests/unit
> G_TEST_SRCDIR=/tmp/cirrus-ci-build/tests/unit
> /tmp/cirrus-ci-build/build/tests/unit/test-blockjob --tap -k
>> ▶ 178/650 /blockjob/ids
>>  OK
>> 178/650 qemu:unit / test-blockjob
>>  ERROR   0.31s   killed by signal 6 SIGABRT
>> ― ✀
>> ―
>> stderr:
>> Assertion failed: (job->status == JOB_STATUS_STANDBY), function
>> test_complete_in_standby, file ../tests/unit/test-blockjob.c, line
>> 499.
>>
>>
>> TAP parsing error: Too few tests run (expected 9, got 1)
>> (test program exited with status code -6)
>> ――
>>
>> Anybody in the block team looking at these, or shall we just
>> disable this test entirely ?
>
> I ran into this issue today, too:
>
>    https://gitlab.com/thuth/qemu/-/jobs/3954367101
>
> ... if nobody is interested in fixing this test, I think we should
> disable it...
>
>    Thomas
>

 I'm looking at this now, and seems that it's broken since
 6f592e5aca1a27fe1c1f6 "job.c: enable job lock/unlock and remove
 Aiocontext locks", as it stops critical section by new
 aio_context_release() call exactly after bdrv_drain_all_and(), so it's
 not a surprise that job may start at that moment and the following
 assertion fires.

 Emanuele could please look at it?

>>> Well if I understood correctly, the only thing that was preventing the
>>> job from continuing was the aiocontext lock held.
>>>
>>> The failing assertion even mentions that:
>>> /* Lock the IO thread to prevent the job from being run */
>>> [...]
>>> /* But the job cannot run, so it will remain on standby */
>>> assert(job->status == JOB_STATUS_STANDBY);
>>>
>>> Essentially bdrv_drain_all_end() would wake up the job coroutine, but it
>>> would anyways block somewhere after since the aiocontext lock was taken
>>> by the test.
>>>
>>> Now that we got rid of aiocontext lock in job code, nothing prevents the
>>> test from resuming.
>>> Mixing job lock and aiocontext acquire/release is not a good idea, and
>>> it would anyways block job_resume() called by bdrv_drain_all_end(), so
>>> the test itself would deadlock.
>>>
>>> So unless @Kevin has a better idea, this seems to be just an "hack" to
>>> test stuff that is not possible to do now anymore. What I would suggest
>>> is to get rid of that test, or at least of that assert function. I
>>> unfortunately cannot reproduce the failure, but I think the remaining
>>> functions called by the test should run as expected.
>>>
>>
>> Thanks! I agree. Probably, alternatively we could just expand the
>> drained section, like
>>
>> @@ -488,12 +488,6 @@ static void test_complete_in_standby(void)
>>  bdrv_drain_all_begin();
>>  assert_job_status_is(job, JOB_STATUS_STANDBY);
>>
>> -    /* Lock the IO thread to prevent the job from being run */
>> -    aio_context_acquire(ctx);
>> -    /* This will schedule the job to resume it */
>> -    bdrv_drain_all_end();
>> -    aio_context_release(ctx);
>> -
>>  WITH_JOB_LOCK_GUARD() {
>>  /* But the job cannot run, so it will remain on standby */
>>  assert(job->status == JOB_STATUS_STANDBY);
>> @@ -511,6 +505,7 @@ static void test_complete_in_standby(void)
>>  job_dismiss_locked(&job, &error_abort);
>>  }
>>
>> +    bdrv_drain_all_end();
>>  aio_context

Re: test-blockjob: intermittent CI failures in msys2-64bit job

2023-04-26 Thread Hanna Czenczek

On 26.04.23 09:38, Emanuele Giuseppe Esposito wrote:


Am 25/04/2023 um 18:48 schrieb Hanna Czenczek:

On 24.04.23 20:32, Vladimir Sementsov-Ogievskiy wrote:

On 24.04.23 16:36, Emanuele Giuseppe Esposito wrote:


Am 21/04/2023 um 12:13 schrieb Vladimir Sementsov-Ogievskiy:

On 17.03.23 15:35, Thomas Huth wrote:

On 17/03/2023 11.17, Peter Maydell wrote:

On Mon, 6 Mar 2023 at 11:16, Peter Maydell 
wrote:

On Fri, 3 Mar 2023 at 18:36, Peter Maydell
 wrote:

I've noticed that test-blockjob seems to fail intermittently
on the msys2-64bit job:

https://gitlab.com/qemu-project/qemu/-/jobs/3872508803
https://gitlab.com/qemu-project/qemu/-/jobs/3871061024
https://gitlab.com/qemu-project/qemu/-/jobs/3865312440

Sample output:
| 53/89
ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby:
assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR
53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3

Here's an intermittent failure from my macos x86 machine:

172/621 qemu:unit / test-blockjob
  ERROR   0.26s   killed by signal 6 SIGABRT

And an intermittent on the freebsd 13 CI job:
https://gitlab.com/qemu-project/qemu/-/jobs/3950667240


MALLOC_PERTURB_=197
G_TEST_BUILDDIR=/tmp/cirrus-ci-build/build/tests/unit
G_TEST_SRCDIR=/tmp/cirrus-ci-build/tests/unit
/tmp/cirrus-ci-build/build/tests/unit/test-blockjob --tap -k

▶ 178/650 /blockjob/ids
  OK
178/650 qemu:unit / test-blockjob
  ERROR   0.31s   killed by signal 6 SIGABRT
― ✀
―
stderr:
Assertion failed: (job->status == JOB_STATUS_STANDBY), function
test_complete_in_standby, file ../tests/unit/test-blockjob.c, line
499.


TAP parsing error: Too few tests run (expected 9, got 1)
(test program exited with status code -6)
――

Anybody in the block team looking at these, or shall we just
disable this test entirely ?

I ran into this issue today, too:

    https://gitlab.com/thuth/qemu/-/jobs/3954367101

... if nobody is interested in fixing this test, I think we should
disable it...

    Thomas


I'm looking at this now, and seems that it's broken since
6f592e5aca1a27fe1c1f6 "job.c: enable job lock/unlock and remove
Aiocontext locks", as it stops critical section by new
aio_context_release() call exactly after bdrv_drain_all_and(), so it's
not a surprise that job may start at that moment and the following
assertion fires.

Emanuele could please look at it?


Well if I understood correctly, the only thing that was preventing the
job from continuing was the aiocontext lock held.

The failing assertion even mentions that:
/* Lock the IO thread to prevent the job from being run */
[...]
/* But the job cannot run, so it will remain on standby */
assert(job->status == JOB_STATUS_STANDBY);

Essentially bdrv_drain_all_end() would wake up the job coroutine, but it
would anyways block somewhere after since the aiocontext lock was taken
by the test.

Now that we got rid of aiocontext lock in job code, nothing prevents the
test from resuming.
Mixing job lock and aiocontext acquire/release is not a good idea, and
it would anyways block job_resume() called by bdrv_drain_all_end(), so
the test itself would deadlock.

So unless @Kevin has a better idea, this seems to be just an "hack" to
test stuff that is not possible to do now anymore. What I would suggest
is to get rid of that test, or at least of that assert function. I
unfortunately cannot reproduce the failure, but I think the remaining
functions called by the test should run as expected.


Thanks! I agree. Probably, alternatively we could just expand the
drained section, like

@@ -488,12 +488,6 @@ static void test_complete_in_standby(void)
  bdrv_drain_all_begin();
  assert_job_status_is(job, JOB_STATUS_STANDBY);

-    /* Lock the IO thread to prevent the job from being run */
-    aio_context_acquire(ctx);
-    /* This will schedule the job to resume it */
-    bdrv_drain_all_end();
-    aio_context_release(ctx);
-
  WITH_JOB_LOCK_GUARD() {
  /* But the job cannot run, so it will remain on standby */
  assert(job->status == JOB_STATUS_STANDBY);
@@ -511,6 +505,7 @@ static void test_complete_in_standby(void)
  job_dismiss_locked(&job, &error_abort);
  }

+    bdrv_drain_all_end();
  aio_context_acquire(ctx);
  destroy_blk(blk);
  aio_context_release(ctx);


But, seems that test wanted to specifically test job, that still in
STANDBY exactly after drained section...

[cc: Hanna]

Hanna, it was your test (added in
c2c731a4d35062295cd3260e66b3754588a2fad4, two years ago). Don't you
remember was important to catch STANDBY job *after* a drained section?

I’m not quite sure, but I think the idea was that we basically try to
get as close to something that might come in over QMP.  Over QMP, you
can’t issue a job-complete while keeping everything drained, so I
wouldn’t just e

Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx

2023-04-26 Thread Paolo Bonzini

On 4/24/23 22:02, John Snow wrote:

Some important points as a pre-emptive "FAQ":

- This venv is unconditionally created and lives at {build_dir}/pyvenv.

- The python interpreter used by this venv is always the one identified
   by configure. (Which in turn is always the one specified by --python
   or $PYTHON)

-*almost*  all python scripts in qemu.git executed as part of the build
   system, meson, sphinx, avocado tests, vm tests or CI are always
   executed within this venv.

   (iotests are not yet integrated; I plan to tackle this separately as a
   follow-up in order to have a more tightly focused scope on that
   series.)

- It remains possible to build and test fully offline.
   (In most cases, you just need meson and sphinx from your distro's repo.)

- Distribution packaged 'meson' and 'sphinx' are still utilized whenever
   possible as the highest preference.

- Vendored versions of e.g. 'meson' are always preferred to PyPI
   versions for speed, repeatability and ensuring tarball builds work
   as-is offline.

   (Sphinx will not be vendored, just like it already isn't.)

- Missing dependencies, when possible, are fetched and installed
   on-demand automatically to make developer environments "just work".

- Works for Python 3.7 and up, on Fedora, OpenSuSE, Red Hat, CentOS,
   Alpine, Debian, Ubuntu, NetBSD, OpenBSD, and hopefully everywhere

- No new dependencies (...for most platforms. Debian and NetBSD get an
   asterisk.)

- The meson git submodule is unused after this series and can be removed.


Thanks, this looks pretty good.  Some changes I'd make for the non-RFC 
version:


- I think we should just check in the meson wheel (which also removes 
the need for patch 12, so it can be done in its stead) and remove the 
submodule


- The verbosity of mkvenv.py can be tuned down and most prints replaced 
with logger.info() or logger.debug()


- While I agree with keeping patch 18 separate, I would move it earlier 
so that patch 19 can be squashed into patch 14


- I am ambivalent about keeping --enable/--disable-pypi in the first 
committed patchset, but in any case I would move patches 16 and 20 
before patch 15


Paolo




[PATCH] tests/unit/test-blockjob: Re-enable complete_in_standby test

2023-04-26 Thread Emanuele Giuseppe Esposito
Pause the job while draining so that pause_count will be
increased and bdrv_drain_all_end() won't reset it to 0, so the
job will not continue.

With this fix, the test is not flaky anymore.

Additionally remove useless aiocontext lock around bdrv_drain_all_end()
in test_complete_in_standby().

Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable
complete_in_standby test"
Suggested-by: Hanna Czenczek 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/unit/test-blockjob.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index a130f6fefb..46d720aeee 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -488,11 +488,15 @@ static void test_complete_in_standby(void)
 bdrv_drain_all_begin();
 assert_job_status_is(job, JOB_STATUS_STANDBY);
 
+/*
+ * Increase pause_count so that the counter is
+ * unbalanced and job won't resume
+ */
+job_pause(job);
+
 /* Lock the IO thread to prevent the job from being run */
-aio_context_acquire(ctx);
 /* This will schedule the job to resume it */
 bdrv_drain_all_end();
-aio_context_release(ctx);
 
 WITH_JOB_LOCK_GUARD() {
 /* But the job cannot run, so it will remain on standby */
@@ -531,13 +535,6 @@ int main(int argc, char **argv)
 g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
 g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
 g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
-
-/*
- * This test is flaky and sometimes fails in CI and otherwise:
- * don't run unless user opts in via environment variable.
- */
-if (getenv("QEMU_TEST_FLAKY_TESTS")) {
-g_test_add_func("/blockjob/complete_in_standby", 
test_complete_in_standby);
-}
+g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby);
 return g_test_run();
 }
-- 
2.39.1




Re: test-blockjob: intermittent CI failures in msys2-64bit job

2023-04-26 Thread Emanuele Giuseppe Esposito



Am 26/04/2023 um 10:03 schrieb Hanna Czenczek:
> On 26.04.23 09:38, Emanuele Giuseppe Esposito wrote:
>>
>> Am 25/04/2023 um 18:48 schrieb Hanna Czenczek:
>>> On 24.04.23 20:32, Vladimir Sementsov-Ogievskiy wrote:
 On 24.04.23 16:36, Emanuele Giuseppe Esposito wrote:
>
> Am 21/04/2023 um 12:13 schrieb Vladimir Sementsov-Ogievskiy:
>> On 17.03.23 15:35, Thomas Huth wrote:
>>> On 17/03/2023 11.17, Peter Maydell wrote:
 On Mon, 6 Mar 2023 at 11:16, Peter Maydell
 
 wrote:
> On Fri, 3 Mar 2023 at 18:36, Peter Maydell
>  wrote:
>> I've noticed that test-blockjob seems to fail intermittently
>> on the msys2-64bit job:
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/3872508803
>> https://gitlab.com/qemu-project/qemu/-/jobs/3871061024
>> https://gitlab.com/qemu-project/qemu/-/jobs/3865312440
>>
>> Sample output:
>> | 53/89
>> ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby:
>> assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR
>> 53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3
> Here's an intermittent failure from my macos x86 machine:
>
> 172/621 qemu:unit / test-blockjob
>   ERROR   0.26s   killed by signal 6 SIGABRT
 And an intermittent on the freebsd 13 CI job:
 https://gitlab.com/qemu-project/qemu/-/jobs/3950667240

>>> MALLOC_PERTURB_=197
>>> G_TEST_BUILDDIR=/tmp/cirrus-ci-build/build/tests/unit
>>> G_TEST_SRCDIR=/tmp/cirrus-ci-build/tests/unit
>>> /tmp/cirrus-ci-build/build/tests/unit/test-blockjob --tap -k
 ▶ 178/650 /blockjob/ids
   OK
 178/650 qemu:unit / test-blockjob
   ERROR   0.31s   killed by signal 6 SIGABRT
 ― ✀
 ―
 stderr:
 Assertion failed: (job->status == JOB_STATUS_STANDBY), function
 test_complete_in_standby, file ../tests/unit/test-blockjob.c, line
 499.


 TAP parsing error: Too few tests run (expected 9, got 1)
 (test program exited with status code -6)
 ――

 Anybody in the block team looking at these, or shall we just
 disable this test entirely ?
>>> I ran into this issue today, too:
>>>
>>> https://gitlab.com/thuth/qemu/-/jobs/3954367101
>>>
>>> ... if nobody is interested in fixing this test, I think we should
>>> disable it...
>>>
>>> Thomas
>>>
>> I'm looking at this now, and seems that it's broken since
>> 6f592e5aca1a27fe1c1f6 "job.c: enable job lock/unlock and remove
>> Aiocontext locks", as it stops critical section by new
>> aio_context_release() call exactly after bdrv_drain_all_and(), so
>> it's
>> not a surprise that job may start at that moment and the following
>> assertion fires.
>>
>> Emanuele could please look at it?
>>
> Well if I understood correctly, the only thing that was preventing the
> job from continuing was the aiocontext lock held.
>
> The failing assertion even mentions that:
> /* Lock the IO thread to prevent the job from being run */
> [...]
> /* But the job cannot run, so it will remain on standby */
> assert(job->status == JOB_STATUS_STANDBY);
>
> Essentially bdrv_drain_all_end() would wake up the job coroutine,
> but it
> would anyways block somewhere after since the aiocontext lock was
> taken
> by the test.
>
> Now that we got rid of aiocontext lock in job code, nothing
> prevents the
> test from resuming.
> Mixing job lock and aiocontext acquire/release is not a good idea, and
> it would anyways block job_resume() called by bdrv_drain_all_end(), so
> the test itself would deadlock.
>
> So unless @Kevin has a better idea, this seems to be just an "hack" to
> test stuff that is not possible to do now anymore. What I would
> suggest
> is to get rid of that test, or at least of that assert function. I
> unfortunately cannot reproduce the failure, but I think the remaining
> functions called by the test should run as expected.
>
 Thanks! I agree. Probably, alternatively we could just expand the
 drained section, like

 @@ -488,12 +488,6 @@ static void test_complete_in_standby(void)
   bdrv_drain_all_begin();
   assert_job_status_is(job, JOB_STATUS_STANDBY);

 -    /* Lock the IO thread to prevent the job from being run */
 -    aio_context_acquire(ctx);
 -    /* This will schedule the job to resume it */
 -    bdrv_drain_all_end();
 -    aio_context_release(ctx);
 -
   WITH_JO

Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx

2023-04-26 Thread Daniel P . Berrangé
On Tue, Apr 25, 2023 at 02:58:53PM -0400, John Snow wrote:
> On Tue, Apr 25, 2023 at 2:10 PM Daniel P. Berrangé  
> wrote:
> > How about having --enable-pypi never|auto|force with the following
> > semantics for --enable-docs + --enable-pypi
> >
> >
> >   * docs=no   - pypi never used
> >
> >   * docs=auto + pypi=never  => docs only enable if sphinx is already
> >installed locally, otherwise disabled
> >
> >   * docs=auto + pypi=auto  => docs enable if sphinx is already
> >   installed locally, or can download from
> >   pypi as fallback
> >
> 
> As long as this doesn't cause needless trouble for existing configure
> invocations baked into scripts and the like. It's a bit more
> aggressive about enabling docs than we have been in the past, but
> maybe that's not a bad thing.
> 
> >   * docs=auto + pypi=force  => always download sphinx from pypi
> >
> 
> So if you already have Sphinx, this should perform an upgrade to the
> latest version?

Essentially I meant 'force' to mean *never* use the host python
installation packages. Always install all the deps in the venv,
even if they exist in the host with sufficient version met.

> >   * docs=yes + pypi=never  => ERROR if sphinx is not already
> >   installed locally
> >
> >   * docs=yes + pypi=auto  => docs enable if sphinx is already
> >  installed locally, or can download from
> >  pypi as fallback
> >
> >   * docs=yes + pypi=force  => always download sphinx from pypi
> 
> Yeah, there's room for settings like these, and the above mostly makes
> sense to me.
> 
> Paolo and I had written up a more elaborate set of flags at one point
> under the premise of integrating testing, too -- for instance,
> pre-loading the testing dependencies for iotests (local qemu package,
> qemu.qmp, etc) or for avocado tests (avocado-framework and other deps
> needed for those tests). I wanted to try and check in a "minimum
> viable" version of this patch set first close to the 8.1 tree opening
> to iron out any bugs in the core feature and then I'd work to expand
> the flags and capability of the system as differential patches.
> 
> So, I think changing --enable-pypi too much is beyond the scope of the
> current patchset, but it's on my mind for what will follow almost
> immediately after it. With that in mind, we can brainstorm a little
> here and throw some darts at the wall:

Yep, I don't consider it a pre-requisite for this series.

> 
> The version of flags we originally came up with was this:
> 
> --python=... # runtime used to create venv
> --enable-pip-groups=testing,devel,avocado,meson,sphinx
> --enable-pip=now  # install all python deps now
> --enable-pip=on-demand  # install qemu.git/meson/sphinx, delay the rest
> --enable-pip=no# offline
> --{enable,disable}-isolated-venv # let venv use system/distro if disable

This feels like a bit of overkill to me, and would create a hell
of a lot of combinations to test if you expand the matrix of
options.

> I may have pulled us in a slightly different direction with the
> version of the patches I actually sent here today, but the key ideas
> here were:
> 
> - The ability to opt into or out of different package groups at
> configure-time. meson is always required, docs/sphinx is optional, and
> the other three groups are testing related. (testing: iotests and
> qemu.qmp, devel: python self-tests, avocado: avocado tests.)

I think this is especially overkill. While you can probably come up
with some scenarios where you could use this fine grained selection,
I don't think it would be very compelling.

> - The ability to decide when packages get installed to the venv;
> either at configure time ("now") or as-needed ("on-demand") or
> functionally never ("no")

The distinction between now vs on-demand is really just about
avoiding the overhead of downloading bits that you might not
end up using. eg not downloading avocado unless you will be
running 'make check-avocado'. That's saving a few 10s or 100s
of KB of download, but doesn't feel like it'd make a noticable
difference in the overall QEMU build time.

THe 'now' and 'no' options look sufficient (corresponding to
the 'auto' and 'never' options I suggested above)

> - The ability to enable or disable venv isolation (turning on or off
> system packages).

Corresponds to the 'force' option I suggested above.

> 
> I think I abandoned the idea of configuring the venv isolation and
> have it set to "always off". We may revisit this later, but I think
> for this series it's fine to have glossed over it.
> I also skipped over the idea of having package installation being
> "now", "on-demand" or "no" -- this patch-set more or less is using a
> hybrid of "now" and "on-demand" where meson and sphinx are "now" but
> avocado is "on-demand". Unifying that discrepancy can occur in
> subsequent patches as it closely resem

Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect

2023-04-26 Thread Tim Smith
On Mon, Apr 24, 2023 at 2:51 PM Paul Durrant  wrote:
>
> So if you drop the ring drain then this patch should still stop the
> SEGVs, right?
>

I think that's worth a few test runs. I recall some coredumps in that
condition when I was investigating early on, but I don't have them in
my collection so maybe I'm misremembering.

Tim



Re: [PATCH v3 2/7] target/i386: Add new EPYC CPU versions with updated cache_info

2023-04-26 Thread Maksim Davydov



On 4/25/23 18:35, Moger, Babu wrote:

Hi Maksim,

On 4/25/23 07:51, Maksim Davydov wrote:

On 4/24/23 19:33, Babu Moger wrote:

From: Michael Roth 

Introduce new EPYC cpu versions: EPYC-v4 and EPYC-Rome-v3.
The only difference vs. older models is an updated cache_info with
the 'complex_indexing' bit unset, since this bit is not currently
defined for AMD and may cause problems should it be used for
something else in the future. Setting this bit will also cause
CPUID validation failures when running SEV-SNP guests.

Signed-off-by: Michael Roth 
Signed-off-by: Babu Moger 
Acked-by: Michael S. Tsirkin 
---
   target/i386/cpu.c | 118 ++
   1 file changed, 118 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e3d9eaa307..c1bc47661d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1707,6 +1707,56 @@ static const CPUCaches epyc_cache_info = {
   },
   };
   +static CPUCaches epyc_v4_cache_info = {
+    .l1d_cache = &(CPUCacheInfo) {
+    .type = DATA_CACHE,
+    .level = 1,
+    .size = 32 * KiB,
+    .line_size = 64,
+    .associativity = 8,
+    .partitions = 1,
+    .sets = 64,
+    .lines_per_tag = 1,
+    .self_init = 1,
+    .no_invd_sharing = true,
+    },
+    .l1i_cache = &(CPUCacheInfo) {
+    .type = INSTRUCTION_CACHE,
+    .level = 1,
+    .size = 64 * KiB,
+    .line_size = 64,
+    .associativity = 4,
+    .partitions = 1,
+    .sets = 256,
+    .lines_per_tag = 1,
+    .self_init = 1,
+    .no_invd_sharing = true,
+    },
+    .l2_cache = &(CPUCacheInfo) {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 512 * KiB,
+    .line_size = 64,
+    .associativity = 8,
+    .partitions = 1,
+    .sets = 1024,
+    .lines_per_tag = 1,
+    },
+    .l3_cache = &(CPUCacheInfo) {
+    .type = UNIFIED_CACHE,
+    .level = 3,
+    .size = 8 * MiB,
+    .line_size = 64,
+    .associativity = 16,
+    .partitions = 1,
+    .sets = 8192,
+    .lines_per_tag = 1,
+    .self_init = true,
+    .inclusive = true,
+    .complex_indexing = false,
+    },
+};
+
   static const CPUCaches epyc_rome_cache_info = {
   .l1d_cache = &(CPUCacheInfo) {
   .type = DATA_CACHE,
@@ -1757,6 +1807,56 @@ static const CPUCaches epyc_rome_cache_info = {
   },
   };
   +static const CPUCaches epyc_rome_v3_cache_info = {
+    .l1d_cache = &(CPUCacheInfo) {
+    .type = DATA_CACHE,
+    .level = 1,
+    .size = 32 * KiB,
+    .line_size = 64,
+    .associativity = 8,
+    .partitions = 1,
+    .sets = 64,
+    .lines_per_tag = 1,
+    .self_init = 1,
+    .no_invd_sharing = true,
+    },
+    .l1i_cache = &(CPUCacheInfo) {
+    .type = INSTRUCTION_CACHE,
+    .level = 1,
+    .size = 32 * KiB,
+    .line_size = 64,
+    .associativity = 8,
+    .partitions = 1,
+    .sets = 64,
+    .lines_per_tag = 1,
+    .self_init = 1,
+    .no_invd_sharing = true,
+    },
+    .l2_cache = &(CPUCacheInfo) {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 512 * KiB,
+    .line_size = 64,
+    .associativity = 8,
+    .partitions = 1,
+    .sets = 1024,
+    .lines_per_tag = 1,
+    },
+    .l3_cache = &(CPUCacheInfo) {
+    .type = UNIFIED_CACHE,
+    .level = 3,
+    .size = 16 * MiB,
+    .line_size = 64,
+    .associativity = 16,
+    .partitions = 1,
+    .sets = 16384,
+    .lines_per_tag = 1,
+    .self_init = true,
+    .inclusive = true,
+    .complex_indexing = false,
+    },
+};
+
   static const CPUCaches epyc_milan_cache_info = {
   .l1d_cache = &(CPUCacheInfo) {
   .type = DATA_CACHE,
@@ -4091,6 +4191,15 @@ static const X86CPUDefinition builtin_x86_defs[] = {
   { /* end of list */ }
   }
   },
+    {
+    .version = 4,
+    .props = (PropValue[]) {
+    { "model-id",
+  "AMD EPYC-v4 Processor" },
+    { /* end of list */ }
+    },
+    .cache_info = &epyc_v4_cache_info
+    },
   { /* end of list */ }
   }
   },
@@ -4210,6 +4319,15 @@ static const X86CPUDefinition builtin_x86_defs[] = {
   { /* end of list */ }
   }
   },
+    {
+    .version = 3,
+    .props = (PropValue[]) {
+    { "model-id",
+  "AMD EPYC-Rome-v3 Processor" },

What do you think about adding more information to the model name to reveal
its key feature? For instance, model-id can be "EPYC-Rome-v3 (NO INDEXING)",
because only cache info was affected. Or alias can be used to achieve
the same effect. It works well in

Actually, we already thoug

Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx

2023-04-26 Thread Paolo Bonzini

On 4/26/23 10:21, Daniel P. Berrangé wrote:

So if you already have Sphinx, this should perform an upgrade to the
latest version?


Essentially I meant 'force' to mean*never*  use the host python
installation packages. Always install all the deps in the venv,
even if they exist in the host with sufficient version met.


I think this is essentially --enable-isolated-venv.  I don't think there 
is a usecase for "let the venv use system packages, but override them 
with pip right away".




--python=... # runtime used to create venv
--enable-pip-groups=testing,devel,avocado,meson,sphinx
--enable-pip=now  # install all python deps now
--enable-pip=on-demand  # install qemu.git/meson/sphinx, delay the rest
--enable-pip=no# offline
--{enable,disable}-isolated-venv # let venv use system/distro if disable


This feels like a bit of overkill to me, and would create a hell
of a lot of combinations to test if you expand the matrix of
options.


Yeah, this is a bit overkill.  I think we can reduce it to three cases, 
corresponding to:


- --enable-pypi --enable-isolated-venv - use pip to install everything, 
including for options in "auto" state (e.g. would install sphinx without 
--enable-docs)


- --enable-pypi --disable-isolated-venv - use pip to install missing 
packages.  TBD whether to do so for options in "auto" state or only for 
"enabled" (i.e., TBD whether to install sphinx without --enable-docs).


- --disable-pypi (only meaningful for --disable-isolated-venv) - apart 
from vendored wheels, just use system site packages (same as QEMU <= 8.0)


I think we want to hash out this detail first, and thus we should leave 
online mode out of the non-RFC version.  It can be implemented together 
with isolated mode.


Paolo




Re: [PATCH] tests/unit/test-blockjob: Re-enable complete_in_standby test

2023-04-26 Thread Thomas Huth

On 26/04/2023 10.16, Emanuele Giuseppe Esposito wrote:

Pause the job while draining so that pause_count will be
increased and bdrv_drain_all_end() won't reset it to 0, so the
job will not continue.

With this fix, the test is not flaky anymore.

Additionally remove useless aiocontext lock around bdrv_drain_all_end()
in test_complete_in_standby().

Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable
complete_in_standby test"
Suggested-by: Hanna Czenczek 
Signed-off-by: Emanuele Giuseppe Esposito 
---
  tests/unit/test-blockjob.c | 17 +++--
  1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index a130f6fefb..46d720aeee 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -488,11 +488,15 @@ static void test_complete_in_standby(void)
  bdrv_drain_all_begin();
  assert_job_status_is(job, JOB_STATUS_STANDBY);
  
+/*

+ * Increase pause_count so that the counter is
+ * unbalanced and job won't resume
+ */
+job_pause(job);
+
  /* Lock the IO thread to prevent the job from being run */


I guess the above comment should now be removed, too?


-aio_context_acquire(ctx);
  /* This will schedule the job to resume it */
  bdrv_drain_all_end();
-aio_context_release(ctx);


 Thomas




Re: [PATCH 8/9] docs/devel: mention the spacing requirement for QOM

2023-04-26 Thread Mark Cave-Ayland

On 21/04/2023 07:15, Philippe Mathieu-Daudé wrote:


On 20/4/23 21:32, Mark Cave-Ayland wrote:

On 20/04/2023 16:57, Alex Bennée wrote:


We have a more complete document on QOM but we should at least mention
the style requirements in the style guide.

Signed-off-by: Alex Bennée 
Cc: Mark Cave-Ayland 
---
  docs/devel/qom.rst   |  2 ++
  docs/devel/style.rst | 29 +
  2 files changed, 31 insertions(+)




A couple of points:

1) It is probably worth removing the typedefs given that they are handled by the 
various QOM macros


2) There should be mention of the fixed names "parent_obj" and "parent_class" 
for
the first declaration.

How about something like this:


QEMU Object Model Declarations
==

The QEMU Object Model (QOM) provides a framework for handling objects
in the base C language. The first declaration of a storage or class
structure should always be the parent and leave a visual space between


s/should/must/


that declaration and the new code.

For a storage structure the first declaration should always be called
"parent_obj" and for a class structure the first member should always
be called "parent_class" as below:

.. code-block:: c

 struct MyDeviceState {
 DeviceState parent_obj;

 /* Properties */
 int prop_a;
 char *prob_b;


Should we mention "We recommend placing instance/class properties fields
just after the parent field"?


I don't think I've ever seen any recommendations on placing instance/class property 
fields other than for the parent DeviceState/DeviceClass?


IMO it doesn't seem worth committing to anything else for now, especially as which 
fields get exposed as properties can be quite fluid these days ;)



 /* Other stuff */
 int internal_state;
 };

 struct MyDeviceClass {
 DeviceClass parent_class;

 void (*new_fn1)(void);
 bool (*new_fn2)(CPUState *);
 };

Note that there is no need to provide typedefs for QOM structures since these are 
generated automatically by the QOM declaration macros. See :ref:`qom` for more 
details.



ATB,

Mark.



Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx

2023-04-26 Thread Paolo Bonzini

On 4/26/23 10:05, Paolo Bonzini wrote:




Thanks, this looks pretty good.  Some changes I'd make for the non-RFC 
version:


- I think we should just check in the meson wheel (which also removes 
the need for patch 12, so it can be done in its stead) and remove the 
submodule


- The verbosity of mkvenv.py can be tuned down and most prints replaced 
with logger.info() or logger.debug()


- While I agree with keeping patch 18 separate, I would move it earlier 
so that patch 19 can be squashed into patch 14


- I am ambivalent about keeping --enable/--disable-pypi in the first 
committed patchset, but in any case I would move patches 16 and 20 
before patch 15


Just one extra thing, since we're changing so much of Python handling 
and since the code is written, I would keep the Debian 10 workarounds 
for now, and only drop them after we drop support for 3.6.


Paolo




Re: [PATCH 17/18] docs/devel: mention the spacing requirement for QOM

2023-04-26 Thread Mark Cave-Ayland

On 24/04/2023 10:22, Alex Bennée wrote:


We have a more complete document on QOM but we should at least mention
the style requirements in the style guide.

Signed-off-by: Alex Bennée 
Cc: Mark Cave-Ayland 
Reviewed-by: Juan Quintela 
Message-Id: <20230420155723.1711048-9-alex.ben...@linaro.org>

---
vppr:
   - use Mark's formulation, briefly mention property separation.
---
  docs/devel/qom.rst   |  2 ++
  docs/devel/style.rst | 40 
  2 files changed, 42 insertions(+)

diff --git a/docs/devel/qom.rst b/docs/devel/qom.rst
index 3e34b07c98..c9237950d0 100644
--- a/docs/devel/qom.rst
+++ b/docs/devel/qom.rst
@@ -1,3 +1,5 @@
+.. _qom:
+
  ===
  The QEMU Object Model (QOM)
  ===
diff --git a/docs/devel/style.rst b/docs/devel/style.rst
index 5bc6f2f095..e9fce0fc69 100644
--- a/docs/devel/style.rst
+++ b/docs/devel/style.rst
@@ -628,6 +628,46 @@ are still some caveats to beware of
  QEMU Specific Idioms
  
  
+QEMU Object Model Declarations

+==
+
+QEMU Object Model Declarations
+==


Accidental repetition here :)


+The QEMU Object Model (QOM) provides a framework for handling objects
+in the base C language. The first declaration of a storage or class
+structure should always be the parent and leave a visual space between
+that declaration and the new code. It is also useful to separate
+backing for properties (options driven by the user) and internal state
+to make navigation easier.
+
+For a storage structure the first declaration should always be called
+"parent_obj" and for a class structure the first member should always
+be called "parent_class" as below:
+
+.. code-block:: c
+
+struct MyDeviceState {
+DeviceState parent_obj;
+
+/* Properties */
+int prop_a;
+char *prop_b;
+/* Other stuff */
+int internal_state;
+};
+
+struct MyDeviceClass {
+DeviceClass parent_class;
+
+void (*new_fn1)(void);
+bool (*new_fn2)(CPUState *);
+};
+
+Note that there is no need to provide typedefs for QOM structures
+since these are generated automatically by the QOM declaration macros.
+See :ref:`qom` for more details.
+
  Error handling and reporting
  


Otherwise:

Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.



Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx

2023-04-26 Thread Daniel P . Berrangé
On Mon, Apr 24, 2023 at 04:02:28PM -0400, John Snow wrote:
> GitLab CI: https://gitlab.com/jsnow/qemu/-/pipelines/846869409
>(All green, except Python self-tests, see below)
> 
> This patch series creates a mandatory python virtual environment
> ("venv") during configure time and uses it to ensure the availability of
> meson and sphinx.
> 
> See https://www.qemu.org/2023/03/24/python/ for details. The summary is
> that the goal of this series is to ensure that the `python` used to run
> meson is the same `python` used to run Sphinx, tests, and any build-time
> python scripting we have. As it stands, meson and sphinx (and their
> extensions) *may* run in a different python environment than the one
> configured and chosen by the user at configure/build time.

I mentioned this when we were chatting on IRC, but to repeat for
the general audience..

I think it'd be useful for support purposes if the configure
script summary printed details of how we setup the venv.
eg perhaps a summary for each python module of whether we
added it to the venv, or relied on te site packages:

   python venv: meson (site), sphinx (venv), avocado (venv)


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RFC PATCH v3 00/20] configure: create a python venv and ensure meson, sphinx

2023-04-26 Thread Paolo Bonzini

On 4/26/23 10:53, Daniel P. Berrangé wrote:

I think it'd be useful for support purposes if the configure
script summary printed details of how we setup the venv.
eg perhaps a summary for each python module of whether we
added it to the venv, or relied on te site packages:

python venv: meson (site), sphinx (venv), avocado (venv)


Yes, I agree John did a great job with the errors but a little more work 
is needed to cleanup of mkvenv.py's logging of the "good" case.  The 
installation case is already covered by "pip install"'s output, but we 
need to have something like:


mkvenv: Creating {isolated|non-isolated} virtual environment [based on 
/home/pbonzini/myvenv]


and when creating the console-scripts shims:

mkvenv: Found avocado v90.0

Paolo




Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config

2023-04-26 Thread Daniel P . Berrangé
On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé  wrote:
> > The 'unsigned int interations' config for migration is somewhat
> > overkill. Most tests don't set it, and a value of '0' is treated
> > as equivalent to '1'. The only test that does set it, xbzrle,
> > used a value of '2'.
> >
> > This setting, however, only relates to the migration iterations
> > that take place prior to allowing convergence. IOW, on top of
> > this iteration count, there is always at least 1 further migration
> > iteration done to deal with pages that are dirtied during the
> > previous iteration(s).
> >
> > IOW, even with iterations==1, the xbzrle test will be running for
> > a minimum of 2 iterations. With this in mind we can simplify the
> > code and just get rid of the special case.
> 
> Perhaps the old code was already wrong, but we need at least three
> iterations for the xbzrle test:
> - 1st iteration: xbzrle is not used, nothing is on cache.

Are you sure about this ?  I see ram_save_page() calling
save_xbzrle_page() and unless I'm mis-understanding the
code, it doesn't appear to skip anything on the 1st
iteration.

IIUC save_xbzrle_page will add pages into the cache on
the first iteration, so the second iteration will get
cache hits

> - 2nd iteration: pages are put into cache, no xbzrle is used because
>   there is no previous page.
> - 3rd iteration: We really use xbzrle now against the copy of the
>   previous iterations.
> 
> And yes, this should be commented somewhere.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 0/3] ROM migration

2023-04-26 Thread Gerd Hoffmann
  Hi,

> As I understand, netcard ROM file is needed only for network boot. So,
> it's absolutely correct to use romfile="" option: network boot will
> not work, but everything else will work correctly. Is that right?

In most cases yes.

The exception to the rule is virtio-net with UEFI firmware.  OVMF and
ArmVirt images ship with a virtio-net driver included, so network
booting works without a nic rom.

take care,
  Gerd




Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config

2023-04-26 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote:
>> Daniel P. Berrangé  wrote:
>> > The 'unsigned int interations' config for migration is somewhat
>> > overkill. Most tests don't set it, and a value of '0' is treated
>> > as equivalent to '1'. The only test that does set it, xbzrle,
>> > used a value of '2'.
>> >
>> > This setting, however, only relates to the migration iterations
>> > that take place prior to allowing convergence. IOW, on top of
>> > this iteration count, there is always at least 1 further migration
>> > iteration done to deal with pages that are dirtied during the
>> > previous iteration(s).
>> >
>> > IOW, even with iterations==1, the xbzrle test will be running for
>> > a minimum of 2 iterations. With this in mind we can simplify the
>> > code and just get rid of the special case.
>> 
>> Perhaps the old code was already wrong, but we need at least three
>> iterations for the xbzrle test:
>> - 1st iteration: xbzrle is not used, nothing is on cache.
>
> Are you sure about this ?  I see ram_save_page() calling
> save_xbzrle_page() and unless I'm mis-understanding the
> code, it doesn't appear to skip anything on the 1st
> iteration.

I will admit that code is convoluted as hell.
And I confuse myself a lot here O:-)

struct RAM_STATE {
...
/* Start using XBZRLE (e.g., after the first round). */
bool xbzrle_enabled;
}

I.e. xbzrle_enabled() and m->xbzrle_enabled are two completely different things.

static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
{
...
if (rs->xbzrle_enabled && !migration_in_postcopy()) {
pages = save_xbzrle_page(rs, pss, &p, current_addr,
 block, offset);

}

}

and

static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
{
/* Update pss->page for the next dirty bit in ramblock */
pss_find_next_dirty(pss);

if (pss->complete_round && pss->block == rs->last_seen_block &&
...
return PAGE_ALL_CLEAN;
}
if (!offset_in_ramblock(pss->block,
((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {

if (!pss->block) {

if (migrate_use_xbzrle()) {
rs->xbzrle_enabled = true;
}
}
...
} else {
/* We've found something */
return PAGE_DIRTY_FOUND;
}
}



> IIUC save_xbzrle_page will add pages into the cache on
> the first iteration, so the second iteration will get
> cache hits
>
>> - 2nd iteration: pages are put into cache, no xbzrle is used because
>>   there is no previous page.
>> - 3rd iteration: We really use xbzrle now against the copy of the
>>   previous iterations.
>> 
>> And yes, this should be commented somewhere.

Seeing that it has been able to confuse you, a single comment will not
make the trick O:-)

Later, Juan.




Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config

2023-04-26 Thread Daniel P . Berrangé
On Wed, Apr 26, 2023 at 11:42:51AM +0200, Juan Quintela wrote:
> Daniel P. Berrangé  wrote:
> > On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote:
> >> Daniel P. Berrangé  wrote:
> >> > The 'unsigned int interations' config for migration is somewhat
> >> > overkill. Most tests don't set it, and a value of '0' is treated
> >> > as equivalent to '1'. The only test that does set it, xbzrle,
> >> > used a value of '2'.
> >> >
> >> > This setting, however, only relates to the migration iterations
> >> > that take place prior to allowing convergence. IOW, on top of
> >> > this iteration count, there is always at least 1 further migration
> >> > iteration done to deal with pages that are dirtied during the
> >> > previous iteration(s).
> >> >
> >> > IOW, even with iterations==1, the xbzrle test will be running for
> >> > a minimum of 2 iterations. With this in mind we can simplify the
> >> > code and just get rid of the special case.
> >> 
> >> Perhaps the old code was already wrong, but we need at least three
> >> iterations for the xbzrle test:
> >> - 1st iteration: xbzrle is not used, nothing is on cache.
> >
> > Are you sure about this ?  I see ram_save_page() calling
> > save_xbzrle_page() and unless I'm mis-understanding the
> > code, it doesn't appear to skip anything on the 1st
> > iteration.
> 
> I will admit that code is convoluted as hell.
> And I confuse myself a lot here O:-)
> 
> struct RAM_STATE {
> ...
> /* Start using XBZRLE (e.g., after the first round). */
> bool xbzrle_enabled;
> }
> 
> I.e. xbzrle_enabled() and m->xbzrle_enabled are two completely different 
> things.

Aie !  That's confusing indeed :-)

Lets rename that struct field to 'xbzrle_started', to better
distinguish active state from enabled state.


> static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> {
> ...
> if (rs->xbzrle_enabled && !migration_in_postcopy()) {
> pages = save_xbzrle_page(rs, pss, &p, current_addr,
>  block, offset);
> 
> }
> 
> }
> 
> and
> 
> static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
> {
> /* Update pss->page for the next dirty bit in ramblock */
> pss_find_next_dirty(pss);
> 
> if (pss->complete_round && pss->block == rs->last_seen_block &&
> ...
> return PAGE_ALL_CLEAN;
> }
> if (!offset_in_ramblock(pss->block,
> ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
> 
> if (!pss->block) {
> 
> if (migrate_use_xbzrle()) {
> rs->xbzrle_enabled = true;
> }
> }
> ...
> } else {
> /* We've found something */
> return PAGE_DIRTY_FOUND;
> }
> }
> 
> 
> 
> > IIUC save_xbzrle_page will add pages into the cache on
> > the first iteration, so the second iteration will get
> > cache hits
> >
> >> - 2nd iteration: pages are put into cache, no xbzrle is used because
> >>   there is no previous page.
> >> - 3rd iteration: We really use xbzrle now against the copy of the
> >>   previous iterations.
> >> 
> >> And yes, this should be commented somewhere.
> 
> Seeing that it has been able to confuse you, a single comment will not
> make the trick O:-)
> 
> Later, Juan.
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()

2023-04-26 Thread Juan Quintela
"manish.mishra"  wrote:
> multifd_send_sync_main() posts request on the multifd channel
> but does not call sem_wait() on channels_ready semaphore, making
> the channels_ready semaphore count keep increasing.
> As a result, sem_wait() on channels_ready in multifd_send_pages()
> is always non-blocking hence multifd_send_pages() keeps searching
> for a free channel in a busy loop until a channel is freed.
>
> Signed-off-by: manish.mishra 
> ---
>  migration/multifd.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index cce3ad6988..43d26e7012 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f)
>  
>  trace_multifd_send_sync_main_signal(p->id);
>  
> +qemu_sem_wait(&multifd_send_state->channels_ready);
>  qemu_mutex_lock(&p->mutex);
>  
>  if (p->quit) {

We need this, but I think it is better to put it on the second loop.

> @@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp)
>  multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
>  multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
>  multifd_send_state->pages = multifd_pages_init(page_count);
> -qemu_sem_init(&multifd_send_state->channels_ready, 0);
> +qemu_sem_init(&multifd_send_state->channels_ready, thread_count);
>  qatomic_set(&multifd_send_state->exiting, 0);
>  multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];

I think this bit is wrong.
We should not set the channels ready until the thread is ready and
channel is created.

What do you think about this patch:

>From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001
From: Juan Quintela 
Date: Wed, 26 Apr 2023 12:20:36 +0200
Subject: [PATCH] multifd: Fix the number of channels ready

We don't wait in the sem when we are doing a sync_main.  Make it wait
there.  To make things clearer, we mark the channel ready at the
begining of the thread loop.

This causes a busy loop in multifd_send_page().
Found-by: manish.mishra 

Signed-off-by: Juan Quintela 
---
 migration/multifd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 903df2117b..e625e8725e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = &multifd_send_state->params[i];
 
+qemu_sem_wait(&multifd_send_state->channels_ready);
 trace_multifd_send_sync_main_wait(p->id);
 qemu_sem_wait(&p->sem_sync);
 
@@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque)
 p->num_packets = 1;
 
 while (true) {
+qemu_sem_post(&multifd_send_state->channels_ready);
 qemu_sem_wait(&p->sem);
 
 if (qatomic_read(&multifd_send_state->exiting)) {
@@ -736,7 +738,6 @@ static void *multifd_send_thread(void *opaque)
 if (flags & MULTIFD_FLAG_SYNC) {
 qemu_sem_post(&p->sem_sync);
 }
-qemu_sem_post(&multifd_send_state->channels_ready);
 } else if (p->quit) {
 qemu_mutex_unlock(&p->mutex);
 break;
-- 
2.40.0





Re: [PATCH v3 01/18] hw/ide/piix: Expose output IRQ as properties for late object population

2023-04-26 Thread Mark Cave-Ayland

On 02/03/2023 22:40, Philippe Mathieu-Daudé wrote:


Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/ide/piix.c | 14 --
  include/hw/ide/piix.h |  4 
  2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 41d60921e3..a36dac8469 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -121,6 +121,13 @@ static void piix_ide_reset(DeviceState *dev)
  pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
  }
  
+static void piix_ide_initfn(Object *obj)

+{
+PCIIDEState *dev = PCI_IDE(obj);
+
+qdev_init_gpio_out_named(DEVICE(obj), dev->isa_irq, "ide-irq", 2);
+}
+
  static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)
  {
  static const struct {
@@ -133,6 +140,7 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, 
Error **errp)
  };
  int ret;
  
+qemu_irq irq_out = d->isa_irq[i] ? : isa_get_irq(NULL, port_info[i].isairq);

  ide_bus_init(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
  ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase,
port_info[i].iobase2);
@@ -141,7 +149,7 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, 
Error **errp)
   object_get_typename(OBJECT(d)), i);
  return false;
  }
-ide_bus_init_output_irq(&d->bus[i], isa_get_irq(NULL, 
port_info[i].isairq));
+ide_bus_init_output_irq(&d->bus[i], irq_out);
  
  bmdma_init(&d->bus[i], &d->bmdma[i], d);

  d->bmdma[i].bus = &d->bus[i];
@@ -162,7 +170,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  
  vmstate_register(VMSTATE_IF(dev), 0, &vmstate_ide_pci, d);
  
-for (unsigned i = 0; i < 2; i++) {

+for (unsigned i = 0; i < ARRAY_SIZE(d->isa_irq); i++) {
  if (!pci_piix_init_bus(d, i, errp)) {
  return;
  }
@@ -199,6 +207,7 @@ static void piix3_ide_class_init(ObjectClass *klass, void 
*data)
  static const TypeInfo piix3_ide_info = {
  .name  = TYPE_PIIX3_IDE,
  .parent= TYPE_PCI_IDE,
+.instance_init = piix_ide_initfn,
  .class_init= piix3_ide_class_init,
  };
  
@@ -221,6 +230,7 @@ static void piix4_ide_class_init(ObjectClass *klass, void *data)

  static const TypeInfo piix4_ide_info = {
  .name  = TYPE_PIIX4_IDE,
  .parent= TYPE_PCI_IDE,
+.instance_init = piix_ide_initfn,
  .class_init= piix4_ide_class_init,
  };
  
diff --git a/include/hw/ide/piix.h b/include/hw/ide/piix.h

index ef3ef3d62d..533d24d408 100644
--- a/include/hw/ide/piix.h
+++ b/include/hw/ide/piix.h
@@ -1,6 +1,10 @@
  #ifndef HW_IDE_PIIX_H
  #define HW_IDE_PIIX_H
  
+/*

+ * QEMU interface:
+ *  + named GPIO outputs "ide-irq": asserted by each IDE channel
+ */
  #define TYPE_PIIX3_IDE "piix3-ide"
  #define TYPE_PIIX4_IDE "piix4-ide"


Comparing this with Bernhard's latest series, Bernhard's patch at 
https://patchew.org/QEMU/20230422150728.176512-1-shen...@gmail.com/20230422150728.176512-2-shen...@gmail.com/ 
(with a small change) is the version we should use, since legacy IRQs are a feature 
of all PCI IDE controllers and not just the PIIX controllers.


If we do it this way then it is possible for all PCI IDE controllers to share the 
same logic for BDMA and legacy/native mode switching moving forward: if a PCI IDE 
controller doesn't implement legacy IRQ routing then the board can leave the IRQs 
disconnected.



ATB,

Mark.



[PATCH v4 02/48] hw/net/net_tx_pkt: Decouple interface from PCI

2023-04-26 Thread Akihiko Odaki
This allows to use the network packet abstractions even if PCI is not
used.

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.h  | 31 ---
 hw/net/e1000e_core.c | 13 -
 hw/net/igb_core.c| 13 ++---
 hw/net/net_tx_pkt.c  | 36 +---
 hw/net/vmxnet3.c | 14 +++---
 5 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index 5eb123ef90..4d7233e975 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -26,17 +26,16 @@
 
 struct NetTxPkt;
 
-typedef void (* NetTxPktCallback)(void *, const struct iovec *, int, const 
struct iovec *, int);
+typedef void (*NetTxPktFreeFrag)(void *, void *, size_t);
+typedef void (*NetTxPktSend)(void *, const struct iovec *, int, const struct 
iovec *, int);
 
 /**
  * Init function for tx packet functionality
  *
  * @pkt:packet pointer
- * @pci_dev:PCI device processing this packet
  * @max_frags:  max tx ip fragments
  */
-void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
-uint32_t max_frags);
+void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags);
 
 /**
  * Clean all tx packet resources.
@@ -95,12 +94,11 @@ net_tx_pkt_setup_vlan_header(struct NetTxPkt *pkt, uint16_t 
vlan)
  * populate data fragment into pkt context.
  *
  * @pkt:packet
- * @pa: physical address of fragment
+ * @base:   pointer to fragment
  * @len:length of fragment
  *
  */
-bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa,
-size_t len);
+bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, void *base, size_t len);
 
 /**
  * Fix ip header fields and calculate IP header and pseudo header checksums.
@@ -148,10 +146,11 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt);
  * reset tx packet private context (needed to be called between packets)
  *
  * @pkt:packet
- * @dev:PCI device processing the next packet
- *
+ * @callback:   function to free the fragments
+ * @context:pointer to be passed to the callback
  */
-void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *dev);
+void net_tx_pkt_reset(struct NetTxPkt *pkt,
+  NetTxPktFreeFrag callback, void *context);
 
 /**
  * Unmap a fragment mapped from a PCI device.
@@ -162,6 +161,16 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice 
*dev);
  */
 void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len);
 
+/**
+ * map data fragment from PCI device and populate it into pkt context.
+ *
+ * @pci_dev:PCI device owning fragment
+ * @pa: physical address of fragment
+ * @len:length of fragment
+ */
+bool net_tx_pkt_add_raw_fragment_pci(struct NetTxPkt *pkt, PCIDevice *pci_dev,
+ dma_addr_t pa, size_t len);
+
 /**
  * Send packet to qemu. handles sw offloads if vhdr is not supported.
  *
@@ -182,7 +191,7 @@ bool net_tx_pkt_send(struct NetTxPkt *pkt, NetClientState 
*nc);
  * @ret:operation result
  */
 bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool offload,
-NetTxPktCallback callback, void *context);
+NetTxPktSend callback, void *context);
 
 /**
  * parse raw packet data and analyze offload requirements.
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index cfa3f55e96..15821a75e0 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -746,7 +746,8 @@ e1000e_process_tx_desc(E1000ECore *core,
 addr = le64_to_cpu(dp->buffer_addr);
 
 if (!tx->skip_cp) {
-if (!net_tx_pkt_add_raw_fragment(tx->tx_pkt, addr, split_size)) {
+if (!net_tx_pkt_add_raw_fragment_pci(tx->tx_pkt, core->owner,
+ addr, split_size)) {
 tx->skip_cp = true;
 }
 }
@@ -764,7 +765,7 @@ e1000e_process_tx_desc(E1000ECore *core,
 }
 
 tx->skip_cp = false;
-net_tx_pkt_reset(tx->tx_pkt, core->owner);
+net_tx_pkt_reset(tx->tx_pkt, net_tx_pkt_unmap_frag_pci, core->owner);
 
 tx->sum_needed = 0;
 tx->cptse = 0;
@@ -3421,7 +3422,7 @@ e1000e_core_pci_realize(E1000ECore *core,
 qemu_add_vm_change_state_handler(e1000e_vm_state_change, core);
 
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
-net_tx_pkt_init(&core->tx[i].tx_pkt, core->owner, E1000E_MAX_TX_FRAGS);
+net_tx_pkt_init(&core->tx[i].tx_pkt, E1000E_MAX_TX_FRAGS);
 }
 
 net_rx_pkt_init(&core->rx_pkt);
@@ -3446,7 +3447,8 @@ e1000e_core_pci_uninit(E1000ECore *core)
 qemu_del_vm_change_state_handler(core->vmstate);
 
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt, core->owner);
+net_tx_pkt_reset(core->tx[i].tx_pkt,
+ net_tx_pkt_unmap_frag_pci, core->owner);
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
 

[PATCH v4 01/48] hw/net/net_tx_pkt: Decouple implementation from PCI

2023-04-26 Thread Akihiko Odaki
This is intended to be followed by another change for the interface.
It also fixes the leak of memory mapping when the specified memory is
partially mapped.

Fixes: e263cd49c7 ("Packet abstraction for VMWARE network devices")
Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.h |  9 
 hw/net/net_tx_pkt.c | 53 -
 2 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index e5ce6f20bc..5eb123ef90 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -153,6 +153,15 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt);
  */
 void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *dev);
 
+/**
+ * Unmap a fragment mapped from a PCI device.
+ *
+ * @context:PCI device owning fragment
+ * @base:   pointer to fragment
+ * @len:length of fragment
+ */
+void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len);
+
 /**
  * Send packet to qemu. handles sw offloads if vhdr is not supported.
  *
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 8dc8568ba2..aca12ff035 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -384,10 +384,9 @@ void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt,
 }
 }
 
-bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa,
-size_t len)
+static bool net_tx_pkt_add_raw_fragment_common(struct NetTxPkt *pkt,
+   void *base, size_t len)
 {
-hwaddr mapped_len = 0;
 struct iovec *ventry;
 assert(pkt);
 
@@ -395,23 +394,12 @@ bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, 
hwaddr pa,
 return false;
 }
 
-if (!len) {
-return true;
- }
-
 ventry = &pkt->raw[pkt->raw_frags];
-mapped_len = len;
+ventry->iov_base = base;
+ventry->iov_len = len;
+pkt->raw_frags++;
 
-ventry->iov_base = pci_dma_map(pkt->pci_dev, pa,
-   &mapped_len, DMA_DIRECTION_TO_DEVICE);
-
-if ((ventry->iov_base != NULL) && (len == mapped_len)) {
-ventry->iov_len = mapped_len;
-pkt->raw_frags++;
-return true;
-} else {
-return false;
-}
+return true;
 }
 
 bool net_tx_pkt_has_fragments(struct NetTxPkt *pkt)
@@ -465,8 +453,9 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice 
*pci_dev)
 assert(pkt->raw);
 for (i = 0; i < pkt->raw_frags; i++) {
 assert(pkt->raw[i].iov_base);
-pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base,
-  pkt->raw[i].iov_len, DMA_DIRECTION_TO_DEVICE, 0);
+net_tx_pkt_unmap_frag_pci(pkt->pci_dev,
+  pkt->raw[i].iov_base,
+  pkt->raw[i].iov_len);
 }
 }
 pkt->pci_dev = pci_dev;
@@ -476,6 +465,30 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice 
*pci_dev)
 pkt->l4proto = 0;
 }
 
+void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len)
+{
+pci_dma_unmap(context, base, len, DMA_DIRECTION_TO_DEVICE, 0);
+}
+
+bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa,
+size_t len)
+{
+dma_addr_t mapped_len = len;
+void *base = pci_dma_map(pkt->pci_dev, pa, &mapped_len,
+ DMA_DIRECTION_TO_DEVICE);
+if (!base) {
+return false;
+}
+
+if (mapped_len != len ||
+!net_tx_pkt_add_raw_fragment_common(pkt, base, len)) {
+net_tx_pkt_unmap_frag_pci(pkt->pci_dev, base, mapped_len);
+return false;
+}
+
+return true;
+}
+
 static void net_tx_pkt_do_sw_csum(struct NetTxPkt *pkt,
   struct iovec *iov, uint32_t iov_len,
   uint16_t csl)
-- 
2.40.0




[PATCH v4 00/48] igb: Fix for DPDK

2023-04-26 Thread Akihiko Odaki
Based-on: <366bbcafdb6e0373f0deb105153768a8c0bded87.ca...@gmail.com>
("[PATCH 0/1] e1000e: Fix tx/rx counters")

This series has fixes and feature additions to pass DPDK Test Suite with
igb. It also includes a few minor changes related to networking.

Patch [01, 10] are bug fixes.
Patch [11, 14] delete code which is unnecessary.
Patch [15, 33] are minor changes.
Patch [34, 46] implement new features.
Patch [47, 48] update documentations.

While this includes so many patches, it is not necessary to land them at
once. Only bug fix patches may be applied first, for example.

V3 -> V4:
- Renamed "min_buf variable to "buf". (Sriram Yagnaraman)
- Added patch "igb: Clear-on-read ICR when ICR.INTA is set".
  (Sriram Yagnaraman)

V2 -> V3:
- Fixed parameter name in hw/net/net_tx_pkt. (Philippe Mathieu-Daudé)
- Added patch "igb: Clear IMS bits when committing ICR access".
- Added patch "igb: Clear EICR bits for delayed MSI-X interrupts".
- Added patch "e1000e: Rename a variable in e1000e_receive_internal()".
- Added patch "igb: Rename a variable in igb_receive_internal()".
- Added patch "e1000e: Notify only new interrupts".
- Added patch "igb: Notify only new interrupts".

V1 -> V2:
- Dropped patch "Include the second VLAN tag in the buffer". The second
  VLAN tag is not used at the point and unecessary.
- Added patch "e1000x: Rename TcpIpv6 into TcpIpv6Ex".
- Split patch "hw/net/net_tx_pkt: Decouple from PCI".
  (Philippe Mathieu-Daudé)
- Added advanced Rx descriptor packet encoding definitions.
  (Sriram Yagnaraman)
- Added some constants to eth.h to derive packet oversize thresholds.
- Added IGB_TX_FLAGS_VLAN_SHIFT usage.
- Renamed patch "igb: Fix igb_mac_reg_init alignment".
  (Philippe Mathieu-Daudé)
- Fixed size check for packets with double VLAN. (Sriram Yagnaraman)
- Fixed timing to timestamp Tx packet.

Akihiko Odaki (48):
  hw/net/net_tx_pkt: Decouple implementation from PCI
  hw/net/net_tx_pkt: Decouple interface from PCI
  e1000x: Fix BPRC and MPRC
  igb: Fix Rx packet type encoding
  igb: Do not require CTRL.VME for tx VLAN tagging
  igb: Clear IMS bits when committing ICR access
  net/net_rx_pkt: Use iovec for net_rx_pkt_set_protocols()
  e1000e: Always copy ethernet header
  igb: Always copy ethernet header
  Fix references to igb Avocado test
  tests/avocado: Remove unused imports
  tests/avocado: Remove test_igb_nomsi_kvm
  hw/net/net_tx_pkt: Remove net_rx_pkt_get_l4_info
  net/eth: Rename eth_setup_vlan_headers_ex
  e1000x: Share more Rx filtering logic
  e1000x: Take CRC into consideration for size check
  e1000x: Rename TcpIpv6 into TcpIpv6Ex
  e1000e: Always log status after building rx metadata
  igb: Always log status after building rx metadata
  igb: Remove goto
  igb: Read DCMD.VLE of the first Tx descriptor
  e1000e: Reset packet state after emptying Tx queue
  vmxnet3: Reset packet state after emptying Tx queue
  igb: Add more definitions for Tx descriptor
  igb: Share common VF constants
  igb: Fix igb_mac_reg_init coding style alignment
  igb: Clear EICR bits for delayed MSI-X interrupts
  e1000e: Rename a variable in e1000e_receive_internal()
  igb: Rename a variable in igb_receive_internal()
  net/eth: Use void pointers
  net/eth: Always add VLAN tag
  hw/net/net_rx_pkt: Enforce alignment for eth_header
  tests/qtest/libqos/igb: Set GPIE.Multiple_MSIX
  igb: Implement MSI-X single vector mode
  igb: Use UDP for RSS hash
  igb: Implement Rx SCTP CSO
  igb: Implement Tx SCTP CSO
  igb: Strip the second VLAN tag for extended VLAN
  igb: Filter with the second VLAN tag for extended VLAN
  igb: Implement igb-specific oversize check
  igb: Implement Rx PTP2 timestamp
  igb: Implement Tx timestamp
  e1000e: Notify only new interrupts
  igb: Notify only new interrupts
  igb: Clear-on-read ICR when ICR.INTA is set
  vmxnet3: Do not depend on PC
  MAINTAINERS: Add a reviewer for network packet abstractions
  docs/system/devices/igb: Note igb is tested for DPDK

 MAINTAINERS   |   3 +-
 docs/system/devices/igb.rst   |  14 +-
 hw/net/e1000e_core.h  |   2 -
 hw/net/e1000x_common.h|   9 +-
 hw/net/e1000x_regs.h  |  24 +-
 hw/net/igb_common.h   |  24 +-
 hw/net/igb_regs.h |  67 +-
 hw/net/net_rx_pkt.h   |  38 +-
 hw/net/net_tx_pkt.h   |  46 +-
 include/net/eth.h |  29 +-
 include/qemu/crc32c.h |   1 +
 hw/net/e1000.c|  41 +-
 hw/net/e1000e_core.c  | 292 +++
 hw/net/e1000x_common.c|  79 +-
 hw/net/igb.c  |  10 +-
 hw/net/igb_core.c | 717 ++
 hw/net/igbvf.c|   7 -
 hw/net/net_rx_pkt.c   | 107 ++-
 hw/net/net_tx_pkt.c   

[PATCH v4 07/48] net/net_rx_pkt: Use iovec for net_rx_pkt_set_protocols()

2023-04-26 Thread Akihiko Odaki
igb does not properly ensure the buffer passed to
net_rx_pkt_set_protocols() is contiguous for the entire L2/L3/L4 header.
Allow it to pass scattered data to net_rx_pkt_set_protocols().

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/net_rx_pkt.h | 10 ++
 include/net/eth.h   |  6 +++---
 hw/net/igb_core.c   |  2 +-
 hw/net/net_rx_pkt.c | 14 +-
 hw/net/virtio-net.c |  7 +--
 hw/net/vmxnet3.c|  7 ++-
 net/eth.c   | 18 --
 7 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
index d00b484900..a06f5c2675 100644
--- a/hw/net/net_rx_pkt.h
+++ b/hw/net/net_rx_pkt.h
@@ -55,12 +55,14 @@ size_t net_rx_pkt_get_total_len(struct NetRxPkt *pkt);
  * parse and set packet analysis results
  *
  * @pkt:packet
- * @data:   pointer to the data buffer to be parsed
- * @len:data length
+ * @iov:received data scatter-gather list
+ * @iovcnt: number of elements in iov
+ * @iovoff: data start offset in the iov
  *
  */
-void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, const void *data,
-  size_t len);
+void net_rx_pkt_set_protocols(struct NetRxPkt *pkt,
+  const struct iovec *iov, size_t iovcnt,
+  size_t iovoff);
 
 /**
  * fetches packet analysis results
diff --git a/include/net/eth.h b/include/net/eth.h
index c5ae4493b4..9f19c3a695 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -312,10 +312,10 @@ eth_get_l2_hdr_length(const void *p)
 }
 
 static inline uint32_t
-eth_get_l2_hdr_length_iov(const struct iovec *iov, int iovcnt)
+eth_get_l2_hdr_length_iov(const struct iovec *iov, size_t iovcnt, size_t 
iovoff)
 {
 uint8_t p[sizeof(struct eth_header) + sizeof(struct vlan_header)];
-size_t copied = iov_to_buf(iov, iovcnt, 0, p, ARRAY_SIZE(p));
+size_t copied = iov_to_buf(iov, iovcnt, iovoff, p, ARRAY_SIZE(p));
 
 if (copied < ARRAY_SIZE(p)) {
 return copied;
@@ -397,7 +397,7 @@ typedef struct eth_l4_hdr_info_st {
 bool has_tcp_data;
 } eth_l4_hdr_info;
 
-void eth_get_protocols(const struct iovec *iov, int iovcnt,
+void eth_get_protocols(const struct iovec *iov, size_t iovcnt, size_t iovoff,
bool *hasip4, bool *hasip6,
size_t *l3hdr_off,
size_t *l4hdr_off,
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index eaca5bd2b6..21a8d9ada4 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1650,7 +1650,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 
 ehdr = PKT_GET_ETH_HDR(filter_buf);
 net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(ehdr));
-net_rx_pkt_set_protocols(core->rx_pkt, filter_buf, size);
+net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs);
 
 queues = igb_receive_assign(core, ehdr, size, &rss_info, external_tx);
 if (!queues) {
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 39cdea06de..63be6e05ad 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -103,7 +103,7 @@ net_rx_pkt_pull_data(struct NetRxPkt *pkt,
 iov, iovcnt, ploff, pkt->tot_len);
 }
 
-eth_get_protocols(pkt->vec, pkt->vec_len, &pkt->hasip4, &pkt->hasip6,
+eth_get_protocols(pkt->vec, pkt->vec_len, 0, &pkt->hasip4, &pkt->hasip6,
   &pkt->l3hdr_off, &pkt->l4hdr_off, &pkt->l5hdr_off,
   &pkt->ip6hdr_info, &pkt->ip4hdr_info, &pkt->l4hdr_info);
 
@@ -186,17 +186,13 @@ size_t net_rx_pkt_get_total_len(struct NetRxPkt *pkt)
 return pkt->tot_len;
 }
 
-void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, const void *data,
-  size_t len)
+void net_rx_pkt_set_protocols(struct NetRxPkt *pkt,
+  const struct iovec *iov, size_t iovcnt,
+  size_t iovoff)
 {
-const struct iovec iov = {
-.iov_base = (void *)data,
-.iov_len = len
-};
-
 assert(pkt);
 
-eth_get_protocols(&iov, 1, &pkt->hasip4, &pkt->hasip6,
+eth_get_protocols(iov, iovcnt, iovoff, &pkt->hasip4, &pkt->hasip6,
   &pkt->l3hdr_off, &pkt->l4hdr_off, &pkt->l5hdr_off,
   &pkt->ip6hdr_info, &pkt->ip4hdr_info, &pkt->l4hdr_info);
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 53e1c32643..37551fd854 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1835,9 +1835,12 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
 VIRTIO_NET_HASH_REPORT_UDPv6,
 VIRTIO_NET_HASH_REPORT_UDPv6_EX
 };
+struct iovec iov = {
+.iov_base = (void *)buf,
+.iov_len = size
+};
 
-net_rx_pkt_set_protocols(pkt, buf + n->host_hdr_len,
- si

[PATCH v4 06/48] igb: Clear IMS bits when committing ICR access

2023-04-26 Thread Akihiko Odaki
The datasheet says contradicting statements regarding ICR accesses so it
is not reliable to determine the behavior of ICR accesses. However,
e1000e does clear IMS bits when reading ICR accesses and Linux also
expects ICR accesses will clear IMS bits according to:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/igb/igb_main.c?h=v6.2#n8048

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 96a118b6c1..eaca5bd2b6 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -2452,16 +2452,16 @@ igb_set_ims(IGBCore *core, int index, uint32_t val)
 static void igb_commit_icr(IGBCore *core)
 {
 /*
- * If GPIE.NSICR = 0, then the copy of IAM to IMS will occur only if at
+ * If GPIE.NSICR = 0, then the clear of IMS will occur only if at
  * least one bit is set in the IMS and there is a true interrupt as
  * reflected in ICR.INTA.
  */
 if ((core->mac[GPIE] & E1000_GPIE_NSICR) ||
 (core->mac[IMS] && (core->mac[ICR] & E1000_ICR_INT_ASSERTED))) {
-igb_set_ims(core, IMS, core->mac[IAM]);
-} else {
-igb_update_interrupt_state(core);
+igb_clear_ims_bits(core, core->mac[IAM]);
 }
+
+igb_update_interrupt_state(core);
 }
 
 static void igb_set_icr(IGBCore *core, int index, uint32_t val)
-- 
2.40.0




[PATCH v4 17/48] e1000x: Rename TcpIpv6 into TcpIpv6Ex

2023-04-26 Thread Akihiko Odaki
e1000e and igb employs NetPktRssIpV6TcpEx for RSS hash if TcpIpv6 MRQC
bit is set. Moreover, igb also has a MRQC bit for NetPktRssIpV6Tcp
though it is not implemented yet. Rename it to TcpIpv6Ex to avoid
confusion.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/e1000x_regs.h | 24 
 hw/net/e1000e_core.c |  8 
 hw/net/igb_core.c|  8 
 hw/net/trace-events  |  2 +-
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/hw/net/e1000x_regs.h b/hw/net/e1000x_regs.h
index 6d3c4c6d3a..13760c66d3 100644
--- a/hw/net/e1000x_regs.h
+++ b/hw/net/e1000x_regs.h
@@ -290,18 +290,18 @@
 #define E1000_RETA_IDX(hash)((hash) & (BIT(7) - 1))
 #define E1000_RETA_VAL(reta, hash)  (((uint8_t *)(reta))[E1000_RETA_IDX(hash)])
 
-#define E1000_MRQC_EN_TCPIPV4(mrqc) ((mrqc) & BIT(16))
-#define E1000_MRQC_EN_IPV4(mrqc)((mrqc) & BIT(17))
-#define E1000_MRQC_EN_TCPIPV6(mrqc) ((mrqc) & BIT(18))
-#define E1000_MRQC_EN_IPV6EX(mrqc)  ((mrqc) & BIT(19))
-#define E1000_MRQC_EN_IPV6(mrqc)((mrqc) & BIT(20))
-
-#define E1000_MRQ_RSS_TYPE_NONE (0)
-#define E1000_MRQ_RSS_TYPE_IPV4TCP  (1)
-#define E1000_MRQ_RSS_TYPE_IPV4 (2)
-#define E1000_MRQ_RSS_TYPE_IPV6TCP  (3)
-#define E1000_MRQ_RSS_TYPE_IPV6EX   (4)
-#define E1000_MRQ_RSS_TYPE_IPV6 (5)
+#define E1000_MRQC_EN_TCPIPV4(mrqc)   ((mrqc) & BIT(16))
+#define E1000_MRQC_EN_IPV4(mrqc)  ((mrqc) & BIT(17))
+#define E1000_MRQC_EN_TCPIPV6EX(mrqc) ((mrqc) & BIT(18))
+#define E1000_MRQC_EN_IPV6EX(mrqc)((mrqc) & BIT(19))
+#define E1000_MRQC_EN_IPV6(mrqc)  ((mrqc) & BIT(20))
+
+#define E1000_MRQ_RSS_TYPE_NONE   (0)
+#define E1000_MRQ_RSS_TYPE_IPV4TCP(1)
+#define E1000_MRQ_RSS_TYPE_IPV4   (2)
+#define E1000_MRQ_RSS_TYPE_IPV6TCPEX  (3)
+#define E1000_MRQ_RSS_TYPE_IPV6EX (4)
+#define E1000_MRQ_RSS_TYPE_IPV6   (5)
 
 #define E1000_ICR_ASSERTED BIT(31)
 #define E1000_EIAC_MASK0x01F0
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 41d2435074..38d465a203 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -537,7 +537,7 @@ e1000e_rss_get_hash_type(E1000ECore *core, struct NetRxPkt 
*pkt)
 ip6info->rss_ex_dst_valid,
 ip6info->rss_ex_src_valid,
 core->mac[MRQC],
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC]),
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6(core->mac[MRQC]));
 
@@ -546,8 +546,8 @@ e1000e_rss_get_hash_type(E1000ECore *core, struct NetRxPkt 
*pkt)
   ip6info->rss_ex_src_valid))) {
 
 if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP &&
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC])) {
-return E1000_MRQ_RSS_TYPE_IPV6TCP;
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC])) {
+return E1000_MRQ_RSS_TYPE_IPV6TCPEX;
 }
 
 if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) {
@@ -581,7 +581,7 @@ e1000e_rss_calc_hash(E1000ECore *core,
 case E1000_MRQ_RSS_TYPE_IPV4TCP:
 type = NetPktRssIpV4Tcp;
 break;
-case E1000_MRQ_RSS_TYPE_IPV6TCP:
+case E1000_MRQ_RSS_TYPE_IPV6TCPEX:
 type = NetPktRssIpV6TcpEx;
 break;
 case E1000_MRQ_RSS_TYPE_IPV6:
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 934db3c3e5..209fdad862 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -301,7 +301,7 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
 ip6info->rss_ex_dst_valid,
 ip6info->rss_ex_src_valid,
 core->mac[MRQC],
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC]),
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6(core->mac[MRQC]));
 
@@ -310,8 +310,8 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
   ip6info->rss_ex_src_valid))) {
 
 if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP &&
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC])) {
-return E1000_MRQ_RSS_TYPE_IPV6TCP;
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC])) {
+return E1000_MRQ_RSS_TYPE_IPV6TCPEX;
 }
 
 if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) {
@@ -343,7 +343,7 @@ igb_rss_calc_hash(IGBCore *core, struct NetRxPkt *pkt, 
E1000E_RSSInfo *info)
 case E1000_MRQ_RSS_TYPE_IPV4TCP:
 type = NetPktRssIpV4Tcp;
 break;
-case E1000_MRQ_RSS_TYPE_IPV6TCP:
+case E1000_MRQ_RSS_TYPE_IPV6TCPEX:
 type = NetPktRssIpV6TcpEx;
 break;
 case E1000_MRQ_RSS_TYPE_IPV6:

[PATCH v4 21/48] igb: Read DCMD.VLE of the first Tx descriptor

2023-04-26 Thread Akihiko Odaki
Section 7.2.2.3 Advanced Transmit Data Descriptor says:
> For frames that spans multiple descriptors, all fields apart from
> DCMD.EOP, DCMD.RS, DCMD.DEXT, DTALEN, Address and DTYP are valid only
> in the first descriptors and are ignored in the subsequent ones.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index bae51cbb63..162ef26789 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -613,7 +613,7 @@ igb_process_tx_desc(IGBCore *core,
 idx = (tx->first_olinfo_status >> 4) & 1;
 igb_tx_insert_vlan(core, queue_index, tx,
 tx->ctx[idx].vlan_macip_lens >> 16,
-!!(cmd_type_len & E1000_TXD_CMD_VLE));
+!!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE));
 
 if (igb_tx_pkt_send(core, tx, queue_index)) {
 igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index);
-- 
2.40.0




[PATCH v4 09/48] igb: Always copy ethernet header

2023-04-26 Thread Akihiko Odaki
igb_receive_internal() used to check the iov length to determine
copy the iovs to a contiguous buffer, but the check is flawed in two
ways:
- It does not ensure that iovcnt > 0.
- It does not take virtio-net header into consideration.

The size of this copy is just 22 octets, which can be even less than
the code size required for checks. This (wrong) optimization is probably
not worth so just remove it. Removing this also allows igb to assume
aligned accesses for the ethernet header.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 21a8d9ada4..1123df9e77 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -67,6 +67,11 @@ typedef struct IGBTxPktVmdqCallbackContext {
 NetClientState *nc;
 } IGBTxPktVmdqCallbackContext;
 
+typedef struct L2Header {
+struct eth_header eth;
+struct vlan_header vlan;
+} L2Header;
+
 static ssize_t
 igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt,
  bool has_vnet, bool *external_tx);
@@ -961,15 +966,16 @@ igb_rx_is_oversized(IGBCore *core, uint16_t qn, size_t 
size)
 return size > (lpe ? max_ethernet_lpe_size : max_ethernet_vlan_size);
 }
 
-static uint16_t igb_receive_assign(IGBCore *core, const struct eth_header 
*ehdr,
+static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header,
size_t size, E1000E_RSSInfo *rss_info,
bool *external_tx)
 {
 static const int ta_shift[] = { 4, 3, 2, 0 };
+const struct eth_header *ehdr = &l2_header->eth;
 uint32_t f, ra[2], *macp, rctl = core->mac[RCTL];
 uint16_t queues = 0;
 uint16_t oversized = 0;
-uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(ehdr)->h_tci) & VLAN_VID_MASK;
+uint16_t vid = be16_to_cpu(l2_header->vlan.h_tci) & VLAN_VID_MASK;
 bool accepted = false;
 int i;
 
@@ -1590,14 +1596,13 @@ static ssize_t
 igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt,
  bool has_vnet, bool *external_tx)
 {
-static const int maximum_ethernet_hdr_len = (ETH_HLEN + 4);
-
 uint16_t queues = 0;
 uint32_t n = 0;
-uint8_t min_buf[ETH_ZLEN];
+union {
+L2Header l2_header;
+uint8_t octets[ETH_ZLEN];
+} buf;
 struct iovec min_iov;
-struct eth_header *ehdr;
-uint8_t *filter_buf;
 size_t size, orig_size;
 size_t iov_ofs = 0;
 E1000E_RxRing rxr;
@@ -1623,24 +1628,21 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 net_rx_pkt_unset_vhdr(core->rx_pkt);
 }
 
-filter_buf = iov->iov_base + iov_ofs;
 orig_size = iov_size(iov, iovcnt);
 size = orig_size - iov_ofs;
 
 /* Pad to minimum Ethernet frame length */
-if (size < sizeof(min_buf)) {
-iov_to_buf(iov, iovcnt, iov_ofs, min_buf, size);
-memset(&min_buf[size], 0, sizeof(min_buf) - size);
+if (size < sizeof(buf)) {
+iov_to_buf(iov, iovcnt, iov_ofs, &buf, size);
+memset(&buf.octets[size], 0, sizeof(buf) - size);
 e1000x_inc_reg_if_not_full(core->mac, RUC);
-min_iov.iov_base = filter_buf = min_buf;
-min_iov.iov_len = size = sizeof(min_buf);
+min_iov.iov_base = &buf;
+min_iov.iov_len = size = sizeof(buf);
 iovcnt = 1;
 iov = &min_iov;
 iov_ofs = 0;
-} else if (iov->iov_len < maximum_ethernet_hdr_len) {
-/* This is very unlikely, but may happen. */
-iov_to_buf(iov, iovcnt, iov_ofs, min_buf, maximum_ethernet_hdr_len);
-filter_buf = min_buf;
+} else {
+iov_to_buf(iov, iovcnt, iov_ofs, &buf, sizeof(buf.l2_header));
 }
 
 /* Discard oversized packets if !LPE and !SBP. */
@@ -1648,11 +1650,12 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 return orig_size;
 }
 
-ehdr = PKT_GET_ETH_HDR(filter_buf);
-net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(ehdr));
+net_rx_pkt_set_packet_type(core->rx_pkt,
+   get_eth_packet_type(&buf.l2_header.eth));
 net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs);
 
-queues = igb_receive_assign(core, ehdr, size, &rss_info, external_tx);
+queues = igb_receive_assign(core, &buf.l2_header, size,
+&rss_info, external_tx);
 if (!queues) {
 trace_e1000e_rx_flt_dropped();
 return orig_size;
-- 
2.40.0




[PATCH v4 13/48] hw/net/net_tx_pkt: Remove net_rx_pkt_get_l4_info

2023-04-26 Thread Akihiko Odaki
This function is not used.

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_rx_pkt.h | 9 -
 hw/net/net_rx_pkt.c | 5 -
 2 files changed, 14 deletions(-)

diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
index a06f5c2675..ce8dbdb284 100644
--- a/hw/net/net_rx_pkt.h
+++ b/hw/net/net_rx_pkt.h
@@ -119,15 +119,6 @@ eth_ip6_hdr_info *net_rx_pkt_get_ip6_info(struct NetRxPkt 
*pkt);
  */
 eth_ip4_hdr_info *net_rx_pkt_get_ip4_info(struct NetRxPkt *pkt);
 
-/**
- * fetches L4 header analysis results
- *
- * Return:  pointer to analysis results structure which is stored in internal
- *  packet area.
- *
- */
-eth_l4_hdr_info *net_rx_pkt_get_l4_info(struct NetRxPkt *pkt);
-
 typedef enum {
 NetPktRssIpV4,
 NetPktRssIpV4Tcp,
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 63be6e05ad..6125a063d7 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -236,11 +236,6 @@ eth_ip4_hdr_info *net_rx_pkt_get_ip4_info(struct NetRxPkt 
*pkt)
 return &pkt->ip4hdr_info;
 }
 
-eth_l4_hdr_info *net_rx_pkt_get_l4_info(struct NetRxPkt *pkt)
-{
-return &pkt->l4hdr_info;
-}
-
 static inline void
 _net_rx_rss_add_chunk(uint8_t *rss_input, size_t *bytes_written,
   void *ptr, size_t size)
-- 
2.40.0




[PATCH v4 11/48] tests/avocado: Remove unused imports

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
---
 tests/avocado/netdev-ethtool.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/avocado/netdev-ethtool.py b/tests/avocado/netdev-ethtool.py
index f7e9464184..8de118e313 100644
--- a/tests/avocado/netdev-ethtool.py
+++ b/tests/avocado/netdev-ethtool.py
@@ -7,7 +7,6 @@
 
 from avocado import skip
 from avocado_qemu import QemuSystemTest
-from avocado_qemu import exec_command, exec_command_and_wait_for_pattern
 from avocado_qemu import wait_for_console_pattern
 
 class NetDevEthtool(QemuSystemTest):
-- 
2.40.0




[PATCH v4 08/48] e1000e: Always copy ethernet header

2023-04-26 Thread Akihiko Odaki
e1000e_receive_internal() used to check the iov length to determine
copy the iovs to a contiguous buffer, but the check is flawed in two
ways:
- It does not ensure that iovcnt > 0.
- It does not take virtio-net header into consideration.

The size of this copy is just 18 octets, which can be even less than
the code size required for checks. This (wrong) optimization is probably
not worth so just remove it.

Fixes: 6f3fbe4ed0 ("net: Introduce e1000e device emulation")
Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index c2d864a504..14b94db59c 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1686,12 +1686,9 @@ static ssize_t
 e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt,
 bool has_vnet)
 {
-static const int maximum_ethernet_hdr_len = (ETH_HLEN + 4);
-
 uint32_t n = 0;
-uint8_t min_buf[ETH_ZLEN];
+uint8_t buf[ETH_ZLEN];
 struct iovec min_iov;
-uint8_t *filter_buf;
 size_t size, orig_size;
 size_t iov_ofs = 0;
 E1000E_RxRing rxr;
@@ -1714,24 +1711,21 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 net_rx_pkt_unset_vhdr(core->rx_pkt);
 }
 
-filter_buf = iov->iov_base + iov_ofs;
 orig_size = iov_size(iov, iovcnt);
 size = orig_size - iov_ofs;
 
 /* Pad to minimum Ethernet frame length */
-if (size < sizeof(min_buf)) {
-iov_to_buf(iov, iovcnt, iov_ofs, min_buf, size);
-memset(&min_buf[size], 0, sizeof(min_buf) - size);
+if (size < sizeof(buf)) {
+iov_to_buf(iov, iovcnt, iov_ofs, buf, size);
+memset(&buf[size], 0, sizeof(buf) - size);
 e1000x_inc_reg_if_not_full(core->mac, RUC);
-min_iov.iov_base = filter_buf = min_buf;
-min_iov.iov_len = size = sizeof(min_buf);
+min_iov.iov_base = buf;
+min_iov.iov_len = size = sizeof(buf);
 iovcnt = 1;
 iov = &min_iov;
 iov_ofs = 0;
-} else if (iov->iov_len < maximum_ethernet_hdr_len) {
-/* This is very unlikely, but may happen. */
-iov_to_buf(iov, iovcnt, iov_ofs, min_buf, maximum_ethernet_hdr_len);
-filter_buf = min_buf;
+} else {
+iov_to_buf(iov, iovcnt, iov_ofs, buf, ETH_HLEN + 4);
 }
 
 /* Discard oversized packets if !LPE and !SBP. */
@@ -1740,9 +1734,9 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 }
 
 net_rx_pkt_set_packet_type(core->rx_pkt,
-get_eth_packet_type(PKT_GET_ETH_HDR(filter_buf)));
+get_eth_packet_type(PKT_GET_ETH_HDR(buf)));
 
-if (!e1000e_receive_filter(core, filter_buf, size)) {
+if (!e1000e_receive_filter(core, buf, size)) {
 trace_e1000e_rx_flt_dropped();
 return orig_size;
 }
-- 
2.40.0




[PATCH v4 18/48] e1000e: Always log status after building rx metadata

2023-04-26 Thread Akihiko Odaki
Without this change, the status flags may not be traced e.g. if checksum
offloading is disabled.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/e1000e_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 38d465a203..6a213c0224 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1244,9 +1244,8 @@ e1000e_build_rx_metadata(E1000ECore *core,
 trace_e1000e_rx_metadata_l4_cso_disabled();
 }
 
-trace_e1000e_rx_metadata_status_flags(*status_flags);
-
 func_exit:
+trace_e1000e_rx_metadata_status_flags(*status_flags);
 *status_flags = cpu_to_le32(*status_flags);
 }
 
-- 
2.40.0




[PATCH v4 25/48] igb: Share common VF constants

2023-04-26 Thread Akihiko Odaki
The constants need to be consistent between the PF and VF.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_common.h |  8 
 hw/net/igb.c| 10 +-
 hw/net/igbvf.c  |  7 ---
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/hw/net/igb_common.h b/hw/net/igb_common.h
index 69ac490f75..f2a9065791 100644
--- a/hw/net/igb_common.h
+++ b/hw/net/igb_common.h
@@ -28,6 +28,14 @@
 
 #include "igb_regs.h"
 
+#define TYPE_IGBVF "igbvf"
+
+#define IGBVF_MMIO_BAR_IDX  (0)
+#define IGBVF_MSIX_BAR_IDX  (3)
+
+#define IGBVF_MMIO_SIZE (16 * 1024)
+#define IGBVF_MSIX_SIZE (16 * 1024)
+
 #define defreg(x) x = (E1000_##x >> 2)
 #define defreg_indexed(x, i) x##i = (E1000_##x(i) >> 2)
 #define defreg_indexeda(x, i) x##i##_A = (E1000_##x##_A(i) >> 2)
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 51a7e9133e..1c989d7677 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -433,16 +433,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 pcie_ari_init(pci_dev, 0x150, 1);
 
-pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, "igbvf",
+pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
 IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
 IGB_VF_OFFSET, IGB_VF_STRIDE);
 
-pcie_sriov_pf_init_vf_bar(pci_dev, 0,
+pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
 PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
-16 * KiB);
-pcie_sriov_pf_init_vf_bar(pci_dev, 3,
+IGBVF_MMIO_SIZE);
+pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MSIX_BAR_IDX,
 PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
-16 * KiB);
+IGBVF_MSIX_SIZE);
 
 igb_init_net_peer(s, pci_dev, macaddr);
 
diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c
index 70beb7af50..284ea61184 100644
--- a/hw/net/igbvf.c
+++ b/hw/net/igbvf.c
@@ -50,15 +50,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 
-#define TYPE_IGBVF "igbvf"
 OBJECT_DECLARE_SIMPLE_TYPE(IgbVfState, IGBVF)
 
-#define IGBVF_MMIO_BAR_IDX  (0)
-#define IGBVF_MSIX_BAR_IDX  (3)
-
-#define IGBVF_MMIO_SIZE (16 * 1024)
-#define IGBVF_MSIX_SIZE (16 * 1024)
-
 struct IgbVfState {
 PCIDevice parent_obj;
 
-- 
2.40.0




[PATCH v4 28/48] e1000e: Rename a variable in e1000e_receive_internal()

2023-04-26 Thread Akihiko Odaki
Rename variable "n" to "causes", which properly represents the content
of the variable.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 7dce448657..aea70b74d9 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1650,7 +1650,7 @@ static ssize_t
 e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt,
 bool has_vnet)
 {
-uint32_t n = 0;
+uint32_t causes = 0;
 uint8_t buf[ETH_ZLEN];
 struct iovec min_iov;
 size_t size, orig_size;
@@ -1723,32 +1723,32 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 
 /* Perform small receive detection (RSRPD) */
 if (total_size < core->mac[RSRPD]) {
-n |= E1000_ICS_SRPD;
+causes |= E1000_ICS_SRPD;
 }
 
 /* Perform ACK receive detection */
 if  (!(core->mac[RFCTL] & E1000_RFCTL_ACK_DIS) &&
  (e1000e_is_tcp_ack(core, core->rx_pkt))) {
-n |= E1000_ICS_ACK;
+causes |= E1000_ICS_ACK;
 }
 
 /* Check if receive descriptor minimum threshold hit */
 rdmts_hit = e1000e_rx_descr_threshold_hit(core, rxr.i);
-n |= e1000e_rx_wb_interrupt_cause(core, rxr.i->idx, rdmts_hit);
+causes |= e1000e_rx_wb_interrupt_cause(core, rxr.i->idx, rdmts_hit);
 
 trace_e1000e_rx_written_to_guest(rxr.i->idx);
 } else {
-n |= E1000_ICS_RXO;
+causes |= E1000_ICS_RXO;
 retval = 0;
 
 trace_e1000e_rx_not_written_to_guest(rxr.i->idx);
 }
 
-if (!e1000e_intrmgr_delay_rx_causes(core, &n)) {
-trace_e1000e_rx_interrupt_set(n);
-e1000e_set_interrupt_cause(core, n);
+if (!e1000e_intrmgr_delay_rx_causes(core, &causes)) {
+trace_e1000e_rx_interrupt_set(causes);
+e1000e_set_interrupt_cause(core, causes);
 } else {
-trace_e1000e_rx_interrupt_delayed(n);
+trace_e1000e_rx_interrupt_delayed(causes);
 }
 
 return retval;
-- 
2.40.0




[PATCH v4 27/48] igb: Clear EICR bits for delayed MSI-X interrupts

2023-04-26 Thread Akihiko Odaki
Section 7.3.4.1 says:
> When auto-clear is enabled for an interrupt cause, the EICR bit is
> set when a cause event mapped to this vector occurs. When the EITR
> Counter reaches zero, the MSI-X message is sent on PCIe. Then the
> EICR bit is cleared and enabled to be set by a new cause event

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 20645c4764..edda07e564 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -97,23 +97,31 @@ igb_lower_legacy_irq(IGBCore *core)
 pci_set_irq(core->owner, 0);
 }
 
-static void igb_msix_notify(IGBCore *core, unsigned int vector)
+static void igb_msix_notify(IGBCore *core, unsigned int cause)
 {
 PCIDevice *dev = core->owner;
 uint16_t vfn;
+uint32_t effective_eiac;
+unsigned int vector;
 
-vfn = 8 - (vector + 2) / IGBVF_MSIX_VEC_NUM;
+vfn = 8 - (cause + 2) / IGBVF_MSIX_VEC_NUM;
 if (vfn < pcie_sriov_num_vfs(core->owner)) {
 dev = pcie_sriov_get_vf_at_index(core->owner, vfn);
 assert(dev);
-vector = (vector + 2) % IGBVF_MSIX_VEC_NUM;
-} else if (vector >= IGB_MSIX_VEC_NUM) {
+vector = (cause + 2) % IGBVF_MSIX_VEC_NUM;
+} else if (cause >= IGB_MSIX_VEC_NUM) {
 qemu_log_mask(LOG_GUEST_ERROR,
   "igb: Tried to use vector unavailable for PF");
 return;
+} else {
+vector = cause;
 }
 
 msix_notify(dev, vector);
+
+trace_e1000e_irq_icr_clear_eiac(core->mac[EICR], core->mac[EIAC]);
+effective_eiac = core->mac[EIAC] & BIT(cause);
+core->mac[EICR] &= ~effective_eiac;
 }
 
 static inline void
@@ -1834,7 +1842,6 @@ igb_eitr_should_postpone(IGBCore *core, int idx)
 static void igb_send_msix(IGBCore *core)
 {
 uint32_t causes = core->mac[EICR] & core->mac[EIMS];
-uint32_t effective_eiac;
 int vector;
 
 for (vector = 0; vector < IGB_INTR_NUM; ++vector) {
@@ -1842,10 +1849,6 @@ static void igb_send_msix(IGBCore *core)
 
 trace_e1000e_irq_msix_notify_vec(vector);
 igb_msix_notify(core, vector);
-
-trace_e1000e_irq_icr_clear_eiac(core->mac[EICR], core->mac[EIAC]);
-effective_eiac = core->mac[EIAC] & BIT(vector);
-core->mac[EICR] &= ~effective_eiac;
 }
 }
 }
-- 
2.40.0




[PATCH v4 37/48] igb: Implement Tx SCTP CSO

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/net_tx_pkt.h |  8 
 hw/net/igb_core.c   | 12 +++-
 hw/net/net_tx_pkt.c | 18 ++
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index 4d7233e975..0a716e74a5 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -116,6 +116,14 @@ void net_tx_pkt_update_ip_checksums(struct NetTxPkt *pkt);
  */
 void net_tx_pkt_update_ip_hdr_checksum(struct NetTxPkt *pkt);
 
+/**
+ * Calculate the SCTP checksum.
+ *
+ * @pkt:packet
+ *
+ */
+bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt);
+
 /**
  * get length of all populated data.
  *
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 95d46d6e6d..5eacf1cd8c 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -440,8 +440,9 @@ igb_tx_insert_vlan(IGBCore *core, uint16_t qn, struct 
igb_tx *tx,
 static bool
 igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 {
+uint32_t idx = (tx->first_olinfo_status >> 4) & 1;
+
 if (tx->first_cmd_type_len & E1000_ADVTXD_DCMD_TSE) {
-uint32_t idx = (tx->first_olinfo_status >> 4) & 1;
 uint32_t mss = tx->ctx[idx].mss_l4len_idx >> E1000_ADVTXD_MSS_SHIFT;
 if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, mss)) {
 return false;
@@ -452,10 +453,11 @@ igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 return true;
 }
 
-if (tx->first_olinfo_status & E1000_ADVTXD_POTS_TXSM) {
-if (!net_tx_pkt_build_vheader(tx->tx_pkt, false, true, 0)) {
-return false;
-}
+if ((tx->first_olinfo_status & E1000_ADVTXD_POTS_TXSM) &&
+!((tx->ctx[idx].type_tucmd_mlhl & E1000_ADVTXD_TUCMD_L4T_SCTP) ?
+  net_tx_pkt_update_sctp_checksum(tx->tx_pkt) :
+  net_tx_pkt_build_vheader(tx->tx_pkt, false, true, 0))) {
+return false;
 }
 
 if (tx->first_olinfo_status & E1000_ADVTXD_POTS_IXSM) {
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index af8f77a3f0..2e5f58b3c9 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/crc32c.h"
 #include "net/eth.h"
 #include "net/checksum.h"
 #include "net/tap.h"
@@ -135,6 +136,23 @@ void net_tx_pkt_update_ip_checksums(struct NetTxPkt *pkt)
  pkt->virt_hdr.csum_offset, &csum, sizeof(csum));
 }
 
+bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
+{
+uint32_t csum = 0;
+struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
+
+if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, &csum, 
sizeof(csum)) < sizeof(csum)) {
+return false;
+}
+
+csum = cpu_to_le32(iov_crc32c(0x, pl_start_frag, 
pkt->payload_frags));
+if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, &csum, 
sizeof(csum)) < sizeof(csum)) {
+return false;
+}
+
+return true;
+}
+
 static void net_tx_pkt_calculate_hdr_len(struct NetTxPkt *pkt)
 {
 pkt->hdr_len = pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len +
-- 
2.40.0




[PATCH v4 20/48] igb: Remove goto

2023-04-26 Thread Akihiko Odaki
The goto is a bit confusing as it changes the control flow only if L4
protocol is not recognized. It is also different from e1000e, and
noisy when comparing e1000e and igb.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 946b917f91..bae51cbb63 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1297,7 +1297,7 @@ igb_build_rx_metadata(IGBCore *core,
 break;
 
 default:
-goto func_exit;
+break;
 }
 } else {
 trace_e1000e_rx_metadata_l4_cso_disabled();
-- 
2.40.0




[PATCH v4 04/48] igb: Fix Rx packet type encoding

2023-04-26 Thread Akihiko Odaki
igb's advanced descriptor uses a packet type encoding different from
one used in e1000e's extended descriptor. Fix the logic to encode
Rx packet type accordingly.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h |  5 +
 hw/net/igb_core.c | 38 +++---
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index c5c5b3c3b8..21ee9a3b2d 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -641,6 +641,11 @@ union e1000_adv_rx_desc {
 
 #define E1000_STATUS_NUM_VFS_SHIFT 14
 
+#define E1000_ADVRXD_PKT_IP4 BIT(4)
+#define E1000_ADVRXD_PKT_IP6 BIT(6)
+#define E1000_ADVRXD_PKT_TCP BIT(8)
+#define E1000_ADVRXD_PKT_UDP BIT(9)
+
 static inline uint8_t igb_ivar_entry_rx(uint8_t i)
 {
 return i < 8 ? i * 4 : (i - 8) * 4 + 2;
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 464a41d0aa..dbd1192a8e 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1227,7 +1227,6 @@ igb_build_rx_metadata(IGBCore *core,
 struct virtio_net_hdr *vhdr;
 bool hasip4, hasip6;
 EthL4HdrProto l4hdr_proto;
-uint32_t pkt_type;
 
 *status_flags = E1000_RXD_STAT_DD;
 
@@ -1266,28 +1265,29 @@ igb_build_rx_metadata(IGBCore *core,
 trace_e1000e_rx_metadata_ack();
 }
 
-if (hasip6 && (core->mac[RFCTL] & E1000_RFCTL_IPV6_DIS)) {
-trace_e1000e_rx_metadata_ipv6_filtering_disabled();
-pkt_type = E1000_RXD_PKT_MAC;
-} else if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP ||
-   l4hdr_proto == ETH_L4_HDR_PROTO_UDP) {
-pkt_type = hasip4 ? E1000_RXD_PKT_IP4_XDP : E1000_RXD_PKT_IP6_XDP;
-} else if (hasip4 || hasip6) {
-pkt_type = hasip4 ? E1000_RXD_PKT_IP4 : E1000_RXD_PKT_IP6;
-} else {
-pkt_type = E1000_RXD_PKT_MAC;
-}
+if (pkt_info) {
+*pkt_info = rss_info->enabled ? rss_info->type : 0;
 
-trace_e1000e_rx_metadata_pkt_type(pkt_type);
+if (hasip4) {
+*pkt_info |= E1000_ADVRXD_PKT_IP4;
+}
 
-if (pkt_info) {
-if (rss_info->enabled) {
-*pkt_info = rss_info->type;
+if (hasip6) {
+*pkt_info |= E1000_ADVRXD_PKT_IP6;
 }
 
-*pkt_info |= (pkt_type << 4);
-} else {
-*status_flags |= E1000_RXD_PKT_TYPE(pkt_type);
+switch (l4hdr_proto) {
+case ETH_L4_HDR_PROTO_TCP:
+*pkt_info |= E1000_ADVRXD_PKT_TCP;
+break;
+
+case ETH_L4_HDR_PROTO_UDP:
+*pkt_info |= E1000_ADVRXD_PKT_UDP;
+break;
+
+default:
+break;
+}
 }
 
 if (hdr_info) {
-- 
2.40.0




[PATCH v4 30/48] net/eth: Use void pointers

2023-04-26 Thread Akihiko Odaki
The uses of uint8_t pointers were misleading as they are never accessed
as an array of octets and it even require more strict alignment to
access as struct eth_header.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/net/eth.h | 4 ++--
 net/eth.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index 05f56931e7..95ff24d6b8 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -342,12 +342,12 @@ eth_get_pkt_tci(const void *p)
 
 size_t
 eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff,
-   uint8_t *new_ehdr_buf,
+   void *new_ehdr_buf,
uint16_t *payload_offset, uint16_t *tci);
 
 size_t
 eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff,
-  uint16_t vet, uint8_t *new_ehdr_buf,
+  uint16_t vet, void *new_ehdr_buf,
   uint16_t *payload_offset, uint16_t *tci);
 
 uint16_t
diff --git a/net/eth.c b/net/eth.c
index b6ff89c460..f7ffbda600 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -226,11 +226,11 @@ void eth_get_protocols(const struct iovec *iov, size_t 
iovcnt, size_t iovoff,
 
 size_t
 eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff,
-   uint8_t *new_ehdr_buf,
+   void *new_ehdr_buf,
uint16_t *payload_offset, uint16_t *tci)
 {
 struct vlan_header vlan_hdr;
-struct eth_header *new_ehdr = (struct eth_header *) new_ehdr_buf;
+struct eth_header *new_ehdr = new_ehdr_buf;
 
 size_t copied = iov_to_buf(iov, iovcnt, iovoff,
new_ehdr, sizeof(*new_ehdr));
@@ -276,7 +276,7 @@ eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t 
iovoff,
 
 size_t
 eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff,
-  uint16_t vet, uint8_t *new_ehdr_buf,
+  uint16_t vet, void *new_ehdr_buf,
   uint16_t *payload_offset, uint16_t *tci)
 {
 struct vlan_header vlan_hdr;
-- 
2.40.0




[PATCH v4 12/48] tests/avocado: Remove test_igb_nomsi_kvm

2023-04-26 Thread Akihiko Odaki
It is unlikely to find more bugs with KVM so remove test_igb_nomsi_kvm
to save time to run it.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Thomas Huth 
Acked-by: Alex Bennée 
---
 tests/avocado/netdev-ethtool.py | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/tests/avocado/netdev-ethtool.py b/tests/avocado/netdev-ethtool.py
index 8de118e313..6da800f62b 100644
--- a/tests/avocado/netdev-ethtool.py
+++ b/tests/avocado/netdev-ethtool.py
@@ -29,7 +29,7 @@ def get_asset(self, name, sha1):
 # URL into a unique one
 return self.fetch_asset(name=name, locations=(url), asset_hash=sha1)
 
-def common_test_code(self, netdev, extra_args=None, kvm=False):
+def common_test_code(self, netdev, extra_args=None):
 
 # This custom kernel has drivers for all the supported network
 # devices we can emulate in QEMU
@@ -57,9 +57,6 @@ def common_test_code(self, netdev, extra_args=None, 
kvm=False):
  '-drive', drive,
  '-device', netdev)
 
-if kvm:
-self.vm.add_args('-accel', 'kvm')
-
 self.vm.set_console(console_index=0)
 self.vm.launch()
 
@@ -86,13 +83,6 @@ def test_igb_nomsi(self):
 """
 self.common_test_code("igb", "pci=nomsi")
 
-def test_igb_nomsi_kvm(self):
-"""
-:avocado: tags=device:igb
-"""
-self.require_accelerator('kvm')
-self.common_test_code("igb", "pci=nomsi", True)
-
 # It seems the other popular cards we model in QEMU currently fail
 # the pattern test with:
 #
-- 
2.40.0




[PATCH v4 05/48] igb: Do not require CTRL.VME for tx VLAN tagging

2023-04-26 Thread Akihiko Odaki
While the datasheet of e1000e says it checks CTRL.VME for tx VLAN
tagging, igb's datasheet has no such statements. It also says for
"CTRL.VLE":
> This register only affects the VLAN Strip in Rx it does not have any
> influence in the Tx path in the 82576.
(Appendix A. Changes from the 82575)

There is no "CTRL.VLE" so it is more likely that it is a mistake of
CTRL.VME.

Fixes: fba7c3b788 ("igb: respect VMVIR and VMOLR for VLAN")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index dbd1192a8e..96a118b6c1 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -402,7 +402,7 @@ igb_tx_insert_vlan(IGBCore *core, uint16_t qn, struct 
igb_tx *tx,
 }
 }
 
-if (insert_vlan && e1000x_vlan_enabled(core->mac)) {
+if (insert_vlan) {
 net_tx_pkt_setup_vlan_header_ex(tx->tx_pkt, vlan,
 core->mac[VET] & 0x);
 }
-- 
2.40.0




[PATCH v4 26/48] igb: Fix igb_mac_reg_init coding style alignment

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/igb_core.c | 96 +++
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 56a53872cf..20645c4764 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -4027,54 +4027,54 @@ static const uint32_t igb_mac_reg_init[] = {
 [VMOLR0 ... VMOLR0 + 7] = 0x2600 | E1000_VMOLR_STRCRC,
 [RPLOLR]= E1000_RPLOLR_STRCRC,
 [RLPML] = 0x2600,
-[TXCTL0]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL1]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL2]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL3]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL4]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL5]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL6]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL7]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL8]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL9]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL10]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL11]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL12]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL13]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL14]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL15]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL0]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL1]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL2]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL3]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL4]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL5]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL6]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL7]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL8]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL9]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL10]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL11]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+

[PATCH v4 16/48] e1000x: Take CRC into consideration for size check

2023-04-26 Thread Akihiko Odaki
Section 13.7.15 Receive Length Error Count says:
>  Packets over 1522 bytes are oversized if LongPacketEnable is 0b
> (RCTL.LPE). If LongPacketEnable (LPE) is 1b, then an incoming packet
> is considered oversized if it exceeds 16384 bytes.

> These lengths are based on bytes in the received packet from
>  through , inclusively.

As QEMU processes packets without CRC, the number of bytes for CRC
need to be subtracted. This change adds some size definitions to be used
to derive the new size thresholds to eth.h.

Signed-off-by: Akihiko Odaki 
---
 include/net/eth.h  |  2 ++
 hw/net/e1000x_common.c | 10 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index e8af5742be..05f56931e7 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -32,6 +32,8 @@
 #define ETH_ALEN 6
 #define ETH_HLEN 14
 #define ETH_ZLEN 60 /* Min. octets in frame without FCS */
+#define ETH_FCS_LEN 4
+#define ETH_MTU 1500
 
 struct eth_header {
 uint8_t  h_dest[ETH_ALEN];   /* destination eth addr */
diff --git a/hw/net/e1000x_common.c b/hw/net/e1000x_common.c
index 6cc23138a8..212873fd77 100644
--- a/hw/net/e1000x_common.c
+++ b/hw/net/e1000x_common.c
@@ -140,16 +140,16 @@ bool e1000x_hw_rx_enabled(uint32_t *mac)
 
 bool e1000x_is_oversized(uint32_t *mac, size_t size)
 {
+size_t header_size = sizeof(struct eth_header) + sizeof(struct 
vlan_header);
 /* this is the size past which hardware will
drop packets when setting LPE=0 */
-static const int maximum_ethernet_vlan_size = 1522;
+size_t maximum_short_size = header_size + ETH_MTU;
 /* this is the size past which hardware will
drop packets when setting LPE=1 */
-static const int maximum_ethernet_lpe_size = 16 * KiB;
+size_t maximum_large_size = 16 * KiB - ETH_FCS_LEN;
 
-if ((size > maximum_ethernet_lpe_size ||
-(size > maximum_ethernet_vlan_size
-&& !(mac[RCTL] & E1000_RCTL_LPE)))
+if ((size > maximum_large_size ||
+(size > maximum_short_size && !(mac[RCTL] & E1000_RCTL_LPE)))
 && !(mac[RCTL] & E1000_RCTL_SBP)) {
 e1000x_inc_reg_if_not_full(mac, ROC);
 trace_e1000x_rx_oversized(size);
-- 
2.40.0




Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()

2023-04-26 Thread manish.mishra



On 26/04/23 3:58 pm, Juan Quintela wrote:

"manish.mishra"  wrote:

multifd_send_sync_main() posts request on the multifd channel
but does not call sem_wait() on channels_ready semaphore, making
the channels_ready semaphore count keep increasing.
As a result, sem_wait() on channels_ready in multifd_send_pages()
is always non-blocking hence multifd_send_pages() keeps searching
for a free channel in a busy loop until a channel is freed.

Signed-off-by: manish.mishra 
---
  migration/multifd.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cce3ad6988..43d26e7012 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f)
  
  trace_multifd_send_sync_main_signal(p->id);
  
+qemu_sem_wait(&multifd_send_state->channels_ready);

  qemu_mutex_lock(&p->mutex);
  
  if (p->quit) {

We need this, but I think it is better to put it on the second loop.


@@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp)
  multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
  multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
  multifd_send_state->pages = multifd_pages_init(page_count);
-qemu_sem_init(&multifd_send_state->channels_ready, 0);
+qemu_sem_init(&multifd_send_state->channels_ready, thread_count);
  qatomic_set(&multifd_send_state->exiting, 0);
  multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];

I think this bit is wrong.
We should not set the channels ready until the thread is ready and
channel is created.

What do you think about this patch:

 From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001
From: Juan Quintela 
Date: Wed, 26 Apr 2023 12:20:36 +0200
Subject: [PATCH] multifd: Fix the number of channels ready

We don't wait in the sem when we are doing a sync_main.  Make it wait
there.  To make things clearer, we mark the channel ready at the
begining of the thread loop.

This causes a busy loop in multifd_send_page().
Found-by: manish.mishra 

Signed-off-by: Juan Quintela 
---
  migration/multifd.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 903df2117b..e625e8725e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f)
  for (i = 0; i < migrate_multifd_channels(); i++) {
  MultiFDSendParams *p = &multifd_send_state->params[i];
  
+qemu_sem_wait(&multifd_send_state->channels_ready);

  trace_multifd_send_sync_main_wait(p->id);
  qemu_sem_wait(&p->sem_sync);
  
@@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque)

  p->num_packets = 1;
  
  while (true) {

+qemu_sem_post(&multifd_send_state->channels_ready);



This has one issue though, if we mark channel_ready here itself, channel is 
actually not ready so we can still busy loop?

May be we can do one thing let the sem_post in while loop at same position 
itself. But we can do another post just before start

of this while loop, as that will be called only once it should do work of 
initialising count equal to multiFD channels?



  qemu_sem_wait(&p->sem);
  
  if (qatomic_read(&multifd_send_state->exiting)) {

@@ -736,7 +738,6 @@ static void *multifd_send_thread(void *opaque)
  if (flags & MULTIFD_FLAG_SYNC) {
  qemu_sem_post(&p->sem_sync);
  }
-qemu_sem_post(&multifd_send_state->channels_ready);
  } else if (p->quit) {
  qemu_mutex_unlock(&p->mutex);
  break;




[PATCH v4 44/48] igb: Notify only new interrupts

2023-04-26 Thread Akihiko Odaki
This follows the corresponding change for e1000e. This fixes:
tests/avocado/netdev-ethtool.py:NetDevEthtool.test_igb

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 201 --
 hw/net/trace-events   |  11 +-
 .../org.centos/stream/8/x86_64/test-avocado   |   1 +
 tests/avocado/netdev-ethtool.py   |   4 -
 4 files changed, 87 insertions(+), 130 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 57973c3ae4..b68e24c9ee 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -94,10 +94,7 @@ static ssize_t
 igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt,
  bool has_vnet, bool *external_tx);
 
-static inline void
-igb_set_interrupt_cause(IGBCore *core, uint32_t val);
-
-static void igb_update_interrupt_state(IGBCore *core);
+static void igb_raise_interrupts(IGBCore *core, size_t index, uint32_t causes);
 static void igb_reset(IGBCore *core, bool sw);
 
 static inline void
@@ -913,8 +910,8 @@ igb_start_xmit(IGBCore *core, const IGB_TxRing *txr)
 }
 
 if (eic) {
-core->mac[EICR] |= eic;
-igb_set_interrupt_cause(core, E1000_ICR_TXDW);
+igb_raise_interrupts(core, EICR, eic);
+igb_raise_interrupts(core, ICR, E1000_ICR_TXDW);
 }
 
 net_tx_pkt_reset(txr->tx->tx_pkt, net_tx_pkt_unmap_frag_pci, d);
@@ -1686,6 +1683,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 {
 uint16_t queues = 0;
 uint32_t causes = 0;
+uint32_t ecauses = 0;
 union {
 L2Header l2_header;
 uint8_t octets[ETH_ZLEN];
@@ -1788,13 +1786,14 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 causes |= E1000_ICS_RXDMT0;
 }
 
-core->mac[EICR] |= igb_rx_wb_eic(core, rxr.i->idx);
+ecauses |= igb_rx_wb_eic(core, rxr.i->idx);
 
 trace_e1000e_rx_written_to_guest(rxr.i->idx);
 }
 
 trace_e1000e_rx_interrupt_set(causes);
-igb_set_interrupt_cause(core, causes);
+igb_raise_interrupts(core, EICR, ecauses);
+igb_raise_interrupts(core, ICR, causes);
 
 return orig_size;
 }
@@ -1854,7 +1853,7 @@ void igb_core_set_link_status(IGBCore *core)
 }
 
 if (core->mac[STATUS] != old_status) {
-igb_set_interrupt_cause(core, E1000_ICR_LSC);
+igb_raise_interrupts(core, ICR, E1000_ICR_LSC);
 }
 }
 
@@ -1934,13 +1933,6 @@ igb_set_rx_control(IGBCore *core, int index, uint32_t 
val)
 }
 }
 
-static inline void
-igb_clear_ims_bits(IGBCore *core, uint32_t bits)
-{
-trace_e1000e_irq_clear_ims(bits, core->mac[IMS], core->mac[IMS] & ~bits);
-core->mac[IMS] &= ~bits;
-}
-
 static inline bool
 igb_postpone_interrupt(IGBIntrDelayTimer *timer)
 {
@@ -1963,9 +1955,8 @@ igb_eitr_should_postpone(IGBCore *core, int idx)
 return igb_postpone_interrupt(&core->eitr[idx]);
 }
 
-static void igb_send_msix(IGBCore *core)
+static void igb_send_msix(IGBCore *core, uint32_t causes)
 {
-uint32_t causes = core->mac[EICR] & core->mac[EIMS];
 int vector;
 
 for (vector = 0; vector < IGB_INTR_NUM; ++vector) {
@@ -1988,124 +1979,116 @@ igb_fix_icr_asserted(IGBCore *core)
 trace_e1000e_irq_fix_icr_asserted(core->mac[ICR]);
 }
 
-static void
-igb_update_interrupt_state(IGBCore *core)
+static void igb_raise_interrupts(IGBCore *core, size_t index, uint32_t causes)
 {
-uint32_t icr;
-uint32_t causes;
+uint32_t old_causes = core->mac[ICR] & core->mac[IMS];
+uint32_t old_ecauses = core->mac[EICR] & core->mac[EIMS];
+uint32_t raised_causes;
+uint32_t raised_ecauses;
 uint32_t int_alloc;
 
-icr = core->mac[ICR] & core->mac[IMS];
+trace_e1000e_irq_set(index << 2,
+ core->mac[index], core->mac[index] | causes);
+
+core->mac[index] |= causes;
 
 if (core->mac[GPIE] & E1000_GPIE_MSIX_MODE) {
-if (icr) {
-causes = 0;
-if (icr & E1000_ICR_DRSTA) {
-int_alloc = core->mac[IVAR_MISC] & 0xff;
-if (int_alloc & E1000_IVAR_VALID) {
-causes |= BIT(int_alloc & 0x1f);
-}
+raised_causes = core->mac[ICR] & core->mac[IMS] & ~old_causes;
+
+if (raised_causes & E1000_ICR_DRSTA) {
+int_alloc = core->mac[IVAR_MISC] & 0xff;
+if (int_alloc & E1000_IVAR_VALID) {
+core->mac[EICR] |= BIT(int_alloc & 0x1f);
 }
-/* Check if other bits (excluding the TCP Timer) are enabled. */
-if (icr & ~E1000_ICR_DRSTA) {
-int_alloc = (core->mac[IVAR_MISC] >> 8) & 0xff;
-if (int_alloc & E1000_IVAR_VALID) {
-causes |= BIT(int_alloc & 0x1f);
-}
-trace_e1000e_irq_add_msi_other(core->mac[EICR]);
+}
+/* Check if other bits (excluding the TCP Timer) are enabled. */
+if (raised_causes & ~E1000_ICR_DRSTA)

[PATCH v4 15/48] e1000x: Share more Rx filtering logic

2023-04-26 Thread Akihiko Odaki
This saves some code and enables tracepoint for e1000's VLAN filtering.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/e1000x_common.h |  4 +++-
 hw/net/e1000.c | 35 +--
 hw/net/e1000e_core.c   | 47 +-
 hw/net/e1000x_common.c | 44 +--
 hw/net/igb_core.c  | 41 +++-
 hw/net/trace-events|  4 ++--
 6 files changed, 56 insertions(+), 119 deletions(-)

diff --git a/hw/net/e1000x_common.h b/hw/net/e1000x_common.h
index 0298e06283..be291684de 100644
--- a/hw/net/e1000x_common.h
+++ b/hw/net/e1000x_common.h
@@ -107,7 +107,9 @@ bool e1000x_rx_ready(PCIDevice *d, uint32_t *mac);
 
 bool e1000x_is_vlan_packet(const void *buf, uint16_t vet);
 
-bool e1000x_rx_group_filter(uint32_t *mac, const uint8_t *buf);
+bool e1000x_rx_vlan_filter(uint32_t *mac, const struct vlan_header *vhdr);
+
+bool e1000x_rx_group_filter(uint32_t *mac, const struct eth_header *ehdr);
 
 bool e1000x_hw_rx_enabled(uint32_t *mac);
 
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 18eb6d8876..aae5f0bdc0 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -804,36 +804,11 @@ start_xmit(E1000State *s)
 }
 
 static int
-receive_filter(E1000State *s, const uint8_t *buf, int size)
+receive_filter(E1000State *s, const void *buf)
 {
-uint32_t rctl = s->mac_reg[RCTL];
-int isbcast = is_broadcast_ether_addr(buf);
-int ismcast = is_multicast_ether_addr(buf);
-
-if (e1000x_is_vlan_packet(buf, le16_to_cpu(s->mac_reg[VET])) &&
-e1000x_vlan_rx_filter_enabled(s->mac_reg)) {
-uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(buf)->h_tci);
-uint32_t vfta =
-ldl_le_p((uint32_t *)(s->mac_reg + VFTA) +
- ((vid >> E1000_VFTA_ENTRY_SHIFT) & 
E1000_VFTA_ENTRY_MASK));
-if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) {
-return 0;
-}
-}
-
-if (!isbcast && !ismcast && (rctl & E1000_RCTL_UPE)) { /* promiscuous 
ucast */
-return 1;
-}
-
-if (ismcast && (rctl & E1000_RCTL_MPE)) {  /* promiscuous mcast */
-return 1;
-}
-
-if (isbcast && (rctl & E1000_RCTL_BAM)) {  /* broadcast enabled */
-return 1;
-}
-
-return e1000x_rx_group_filter(s->mac_reg, buf);
+return (!e1000x_is_vlan_packet(buf, s->mac_reg[VET]) ||
+e1000x_rx_vlan_filter(s->mac_reg, PKT_GET_VLAN_HDR(buf))) &&
+   e1000x_rx_group_filter(s->mac_reg, buf);
 }
 
 static void
@@ -949,7 +924,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 return size;
 }
 
-if (!receive_filter(s, filter_buf, size)) {
+if (!receive_filter(s, filter_buf)) {
 return size;
 }
 
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 14b94db59c..41d2435074 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1034,48 +1034,11 @@ e1000e_rx_l4_cso_enabled(E1000ECore *core)
 }
 
 static bool
-e1000e_receive_filter(E1000ECore *core, const uint8_t *buf, int size)
+e1000e_receive_filter(E1000ECore *core, const void *buf)
 {
-uint32_t rctl = core->mac[RCTL];
-
-if (e1000x_is_vlan_packet(buf, core->mac[VET]) &&
-e1000x_vlan_rx_filter_enabled(core->mac)) {
-uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(buf)->h_tci);
-uint32_t vfta =
-ldl_le_p((uint32_t *)(core->mac + VFTA) +
- ((vid >> E1000_VFTA_ENTRY_SHIFT) & 
E1000_VFTA_ENTRY_MASK));
-if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) {
-trace_e1000e_rx_flt_vlan_mismatch(vid);
-return false;
-} else {
-trace_e1000e_rx_flt_vlan_match(vid);
-}
-}
-
-switch (net_rx_pkt_get_packet_type(core->rx_pkt)) {
-case ETH_PKT_UCAST:
-if (rctl & E1000_RCTL_UPE) {
-return true; /* promiscuous ucast */
-}
-break;
-
-case ETH_PKT_BCAST:
-if (rctl & E1000_RCTL_BAM) {
-return true; /* broadcast enabled */
-}
-break;
-
-case ETH_PKT_MCAST:
-if (rctl & E1000_RCTL_MPE) {
-return true; /* promiscuous mcast */
-}
-break;
-
-default:
-g_assert_not_reached();
-}
-
-return e1000x_rx_group_filter(core->mac, buf);
+return (!e1000x_is_vlan_packet(buf, core->mac[VET]) ||
+e1000x_rx_vlan_filter(core->mac, PKT_GET_VLAN_HDR(buf))) &&
+   e1000x_rx_group_filter(core->mac, buf);
 }
 
 static inline void
@@ -1736,7 +1699,7 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 net_rx_pkt_set_packet_type(core->rx_pkt,
 get_eth_packet_type(PKT_GET_ETH_HDR(buf)));
 
-if (!e1000e_receive_filter(core, buf, size)) {
+if (!e1000e_receive_filter(core, buf)) {
 trace_e1000e_rx_flt_dropped();
 return orig_size;
 }
di

[PATCH v4 29/48] igb: Rename a variable in igb_receive_internal()

2023-04-26 Thread Akihiko Odaki
Rename variable "n" to "causes", which properly represents the content
of the variable.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index edda07e564..c954369964 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1569,7 +1569,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
  bool has_vnet, bool *external_tx)
 {
 uint16_t queues = 0;
-uint32_t n = 0;
+uint32_t causes = 0;
 union {
 L2Header l2_header;
 uint8_t octets[ETH_ZLEN];
@@ -1649,19 +1649,19 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 e1000x_fcs_len(core->mac);
 
 if (!igb_has_rxbufs(core, rxr.i, total_size)) {
-n |= E1000_ICS_RXO;
+causes |= E1000_ICS_RXO;
 trace_e1000e_rx_not_written_to_guest(rxr.i->idx);
 continue;
 }
 
-n |= E1000_ICR_RXDW;
+causes |= E1000_ICR_RXDW;
 
 igb_rx_fix_l4_csum(core, core->rx_pkt);
 igb_write_packet_to_guest(core, core->rx_pkt, &rxr, &rss_info);
 
 /* Check if receive descriptor minimum threshold hit */
 if (igb_rx_descr_threshold_hit(core, rxr.i)) {
-n |= E1000_ICS_RXDMT0;
+causes |= E1000_ICS_RXDMT0;
 }
 
 core->mac[EICR] |= igb_rx_wb_eic(core, rxr.i->idx);
@@ -1669,8 +1669,8 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 trace_e1000e_rx_written_to_guest(rxr.i->idx);
 }
 
-trace_e1000e_rx_interrupt_set(n);
-igb_set_interrupt_cause(core, n);
+trace_e1000e_rx_interrupt_set(causes);
+igb_set_interrupt_cause(core, causes);
 
 return orig_size;
 }
-- 
2.40.0




[PATCH v4 24/48] igb: Add more definitions for Tx descriptor

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h | 32 +++-
 hw/net/igb_core.c |  4 ++--
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index 21ee9a3b2d..eb995d8b2e 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -42,11 +42,6 @@ union e1000_adv_tx_desc {
 } wb;
 };
 
-#define E1000_ADVTXD_DTYP_CTXT  0x0020 /* Advanced Context Descriptor */
-#define E1000_ADVTXD_DTYP_DATA  0x0030 /* Advanced Data Descriptor */
-#define E1000_ADVTXD_DCMD_DEXT  0x2000 /* Descriptor Extension (1=Adv) */
-#define E1000_ADVTXD_DCMD_TSE   0x8000 /* TCP/UDP Segmentation Enable */
-
 #define E1000_ADVTXD_POTS_IXSM  0x0100 /* Insert TCP/UDP Checksum */
 #define E1000_ADVTXD_POTS_TXSM  0x0200 /* Insert TCP/UDP Checksum */
 
@@ -151,6 +146,10 @@ union e1000_adv_rx_desc {
 #define IGB_82576_VF_DEV_ID0x10CA
 #define IGB_I350_VF_DEV_ID 0x1520
 
+/* VLAN info */
+#define IGB_TX_FLAGS_VLAN_MASK 0x
+#define IGB_TX_FLAGS_VLAN_SHIFT16
+
 /* from igb/e1000_82575.h */
 
 #define E1000_MRQC_ENABLE_RSS_MQ0x0002
@@ -160,6 +159,29 @@ union e1000_adv_rx_desc {
 #define E1000_MRQC_RSS_FIELD_IPV6_UDP   0x0080
 #define E1000_MRQC_RSS_FIELD_IPV6_UDP_EX0x0100
 
+/* Adv Transmit Descriptor Config Masks */
+#define E1000_ADVTXD_MAC_TSTAMP   0x0008 /* IEEE1588 Timestamp packet */
+#define E1000_ADVTXD_DTYP_CTXT0x0020 /* Advanced Context Descriptor */
+#define E1000_ADVTXD_DTYP_DATA0x0030 /* Advanced Data Descriptor */
+#define E1000_ADVTXD_DCMD_EOP 0x0100 /* End of Packet */
+#define E1000_ADVTXD_DCMD_IFCS0x0200 /* Insert FCS (Ethernet CRC) */
+#define E1000_ADVTXD_DCMD_RS  0x0800 /* Report Status */
+#define E1000_ADVTXD_DCMD_DEXT0x2000 /* Descriptor extension (1=Adv) */
+#define E1000_ADVTXD_DCMD_VLE 0x4000 /* VLAN pkt enable */
+#define E1000_ADVTXD_DCMD_TSE 0x8000 /* TCP Seg enable */
+#define E1000_ADVTXD_PAYLEN_SHIFT14 /* Adv desc PAYLEN shift */
+
+#define E1000_ADVTXD_MACLEN_SHIFT9  /* Adv ctxt desc mac len shift */
+#define E1000_ADVTXD_TUCMD_L4T_UDP 0x  /* L4 Packet TYPE of UDP */
+#define E1000_ADVTXD_TUCMD_IPV40x0400  /* IP Packet Type: 1=IPv4 */
+#define E1000_ADVTXD_TUCMD_L4T_TCP 0x0800  /* L4 Packet TYPE of TCP */
+#define E1000_ADVTXD_TUCMD_L4T_SCTP 0x1000 /* L4 packet TYPE of SCTP */
+/* IPSec Encrypt Enable for ESP */
+#define E1000_ADVTXD_L4LEN_SHIFT 8  /* Adv ctxt L4LEN shift */
+#define E1000_ADVTXD_MSS_SHIFT  16  /* Adv ctxt MSS shift */
+/* Adv ctxt IPSec SA IDX mask */
+/* Adv ctxt IPSec ESP len mask */
+
 /* Additional Transmit Descriptor Control definitions */
 #define E1000_TXDCTL_QUEUE_ENABLE  0x0200 /* Enable specific Tx Queue */
 
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 162ef26789..56a53872cf 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -418,7 +418,7 @@ igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 {
 if (tx->first_cmd_type_len & E1000_ADVTXD_DCMD_TSE) {
 uint32_t idx = (tx->first_olinfo_status >> 4) & 1;
-uint32_t mss = tx->ctx[idx].mss_l4len_idx >> 16;
+uint32_t mss = tx->ctx[idx].mss_l4len_idx >> E1000_ADVTXD_MSS_SHIFT;
 if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, mss)) {
 return false;
 }
@@ -612,7 +612,7 @@ igb_process_tx_desc(IGBCore *core,
 if (!tx->skip_cp && net_tx_pkt_parse(tx->tx_pkt)) {
 idx = (tx->first_olinfo_status >> 4) & 1;
 igb_tx_insert_vlan(core, queue_index, tx,
-tx->ctx[idx].vlan_macip_lens >> 16,
+tx->ctx[idx].vlan_macip_lens >> IGB_TX_FLAGS_VLAN_SHIFT,
 !!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE));
 
 if (igb_tx_pkt_send(core, tx, queue_index)) {
-- 
2.40.0




[PATCH v4 10/48] Fix references to igb Avocado test

2023-04-26 Thread Akihiko Odaki
Fixes: 9f95111474 ("tests/avocado: re-factor igb test to avoid timeouts")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS| 2 +-
 docs/system/devices/igb.rst| 2 +-
 scripts/ci/org.centos/stream/8/x86_64/test-avocado | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ef45b5e71e..c31d2279ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2256,7 +2256,7 @@ R: Sriram Yagnaraman 
 S: Maintained
 F: docs/system/devices/igb.rst
 F: hw/net/igb*
-F: tests/avocado/igb.py
+F: tests/avocado/netdev-ethtool.py
 F: tests/qtest/igb-test.c
 F: tests/qtest/libqos/igb.c
 
diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst
index 70edadd574..afe036dad2 100644
--- a/docs/system/devices/igb.rst
+++ b/docs/system/devices/igb.rst
@@ -60,7 +60,7 @@ Avocado test and can be ran with the following command:
 
 .. code:: shell
 
-  make check-avocado AVOCADO_TESTS=tests/avocado/igb.py
+  make check-avocado AVOCADO_TESTS=tests/avocado/netdev-ethtool.py
 
 References
 ==
diff --git a/scripts/ci/org.centos/stream/8/x86_64/test-avocado 
b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
index d2c0e5fb4c..a1aa601ee3 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/test-avocado
+++ b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
@@ -30,7 +30,7 @@ make get-vm-images
 tests/avocado/cpu_queries.py:QueryCPUModelExpansion.test \
 tests/avocado/empty_cpu_model.py:EmptyCPUModel.test \
 tests/avocado/hotplug_cpu.py:HotPlugCPU.test \
-tests/avocado/igb.py:IGB.test \
+tests/avocado/netdev-ethtool.py:NetDevEthtool.test_igb_nomsi \
 tests/avocado/info_usernet.py:InfoUsernet.test_hostfwd \
 tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu \
 tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt \
-- 
2.40.0




[PATCH v4 46/48] vmxnet3: Do not depend on PC

2023-04-26 Thread Akihiko Odaki
vmxnet3 has no dependency on PC, and VMware Fusion actually makes it
available on Apple Silicon according to:
https://kb.vmware.com/s/article/90364

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/Kconfig b/hw/net/Kconfig
index 18c7851efe..98e00be4f9 100644
--- a/hw/net/Kconfig
+++ b/hw/net/Kconfig
@@ -56,7 +56,7 @@ config RTL8139_PCI
 
 config VMXNET3_PCI
 bool
-default y if PCI_DEVICES && PC_PCI
+default y if PCI_DEVICES
 depends on PCI
 
 config SMC91C111
-- 
2.40.0




[PATCH v4 31/48] net/eth: Always add VLAN tag

2023-04-26 Thread Akihiko Odaki
It is possible to have another VLAN tag even if the packet is already
tagged.

Signed-off-by: Akihiko Odaki 
---
 include/net/eth.h   |  4 ++--
 hw/net/net_tx_pkt.c | 16 +++-
 net/eth.c   | 22 ++
 3 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index 95ff24d6b8..048e434685 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -353,8 +353,8 @@ eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, 
size_t iovoff,
 uint16_t
 eth_get_l3_proto(const struct iovec *l2hdr_iov, int iovcnt, size_t l2hdr_len);
 
-void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
-uint16_t vlan_ethtype, bool *is_new);
+void eth_setup_vlan_headers(struct eth_header *ehdr, size_t *ehdr_size,
+uint16_t vlan_tag, uint16_t vlan_ethtype);
 
 
 uint8_t eth_get_gso_type(uint16_t l3_proto, uint8_t *l3_hdr, uint8_t l4proto);
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index ce6b102391..af8f77a3f0 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -40,7 +40,10 @@ struct NetTxPkt {
 
 struct iovec *vec;
 
-uint8_t l2_hdr[ETH_MAX_L2_HDR_LEN];
+struct {
+struct eth_header eth;
+struct vlan_header vlan[3];
+} l2_hdr;
 union {
 struct ip_header ip;
 struct ip6_header ip6;
@@ -365,18 +368,13 @@ bool net_tx_pkt_build_vheader(struct NetTxPkt *pkt, bool 
tso_enable,
 void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt,
 uint16_t vlan, uint16_t vlan_ethtype)
 {
-bool is_new;
 assert(pkt);
 
 eth_setup_vlan_headers(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base,
-vlan, vlan_ethtype, &is_new);
+   &pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len,
+   vlan, vlan_ethtype);
 
-/* update l2hdrlen */
-if (is_new) {
-pkt->hdr_len += sizeof(struct vlan_header);
-pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len +=
-sizeof(struct vlan_header);
-}
+pkt->hdr_len += sizeof(struct vlan_header);
 }
 
 bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, void *base, size_t len)
diff --git a/net/eth.c b/net/eth.c
index f7ffbda600..5307978486 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -21,26 +21,16 @@
 #include "net/checksum.h"
 #include "net/tap.h"
 
-void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
-uint16_t vlan_ethtype, bool *is_new)
+void eth_setup_vlan_headers(struct eth_header *ehdr, size_t *ehdr_size,
+uint16_t vlan_tag, uint16_t vlan_ethtype)
 {
 struct vlan_header *vhdr = PKT_GET_VLAN_HDR(ehdr);
 
-switch (be16_to_cpu(ehdr->h_proto)) {
-case ETH_P_VLAN:
-case ETH_P_DVLAN:
-/* vlan hdr exists */
-*is_new = false;
-break;
-
-default:
-/* No VLAN header, put a new one */
-vhdr->h_proto = ehdr->h_proto;
-ehdr->h_proto = cpu_to_be16(vlan_ethtype);
-*is_new = true;
-break;
-}
+memmove(vhdr + 1, vhdr, *ehdr_size - ETH_HLEN);
 vhdr->h_tci = cpu_to_be16(vlan_tag);
+vhdr->h_proto = ehdr->h_proto;
+ehdr->h_proto = cpu_to_be16(vlan_ethtype);
+*ehdr_size += sizeof(*vhdr);
 }
 
 uint8_t
-- 
2.40.0




[PATCH v4 14/48] net/eth: Rename eth_setup_vlan_headers_ex

2023-04-26 Thread Akihiko Odaki
The old eth_setup_vlan_headers has no user so remove it and rename
eth_setup_vlan_headers_ex.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/net/eth.h   | 9 +
 hw/net/net_tx_pkt.c | 2 +-
 net/eth.c   | 2 +-
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index 9f19c3a695..e8af5742be 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -351,16 +351,9 @@ eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, 
size_t iovoff,
 uint16_t
 eth_get_l3_proto(const struct iovec *l2hdr_iov, int iovcnt, size_t l2hdr_len);
 
-void eth_setup_vlan_headers_ex(struct eth_header *ehdr, uint16_t vlan_tag,
+void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
 uint16_t vlan_ethtype, bool *is_new);
 
-static inline void
-eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
-bool *is_new)
-{
-eth_setup_vlan_headers_ex(ehdr, vlan_tag, ETH_P_VLAN, is_new);
-}
-
 
 uint8_t eth_get_gso_type(uint16_t l3_proto, uint8_t *l3_hdr, uint8_t l4proto);
 
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index cc36750c9b..ce6b102391 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -368,7 +368,7 @@ void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt,
 bool is_new;
 assert(pkt);
 
-eth_setup_vlan_headers_ex(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base,
+eth_setup_vlan_headers(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base,
 vlan, vlan_ethtype, &is_new);
 
 /* update l2hdrlen */
diff --git a/net/eth.c b/net/eth.c
index d7b30df79f..b6ff89c460 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -21,7 +21,7 @@
 #include "net/checksum.h"
 #include "net/tap.h"
 
-void eth_setup_vlan_headers_ex(struct eth_header *ehdr, uint16_t vlan_tag,
+void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
 uint16_t vlan_ethtype, bool *is_new)
 {
 struct vlan_header *vhdr = PKT_GET_VLAN_HDR(ehdr);
-- 
2.40.0




[PATCH v4 33/48] tests/qtest/libqos/igb: Set GPIE.Multiple_MSIX

2023-04-26 Thread Akihiko Odaki
GPIE.Multiple_MSIX is not set by default, and needs to be set to get
interrupts from multiple MSI-X vectors.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 tests/qtest/libqos/igb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/libqos/igb.c b/tests/qtest/libqos/igb.c
index 12fb531bf0..a603468beb 100644
--- a/tests/qtest/libqos/igb.c
+++ b/tests/qtest/libqos/igb.c
@@ -114,6 +114,7 @@ static void igb_pci_start_hw(QOSGraphObject *obj)
 e1000e_macreg_write(&d->e1000e, E1000_RCTL, E1000_RCTL_EN);
 
 /* Enable all interrupts */
+e1000e_macreg_write(&d->e1000e, E1000_GPIE,  E1000_GPIE_MSIX_MODE);
 e1000e_macreg_write(&d->e1000e, E1000_IMS,  0x);
 e1000e_macreg_write(&d->e1000e, E1000_EIMS, 0x);
 
-- 
2.40.0




[PATCH v4 36/48] igb: Implement Rx SCTP CSO

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h |  1 +
 include/net/eth.h |  4 ++-
 include/qemu/crc32c.h |  1 +
 hw/net/e1000e_core.c  |  5 
 hw/net/igb_core.c | 15 +-
 hw/net/net_rx_pkt.c   | 64 +++
 net/eth.c |  4 +++
 util/crc32c.c |  8 ++
 8 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index e6ac26dc0e..4b4ebd3369 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -670,6 +670,7 @@ union e1000_adv_rx_desc {
 #define E1000_ADVRXD_PKT_IP6 BIT(6)
 #define E1000_ADVRXD_PKT_TCP BIT(8)
 #define E1000_ADVRXD_PKT_UDP BIT(9)
+#define E1000_ADVRXD_PKT_SCTP BIT(10)
 
 static inline uint8_t igb_ivar_entry_rx(uint8_t i)
 {
diff --git a/include/net/eth.h b/include/net/eth.h
index 048e434685..75e7f1551c 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -224,6 +224,7 @@ struct tcp_hdr {
 #define IP_HEADER_VERSION_6   (6)
 #define IP_PROTO_TCP  (6)
 #define IP_PROTO_UDP  (17)
+#define IP_PROTO_SCTP (132)
 #define IPTOS_ECN_MASK0x03
 #define IPTOS_ECN(x)  ((x) & IPTOS_ECN_MASK)
 #define IPTOS_ECN_CE  0x03
@@ -379,7 +380,8 @@ typedef struct eth_ip4_hdr_info_st {
 typedef enum EthL4HdrProto {
 ETH_L4_HDR_PROTO_INVALID,
 ETH_L4_HDR_PROTO_TCP,
-ETH_L4_HDR_PROTO_UDP
+ETH_L4_HDR_PROTO_UDP,
+ETH_L4_HDR_PROTO_SCTP
 } EthL4HdrProto;
 
 typedef struct eth_l4_hdr_info_st {
diff --git a/include/qemu/crc32c.h b/include/qemu/crc32c.h
index 5b78884c38..88b4d2b3b3 100644
--- a/include/qemu/crc32c.h
+++ b/include/qemu/crc32c.h
@@ -30,5 +30,6 @@
 
 
 uint32_t crc32c(uint32_t crc, const uint8_t *data, unsigned int length);
+uint32_t iov_crc32c(uint32_t crc, const struct iovec *iov, size_t iov_cnt);
 
 #endif
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index aea70b74d9..0b939ff5a3 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1114,6 +1114,11 @@ e1000e_verify_csum_in_sw(E1000ECore *core,
 return;
 }
 
+if (l4hdr_proto != ETH_L4_HDR_PROTO_TCP &&
+l4hdr_proto != ETH_L4_HDR_PROTO_UDP) {
+return;
+}
+
 if (!net_rx_pkt_validate_l4_csum(pkt, &csum_valid)) {
 trace_e1000e_rx_metadata_l4_csum_validation_failed();
 return;
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 41a2e5bf7b..95d46d6e6d 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1220,7 +1220,7 @@ igb_build_rx_metadata(IGBCore *core,
   uint16_t *vlan_tag)
 {
 struct virtio_net_hdr *vhdr;
-bool hasip4, hasip6;
+bool hasip4, hasip6, csum_valid;
 EthL4HdrProto l4hdr_proto;
 
 *status_flags = E1000_RXD_STAT_DD;
@@ -1280,6 +1280,10 @@ igb_build_rx_metadata(IGBCore *core,
 *pkt_info |= E1000_ADVRXD_PKT_UDP;
 break;
 
+case ETH_L4_HDR_PROTO_SCTP:
+*pkt_info |= E1000_ADVRXD_PKT_SCTP;
+break;
+
 default:
 break;
 }
@@ -1312,6 +1316,15 @@ igb_build_rx_metadata(IGBCore *core,
 
 if (igb_rx_l4_cso_enabled(core)) {
 switch (l4hdr_proto) {
+case ETH_L4_HDR_PROTO_SCTP:
+if (!net_rx_pkt_validate_l4_csum(pkt, &csum_valid)) {
+trace_e1000e_rx_metadata_l4_csum_validation_failed();
+goto func_exit;
+}
+if (!csum_valid) {
+*status_flags |= E1000_RXDEXT_STATERR_TCPE;
+}
+/* fall through */
 case ETH_L4_HDR_PROTO_TCP:
 *status_flags |= E1000_RXD_STAT_TCPCS;
 break;
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 1de42b4f51..3575c8b9f9 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/crc32c.h"
 #include "trace.h"
 #include "net_rx_pkt.h"
 #include "net/checksum.h"
@@ -554,32 +555,73 @@ _net_rx_pkt_calc_l4_csum(struct NetRxPkt *pkt)
 return csum;
 }
 
-bool net_rx_pkt_validate_l4_csum(struct NetRxPkt *pkt, bool *csum_valid)
+static bool
+_net_rx_pkt_validate_sctp_sum(struct NetRxPkt *pkt)
 {
-uint16_t csum;
+size_t csum_off;
+size_t off = pkt->l4hdr_off;
+size_t vec_len = pkt->vec_len;
+struct iovec *vec;
+uint32_t calculated = 0;
+uint32_t original;
+bool valid;
 
-trace_net_rx_pkt_l4_csum_validate_entry();
+for (vec = pkt->vec; vec->iov_len < off; vec++) {
+off -= vec->iov_len;
+vec_len--;
+}
 
-if (pkt->l4hdr_info.proto != ETH_L4_HDR_PROTO_TCP &&
-pkt->l4hdr_info.proto != ETH_L4_HDR_PROTO_UDP) {
-trace_net_rx_pkt_l4_csum_validate_not_xxp();
+csum_off = off + 8;
+
+if (!iov_to_buf(vec, vec_len, csum_off, &original, sizeof(original))) {
 return false;
 }
 
-if (pkt->l4hdr_info.proto == ETH_L4_HDR_PROTO_UDP &&
-pkt->l4hdr_info.hdr.udp.uh_sum

[PATCH v4 34/48] igb: Implement MSI-X single vector mode

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index c954369964..6d55b43fb4 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1873,7 +1873,7 @@ igb_update_interrupt_state(IGBCore *core)
 
 icr = core->mac[ICR] & core->mac[IMS];
 
-if (msix_enabled(core->owner)) {
+if (core->mac[GPIE] & E1000_GPIE_MSIX_MODE) {
 if (icr) {
 causes = 0;
 if (icr & E1000_ICR_DRSTA) {
@@ -1908,7 +1908,12 @@ igb_update_interrupt_state(IGBCore *core)
 trace_e1000e_irq_pending_interrupts(core->mac[ICR] & core->mac[IMS],
 core->mac[ICR], core->mac[IMS]);
 
-if (msi_enabled(core->owner)) {
+if (msix_enabled(core->owner)) {
+if (icr) {
+trace_e1000e_irq_msix_notify_vec(0);
+msix_notify(core->owner, 0);
+}
+} else if (msi_enabled(core->owner)) {
 if (icr) {
 msi_notify(core->owner, 0);
 }
-- 
2.40.0




[PATCH v4 23/48] vmxnet3: Reset packet state after emptying Tx queue

2023-04-26 Thread Akihiko Odaki
Keeping Tx packet state after the transmit queue is emptied but this
behavior is unreliable as the state can be reset anytime the migration
happens.

Always reset Tx packet state always after the queue is emptied.

Signed-off-by: Akihiko Odaki 
---
 hw/net/vmxnet3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 05f41b6dfa..18b9edfdb2 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -681,6 +681,8 @@ static void vmxnet3_process_tx_queue(VMXNET3State *s, int 
qidx)
  net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s));
 }
 }
+
+net_tx_pkt_reset(s->tx_pkt, net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s));
 }
 
 static inline void
@@ -1159,7 +1161,6 @@ static void vmxnet3_deactivate_device(VMXNET3State *s)
 {
 if (s->device_active) {
 VMW_CBPRN("Deactivating vmxnet3...");
-net_tx_pkt_reset(s->tx_pkt, net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s));
 net_tx_pkt_uninit(s->tx_pkt);
 net_rx_pkt_uninit(s->rx_pkt);
 s->device_active = false;
-- 
2.40.0




Re: [PATCH] tests/unit/test-blockjob: Re-enable complete_in_standby test

2023-04-26 Thread Emanuele Giuseppe Esposito



Am 26/04/2023 um 10:45 schrieb Thomas Huth:
> On 26/04/2023 10.16, Emanuele Giuseppe Esposito wrote:
>> Pause the job while draining so that pause_count will be
>> increased and bdrv_drain_all_end() won't reset it to 0, so the
>> job will not continue.
>>
>> With this fix, the test is not flaky anymore.
>>
>> Additionally remove useless aiocontext lock around bdrv_drain_all_end()
>> in test_complete_in_standby().
>>
>> Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable
>> complete_in_standby test"
>> Suggested-by: Hanna Czenczek 
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> ---
>>   tests/unit/test-blockjob.c | 17 +++--
>>   1 file changed, 7 insertions(+), 10 deletions(-)
>>
>> diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
>> index a130f6fefb..46d720aeee 100644
>> --- a/tests/unit/test-blockjob.c
>> +++ b/tests/unit/test-blockjob.c
>> @@ -488,11 +488,15 @@ static void test_complete_in_standby(void)
>>   bdrv_drain_all_begin();
>>   assert_job_status_is(job, JOB_STATUS_STANDBY);
>>   +    /*
>> + * Increase pause_count so that the counter is
>> + * unbalanced and job won't resume
>> + */
>> +    job_pause(job);
>> +
>>   /* Lock the IO thread to prevent the job from being run */
> 
> I guess the above comment should now be removed, too?
> 
>> -    aio_context_acquire(ctx);
>>   /* This will schedule the job to resume it */
>>   bdrv_drain_all_end();
>> -    aio_context_release(ctx);
> 
>  Thomas
> 
Makes sense, resending

Emanuele




[PATCH v4 45/48] igb: Clear-on-read ICR when ICR.INTA is set

2023-04-26 Thread Akihiko Odaki
For GPIE.NSICR, Section 7.3.2.1.2 says:
> ICR bits are cleared on register read. If GPIE.NSICR = 0b, then the
> clear on read occurs only if no bit is set in the IMS or at least one
> bit is set in the IMS and there is a true interrupt as reflected in
> ICR.INTA.

e1000e does similar though it checks for CTRL_EXT.IAME, which does not
exist on igb.

Suggested-by: Sriram Yagnaraman 
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index b68e24c9ee..29190054c6 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -2598,6 +2598,8 @@ igb_mac_icr_read(IGBCore *core, int index)
 } else if (core->mac[IMS] == 0) {
 trace_e1000e_irq_icr_clear_zero_ims();
 igb_lower_interrupts(core, ICR, 0x);
+} else if (core->mac[ICR] & E1000_ICR_INT_ASSERTED) {
+igb_lower_interrupts(core, ICR, 0x);
 } else if (!msix_enabled(core->owner)) {
 trace_e1000e_irq_icr_clear_nonmsix_icr_read();
 igb_lower_interrupts(core, ICR, 0x);
-- 
2.40.0




[PATCH v4 32/48] hw/net/net_rx_pkt: Enforce alignment for eth_header

2023-04-26 Thread Akihiko Odaki
eth_strip_vlan and eth_strip_vlan_ex refers to ehdr_buf as struct
eth_header. Enforce alignment for the structure.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/net_rx_pkt.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 6125a063d7..1de42b4f51 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -23,7 +23,10 @@
 
 struct NetRxPkt {
 struct virtio_net_hdr virt_hdr;
-uint8_t ehdr_buf[sizeof(struct eth_header) + sizeof(struct vlan_header)];
+struct {
+struct eth_header eth;
+struct vlan_header vlan;
+} ehdr_buf;
 struct iovec *vec;
 uint16_t vec_len_total;
 uint16_t vec_len;
@@ -89,7 +92,7 @@ net_rx_pkt_pull_data(struct NetRxPkt *pkt,
 if (pkt->ehdr_buf_len) {
 net_rx_pkt_iovec_realloc(pkt, iovcnt + 1);
 
-pkt->vec[0].iov_base = pkt->ehdr_buf;
+pkt->vec[0].iov_base = &pkt->ehdr_buf;
 pkt->vec[0].iov_len = pkt->ehdr_buf_len;
 
 pkt->tot_len = pllen + pkt->ehdr_buf_len;
@@ -120,7 +123,7 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
 assert(pkt);
 
 if (strip_vlan) {
-pkt->ehdr_buf_len = eth_strip_vlan(iov, iovcnt, iovoff, pkt->ehdr_buf,
+pkt->ehdr_buf_len = eth_strip_vlan(iov, iovcnt, iovoff, &pkt->ehdr_buf,
&ploff, &tci);
 } else {
 pkt->ehdr_buf_len = 0;
@@ -142,7 +145,7 @@ void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt,
 
 if (strip_vlan) {
 pkt->ehdr_buf_len = eth_strip_vlan_ex(iov, iovcnt, iovoff, vet,
-  pkt->ehdr_buf,
+  &pkt->ehdr_buf,
   &ploff, &tci);
 } else {
 pkt->ehdr_buf_len = 0;
-- 
2.40.0




[PATCH v4 35/48] igb: Use UDP for RSS hash

2023-04-26 Thread Akihiko Odaki
e1000e does not support using UDP for RSS hash, but igb does.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h |  3 +++
 hw/net/igb_core.c | 16 
 2 files changed, 19 insertions(+)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index eb995d8b2e..e6ac26dc0e 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -659,6 +659,9 @@ union e1000_adv_rx_desc {
 
 #define E1000_RSS_QUEUE(reta, hash) (E1000_RETA_VAL(reta, hash) & 0x0F)
 
+#define E1000_MRQ_RSS_TYPE_IPV4UDP 7
+#define E1000_MRQ_RSS_TYPE_IPV6UDP 8
+
 #define E1000_STATUS_IOV_MODE 0x0004
 
 #define E1000_STATUS_NUM_VFS_SHIFT 14
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 6d55b43fb4..41a2e5bf7b 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -287,6 +287,11 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
 return E1000_MRQ_RSS_TYPE_IPV4TCP;
 }
 
+if (l4hdr_proto == ETH_L4_HDR_PROTO_UDP &&
+(core->mac[MRQC] & E1000_MRQC_RSS_FIELD_IPV4_UDP)) {
+return E1000_MRQ_RSS_TYPE_IPV4UDP;
+}
+
 if (E1000_MRQC_EN_IPV4(core->mac[MRQC])) {
 return E1000_MRQ_RSS_TYPE_IPV4;
 }
@@ -322,6 +327,11 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
 return E1000_MRQ_RSS_TYPE_IPV6TCPEX;
 }
 
+if (l4hdr_proto == ETH_L4_HDR_PROTO_UDP &&
+(core->mac[MRQC] & E1000_MRQC_RSS_FIELD_IPV6_UDP)) {
+return E1000_MRQ_RSS_TYPE_IPV6UDP;
+}
+
 if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) {
 return E1000_MRQ_RSS_TYPE_IPV6EX;
 }
@@ -360,6 +370,12 @@ igb_rss_calc_hash(IGBCore *core, struct NetRxPkt *pkt, 
E1000E_RSSInfo *info)
 case E1000_MRQ_RSS_TYPE_IPV6EX:
 type = NetPktRssIpV6Ex;
 break;
+case E1000_MRQ_RSS_TYPE_IPV4UDP:
+type = NetPktRssIpV4Udp;
+break;
+case E1000_MRQ_RSS_TYPE_IPV6UDP:
+type = NetPktRssIpV6Udp;
+break;
 default:
 assert(false);
 return 0;
-- 
2.40.0




[PATCH v4 43/48] e1000e: Notify only new interrupts

2023-04-26 Thread Akihiko Odaki
In MSI-X mode, if there are interrupts already notified but not cleared
and a new interrupt arrives, e1000e incorrectly notifies the notified
ones again along with the new one.

To fix this issue, replace e1000e_update_interrupt_state() with
two new functions: e1000e_raise_interrupts() and
e1000e_lower_interrupts(). These functions don't only raise or lower
interrupts, but it also performs register writes which updates the
interrupt state. Before it performs a register write, these function
determines the interrupts already raised, and compares with the
interrupts raised after the register write to determine the interrupts
to notify.

The introduction of these functions made tracepoints which assumes that
the caller of e1000e_update_interrupt_state() performs register writes
obsolete. These tracepoints are now removed, and alternative ones are
added to the new functions.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.h |   2 -
 hw/net/e1000e_core.c | 153 +++
 hw/net/trace-events  |   2 +
 3 files changed, 69 insertions(+), 88 deletions(-)

diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
index 213a70530d..66b025cc43 100644
--- a/hw/net/e1000e_core.h
+++ b/hw/net/e1000e_core.h
@@ -111,8 +111,6 @@ struct E1000Core {
 PCIDevice *owner;
 void (*owner_start_recv)(PCIDevice *d);
 
-uint32_t msi_causes_pending;
-
 int64_t timadj;
 };
 
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index d601386992..9f185d099c 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -165,14 +165,14 @@ e1000e_intrmgr_on_throttling_timer(void *opaque)
 
 timer->running = false;
 
-if (msi_enabled(timer->core->owner)) {
-trace_e1000e_irq_msi_notify_postponed();
-/* Clear msi_causes_pending to fire MSI eventually */
-timer->core->msi_causes_pending = 0;
-e1000e_set_interrupt_cause(timer->core, 0);
-} else {
-trace_e1000e_irq_legacy_notify_postponed();
-e1000e_set_interrupt_cause(timer->core, 0);
+if (timer->core->mac[IMS] & timer->core->mac[ICR]) {
+if (msi_enabled(timer->core->owner)) {
+trace_e1000e_irq_msi_notify_postponed();
+msi_notify(timer->core->owner, 0);
+} else {
+trace_e1000e_irq_legacy_notify_postponed();
+e1000e_raise_legacy_irq(timer->core);
+}
 }
 }
 
@@ -366,10 +366,6 @@ static void
 e1000e_intrmgr_fire_all_timers(E1000ECore *core)
 {
 int i;
-uint32_t val = e1000e_intmgr_collect_delayed_causes(core);
-
-trace_e1000e_irq_adding_delayed_causes(val, core->mac[ICR]);
-core->mac[ICR] |= val;
 
 if (core->itr.running) {
 timer_del(core->itr.timer);
@@ -1974,13 +1970,6 @@ 
void(*e1000e_phyreg_writeops[E1000E_PHY_PAGES][E1000E_PHY_PAGE_SIZE])
 }
 };
 
-static inline void
-e1000e_clear_ims_bits(E1000ECore *core, uint32_t bits)
-{
-trace_e1000e_irq_clear_ims(bits, core->mac[IMS], core->mac[IMS] & ~bits);
-core->mac[IMS] &= ~bits;
-}
-
 static inline bool
 e1000e_postpone_interrupt(E1000IntrDelayTimer *timer)
 {
@@ -2038,7 +2027,6 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t cause, 
uint32_t int_cfg)
 effective_eiac = core->mac[EIAC] & cause;
 
 core->mac[ICR] &= ~effective_eiac;
-core->msi_causes_pending &= ~effective_eiac;
 
 if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
 core->mac[IMS] &= ~effective_eiac;
@@ -2130,33 +2118,17 @@ e1000e_fix_icr_asserted(E1000ECore *core)
 trace_e1000e_irq_fix_icr_asserted(core->mac[ICR]);
 }
 
-static void
-e1000e_send_msi(E1000ECore *core, bool msix)
+static void e1000e_raise_interrupts(E1000ECore *core,
+size_t index, uint32_t causes)
 {
-uint32_t causes = core->mac[ICR] & core->mac[IMS] & ~E1000_ICR_ASSERTED;
-
-core->msi_causes_pending &= causes;
-causes ^= core->msi_causes_pending;
-if (causes == 0) {
-return;
-}
-core->msi_causes_pending |= causes;
+bool is_msix = msix_enabled(core->owner);
+uint32_t old_causes = core->mac[IMS] & core->mac[ICR];
+uint32_t raised_causes;
 
-if (msix) {
-e1000e_msix_notify(core, causes);
-} else {
-if (!e1000e_itr_should_postpone(core)) {
-trace_e1000e_irq_msi_notify(causes);
-msi_notify(core->owner, 0);
-}
-}
-}
+trace_e1000e_irq_set(index << 2,
+ core->mac[index], core->mac[index] | causes);
 
-static void
-e1000e_update_interrupt_state(E1000ECore *core)
-{
-bool interrupts_pending;
-bool is_msix = msix_enabled(core->owner);
+core->mac[index] |= causes;
 
 /* Set ICR[OTHER] for MSI-X */
 if (is_msix) {
@@ -2178,40 +2150,58 @@ e1000e_update_interrupt_state(E1000ECore *core)
  */
 core->mac[ICS] = core->mac[ICR];
 
-interrupts_pending = (core->mac[IMS] & core->mac[ICR]) ? true : false;
-if (!interrupts_pending) {
-core->msi_causes_pending = 

[PATCH v2] tests/unit/test-blockjob: Re-enable complete_in_standby test

2023-04-26 Thread Emanuele Giuseppe Esposito
Pause the job while draining so that pause_count will be
increased and bdrv_drain_all_end() won't reset it to 0, so the
job will not continue.

With this fix, the test is not flaky anymore.

Additionally remove useless aiocontext lock around bdrv_drain_all_end()
in test_complete_in_standby().

Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable
complete_in_standby test"
Suggested-by: Hanna Czenczek 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/unit/test-blockjob.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index a130f6fefb..7c03958feb 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -488,11 +488,14 @@ static void test_complete_in_standby(void)
 bdrv_drain_all_begin();
 assert_job_status_is(job, JOB_STATUS_STANDBY);
 
-/* Lock the IO thread to prevent the job from being run */
-aio_context_acquire(ctx);
+/*
+ * Increase pause_count so that the counter is
+ * unbalanced and job won't resume
+ */
+job_pause(job);
+
 /* This will schedule the job to resume it */
 bdrv_drain_all_end();
-aio_context_release(ctx);
 
 WITH_JOB_LOCK_GUARD() {
 /* But the job cannot run, so it will remain on standby */
@@ -531,13 +534,6 @@ int main(int argc, char **argv)
 g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
 g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
 g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
-
-/*
- * This test is flaky and sometimes fails in CI and otherwise:
- * don't run unless user opts in via environment variable.
- */
-if (getenv("QEMU_TEST_FLAKY_TESTS")) {
-g_test_add_func("/blockjob/complete_in_standby", 
test_complete_in_standby);
-}
+g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby);
 return g_test_run();
 }
-- 
2.39.1




[PATCH v4 42/48] igb: Implement Tx timestamp

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h | 3 +++
 hw/net/igb_core.c | 7 +++
 2 files changed, 10 insertions(+)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index 894705599d..82ff195dfc 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -322,6 +322,9 @@ union e1000_adv_rx_desc {
 /* E1000_EITR_CNT_IGNR is only for 82576 and newer */
 #define E1000_EITR_CNT_IGNR 0x8000 /* Don't reset counters on write */
 
+#define E1000_TSYNCTXCTL_VALID0x0001 /* tx timestamp valid */
+#define E1000_TSYNCTXCTL_ENABLED  0x0010 /* enable tx timestampping */
+
 /* PCI Express Control */
 #define E1000_GCR_CMPL_TMOUT_MASK   0xF000
 #define E1000_GCR_CMPL_TMOUT_10ms   0x1000
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index e0433ae7cf..57973c3ae4 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -659,6 +659,13 @@ igb_process_tx_desc(IGBCore *core,
 tx->ctx[idx].vlan_macip_lens >> IGB_TX_FLAGS_VLAN_SHIFT,
 !!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE));
 
+if ((tx->first_cmd_type_len & E1000_ADVTXD_MAC_TSTAMP) &&
+(core->mac[TSYNCTXCTL] & E1000_TSYNCTXCTL_ENABLED) &&
+!(core->mac[TSYNCTXCTL] & E1000_TSYNCTXCTL_VALID)) {
+core->mac[TSYNCTXCTL] |= E1000_TSYNCTXCTL_VALID;
+e1000x_timestamp(core->mac, core->timadj, TXSTMPL, TXSTMPH);
+}
+
 if (igb_tx_pkt_send(core, tx, queue_index)) {
 igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index);
 }
-- 
2.40.0




[PATCH v4 41/48] igb: Implement Rx PTP2 timestamp

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_common.h |  16 +++---
 hw/net/igb_regs.h   |  23 
 hw/net/igb_core.c   | 129 
 3 files changed, 127 insertions(+), 41 deletions(-)

diff --git a/hw/net/igb_common.h b/hw/net/igb_common.h
index f2a9065791..5c261ba9d3 100644
--- a/hw/net/igb_common.h
+++ b/hw/net/igb_common.h
@@ -51,7 +51,7 @@
defreg_indexeda(x, 0), defreg_indexeda(x, 1), \
defreg_indexeda(x, 2), defreg_indexeda(x, 3)
 
-#define defregv(x) defreg_indexed(x, 0), defreg_indexed(x, 1),   \
+#define defreg8(x) defreg_indexed(x, 0), defreg_indexed(x, 1),   \
defreg_indexed(x, 2), defreg_indexed(x, 3),   \
defreg_indexed(x, 4), defreg_indexed(x, 5),   \
defreg_indexed(x, 6), defreg_indexed(x, 7)
@@ -122,6 +122,8 @@ enum {
 defreg(EICS),defreg(EIMS),defreg(EIMC),   defreg(EIAM),
 defreg(EICR),defreg(IVAR_MISC),   defreg(GPIE),
 
+defreg(TSYNCRXCFG), defreg8(ETQF),
+
 defreg(RXPBS),  defregd(RDBAL),   defregd(RDBAH), 
defregd(RDLEN),
 defregd(SRRCTL),defregd(RDH), defregd(RDT),
 defregd(RXDCTL),defregd(RXCTL),   defregd(RQDPC), defreg(RA2),
@@ -133,15 +135,15 @@ enum {
 
 defreg(VT_CTL),
 
-defregv(P2VMAILBOX), defregv(V2PMAILBOX), defreg(MBVFICR),
defreg(MBVFIMR),
+defreg8(P2VMAILBOX), defreg8(V2PMAILBOX), defreg(MBVFICR),
defreg(MBVFIMR),
 defreg(VFLRE),   defreg(VFRE),defreg(VFTE),   defreg(WVBR),
 defreg(QDE), defreg(DTXSWC),  defreg_indexed(VLVF, 0),
-defregv(VMOLR),  defreg(RPLOLR),  defregv(VMBMEM),
defregv(VMVIR),
+defreg8(VMOLR),  defreg(RPLOLR),  defreg8(VMBMEM),
defreg8(VMVIR),
 
-defregv(PVTCTRL),defregv(PVTEICS),defregv(PVTEIMS),   
defregv(PVTEIMC),
-defregv(PVTEIAC),defregv(PVTEIAM),defregv(PVTEICR),   
defregv(PVFGPRC),
-defregv(PVFGPTC),defregv(PVFGORC),defregv(PVFGOTC),   
defregv(PVFMPRC),
-defregv(PVFGPRLBC),  defregv(PVFGPTLBC),  defregv(PVFGORLBC), 
defregv(PVFGOTLBC),
+defreg8(PVTCTRL),defreg8(PVTEICS),defreg8(PVTEIMS),   
defreg8(PVTEIMC),
+defreg8(PVTEIAC),defreg8(PVTEIAM),defreg8(PVTEICR),   
defreg8(PVFGPRC),
+defreg8(PVFGPTC),defreg8(PVFGORC),defreg8(PVFGOTC),   
defreg8(PVFMPRC),
+defreg8(PVFGPRLBC),  defreg8(PVFGPTLBC),  defreg8(PVFGORLBC), 
defreg8(PVFGOTLBC),
 
 defreg(MTA_A),
 
diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index 4b4ebd3369..894705599d 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -210,6 +210,15 @@ union e1000_adv_rx_desc {
 #define E1000_DCA_TXCTRL_CPUID_SHIFT 24 /* Tx CPUID now in the last byte */
 #define E1000_DCA_RXCTRL_CPUID_SHIFT 24 /* Rx CPUID now in the last byte */
 
+/* ETQF register bit definitions */
+#define E1000_ETQF_FILTER_ENABLE   BIT(26)
+#define E1000_ETQF_1588BIT(30)
+#define E1000_ETQF_IMM_INT BIT(29)
+#define E1000_ETQF_QUEUE_ENABLEBIT(31)
+#define E1000_ETQF_QUEUE_SHIFT 16
+#define E1000_ETQF_QUEUE_MASK  0x0007
+#define E1000_ETQF_ETYPE_MASK  0x
+
 #define E1000_DTXSWC_MAC_SPOOF_MASK   0x00FF /* Per VF MAC spoof control */
 #define E1000_DTXSWC_VLAN_SPOOF_MASK  0xFF00 /* Per VF VLAN spoof control 
*/
 #define E1000_DTXSWC_LLE_MASK 0x00FF /* Per VF Local LB enables */
@@ -384,6 +393,20 @@ union e1000_adv_rx_desc {
 #define E1000_FRTIMER   0x01048  /* Free Running Timer - RW */
 #define E1000_FCRTV 0x02460  /* Flow Control Refresh Timer Value - RW */
 
+#define E1000_TSYNCRXCFG 0x05F50 /* Time Sync Rx Configuration - RW */
+
+/* Filtering Registers */
+#define E1000_SAQF(_n) (0x5980 + 4 * (_n))
+#define E1000_DAQF(_n) (0x59A0 + 4 * (_n))
+#define E1000_SPQF(_n) (0x59C0 + 4 * (_n))
+#define E1000_FTQF(_n) (0x59E0 + 4 * (_n))
+#define E1000_SAQF0 E1000_SAQF(0)
+#define E1000_DAQF0 E1000_DAQF(0)
+#define E1000_SPQF0 E1000_SPQF(0)
+#define E1000_FTQF0 E1000_FTQF(0)
+#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */
+#define E1000_ETQF(_n)  (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */
+
 #define E1000_RQDPC(_n) (0x0C030 + ((_n) * 0x40))
 
 #define E1000_RXPBS 0x02404  /* Rx Packet Buffer Size - RW */
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index c04ec01117..e0433ae7cf 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -72,6 +72,24 @@ typedef struct L2Header {
 struct vlan_header vlan[2];
 } L2Header;
 
+typedef struct PTP2 {
+uint8_t message_id_transport_specific;
+uint8_t version_ptp;
+uint16_t message_length;
+uint8_t subdomain_number;
+uint8_t reserved0;
+uint16_t flags;
+uint64_t correction;
+uint8_t reserved1[5];
+uint8_t source_communication_technology;
+uint32_t source_uuid_lo;
+uint16_t source_uuid_hi;
+uint16_t source_port_id;
+uint16_t sequ

[PATCH v4 19/48] igb: Always log status after building rx metadata

2023-04-26 Thread Akihiko Odaki
Without this change, the status flags may not be traced e.g. if checksum
offloading is disabled.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/igb_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 209fdad862..946b917f91 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1303,9 +1303,8 @@ igb_build_rx_metadata(IGBCore *core,
 trace_e1000e_rx_metadata_l4_cso_disabled();
 }
 
-trace_e1000e_rx_metadata_status_flags(*status_flags);
-
 func_exit:
+trace_e1000e_rx_metadata_status_flags(*status_flags);
 *status_flags = cpu_to_le32(*status_flags);
 }
 
-- 
2.40.0




[PATCH v4 48/48] docs/system/devices/igb: Note igb is tested for DPDK

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
---
 docs/system/devices/igb.rst | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst
index afe036dad2..60c10bf7c7 100644
--- a/docs/system/devices/igb.rst
+++ b/docs/system/devices/igb.rst
@@ -14,7 +14,8 @@ Limitations
 ===
 
 This igb implementation was tested with Linux Test Project [2]_ and Windows HLK
-[3]_ during the initial development. The command used when testing with LTP is:
+[3]_ during the initial development. Later it was also tested with DPDK Test
+Suite [4]_. The command used when testing with LTP is:
 
 .. code-block:: shell
 
@@ -22,8 +23,8 @@ This igb implementation was tested with Linux Test Project 
[2]_ and Windows HLK
 
 Be aware that this implementation lacks many functionalities available with the
 actual hardware, and you may experience various failures if you try to use it
-with a different operating system other than Linux and Windows or if you try
-functionalities not covered by the tests.
+with a different operating system other than DPDK, Linux, and Windows or if you
+try functionalities not covered by the tests.
 
 Using igb
 =
@@ -32,7 +33,7 @@ Using igb should be nothing different from using another 
network device. See
 :ref:`pcsys_005fnetwork` in general.
 
 However, you may also need to perform additional steps to activate SR-IOV
-feature on your guest. For Linux, refer to [4]_.
+feature on your guest. For Linux, refer to [5]_.
 
 Developing igb
 ==
@@ -68,4 +69,5 @@ References
 .. [1] 
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82576eb-gigabit-ethernet-controller-datasheet.pdf
 .. [2] https://github.com/linux-test-project/ltp
 .. [3] https://learn.microsoft.com/en-us/windows-hardware/test/hlk/
-.. [4] https://docs.kernel.org/PCI/pci-iov-howto.html
+.. [4] https://doc.dpdk.org/dts/gsg/
+.. [5] https://docs.kernel.org/PCI/pci-iov-howto.html
-- 
2.40.0




Re: [PATCH 01/13] hw/ide/pci: Expose legacy interrupts as GPIOs

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


Exposing the legacy IDE interrupts as GPIOs allows them to be connected in the
parent device through qdev_connect_gpio_out(), i.e. without accessing private
data of TYPE_PCI_IDE.

Signed-off-by: Bernhard Beschow 
---
  hw/ide/pci.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index fc9224bbc9..942e216b9b 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -522,10 +522,18 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState 
*d)
  bm->pci_dev = d;
  }
  
+static void pci_ide_init(Object *obj)

+{
+PCIIDEState *d = PCI_IDE(obj);
+
+qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));


Just one minor nit: can we make this qdev_init_gpio_out_named() and call it "isa-irq" 
to match? This is for 2 reasons: firstly these are PCI devices and so an unnamed 
IRQ/gpio could be considered to belong to PCI, and secondly it gives the gpio the 
same name as the struct field.


From my previous email I think this should supercede Phil's patch at 
https://patchew.org/QEMU/20230302224058.43315-1-phi...@linaro.org/20230302224058.43315-2-phi...@linaro.org/.



+}
+
  static const TypeInfo pci_ide_type_info = {
  .name = TYPE_PCI_IDE,
  .parent = TYPE_PCI_DEVICE,
  .instance_size = sizeof(PCIIDEState),
+.instance_init = pci_ide_init,
  .abstract = true,
  .interfaces = (InterfaceInfo[]) {
  { INTERFACE_CONVENTIONAL_PCI_DEVICE },


Otherwise:

Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.



[PATCH v4 38/48] igb: Strip the second VLAN tag for extended VLAN

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
---
 hw/net/net_rx_pkt.h  | 19 
 include/net/eth.h|  4 ++--
 hw/net/e1000e_core.c |  3 ++-
 hw/net/igb_core.c| 14 ++--
 hw/net/net_rx_pkt.c  | 15 +
 net/eth.c| 52 
 6 files changed, 65 insertions(+), 42 deletions(-)

diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
index ce8dbdb284..55ec67a1a7 100644
--- a/hw/net/net_rx_pkt.h
+++ b/hw/net/net_rx_pkt.h
@@ -223,18 +223,19 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
 /**
 * attach scatter-gather data to rx packet
 *
-* @pkt:packet
-* @iov:received data scatter-gather list
-* @iovcnt  number of elements in iov
-* @iovoff  data start offset in the iov
-* @strip_vlan: should the module strip vlan from data
-* @vet:VLAN tag Ethernet type
+* @pkt:  packet
+* @iov:  received data scatter-gather list
+* @iovcnt:   number of elements in iov
+* @iovoff:   data start offset in the iov
+* @strip_vlan_index: index of Q tag if it is to be stripped. negative 
otherwise.
+* @vet:  VLAN tag Ethernet type
+* @vet_ext:  outer VLAN tag Ethernet type
 *
 */
 void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt,
-   const struct iovec *iov, int iovcnt,
-   size_t iovoff, bool strip_vlan,
-   uint16_t vet);
+const struct iovec *iov, int iovcnt,
+size_t iovoff, int strip_vlan_index,
+uint16_t vet, uint16_t vet_ext);
 
 /**
  * attach data to rx packet
diff --git a/include/net/eth.h b/include/net/eth.h
index 75e7f1551c..3b80b6e07f 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -347,8 +347,8 @@ eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t 
iovoff,
uint16_t *payload_offset, uint16_t *tci);
 
 size_t
-eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff,
-  uint16_t vet, void *new_ehdr_buf,
+eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, int 
index,
+  uint16_t vet, uint16_t vet_ext, void *new_ehdr_buf,
   uint16_t *payload_offset, uint16_t *tci);
 
 uint16_t
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 0b939ff5a3..d601386992 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1711,7 +1711,8 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 }
 
 net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs,
-   e1000x_vlan_enabled(core->mac), core->mac[VET]);
+   e1000x_vlan_enabled(core->mac) ? 0 : -1,
+   core->mac[VET], 0);
 
 e1000e_rss_parse_packet(core, core->rx_pkt, &rss_info);
 e1000e_rx_ring_init(core, &rxr, rss_info.queue);
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 5eacf1cd8c..688eaf7319 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1611,6 +1611,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 E1000E_RxRing rxr;
 E1000E_RSSInfo rss_info;
 size_t total_size;
+int strip_vlan_index;
 int i;
 
 trace_e1000e_rx_receive_iov(iovcnt);
@@ -1672,9 +1673,18 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 
 igb_rx_ring_init(core, &rxr, i);
 
+if (!igb_rx_strip_vlan(core, rxr.i)) {
+strip_vlan_index = -1;
+} else if (core->mac[CTRL_EXT] & BIT(26)) {
+strip_vlan_index = 1;
+} else {
+strip_vlan_index = 0;
+}
+
 net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs,
-   igb_rx_strip_vlan(core, rxr.i),
-   core->mac[VET] & 0x);
+   strip_vlan_index,
+   core->mac[VET] & 0x,
+   core->mac[VET] >> 16);
 
 total_size = net_rx_pkt_get_total_len(core->rx_pkt) +
 e1000x_fcs_len(core->mac);
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 3575c8b9f9..32e5f3f9cf 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -137,20 +137,17 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
 
 void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt,
 const struct iovec *iov, int iovcnt,
-size_t iovoff, bool strip_vlan,
-uint16_t vet)
+size_t iovoff, int strip_vlan_index,
+uint16_t vet, uint16_t vet_ext)
 {
 uint16_t tci = 0;
 uint16_t ploff = iovoff;
 assert(pkt);
 
-if (strip_vlan) {
-pkt->

[PATCH v4 22/48] e1000e: Reset packet state after emptying Tx queue

2023-04-26 Thread Akihiko Odaki
Keeping Tx packet state after the transmit queue is emptied has some
problems:
- The datasheet says the descriptors can be reused after the transmit
  queue is emptied, but the Tx packet state may keep references to them.
- The Tx packet state cannot be migrated so it can be reset anytime the
  migration happens.

Always reset Tx packet state always after the queue is emptied.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 6a213c0224..7dce448657 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -959,6 +959,8 @@ e1000e_start_xmit(E1000ECore *core, const E1000E_TxRing 
*txr)
 if (!ide || !e1000e_intrmgr_delay_tx_causes(core, &cause)) {
 e1000e_set_interrupt_cause(core, cause);
 }
+
+net_tx_pkt_reset(txr->tx->tx_pkt, net_tx_pkt_unmap_frag_pci, core->owner);
 }
 
 static bool
@@ -3389,8 +3391,6 @@ e1000e_core_pci_uninit(E1000ECore *core)
 qemu_del_vm_change_state_handler(core->vmstate);
 
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt,
- net_tx_pkt_unmap_frag_pci, core->owner);
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
 
@@ -3515,8 +3515,6 @@ static void e1000e_reset(E1000ECore *core, bool sw)
 e1000x_reset_mac_addr(core->owner_nic, core->mac, core->permanent_mac);
 
 for (i = 0; i < ARRAY_SIZE(core->tx); i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt,
- net_tx_pkt_unmap_frag_pci, core->owner);
 memset(&core->tx[i].props, 0, sizeof(core->tx[i].props));
 core->tx[i].skip_cp = false;
 }
-- 
2.40.0




[PATCH v4 47/48] MAINTAINERS: Add a reviewer for network packet abstractions

2023-04-26 Thread Akihiko Odaki
I have made significant changes for network packet abstractions so add
me as a reviewer.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c31d2279ab..8b2ef5943c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2214,6 +2214,7 @@ F: tests/qtest/fuzz-megasas-test.c
 
 Network packet abstractions
 M: Dmitry Fleytman 
+R: Akihiko Odaki 
 S: Maintained
 F: include/net/eth.h
 F: net/eth.c
-- 
2.40.0




[PATCH v4 40/48] igb: Implement igb-specific oversize check

2023-04-26 Thread Akihiko Odaki
igb has a configurable size limit for LPE, and uses different limits
depending on whether the packet is treated as a VLAN packet.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 5345f57031..c04ec01117 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -980,16 +980,13 @@ igb_rx_l4_cso_enabled(IGBCore *core)
 return !!(core->mac[RXCSUM] & E1000_RXCSUM_TUOFLD);
 }
 
-static bool
-igb_rx_is_oversized(IGBCore *core, uint16_t qn, size_t size)
+static bool igb_rx_is_oversized(IGBCore *core, const struct eth_header *ehdr,
+size_t size, size_t vlan_num,
+bool lpe, uint16_t rlpml)
 {
-uint16_t pool = qn % IGB_NUM_VM_POOLS;
-bool lpe = !!(core->mac[VMOLR0 + pool] & E1000_VMOLR_LPE);
-int max_ethernet_lpe_size =
-core->mac[VMOLR0 + pool] & E1000_VMOLR_RLPML_MASK;
-int max_ethernet_vlan_size = 1522;
-
-return size > (lpe ? max_ethernet_lpe_size : max_ethernet_vlan_size);
+size_t vlan_header_size = sizeof(struct vlan_header) * vlan_num;
+size_t header_size = sizeof(struct eth_header) + vlan_header_size;
+return lpe ? size + ETH_FCS_LEN > rlpml : size > header_size + ETH_MTU;
 }
 
 static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header,
@@ -1002,6 +999,8 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 uint16_t queues = 0;
 uint16_t oversized = 0;
 size_t vlan_num = 0;
+bool lpe;
+uint16_t rlpml;
 int i;
 
 memset(rss_info, 0, sizeof(E1000E_RSSInfo));
@@ -1021,6 +1020,14 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 }
 }
 
+lpe = !!(core->mac[RCTL] & E1000_RCTL_LPE);
+rlpml = core->mac[RLPML];
+if (!(core->mac[RCTL] & E1000_RCTL_SBP) &&
+igb_rx_is_oversized(core, ehdr, size, vlan_num, lpe, rlpml)) {
+trace_e1000x_rx_oversized(size);
+return queues;
+}
+
 if (vlan_num &&
 !e1000x_rx_vlan_filter(core->mac, l2_header->vlan + vlan_num - 1)) {
 return queues;
@@ -1106,7 +1113,11 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 queues &= core->mac[VFRE];
 if (queues) {
 for (i = 0; i < IGB_NUM_VM_POOLS; i++) {
-if ((queues & BIT(i)) && igb_rx_is_oversized(core, i, size)) {
+lpe = !!(core->mac[VMOLR0 + i] & E1000_VMOLR_LPE);
+rlpml = core->mac[VMOLR0 + i] & E1000_VMOLR_RLPML_MASK;
+if ((queues & BIT(i)) &&
+igb_rx_is_oversized(core, ehdr, size, vlan_num,
+lpe, rlpml)) {
 oversized |= BIT(i);
 }
 }
@@ -1662,11 +1673,6 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 iov_to_buf(iov, iovcnt, iov_ofs, &buf, sizeof(buf.l2_header));
 }
 
-/* Discard oversized packets if !LPE and !SBP. */
-if (e1000x_is_oversized(core->mac, size)) {
-return orig_size;
-}
-
 net_rx_pkt_set_packet_type(core->rx_pkt,
get_eth_packet_type(&buf.l2_header.eth));
 net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs);
-- 
2.40.0




[PATCH v4 39/48] igb: Filter with the second VLAN tag for extended VLAN

2023-04-26 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 688eaf7319..5345f57031 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -69,7 +69,7 @@ typedef struct IGBTxPktVmdqCallbackContext {
 
 typedef struct L2Header {
 struct eth_header eth;
-struct vlan_header vlan;
+struct vlan_header vlan[2];
 } L2Header;
 
 static ssize_t
@@ -1001,7 +1001,7 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 uint32_t f, ra[2], *macp, rctl = core->mac[RCTL];
 uint16_t queues = 0;
 uint16_t oversized = 0;
-uint16_t vid = be16_to_cpu(l2_header->vlan.h_tci) & VLAN_VID_MASK;
+size_t vlan_num = 0;
 int i;
 
 memset(rss_info, 0, sizeof(E1000E_RSSInfo));
@@ -1010,8 +1010,19 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 *external_tx = true;
 }
 
-if (e1000x_is_vlan_packet(ehdr, core->mac[VET] & 0x) &&
-!e1000x_rx_vlan_filter(core->mac, PKT_GET_VLAN_HDR(ehdr))) {
+if (core->mac[CTRL_EXT] & BIT(26)) {
+if (be16_to_cpu(ehdr->h_proto) == core->mac[VET] >> 16 &&
+be16_to_cpu(l2_header->vlan[0].h_proto) == (core->mac[VET] & 
0x)) {
+vlan_num = 2;
+}
+} else {
+if (be16_to_cpu(ehdr->h_proto) == (core->mac[VET] & 0x)) {
+vlan_num = 1;
+}
+}
+
+if (vlan_num &&
+!e1000x_rx_vlan_filter(core->mac, l2_header->vlan + vlan_num - 1)) {
 return queues;
 }
 
@@ -1065,7 +1076,9 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 if (e1000x_vlan_rx_filter_enabled(core->mac)) {
 uint16_t mask = 0;
 
-if (e1000x_is_vlan_packet(ehdr, core->mac[VET] & 0x)) {
+if (vlan_num) {
+uint16_t vid = be16_to_cpu(l2_header->vlan[vlan_num - 
1].h_tci) & VLAN_VID_MASK;
+
 for (i = 0; i < E1000_VLVF_ARRAY_SIZE; i++) {
 if ((core->mac[VLVF0 + i] & E1000_VLVF_VLANID_MASK) == vid 
&&
 (core->mac[VLVF0 + i] & E1000_VLVF_VLANID_ENABLE)) {
-- 
2.40.0




[PATCH v4 03/48] e1000x: Fix BPRC and MPRC

2023-04-26 Thread Akihiko Odaki
Before this change, e1000 and the common code updated BPRC and MPRC
depending on the matched filter, but e1000e and igb decided to update
those counters by deriving the packet type independently. This
inconsistency caused a multicast packet to be counted twice.

Updating BPRC and MPRC depending on are fundamentally flawed anyway as
a filter can be used for different types of packets. For example, it is
possible to filter broadcast packets with MTA.

Always determine what counters to update by inspecting the packets.

Fixes: 3b27430177 ("e1000: Implementing various counters")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/e1000x_common.h |  5 +++--
 hw/net/e1000.c |  6 +++---
 hw/net/e1000e_core.c   | 20 +++-
 hw/net/e1000x_common.c | 25 +++--
 hw/net/igb_core.c  | 22 +-
 5 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/hw/net/e1000x_common.h b/hw/net/e1000x_common.h
index 911abd8a90..0298e06283 100644
--- a/hw/net/e1000x_common.h
+++ b/hw/net/e1000x_common.h
@@ -91,8 +91,9 @@ e1000x_update_regs_on_link_up(uint32_t *mac, uint16_t *phy)
 }
 
 void e1000x_update_rx_total_stats(uint32_t *mac,
-  size_t data_size,
-  size_t data_fcs_size);
+  eth_pkt_types_e pkt_type,
+  size_t pkt_size,
+  size_t pkt_fcs_size);
 
 void e1000x_core_prepare_eeprom(uint16_t   *eeprom,
 const uint16_t *templ,
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 59bacb5d3b..18eb6d8876 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -826,12 +826,10 @@ receive_filter(E1000State *s, const uint8_t *buf, int 
size)
 }
 
 if (ismcast && (rctl & E1000_RCTL_MPE)) {  /* promiscuous mcast */
-e1000x_inc_reg_if_not_full(s->mac_reg, MPRC);
 return 1;
 }
 
 if (isbcast && (rctl & E1000_RCTL_BAM)) {  /* broadcast enabled */
-e1000x_inc_reg_if_not_full(s->mac_reg, BPRC);
 return 1;
 }
 
@@ -922,6 +920,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 size_t desc_offset;
 size_t desc_size;
 size_t total_size;
+eth_pkt_types_e pkt_type;
 
 if (!e1000x_hw_rx_enabled(s->mac_reg)) {
 return -1;
@@ -971,6 +970,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 size -= 4;
 }
 
+pkt_type = get_eth_packet_type(PKT_GET_ETH_HDR(filter_buf));
 rdh_start = s->mac_reg[RDH];
 desc_offset = 0;
 total_size = size + e1000x_fcs_len(s->mac_reg);
@@ -1036,7 +1036,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 }
 } while (desc_offset < total_size);
 
-e1000x_update_rx_total_stats(s->mac_reg, size, total_size);
+e1000x_update_rx_total_stats(s->mac_reg, pkt_type, size, total_size);
 
 n = E1000_ICS_RXT0;
 if ((rdt = s->mac_reg[RDT]) < s->mac_reg[RDH])
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 15821a75e0..c2d864a504 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1488,24 +1488,10 @@ e1000e_write_to_rx_buffers(E1000ECore *core,
 }
 
 static void
-e1000e_update_rx_stats(E1000ECore *core,
-   size_t data_size,
-   size_t data_fcs_size)
+e1000e_update_rx_stats(E1000ECore *core, size_t pkt_size, size_t pkt_fcs_size)
 {
-e1000x_update_rx_total_stats(core->mac, data_size, data_fcs_size);
-
-switch (net_rx_pkt_get_packet_type(core->rx_pkt)) {
-case ETH_PKT_BCAST:
-e1000x_inc_reg_if_not_full(core->mac, BPRC);
-break;
-
-case ETH_PKT_MCAST:
-e1000x_inc_reg_if_not_full(core->mac, MPRC);
-break;
-
-default:
-break;
-}
+eth_pkt_types_e pkt_type = net_rx_pkt_get_packet_type(core->rx_pkt);
+e1000x_update_rx_total_stats(core->mac, pkt_type, pkt_size, pkt_fcs_size);
 }
 
 static inline bool
diff --git a/hw/net/e1000x_common.c b/hw/net/e1000x_common.c
index 4c8e7dcf70..7694673bcc 100644
--- a/hw/net/e1000x_common.c
+++ b/hw/net/e1000x_common.c
@@ -80,7 +80,6 @@ bool e1000x_rx_group_filter(uint32_t *mac, const uint8_t *buf)
 f = mta_shift[(rctl >> E1000_RCTL_MO_SHIFT) & 3];
 f = (((buf[5] << 8) | buf[4]) >> f) & 0xfff;
 if (mac[MTA + (f >> 5)] & (1 << (f & 0x1f))) {
-e1000x_inc_reg_if_not_full(mac, MPRC);
 return true;
 }
 
@@ -212,13 +211,14 @@ e1000x_rxbufsize(uint32_t rctl)
 
 void
 e1000x_update_rx_total_stats(uint32_t *mac,
- size_t data_size,
- size_t data_fcs_size)
+ eth_pkt_types_e pkt_type,
+ size_t pkt_size,
+ size_t pkt_fcs_size)
 {
 static const int PRCregs[6] = { PRC64, PRC127, PRC255, PRC511,
   

Re: [PATCH 02/13] hw/ide/via: Implement ISA IRQ routing

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


The VIA south bridge allows the legacy IDE interrupts to be routed to four
different ISA interrupts. This can be configured through the 0x4a register in
the PCI configuration space of the ISA function. The default routing matches
the legacy ISA IRQs, that is 14 and 15.

Implement this missing piece of the VIA south bridge.

Signed-off-by: Bernhard Beschow 
---
  hw/ide/via.c  |  6 --
  hw/isa/vt82c686.c | 17 +
  2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 177baea9a7..0caae52276 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -31,6 +31,7 @@
  #include "sysemu/dma.h"
  #include "hw/isa/vt82c686.h"
  #include "hw/ide/pci.h"
+#include "hw/irq.h"
  #include "trace.h"
  
  static uint64_t bmdma_read(void *opaque, hwaddr addr,

@@ -104,7 +105,8 @@ static void bmdma_setup_bar(PCIIDEState *d)
  
  static void via_ide_set_irq(void *opaque, int n, int level)

  {
-PCIDevice *d = PCI_DEVICE(opaque);
+PCIIDEState *s = opaque;
+PCIDevice *d = PCI_DEVICE(s);
  
  if (level) {

  d->config[0x70 + n * 8] |= 0x80;
@@ -112,7 +114,7 @@ static void via_ide_set_irq(void *opaque, int n, int level)
  d->config[0x70 + n * 8] &= ~0x80;
  }
  
-via_isa_set_irq(pci_get_function_0(d), 14 + n, level);

+qemu_set_irq(s->isa_irq[n], level);
  }
  
  static void via_ide_reset(DeviceState *dev)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index ca89119ce0..c7e29bb46a 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -568,9 +568,19 @@ static const VMStateDescription vmstate_via = {
  }
  };
  
+static void via_isa_set_ide_irq(void *opaque, int n, int level)

+{
+static const uint8_t irqs[] = { 14, 15, 10, 11 };
+ViaISAState *s = opaque;
+uint8_t irq = irqs[(s->dev.config[0x4a] >> (n * 2)) & 0x3];
+
+qemu_set_irq(s->isa_irqs_in[irq], level);
+}
+
  static void via_isa_init(Object *obj)
  {
  ViaISAState *s = VIA_ISA(obj);
+DeviceState *dev = DEVICE(s);
  
  object_initialize_child(obj, "rtc", &s->rtc, TYPE_MC146818_RTC);

  object_initialize_child(obj, "ide", &s->ide, TYPE_VIA_IDE);
@@ -578,6 +588,8 @@ static void via_isa_init(Object *obj)
  object_initialize_child(obj, "uhci2", &s->uhci[1], 
TYPE_VT82C686B_USB_UHCI);
  object_initialize_child(obj, "ac97", &s->ac97, TYPE_VIA_AC97);
  object_initialize_child(obj, "mc97", &s->mc97, TYPE_VIA_MC97);
+
+qdev_init_gpio_in_named(dev, via_isa_set_ide_irq, "ide", 
ARRAY_SIZE(s->ide.isa_irq));
  }
  
  static const TypeInfo via_isa_info = {

@@ -692,6 +704,10 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
  if (!qdev_realize(DEVICE(&s->ide), BUS(pci_bus), errp)) {
  return;
  }
+for (i = 0; i < 2; i++) {
+qdev_connect_gpio_out(DEVICE(&s->ide), i,
+  qdev_get_gpio_in_named(DEVICE(s), "ide", i));
+}
  
  /* Functions 2-3: USB Ports */

  for (i = 0; i < ARRAY_SIZE(s->uhci); i++) {
@@ -814,6 +830,7 @@ static void vt8231_isa_reset(DeviceState *dev)
   PCI_COMMAND_MASTER | PCI_COMMAND_SPECIAL);
  pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM);
  
+pci_conf[0x4a] = 0x04; /* IDE interrupt Routing */

  pci_conf[0x58] = 0x40; /* Miscellaneous Control 0 */
  pci_conf[0x67] = 0x08; /* Fast IR Config */
  pci_conf[0x6b] = 0x01; /* Fast IR I/O Base */


I see there is still some further discussion on the exact datasheet being used, 
however the basic mechanism of wiring up the IDE IRQs using 
qdev_connect_gpio_out{_named}() in via_isa_realize():


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.



Re: [PATCH 03/13] hw/isa/vt82c686: Remove via_isa_set_irq()

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


Now that via_isa_set_irq() is unused it can be removed.

Signed-off-by: Bernhard Beschow 
---
  include/hw/isa/vt82c686.h | 2 --
  hw/isa/vt82c686.c | 6 --
  2 files changed, 8 deletions(-)

diff --git a/include/hw/isa/vt82c686.h b/include/hw/isa/vt82c686.h
index da1722daf2..b6e95b2851 100644
--- a/include/hw/isa/vt82c686.h
+++ b/include/hw/isa/vt82c686.h
@@ -34,6 +34,4 @@ struct ViaAC97State {
  uint32_t ac97_cmd;
  };
  
-void via_isa_set_irq(PCIDevice *d, int n, int level);

-
  #endif
diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index c7e29bb46a..a69888a396 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -604,12 +604,6 @@ static const TypeInfo via_isa_info = {
  },
  };
  
-void via_isa_set_irq(PCIDevice *d, int n, int level)

-{
-ViaISAState *s = VIA_ISA(d);
-qemu_set_irq(s->isa_irqs_in[n], level);
-}
-
  static void via_isa_request_i8259_irq(void *opaque, int irq, int level)
  {
  ViaISAState *s = opaque;


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.



Re: [PATCH 04/13] hw/ide: Extract IDEBus assignment into bmdma_init()

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


Every invocation of bmdma_init() is followed by `d->bmdma[i].bus = &d->bus[i]`.
Resolve this redundancy by extracting it into bmdma_init().

Signed-off-by: Bernhard Beschow 
---
  hw/ide/cmd646.c  | 1 -
  hw/ide/pci.c | 1 +
  hw/ide/piix.c| 1 -
  hw/ide/sii3112.c | 1 -
  hw/ide/via.c | 1 -
  5 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index a68357c1c5..a094a6e12a 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -297,7 +297,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
  ide_bus_init_output_irq(&d->bus[i], qdev_get_gpio_in(ds, i));
  
  bmdma_init(&d->bus[i], &d->bmdma[i], d);

-d->bmdma[i].bus = &d->bus[i];
  ide_bus_register_restart_cb(&d->bus[i]);
  }
  }
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 942e216b9b..67e0998ff0 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -519,6 +519,7 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d)
  bus->dma = &bm->dma;
  bm->irq = bus->irq;
  bus->irq = qemu_allocate_irq(bmdma_irq, bm, 0);
+bm->bus = bus;
  bm->pci_dev = d;
  }
  
diff --git a/hw/ide/piix.c b/hw/ide/piix.c

index 41d60921e3..a32f7ccece 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -144,7 +144,6 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, 
Error **errp)
  ide_bus_init_output_irq(&d->bus[i], isa_get_irq(NULL, 
port_info[i].isairq));
  
  bmdma_init(&d->bus[i], &d->bmdma[i], d);

-d->bmdma[i].bus = &d->bus[i];
  ide_bus_register_restart_cb(&d->bus[i]);
  
  return true;

diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index f9becdff8e..5dd3d03c29 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -287,7 +287,6 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
  ide_bus_init_output_irq(&s->bus[i], qdev_get_gpio_in(ds, i));
  
  bmdma_init(&s->bus[i], &s->bmdma[i], s);

-s->bmdma[i].bus = &s->bus[i];
  ide_bus_register_restart_cb(&s->bus[i]);
  }
  }
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 0caae52276..91253fa4ef 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -196,7 +196,6 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
  ide_bus_init_output_irq(&d->bus[i], qdev_get_gpio_in(ds, i));
  
  bmdma_init(&d->bus[i], &d->bmdma[i], d);

-d->bmdma[i].bus = &d->bus[i];
  ide_bus_register_restart_cb(&d->bus[i]);
  }
  }


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.



Re: [RFC PATCH 0/3] Deprecate the qemu-system-i386 binary

2023-04-26 Thread Paolo Bonzini

On 4/25/23 15:38, Thomas Huth wrote:

- CPU types have different suffixes between the -x86_64 and -i386
   variant (see TYPE_X86_CPU in cpu-qom.h) ... do we need to care
   about this in the new qemu-system-i386 symlink run mode?

- The code in target/i386/tcg/sysemu/smm_helper.c looks like it
   maybe needs a runtime switch, too ... or is it ok to leave this
   hard-coded to the x86_64 version?


Yes, it would have to switch based on the CPU's LM feature.


Anyway, I'd like to get some feedback on this idea here... What
do you think of the idea of getting rid of the qemu-system-i386
binary this way in the future?


I wonder if we should take this a step further and rename 
qemu-system-x86_64 to qemu-system-x86!  Distros can if they wish create 
symlinks to both qemu-system-i386 and qemu-system-x86_64.


Then we would name the CPUs "foo-x86" and alias them to foo-x86_64 and, 
if they don't have LM set, to foo-i386 as well.


Paolo




Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()

2023-04-26 Thread Juan Quintela
"manish.mishra"  wrote:
> On 26/04/23 3:58 pm, Juan Quintela wrote:
>> "manish.mishra"  wrote:
>>> multifd_send_sync_main() posts request on the multifd channel
>>> but does not call sem_wait() on channels_ready semaphore, making
>>> the channels_ready semaphore count keep increasing.
>>> As a result, sem_wait() on channels_ready in multifd_send_pages()
>>> is always non-blocking hence multifd_send_pages() keeps searching
>>> for a free channel in a busy loop until a channel is freed.
>>>
>>> Signed-off-by: manish.mishra 
>>> ---
>>>   migration/multifd.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index cce3ad6988..43d26e7012 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f)
>>> trace_multifd_send_sync_main_signal(p->id);
>>>   +qemu_sem_wait(&multifd_send_state->channels_ready);
>>>   qemu_mutex_lock(&p->mutex);
>>> if (p->quit) {
>> We need this, but I think it is better to put it on the second loop.
>>
>>> @@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp)
>>>   multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
>>>   multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
>>>   multifd_send_state->pages = multifd_pages_init(page_count);
>>> -qemu_sem_init(&multifd_send_state->channels_ready, 0);
>>> +qemu_sem_init(&multifd_send_state->channels_ready, thread_count);
>>>   qatomic_set(&multifd_send_state->exiting, 0);
>>>   multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
>> I think this bit is wrong.
>> We should not set the channels ready until the thread is ready and
>> channel is created.
>>
>> What do you think about this patch:
>>
>>  From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001
>> From: Juan Quintela 
>> Date: Wed, 26 Apr 2023 12:20:36 +0200
>> Subject: [PATCH] multifd: Fix the number of channels ready
>>
>> We don't wait in the sem when we are doing a sync_main.  Make it wait
>> there.  To make things clearer, we mark the channel ready at the
>> begining of the thread loop.
>>
>> This causes a busy loop in multifd_send_page().
>> Found-by: manish.mishra 
>>
>> Signed-off-by: Juan Quintela 
>> ---
>>   migration/multifd.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 903df2117b..e625e8725e 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f)
>>   for (i = 0; i < migrate_multifd_channels(); i++) {
>>   MultiFDSendParams *p = &multifd_send_state->params[i];
>>   +qemu_sem_wait(&multifd_send_state->channels_ready);
>>   trace_multifd_send_sync_main_wait(p->id);
>>   qemu_sem_wait(&p->sem_sync);
>>   @@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque)
>>   p->num_packets = 1;
>> while (true) {
>> +qemu_sem_post(&multifd_send_state->channels_ready);
>
>
> This has one issue though, if we mark channel_ready here itself, channel is 
> actually not ready so we can still busy loop?

Before:

while (true) {

sem_post(channels_ready)
}

And you want to add to the initialization a counter equal to the number
of channels.

Now:

while (true) {
sem_post(channels_ready)

}

It is semantically the same, but when we setup it ready it means that
when we set it to 1, we now that the channel and thread are ready for
action.

> May be we can do one thing let the sem_post in while loop at same
> position itself. But we can do another post just before start

I can see how this can make any difference.


> of this while loop, as that will be called only once it should do work
> of initialising count equal to multiFD channels?

Yeap.  But I can see what difference do we have here.

Later, Juan.




Re: [PATCH 05/13] hw/ide: Extract pci_ide_class_init()

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


Resolves redundant code in every PCI IDE device model.


I think this needs to mention that it's moving the PCIDeviceClass::exit() function 
from all of the PCI IDE controller implementations to a common implementation in the 
parent PCI_IDE type.



---
  include/hw/ide/pci.h |  1 -
  hw/ide/cmd646.c  | 15 ---
  hw/ide/pci.c | 25 -
  hw/ide/piix.c| 19 ---
  hw/ide/sii3112.c |  3 ++-
  hw/ide/via.c | 15 ---
  6 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 74c127e32f..7bc4e53d02 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -61,7 +61,6 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
  extern MemoryRegionOps bmdma_addr_ioport_ops;
  void pci_ide_create_devs(PCIDevice *dev);
  
-extern const VMStateDescription vmstate_ide_pci;

  extern const MemoryRegionOps pci_ide_cmd_le_ops;
  extern const MemoryRegionOps pci_ide_data_le_ops;
  #endif
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index a094a6e12a..9aabf80e52 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -301,17 +301,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
  }
  }
  
-static void pci_cmd646_ide_exitfn(PCIDevice *dev)

-{
-PCIIDEState *d = PCI_IDE(dev);
-unsigned i;
-
-for (i = 0; i < 2; ++i) {
-memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io);
-memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport);
-}
-}
-
  static Property cmd646_ide_properties[] = {
  DEFINE_PROP_UINT32("secondary", PCIIDEState, secondary, 0),
  DEFINE_PROP_END_OF_LIST(),
@@ -323,17 +312,13 @@ static void cmd646_ide_class_init(ObjectClass *klass, 
void *data)
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
  dc->reset = cmd646_reset;

-dc->vmsd = &vmstate_ide_pci;
  k->realize = pci_cmd646_ide_realize;
-k->exit = pci_cmd646_ide_exitfn;
  k->vendor_id = PCI_VENDOR_ID_CMD;
  k->device_id = PCI_DEVICE_ID_CMD_646;
  k->revision = 0x07;
-k->class_id = PCI_CLASS_STORAGE_IDE;
  k->config_read = cmd646_pci_config_read;
  k->config_write = cmd646_pci_config_write;
  device_class_set_props(dc, cmd646_ide_properties);
-set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
  }
  
  static const TypeInfo cmd646_ide_info = {

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 67e0998ff0..8bea92e394 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -467,7 +467,7 @@ static int ide_pci_post_load(void *opaque, int version_id)
  return 0;
  }
  
-const VMStateDescription vmstate_ide_pci = {

+static const VMStateDescription vmstate_ide_pci = {
  .name = "ide",
  .version_id = 3,
  .minimum_version_id = 0,
@@ -530,11 +530,34 @@ static void pci_ide_init(Object *obj)
  qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
  }
  
+static void pci_ide_exitfn(PCIDevice *dev)

+{
+PCIIDEState *d = PCI_IDE(dev);
+unsigned i;
+
+for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) {
+memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io);
+memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport);
+}
+}
+
+static void pci_ide_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+dc->vmsd = &vmstate_ide_pci;
+k->exit = pci_ide_exitfn;
+k->class_id = PCI_CLASS_STORAGE_IDE;
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+}
+
  static const TypeInfo pci_ide_type_info = {
  .name = TYPE_PCI_IDE,
  .parent = TYPE_PCI_DEVICE,
  .instance_size = sizeof(PCIIDEState),
  .instance_init = pci_ide_init,
+.class_init = pci_ide_class_init,
  .abstract = true,
  .interfaces = (InterfaceInfo[]) {
  { INTERFACE_CONVENTIONAL_PCI_DEVICE },
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a32f7ccece..4e6ca99123 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -159,8 +159,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  bmdma_setup_bar(d);
  pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar);
  
-vmstate_register(VMSTATE_IF(dev), 0, &vmstate_ide_pci, d);

-


Presumably this still survives migration between a pre-series and post-series QEMU 
using the PIIX IDE controller?



  for (unsigned i = 0; i < 2; i++) {
  if (!pci_piix_init_bus(d, i, errp)) {
  return;
@@ -168,17 +166,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  }
  }
  
-static void pci_piix_ide_exitfn(PCIDevice *dev)

-{
-PCIIDEState *d = PCI_IDE(dev);
-unsigned i;
-
-for (i = 0; i < 2; ++i) {
-memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io);
-memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport);
-}
-}

Re: [PATCH] linux-user: report ENOTTY for unknown ioctls

2023-04-26 Thread Philippe Mathieu-Daudé

On 26/4/23 09:06, Thomas Weißschuh wrote:

The correct error number for unknown ioctls is ENOTTY.

ENOSYS would mean that the ioctl() syscall itself is not implemented,
which is very improbable and unexpected for userspace.

ENOTTY means "Inappropriate ioctl for device". This is what the kernel
returns on unknown ioctls, what qemu is trying to express and what
userspace is prepared to handle.

Signed-off-by: Thomas Weißschuh 
---
  linux-user/syscall.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PULL 00/17] QAPI patches patches for 2023-04-26

2023-04-26 Thread Richard Henderson

On 4/26/23 06:57, Markus Armbruster wrote:

The following changes since commit 327ec8d6c2a2223b78d311153a471036e474c5c5:

   Merge tag 'pull-tcg-20230423' ofhttps://gitlab.com/rth7680/qemu  into 
staging (2023-04-23 11:20:37 +0100)

are available in the Git repository at:

   https://repo.or.cz/qemu/armbru.git  tags/pull-qapi-2023-04-26

for you to fetch changes up to a17dbc4b79a28ffb9511f192474ffefd88214cde:

   qapi: allow unions to contain further unions (2023-04-26 07:52:45 +0200)


QAPI patches patches for 2023-04-26


Daniel P. Berrangé (2):
   qapi: support updating expected test output via make
   qapi: allow unions to contain further unions

Markus Armbruster (15):
   qapi: Fix error message format regression
   qapi/schema: Use super()
   qapi: Clean up after removal of simple unions
   qapi: Split up check_type()
   qapi: Improve error message for unexpected array types
   qapi: Simplify code a bit after previous commits
   qapi: Fix error message when type name or array is expected
   qapi: Fix to reject 'data': 'mumble' in struct
   tests/qapi-schema: Improve union discriminator coverage
   tests/qapi-schema: Rename a few conditionals
   tests/qapi-schema: Clean up positive test for conditionals
   tests/qapi-schema: Cover optional conditional struct member
   qapi: Fix code generated for optional conditional struct member
   qapi: Require boxed for conditional command and event arguments
   qapi: Improve specificity of type/member descriptions


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PULL 00/25] Block layer patches

2023-04-26 Thread Richard Henderson

On 4/25/23 14:13, Kevin Wolf wrote:

The following changes since commit ac5f7bf8e208cd7893dbb1a9520559e569a4677c:

   Merge tag 'migration-20230424-pull-request' 
ofhttps://gitlab.com/juan.quintela/qemu  into staging (2023-04-24 15:00:39 
+0100)

are available in the Git repository at:

   https://repo.or.cz/qemu/kevin.git  tags/for-upstream

for you to fetch changes up to 8c1e8fb2e7fc2cbeb57703e143965a4cd3ad301a:

   block/monitor: Fix crash when executing HMP commit (2023-04-25 15:11:57 
+0200)


Block layer patches

- Protect BlockBackend.queued_requests with its own lock
- Switch to AIO_WAIT_WHILE_UNLOCKED() where possible
- AioContext removal: LinuxAioState/LuringState/ThreadPool
- Add more coroutine_fn annotations, use bdrv/blk_co_*
- Fix crash when execute hmp_commit


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PATCH 06/13] hw/ide: Extract bmdma_init_ops()

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


There are three private copies of bmdma_setup_bar() with small adaptions.
Consolidate them into one public implementation.

While at it rename the function to bmdma_init_ops() to reflect that the memory
regions being initialized represent BMDMA operations. The actual mapping as a
PCI BAR is still performed separately in each device.

Note that the bmdma_bar attribute will be renamed in a separate commit.

Signed-off-by: Bernhard Beschow 
---
  include/hw/ide/pci.h |  1 +
  hw/ide/cmd646.c  | 20 +---
  hw/ide/pci.c | 16 
  hw/ide/piix.c| 19 +--
  hw/ide/via.c | 19 +--
  5 files changed, 20 insertions(+), 55 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 7bc4e53d02..597c77c7ad 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -57,6 +57,7 @@ struct PCIIDEState {
  };
  
  void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);

+void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops);
  void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
  extern MemoryRegionOps bmdma_addr_ioport_ops;
  void pci_ide_create_devs(PCIDevice *dev);
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 9aabf80e52..6fd09fe74e 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -161,24 +161,6 @@ static const MemoryRegionOps cmd646_bmdma_ops = {
  .write = bmdma_write,
  };
  
-static void bmdma_setup_bar(PCIIDEState *d)

-{
-BMDMAState *bm;
-int i;
-
-memory_region_init(&d->bmdma_bar, OBJECT(d), "cmd646-bmdma", 16);
-for(i = 0;i < 2; i++) {
-bm = &d->bmdma[i];
-memory_region_init_io(&bm->extra_io, OBJECT(d), &cmd646_bmdma_ops, bm,
-  "cmd646-bmdma-bus", 4);
-memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io);
-memory_region_init_io(&bm->addr_ioport, OBJECT(d),
-  &bmdma_addr_ioport_ops, bm,
-  "cmd646-bmdma-ioport", 4);
-memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, 
&bm->addr_ioport);
-}
-}
-
  static void cmd646_update_irq(PCIDevice *pd)
  {
  int pci_level;
@@ -285,7 +267,7 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
&d->bus[1], "cmd646-cmd1", 4);
  pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]);
  
-bmdma_setup_bar(d);

+bmdma_init_ops(d, &cmd646_bmdma_ops);
  pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar);
  
  /* TODO: RST# value should be 0 */

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 8bea92e394..65ed6f7f72 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -523,6 +523,22 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState 
*d)
  bm->pci_dev = d;
  }
  
+void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops)

+{
+size_t i;
+
+memory_region_init(&d->bmdma_bar, OBJECT(d), "bmdma-container", 16);
+for (i = 0; i < ARRAY_SIZE(d->bmdma); i++) {
+BMDMAState *bm = &d->bmdma[i];
+
+memory_region_init_io(&bm->extra_io, OBJECT(d), bmdma_ops, bm, 
"bmdma-ops", 4);
+memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io);
+memory_region_init_io(&bm->addr_ioport, OBJECT(d), 
&bmdma_addr_ioport_ops, bm,
+  "bmdma-ioport-ops", 4);
+memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, 
&bm->addr_ioport);
+}
+}
+
  static void pci_ide_init(Object *obj)
  {
  PCIIDEState *d = PCI_IDE(obj);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 4e6ca99123..5611473d37 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -86,23 +86,6 @@ static const MemoryRegionOps piix_bmdma_ops = {
  .write = bmdma_write,
  };
  
-static void bmdma_setup_bar(PCIIDEState *d)

-{
-int i;
-
-memory_region_init(&d->bmdma_bar, OBJECT(d), "piix-bmdma-container", 16);
-for(i = 0;i < 2; i++) {
-BMDMAState *bm = &d->bmdma[i];
-
-memory_region_init_io(&bm->extra_io, OBJECT(d), &piix_bmdma_ops, bm,
-  "piix-bmdma", 4);
-memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io);
-memory_region_init_io(&bm->addr_ioport, OBJECT(d),
-  &bmdma_addr_ioport_ops, bm, "bmdma", 4);
-memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, 
&bm->addr_ioport);
-}
-}
-
  static void piix_ide_reset(DeviceState *dev)
  {
  PCIIDEState *d = PCI_IDE(dev);
@@ -156,7 +139,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  
  pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
  
-bmdma_setup_bar(d);

+bmdma_init_ops(d, &piix_bmdma_ops);
  pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar);
  
  for (unsigned i = 0; i < 2; i++) {

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 287143a005..40704e2857 100644
--- a/hw

Re: [PATCH 07/13] hw/ide: Extract pci_ide_{cmd, data}_le_ops initialization into base class constructor

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


There is redundant code in cmd646 and via which can be extracted into the base
class. In case of piix and sii3112 this is currently unneccessary but shouldn't
interfere since the memory regions aren't mapped by those devices. In few
commits later this will be changed, i.e. those device models will also make use
of these memory regions.

Signed-off-by: Bernhard Beschow 
---
  hw/ide/cmd646.c | 11 ---
  hw/ide/pci.c| 10 ++
  hw/ide/via.c| 11 ---
  3 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 6fd09fe74e..85716aaf17 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -251,20 +251,9 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
  dev->wmask[MRDMODE] = 0x0;
  dev->w1cmask[MRDMODE] = MRDMODE_INTR_CH0 | MRDMODE_INTR_CH1;
  
-memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops,

-  &d->bus[0], "cmd646-data0", 8);
  pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[0]);
-
-memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops,
-  &d->bus[0], "cmd646-cmd0", 4);
  pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[0]);
-
-memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops,
-  &d->bus[1], "cmd646-data1", 8);
  pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[1]);
-
-memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops,
-  &d->bus[1], "cmd646-cmd1", 4);
  pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]);
  
  bmdma_init_ops(d, &cmd646_bmdma_ops);

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 65ed6f7f72..a9194313bd 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -543,6 +543,16 @@ static void pci_ide_init(Object *obj)
  {
  PCIIDEState *d = PCI_IDE(obj);
  
+memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops,

+  &d->bus[0], "pci-ide0-data-ops", 8);
+memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops,
+  &d->bus[0], "pci-ide0-cmd-ops", 4);
+
+memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops,
+  &d->bus[1], "pci-ide1-data-ops", 8);
+memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops,
+  &d->bus[1], "pci-ide1-cmd-ops", 4);
+
  qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
  }
  
diff --git a/hw/ide/via.c b/hw/ide/via.c

index 40704e2857..704a8024cb 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -154,20 +154,9 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
  dev->wmask[PCI_INTERRUPT_LINE] = 0;
  dev->wmask[PCI_CLASS_PROG] = 5;
  
-memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops,

-  &d->bus[0], "via-ide0-data", 8);
  pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[0]);
-
-memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops,
-  &d->bus[0], "via-ide0-cmd", 4);
  pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[0]);
-
-memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops,
-  &d->bus[1], "via-ide1-data", 8);
  pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[1]);
-
-memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops,
-  &d->bus[1], "via-ide1-cmd", 4);
  pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]);
  
  bmdma_init_ops(d, &via_bmdma_ops);


I'd also be inclined to agree with Phil/Zoltan re: dropping the trailing "-ops" in 
the name, otherwise:


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.



Re: [PATCH 08/13] hw/ide: Rename PCIIDEState::*_bar attributes

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


The attributes represent memory regions containing operations which are mapped
by the device models into PCI BARs. Reflect this by changing the suffic into
"_ops".

Note that in a few commits piix will also use the {cmd,data}_ops but won't map
them into BARs. This further suggests that the "_bar" suffix doesn't match
very well.

Signed-off-by: Bernhard Beschow 
---
  include/hw/ide/pci.h |  6 +++---
  hw/ide/cmd646.c  | 10 +-
  hw/ide/pci.c | 18 +-
  hw/ide/piix.c|  2 +-
  hw/ide/via.c | 10 +-
  5 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 597c77c7ad..5025df5b82 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -51,9 +51,9 @@ struct PCIIDEState {
  BMDMAState bmdma[2];
  qemu_irq isa_irq[2];
  uint32_t secondary; /* used only for cmd646 */
-MemoryRegion bmdma_bar;
-MemoryRegion cmd_bar[2];
-MemoryRegion data_bar[2];
+MemoryRegion bmdma_ops;
+MemoryRegion cmd_ops[2];
+MemoryRegion data_ops[2];
  };
  
  void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 85716aaf17..b9d005a357 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -251,13 +251,13 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
  dev->wmask[MRDMODE] = 0x0;
  dev->w1cmask[MRDMODE] = MRDMODE_INTR_CH0 | MRDMODE_INTR_CH1;
  
-pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[0]);

-pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[0]);
-pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_bar[1]);
-pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_bar[1]);
+pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->data_ops[0]);
+pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_ops[0]);
+pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &d->data_ops[1]);
+pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &d->cmd_ops[1]);
  
  bmdma_init_ops(d, &cmd646_bmdma_ops);

-pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar);
+pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_ops);
  
  /* TODO: RST# value should be 0 */

  pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index a9194313bd..b2fcc00a64 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -527,15 +527,15 @@ void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps 
*bmdma_ops)
  {
  size_t i;
  
-memory_region_init(&d->bmdma_bar, OBJECT(d), "bmdma-container", 16);

+memory_region_init(&d->bmdma_ops, OBJECT(d), "bmdma-container", 16);
  for (i = 0; i < ARRAY_SIZE(d->bmdma); i++) {
  BMDMAState *bm = &d->bmdma[i];
  
  memory_region_init_io(&bm->extra_io, OBJECT(d), bmdma_ops, bm, "bmdma-ops", 4);

-memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io);
+memory_region_add_subregion(&d->bmdma_ops, i * 8, &bm->extra_io);
  memory_region_init_io(&bm->addr_ioport, OBJECT(d), 
&bmdma_addr_ioport_ops, bm,
"bmdma-ioport-ops", 4);
-memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, 
&bm->addr_ioport);
+memory_region_add_subregion(&d->bmdma_ops, i * 8 + 4, 
&bm->addr_ioport);
  }
  }
  
@@ -543,14 +543,14 @@ static void pci_ide_init(Object *obj)

  {
  PCIIDEState *d = PCI_IDE(obj);
  
-memory_region_init_io(&d->data_bar[0], OBJECT(d), &pci_ide_data_le_ops,

+memory_region_init_io(&d->data_ops[0], OBJECT(d), &pci_ide_data_le_ops,
&d->bus[0], "pci-ide0-data-ops", 8);
-memory_region_init_io(&d->cmd_bar[0], OBJECT(d), &pci_ide_cmd_le_ops,
+memory_region_init_io(&d->cmd_ops[0], OBJECT(d), &pci_ide_cmd_le_ops,
&d->bus[0], "pci-ide0-cmd-ops", 4);
  
-memory_region_init_io(&d->data_bar[1], OBJECT(d), &pci_ide_data_le_ops,

+memory_region_init_io(&d->data_ops[1], OBJECT(d), &pci_ide_data_le_ops,
&d->bus[1], "pci-ide1-data-ops", 8);
-memory_region_init_io(&d->cmd_bar[1], OBJECT(d), &pci_ide_cmd_le_ops,
+memory_region_init_io(&d->cmd_ops[1], OBJECT(d), &pci_ide_cmd_le_ops,
&d->bus[1], "pci-ide1-cmd-ops", 4);
  
  qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));

@@ -562,8 +562,8 @@ static void pci_ide_exitfn(PCIDevice *dev)
  unsigned i;
  
  for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) {

-memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].extra_io);
-memory_region_del_subregion(&d->bmdma_bar, &d->bmdma[i].addr_ioport);
+memory_region_del_subregion(&d->bmdma_ops, &d->bmdma[i].extra_io);
+memory_region_del_subregion(&d->bmdma_ops, &d->bmdma[i].addr_ioport);
  }
  }
  
diff --git a/hw/i

Re: [PATCH v3 18/57] tcg/loongarch64: Rationalize args to tcg_out_qemu_{ld, st}

2023-04-26 Thread Philippe Mathieu-Daudé

On 24/4/23 07:40, Richard Henderson wrote:

Interpret the variable argument placement in the caller.  Shift some
code around slightly to share more between softmmu and user-only.

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 100 +--
  1 file changed, 42 insertions(+), 58 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] multifd: Avoid busy-wait in multifd_send_pages()

2023-04-26 Thread manish.mishra



On 26/04/23 4:35 pm, Juan Quintela wrote:

"manish.mishra"  wrote:

On 26/04/23 3:58 pm, Juan Quintela wrote:

"manish.mishra"  wrote:

multifd_send_sync_main() posts request on the multifd channel
but does not call sem_wait() on channels_ready semaphore, making
the channels_ready semaphore count keep increasing.
As a result, sem_wait() on channels_ready in multifd_send_pages()
is always non-blocking hence multifd_send_pages() keeps searching
for a free channel in a busy loop until a channel is freed.

Signed-off-by: manish.mishra 
---
   migration/multifd.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cce3ad6988..43d26e7012 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -615,6 +615,7 @@ int multifd_send_sync_main(QEMUFile *f)
 trace_multifd_send_sync_main_signal(p->id);
   +qemu_sem_wait(&multifd_send_state->channels_ready);
   qemu_mutex_lock(&p->mutex);
 if (p->quit) {

We need this, but I think it is better to put it on the second loop.


@@ -919,7 +920,7 @@ int multifd_save_setup(Error **errp)
   multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
   multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
   multifd_send_state->pages = multifd_pages_init(page_count);
-qemu_sem_init(&multifd_send_state->channels_ready, 0);
+qemu_sem_init(&multifd_send_state->channels_ready, thread_count);
   qatomic_set(&multifd_send_state->exiting, 0);
   multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];

I think this bit is wrong.
We should not set the channels ready until the thread is ready and
channel is created.

What do you think about this patch:

  From bcb0ef9b97b858458c403d2e4dc9e0dbd96721b3 Mon Sep 17 00:00:00 2001
From: Juan Quintela 
Date: Wed, 26 Apr 2023 12:20:36 +0200
Subject: [PATCH] multifd: Fix the number of channels ready

We don't wait in the sem when we are doing a sync_main.  Make it wait
there.  To make things clearer, we mark the channel ready at the
begining of the thread loop.

This causes a busy loop in multifd_send_page().
Found-by: manish.mishra 

Signed-off-by: Juan Quintela 
---
   migration/multifd.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 903df2117b..e625e8725e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -635,6 +635,7 @@ int multifd_send_sync_main(QEMUFile *f)
   for (i = 0; i < migrate_multifd_channels(); i++) {
   MultiFDSendParams *p = &multifd_send_state->params[i];
   +qemu_sem_wait(&multifd_send_state->channels_ready);
   trace_multifd_send_sync_main_wait(p->id);
   qemu_sem_wait(&p->sem_sync);
   @@ -668,6 +669,7 @@ static void *multifd_send_thread(void *opaque)
   p->num_packets = 1;
 while (true) {
+qemu_sem_post(&multifd_send_state->channels_ready);


This has one issue though, if we mark channel_ready here itself, channel is 
actually not ready so we can still busy loop?

Before:

while (true) {
 
 sem_post(channels_ready)
}

And you want to add to the initialization a counter equal to the number
of channels.

Now:

while (true) {
 sem_post(channels_ready)
 
}

It is semantically the same, but when we setup it ready it means that
when we set it to 1, we now that the channel and thread are ready for
action.


May be we can do one thing let the sem_post in while loop at same
position itself. But we can do another post just before start

I can see how this can make any difference.



of this while loop, as that will be called only once it should do work
of initialising count equal to multiFD channels?

Yeap.  But I can see what difference do we have here.

Later, Juan.



Thanks Juan,
Just confirming if i misunderstood something :)

I meant your approach makes sense, i was just suggesting a small change. To do 
something like this.

qemu_sem_init(&multifd_send_state->channels_ready, 0);

static void *multifd_send_thread(void *opaque) {
 ...

 sem_post(channels_ready); // Post once at start of thread and let one in 
loop as it is.

 while (true) {
    
    sem_post(channels_ready)
 }
}


Something like below has issue that we are marking channel_ready even before 
channel is actually ready, i meant if network is slow it may take some time to 
update pending_job and hence we can busy loop in send_multifd_page().

static void *multifd_send_thread(void *opaque) {
 ...

 while (true) {
    sem_post(channels_ready);
    
 }
}


Not sure if we are already in agreement :) just confirming.


Thanks

Manish Mishra




Re: [PATCH 09/13] hw/ide/piix: Disuse isa_get_irq()

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


isa_get_irq() asks for an ISADevice which piix-ide doesn't provide.
Passing a NULL pointer works but causes the isabus global to be used
then. By fishing out TYPE_ISA_BUS from the QOM tree it is possible to
achieve the same as using isa_get_irq().

This is an alternative solution to commit 9405d87be25d 'hw/ide: Fix
crash when plugging a piix3-ide device into the x-remote machine' which
allows for cleaning up the ISA API while keeping PIIX IDE functions
user-createable.

Signed-off-by: Bernhard Beschow 
---
  hw/ide/piix.c | 23 ---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 6942b484f9..a3a15dc7db 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -104,7 +104,8 @@ static void piix_ide_reset(DeviceState *dev)
  pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
  }
  
-static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)

+static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus,
+  Error **errp)
  {
  static const struct {
  int iobase;
@@ -124,7 +125,8 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, 
Error **errp)
   object_get_typename(OBJECT(d)), i);
  return false;
  }
-ide_bus_init_output_irq(&d->bus[i], isa_get_irq(NULL, 
port_info[i].isairq));
+ide_bus_init_output_irq(&d->bus[i],
+isa_bus_get_irq(isa_bus, port_info[i].isairq));


I don't think is the right solution here, since ultimately we want to move the IRQ 
routing out of the device itself and into the PCI-ISA bridge. I'd go for the same 
solution as you've done for VIA IDE in patch 2, i.e. update the PIIX interrupt 
handler to set the legacy_irqs in PCIIDEState, and then wire them up to the ISA IRQs 
14 and 15 similar to as Phil as done in his patches:


https://patchew.org/QEMU/20230302224058.43315-1-phi...@linaro.org/20230302224058.43315-4-phi...@linaro.org/

https://patchew.org/QEMU/20230302224058.43315-1-phi...@linaro.org/20230302224058.43315-5-phi...@linaro.org/

This also reminds me, given that the first patch above is doing wiring in pc_init1() 
then we are still missing part of your tidy-up series :/



  bmdma_init(&d->bus[i], &d->bmdma[i], d);
  ide_bus_register_restart_cb(&d->bus[i]);
@@ -136,14 +138,29 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  {
  PCIIDEState *d = PCI_IDE(dev);
  uint8_t *pci_conf = dev->config;
+ISABus *isa_bus;
+bool ambiguous;
  
  pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
  
  bmdma_init_ops(d, &piix_bmdma_ops);

  pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_ops);
  
+isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS, &ambiguous));

+if (ambiguous) {
+error_setg(errp,
+   "More than one ISA bus found while %s supports only one",
+   object_get_typename(OBJECT(d)));
+return;
+}
+if (!isa_bus) {
+error_setg(errp, "No ISA bus found while %s requires one",
+   object_get_typename(OBJECT(d)));
+return;
+}


Again I think this should go away with using PCIIDEState's legacy_irqs, since you 
simply let the board wire them up to the ISABus (or not) as required.



  for (unsigned i = 0; i < 2; i++) {
-if (!pci_piix_init_bus(d, i, errp)) {
+if (!pci_piix_init_bus(d, i, isa_bus, errp)) {
  return;
  }
  }



ATB,

Mark.



Re: [PATCH 10/13] hw/ide/piix: Reuse PCIIDEState::{cmd,data}_ops

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


Now that PCIIDEState::{cmd,data}_ops are initialized in the base class
constructor there is an opportunity for PIIX to reuse these attributes. This
resolves usage of ide_init_ioport() which would fall back internally to using
the isabus global due to NULL being passed as ISADevice by PIIX.

Signed-off-by: Bernhard Beschow 
---
  hw/ide/piix.c | 30 +-
  1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a3a15dc7db..406a67fa0f 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -104,34 +104,32 @@ static void piix_ide_reset(DeviceState *dev)
  pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
  }
  
-static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus,

-  Error **errp)
+static void pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus)
  {
  static const struct {
  int iobase;
  int iobase2;
  int isairq;
  } port_info[] = {
-{0x1f0, 0x3f6, 14},
-{0x170, 0x376, 15},
+{0x1f0, 0x3f4, 14},
+{0x170, 0x374, 15},
  };
-int ret;
+MemoryRegion *address_space_io = pci_address_space_io(PCI_DEVICE(d));
  
  ide_bus_init(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);

-ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase,
-  port_info[i].iobase2);
-if (ret) {
-error_setg_errno(errp, -ret, "Failed to realize %s port %u",
- object_get_typename(OBJECT(d)), i);
-return false;
-}
+memory_region_add_subregion(address_space_io, port_info[i].iobase,
+&d->data_ops[i]);
+/*
+ * PIIX forwards the last byte of cmd_ops to ISA. Model this using a low
+ * prio so competing memory regions take precedence.
+ */
+memory_region_add_subregion_overlap(address_space_io, port_info[i].iobase2,
+&d->cmd_ops[i], -1);


Interesting. Is this behaviour documented somewhere and/or used in one of your test 
images at all? If I'd have seen this myself, I probably thought that the addresses 
were a typo...



  ide_bus_init_output_irq(&d->bus[i],
  isa_bus_get_irq(isa_bus, port_info[i].isairq));
  
  bmdma_init(&d->bus[i], &d->bmdma[i], d);

  ide_bus_register_restart_cb(&d->bus[i]);
-
-return true;
  }
  
  static void pci_piix_ide_realize(PCIDevice *dev, Error **errp)

@@ -160,9 +158,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  }
  
  for (unsigned i = 0; i < 2; i++) {

-if (!pci_piix_init_bus(d, i, isa_bus, errp)) {
-return;
-}
+pci_piix_init_bus(d, i, isa_bus);
  }
  }
  



ATB,

Mark.



Re: [PATCH v2] tests/unit/test-blockjob: Re-enable complete_in_standby test

2023-04-26 Thread Vladimir Sementsov-Ogievskiy

On 26.04.23 13:40, Emanuele Giuseppe Esposito wrote:

Pause the job while draining so that pause_count will be
increased and bdrv_drain_all_end() won't reset it to 0, so the
job will not continue.

With this fix, the test is not flaky anymore.

Additionally remove useless aiocontext lock around bdrv_drain_all_end()
in test_complete_in_standby().

Fixes: b6903cbe3a2 "tests/unit/test-blockjob: Disable
complete_in_standby test"
Suggested-by: Hanna Czenczek
Signed-off-by: Emanuele Giuseppe Esposito


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir




Re: [PATCH 11/13] hw/ide/sii3112: Reuse PCIIDEState::{cmd,data}_ops

2023-04-26 Thread Mark Cave-Ayland

On 22/04/2023 16:07, Bernhard Beschow wrote:


Allows to unexport pci_ide_{cmd,data}_le_ops and models TYPE_SII3112_PCI as a
standard-compliant PCI IDE device.

Signed-off-by: Bernhard Beschow 
---
  include/hw/ide/pci.h |  2 --
  hw/ide/pci.c |  4 ++--
  hw/ide/sii3112.c | 50 
  3 files changed, 20 insertions(+), 36 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 5025df5b82..dbb4b13161 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -62,6 +62,4 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
  extern MemoryRegionOps bmdma_addr_ioport_ops;
  void pci_ide_create_devs(PCIDevice *dev);
  
-extern const MemoryRegionOps pci_ide_cmd_le_ops;

-extern const MemoryRegionOps pci_ide_data_le_ops;
  #endif
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index b2fcc00a64..97ccc75aa6 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -60,7 +60,7 @@ static void pci_ide_ctrl_write(void *opaque, hwaddr addr,
  ide_ctrl_write(bus, addr + 2, data);
  }
  
-const MemoryRegionOps pci_ide_cmd_le_ops = {

+static const MemoryRegionOps pci_ide_cmd_le_ops = {
  .read = pci_ide_status_read,
  .write = pci_ide_ctrl_write,
  .endianness = DEVICE_LITTLE_ENDIAN,
@@ -98,7 +98,7 @@ static void pci_ide_data_write(void *opaque, hwaddr addr,
  }
  }
  
-const MemoryRegionOps pci_ide_data_le_ops = {

+static const MemoryRegionOps pci_ide_data_le_ops = {
  .read = pci_ide_data_read,
  .write = pci_ide_data_write,
  .endianness = DEVICE_LITTLE_ENDIAN,
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 0af897a9ef..9cf920369f 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -88,21 +88,9 @@ static uint64_t sii3112_reg_read(void *opaque, hwaddr addr,
  val |= (d->regs[1].confstat & (1UL << 11) ? (1 << 4) : 0);
  val |= (uint32_t)d->i.bmdma[1].status << 16;
  break;
-case 0x80 ... 0x87:
-val = pci_ide_data_le_ops.read(&d->i.bus[0], addr - 0x80, size);
-break;
-case 0x8a:
-val = pci_ide_cmd_le_ops.read(&d->i.bus[0], 2, size);
-break;
  case 0xa0:
  val = d->regs[0].confstat;
  break;
-case 0xc0 ... 0xc7:
-val = pci_ide_data_le_ops.read(&d->i.bus[1], addr - 0xc0, size);
-break;
-case 0xca:
-val = pci_ide_cmd_le_ops.read(&d->i.bus[1], 2, size);
-break;
  case 0xe0:
  val = d->regs[1].confstat;
  break;
@@ -171,18 +159,6 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
  case 0x0c ... 0x0f:
  bmdma_addr_ioport_ops.write(&d->i.bmdma[1], addr - 12, val, size);
  break;
-case 0x80 ... 0x87:
-pci_ide_data_le_ops.write(&d->i.bus[0], addr - 0x80, val, size);
-break;
-case 0x8a:
-pci_ide_cmd_le_ops.write(&d->i.bus[0], 2, val, size);
-break;
-case 0xc0 ... 0xc7:
-pci_ide_data_le_ops.write(&d->i.bus[1], addr - 0xc0, val, size);
-break;
-case 0xca:
-pci_ide_cmd_le_ops.write(&d->i.bus[1], 2, val, size);
-break;
  case 0x100:
  d->regs[0].scontrol = val & 0xfff;
  if (val & 1) {
@@ -259,6 +235,11 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
  pci_config_set_interrupt_pin(dev->config, 1);
  pci_set_byte(dev->config + PCI_CACHE_LINE_SIZE, 8);
  
+pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &s->data_ops[0]);

+pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &s->cmd_ops[0]);
+pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &s->data_ops[1]);
+pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, &s->cmd_ops[1]);
+
  /* BAR5 is in PCI memory space */
  memory_region_init_io(&d->mmio, OBJECT(d), &sii3112_reg_ops, d,
   "sii3112.bar5", 0x200);
@@ -266,17 +247,22 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
  
  /* BAR0-BAR4 are PCI I/O space aliases into BAR5 */

  mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", &d->mmio, 0x80, 8);
-pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, mr);
+memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", &s->data_ops[0], 0,
+ memory_region_size(&s->data_ops[0]));
+memory_region_add_subregion_overlap(&d->mmio, 0x80, mr, 1);
  mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar1", &d->mmio, 0x88, 4);
-pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, mr);
+memory_region_init_alias(mr, OBJECT(d), "sii3112.bar1", &s->cmd_ops[0], 0,
+ memory_region_size(&s->cmd_ops[0]));
+memory_region_add_subregion_overlap(&d->mmio, 0x88, mr, 1);
  mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar2", &d->mmio, 0xc0, 8);
-pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, mr);
+memory_region_init_alias(mr, OBJECT(d),

[PATCH] hw/remote: Fix vfu_cfg trace offset format

2023-04-26 Thread Mattias Nissler
The printed offset value is prefixed with 0x, but was actually printed
in decimal. To spare others the confusion, adjust the format specifier
to hexadecimal.

Signed-off-by: Mattias Nissler 
---
 hw/remote/trace-events | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index c167b3c7a5..0d1b7d56a5 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,8 +5,8 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to 
receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
-vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
-vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", 
%zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 
0x%"PRIx64" size 0x%"PRIx64""
-- 
2.25.1




Test-qga:Qga write is blocked when the client is no connected, virtio-serial port is throttled.

2023-04-26 Thread overgo
Hi, I am testing when qga push msg to libvirt, when the vm saved and restore 
from the memory file, the qga will be hunged in writing.
In flush_buf:(gdb) f 4#4  0x55ffc716894a in flush_buf (port=0x55ffc87bf800, 
buf=, len=) at ../hw/char/virtio-console.c:100100 
vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,(gdb) l95
   * use throttling on host side.96   */97  if 
(!k->is_console) {98  virtio_serial_throttle_port(port, 
true);99  if (!vcon->watch) {100 
vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,101  
   G_IO_OUT|G_IO_HUP,102
 chr_write_unblocked, vcon);103 }104
 }
vcon->watch is 0, but the port is throttled, so it will not process the fe 
event, until restart qga or restart libvirtd.This issue is happend when the vm 
is migrated in or restore from a memory file saved.So I think it can be fixed 
by following:
100 vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,(gdb) 
l95   * use throttling on host side.96   */97   
   if (!k->is_console) {98  virtio_serial_throttle_port(port, 
true);99  if (!vcon->watch) {100 
vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,101  
   G_IO_OUT|G_IO_HUP,102
 chr_write_unblocked, vcon);+   if 
(!vcon->watch)+virtio_serial_throttle_port(port, 
false);103 }104 }I have test the code, and it works.


Re: [PATCH 00/21] Hexagon (target/hexagon) short-circuit and move to DisasContext

2023-04-26 Thread Anton Johansson via



On 4/26/23 02:39, Taylor Simpson wrote:

This patch series achieves two major goals
Goal 1:  Short-circuit packet semantics
 In certain cases, we can avoid the overhead of writing to
 hex_new_value and write directly to hex_gpr.

 Here's a simple example of the TCG generated for
 0x004000b4:  0x7800c020 {   R0 = #0x1 }

 BEFORE:
   004000b4
  movi_i32 new_r0,$0x1
  mov_i32 r0,new_r0

 AFTER:
   004000b4
  movi_i32 r0,$0x1
Goal 2:  Move bookkeeping items from CPUHexagonState to DisasContext
 Suggested-by: Richard Henderson 
 Several fields in CPUHexagonState are only used for bookkeeping
 within the translation of a packet.  With recent changes to eliminate
 the need to free TCGv variables, these make more sense to be
 transient and kept in DisasContext.


This patch series can be divided into 3 main parts
Part 1:  Patches 1-9
 Cleanup in preparation for parts 2 and 3
 The main goal is to move functionality out of generated helpers
Part 2:  Patches 10-15
 Short-circuit packet semantics
Part 3:  Patches 16-21
 Move bookkeeping items from CPUHexagonState to DisasContext



Taylor Simpson (21):
   meson.build Add CONFIG_HEXAGON_IDEF_PARSER
   Hexagon (target/hexagon) Add DisasContext arg to gen_log_reg_write
   Hexagon (target/hexagon) Add overrides for loop setup instructions
   Hexagon (target/hexagon) Add overrides for allocframe/deallocframe
   Hexagon (target/hexagon) Add overrides for clr[tf]new
   Hexagon (target/hexagon) Remove log_reg_write from op_helper.[ch]
   Hexagon (target/hexagon) Eliminate uses of log_pred_write function
   Hexagon (target/hexagon) Clean up pred_written usage
   Hexagon (target/hexagon) Don't overlap dest writes with source reads
   Hexagon (target/hexagon) Mark registers as read during packet analysis
   Hexagon (target/hexagon) Short-circuit packet register writes
   Hexagon (target/hexagon) Short-circuit packet predicate writes
   Hexagon (target/hexagon) Short-circuit packet HVX writes
   Hexagon (target/hexagon) Short-circuit more HVX single instruction
 packets
   Hexagon (target/hexagon) Add overrides for disabled idef-parser insns
   Hexagon (target/hexagon) Make special new_value for USR
   Hexagon (target/hexagon) Move new_value to DisasContext
   Hexagon (target/hexagon) Move new_pred_value to DisasContext
   Hexagon (target/hexagon) Move pred_written to DisasContext
   Hexagon (target/hexagon) Move pkt_has_store_s1 to DisasContext
   Hexagon (target/hexagon) Move items to DisasContext

  meson.build |   1 +
  target/hexagon/cpu.h|  10 +-
  target/hexagon/gen_tcg.h| 118 ++-
  target/hexagon/gen_tcg_hvx.h|  23 ++
  target/hexagon/genptr.h |   6 +-
  target/hexagon/helper.h |   6 +-
  target/hexagon/macros.h |  57 ++--
  target/hexagon/op_helper.h  |  16 +-
  target/hexagon/translate.h  |  52 ++-
  target/hexagon/attribs_def.h.inc|   6 +-
  target/hexagon/arch.c   |   3 +-
  target/hexagon/cpu.c|   5 +-
  target/hexagon/genptr.c | 347 
  target/hexagon/idef-parser/parser-helpers.c |   4 +-
  target/hexagon/op_helper.c  | 154 ++---
  target/hexagon/translate.c  | 274 +++-
  tests/tcg/hexagon/hvx_misc.c|  21 ++
  tests/tcg/hexagon/read_write_overlap.c  | 136 
  target/hexagon/README   |   6 +-
  target/hexagon/gen_analyze_funcs.py |  51 ++-
  target/hexagon/gen_helper_funcs.py  |   9 +-
  target/hexagon/gen_helper_protos.py |  10 +-
  target/hexagon/gen_idef_parser_funcs.py |   7 +
  target/hexagon/gen_tcg_funcs.py |  21 +-
  target/hexagon/hex_common.py|  16 +-
  tests/tcg/hexagon/Makefile.target   |   1 +
  26 files changed, 1066 insertions(+), 294 deletions(-)
  create mode 100644 tests/tcg/hexagon/read_write_overlap.c


For some reason this patchset didn't arrive in one piece, see patchew:

https://patchew.org/QEMU/20230426003945.1318446-1-tsimp...@quicinc.com/

--
Anton Johansson,
rev.ng Labs Srl.




  1   2   3   >