Re: [PULL 18/38] hw/arm/virt: Honor highmem setting when computing the memory map

2022-02-13 Thread Marc Zyngier
[+ Alex for HVF]

On Sun, 13 Feb 2022 05:05:33 +,
Akihiko Odaki  wrote:
> 
> On 2022/01/20 21:36, Peter Maydell wrote:
> > From: Marc Zyngier 
> > 
> > Even when the VM is configured with highmem=off, the highest_gpa
> > field includes devices that are above the 4GiB limit.
> > Similarily, nothing seem to check that the memory is within
> > the limit set by the highmem=off option.
> > 
> > This leads to failures in virt_kvm_type() on systems that have
> > a crippled IPA range, as the reported IPA space is larger than
> > what it should be.
> > 
> > Instead, honor the user-specified limit to only use the devices
> > at the lowest end of the spectrum, and fail if we have memory
> > crossing the 4GiB limit.
> > 
> > Reviewed-by: Andrew Jones 
> > Reviewed-by: Eric Auger 
> > Signed-off-by: Marc Zyngier 
> > Message-id: 20220114140741.1358263-4-...@kernel.org
> > Signed-off-by: Peter Maydell 
> > ---
> >   hw/arm/virt.c | 10 +++---
> >   1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 62bdce1eb4b..3b839ba78ba 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1670,7 +1670,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
> > *vms, int idx)
> >   static void virt_set_memmap(VirtMachineState *vms)
> >   {
> >   MachineState *ms = MACHINE(vms);
> > -hwaddr base, device_memory_base, device_memory_size;
> > +hwaddr base, device_memory_base, device_memory_size, memtop;
> >   int i;
> > vms->memmap = extended_memmap;
> > @@ -1697,7 +1697,11 @@ static void virt_set_memmap(VirtMachineState *vms)
> >   device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * 
> > GiB;
> > /* Base address of the high IO region */
> > -base = device_memory_base + ROUND_UP(device_memory_size, GiB);
> > +memtop = base = device_memory_base + ROUND_UP(device_memory_size, GiB);
> > +if (!vms->highmem && memtop > 4 * GiB) {
> > +error_report("highmem=off, but memory crosses the 4GiB limit\n");
> > +exit(EXIT_FAILURE);
> > +}
> >   if (base < device_memory_base) {
> >   error_report("maxmem/slots too huge");
> >   exit(EXIT_FAILURE);
> > @@ -1714,7 +1718,7 @@ static void virt_set_memmap(VirtMachineState *vms)
> >   vms->memmap[i].size = size;
> >   base += size;
> >   }
> > -vms->highest_gpa = base - 1;
> > +vms->highest_gpa = (vms->highmem ? base : memtop) - 1;
> >   if (device_memory_size > 0) {
> >   ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
> >   ms->device_memory->base = device_memory_base;
> 
> Hi,
> This breaks in a case where highmem is disabled but can have more than
> 4 GiB of RAM. M1 (Apple Silicon) actually can have 36-bit PA with HVF,
> which is not enough for highmem MMIO but is enough to contain 32 GiB
> of RAM.

Funny. The whole point of this series is to make it all work correctly
on M1.

> Where the magic number of 4 GiB / 32-bit came from?

Not exactly a magic number. From QEMU's docs/system/arm/virt.rst:


highmem
  Set ``on``/``off`` to enable/disable placing devices and RAM in physical
  address space above 32 bits. The default is ``on`` for machine types
  later than ``virt-2.12``.


TL;DR: Removing the bogus 'highmem=off' option from your command-line
should get you going with large memory spaces, up to the IPA limit.

The fact that you could run with 32GB of RAM while mandating that the
guest IPA space was limited to 32bit was nothing but a bug, further
"exploited" by HVF to allow disabling the highhmem devices which are
out of reach given the HW limitations (see [1] for details on the
discussion, specially around patch 3).

This is now fixed, and has been extended to work with any IPA size
(including 36bit machines such as M1).

> I also don't quite understand what failures virt_kvm_type() had.

QEMU works by first computing the memory map and passing the required
IPA limit to KVM as part of the VM type. By failing to take into
account the initial limit requirements to the IPA space (either via a
command-line option such as 'highmem', or by using the value provided
by KVM itself), QEMU would try to create a VM that cannot run on the
HW, and KVM would simply return an error.

All of this is documented as part of the KVM/arm64 API [2]. And with
this fixed, QEMU is able to correctly drive KVM on M1.

M.

[1] https://lore.kernel.org/all/2021082211.1290891-1-...@kernel.org
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/virt/kvm/api.rst#n138

-- 
Without deviation from the norm, progress is not possible.



Re: [PULL v2 0/7] nbd: handle AioContext change correctly

2022-02-13 Thread Peter Maydell
On Fri, 11 Feb 2022 at 13:22, Vladimir Sementsov-Ogievskiy
 wrote:
>
> The following changes since commit 0a301624c2f4ced3331ffd5bce85b4274fe132af:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20220208' into staging (2022-02-08 
> 11:40:08 +)
>
> are available in the Git repository at:
>
>   https://src.openvz.org/scm/~vsementsov/qemu.git tags/pull-nbd-2022-02-09-v2
>
> for you to fetch changes up to 8cfbe929e8c26050f0a4580a1606a370a947d4ce:
>
>   iotests/281: Let NBD connection yield in iothread (2022-02-11 14:06:18 
> +0100)
>
> 
> nbd: handle AioContext change correctly
>
> v2: add my s-o-b marks to each commit
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM



Re: [PULL 18/38] hw/arm/virt: Honor highmem setting when computing the memory map

2022-02-13 Thread Peter Maydell
On Sun, 13 Feb 2022 at 10:22, Marc Zyngier  wrote:
>
> [+ Alex for HVF]
>
> On Sun, 13 Feb 2022 05:05:33 +,
> Akihiko Odaki  wrote:
> > Hi,
> > This breaks in a case where highmem is disabled but can have more than
> > 4 GiB of RAM. M1 (Apple Silicon) actually can have 36-bit PA with HVF,
> > which is not enough for highmem MMIO but is enough to contain 32 GiB
> > of RAM.
>
> Funny. The whole point of this series is to make it all work correctly
> on M1.
>
> > Where the magic number of 4 GiB / 32-bit came from?
>
> Not exactly a magic number. From QEMU's docs/system/arm/virt.rst:
>
> 
> highmem
>   Set ``on``/``off`` to enable/disable placing devices and RAM in physical
>   address space above 32 bits. The default is ``on`` for machine types
>   later than ``virt-2.12``.
> 
>
> TL;DR: Removing the bogus 'highmem=off' option from your command-line
> should get you going with large memory spaces, up to the IPA limit.

Yep. I've tested this with hvf, and we now correctly:
 * refuse to put RAM above 32-bits if you asked for a 32-bit
   IPA space with highmem=off
 * use the full 36-bit address space if you don't say highmem=off
   on an M1

Note that there is a macos bug where if you don't say highmem=off
on an M1 Pro then you'll get a macos kernel panic. M1 non-Pro is fine.

thanks
-- PMM



Re: [PATCH] ui/cocoa: Fix the leak of qemu_console_get_label

2022-02-13 Thread BALATON Zoltan

On Sun, 13 Feb 2022, Akihiko Odaki wrote:

Signed-off-by: Akihiko Odaki 
---
ui/cocoa.m | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index ac18e14ce01..fdf52a7c2f7 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -1680,7 +1680,10 @@ static void create_initial_menus(void)
/* Returns a name for a given console */
static NSString * getConsoleName(QemuConsole * console)
{
-return [NSString stringWithFormat: @"%s", qemu_console_get_label(console)];
+char *label = qemu_console_get_label(console);


I guess you could do g_autofree char *label to save a g_free but not a big 
deal and only saves one line here so it's also good as it is.


Regards,
BALATON Zoltan


+NSString *nslabel = [NSString stringWithUTF8String:label];
+g_free(label);
+return nslabel;
}

/* Add an entry to the View menu for each console */





Re: [PULL 18/38] hw/arm/virt: Honor highmem setting when computing the memory map

2022-02-13 Thread Akihiko Odaki
On Sun, Feb 13, 2022 at 7:46 PM Peter Maydell  wrote:
>
> On Sun, 13 Feb 2022 at 10:22, Marc Zyngier  wrote:
> >
> > [+ Alex for HVF]
> >
> > On Sun, 13 Feb 2022 05:05:33 +,
> > Akihiko Odaki  wrote:
> > > Hi,
> > > This breaks in a case where highmem is disabled but can have more than
> > > 4 GiB of RAM. M1 (Apple Silicon) actually can have 36-bit PA with HVF,
> > > which is not enough for highmem MMIO but is enough to contain 32 GiB
> > > of RAM.
> >
> > Funny. The whole point of this series is to make it all work correctly
> > on M1.
> >
> > > Where the magic number of 4 GiB / 32-bit came from?
> >
> > Not exactly a magic number. From QEMU's docs/system/arm/virt.rst:
> >
> > 
> > highmem
> >   Set ``on``/``off`` to enable/disable placing devices and RAM in physical
> >   address space above 32 bits. The default is ``on`` for machine types
> >   later than ``virt-2.12``.
> > 
> >
> > TL;DR: Removing the bogus 'highmem=off' option from your command-line
> > should get you going with large memory spaces, up to the IPA limit.
>
> Yep. I've tested this with hvf, and we now correctly:
>  * refuse to put RAM above 32-bits if you asked for a 32-bit
>IPA space with highmem=off
>  * use the full 36-bit address space if you don't say highmem=off
>on an M1
>
> Note that there is a macos bug where if you don't say highmem=off
> on an M1 Pro then you'll get a macos kernel panic. M1 non-Pro is fine.
>
> thanks
> -- PMM

I found that it actually gets the available PA bit of the emulated CPU
when highmem=on. I used "cortex-a72", which can have more than 36
bits. I just simply switched to "host"; hvf didn't support "host" when
I set up my VM but now it does.
Thanks for your prompt replies.

Regards,
Akihiko Odaki



Re: [PULL 18/38] hw/arm/virt: Honor highmem setting when computing the memory map

2022-02-13 Thread Peter Maydell
On Sun, 13 Feb 2022 at 11:38, Akihiko Odaki  wrote:
> I found that it actually gets the available PA bit of the emulated CPU
> when highmem=on. I used "cortex-a72", which can have more than 36
> bits. I just simply switched to "host"; hvf didn't support "host" when
> I set up my VM but now it does.

It's a bug that we accept 'cortex-a72' there. What should happen
is something like:
 * we want to use the ID register values of a cortex-a72
 * QEMU's hvf layer should say "no, that doesn't match the actual
   CPU we're running on", and give an error

This works correctly with KVM because there the kernel refuses
attempts to set ID registers to values that don't match the host;
for hvf the hvf APIs do permit lying to the guest about ID register
values so we need to do the check ourselves.

(The other approach would be to check the ID register values
and allow them to the extent that the host CPU actually has
the support for the features they imply, so you could "downgrade"
to a less capable CPU but not tell the guest it has feature X
if it isn't really there. But this is (a) a lot more complicated
and (b) gets into the swamp of trying to figure out how to tell
the guest about CPU errata -- the guest needs to apply errata
workarounds for the real host CPU, not for whatever the emulated
CPU is. So "just reject anything that's not an exact match" is
the easy approach.)

-- PMM



Re: [PATCH v2 1/5] malta: Move PCI interrupt handling from gt64xxx to piix4

2022-02-13 Thread Bernhard Beschow
Am 12. Februar 2022 17:44:30 MEZ schrieb BALATON Zoltan :
>On Sat, 12 Feb 2022, BALATON Zoltan wrote:
>> On Sat, 12 Feb 2022, Bernhard Beschow wrote:
>>> Handling PCI interrupts in piix4 increases cohesion and reduces differences
>>> between piix4 and piix3.
>>> 
>>> Signed-off-by: Bernhard Beschow 
>>> Reviewed-by: Philippe Mathieu-Daudé 
>>
>> Sorry for being late in commenting, I've missed the first round. Apologies 
>> if 
>> this causes a delay or another version.
>>
>>> ---
>>> hw/isa/piix4.c | 58 +++
>>> hw/mips/gt64xxx_pci.c  | 62 --
>>> hw/mips/malta.c|  6 +---
>>> include/hw/mips/mips.h |  2 +-
>>> 4 files changed, 65 insertions(+), 63 deletions(-)
>>> 
>>> diff --git a/hw/isa/piix4.c b/hw/isa/piix4.c
>>> index 0fe7b69bc4..5a86308689 100644
>>> --- a/hw/isa/piix4.c
>>> +++ b/hw/isa/piix4.c
>>> @@ -45,6 +45,7 @@ struct PIIX4State {
>>> PCIDevice dev;
>>> qemu_irq cpu_intr;
>>> qemu_irq *isa;
>>> +qemu_irq i8259[ISA_NUM_IRQS];
>>>
>>> RTCState rtc;
>>> /* Reset Control Register */
>>> @@ -54,6 +55,30 @@ struct PIIX4State {
>>> 
>>> OBJECT_DECLARE_SIMPLE_TYPE(PIIX4State, PIIX4_PCI_DEVICE)
>>> 
>>> +static int pci_irq_levels[4];
>>> +
>>> +static void piix4_set_irq(void *opaque, int irq_num, int level)
>>> +{
>>> +int i, pic_irq, pic_level;
>>> +qemu_irq *pic = opaque;
>>> +
>>> +pci_irq_levels[irq_num] = level;
>>> +
>>> +/* now we change the pic irq level according to the piix irq mappings 
>>> */
>>> +/* XXX: optimize */
>>> +pic_irq = piix4_dev->config[PIIX_PIRQCA + irq_num];
>>> +if (pic_irq < 16) {
>>> +/* The pic level is the logical OR of all the PCI irqs mapped to 
>>> it. */
>>> +pic_level = 0;
>>> +for (i = 0; i < 4; i++) {
>>> +if (pic_irq == piix4_dev->config[PIIX_PIRQCA + i]) {
>>> +pic_level |= pci_irq_levels[i];
>>> +}
>>> +}
>>> +qemu_set_irq(pic[pic_irq], pic_level);
>>> +}
>>> +}
>>> +
>>> static void piix4_isa_reset(DeviceState *dev)
>>> {
>>> PIIX4State *d = PIIX4_PCI_DEVICE(dev);
>>> @@ -248,8 +273,34 @@ static void piix4_register_types(void)
>>> 
>>> type_init(piix4_register_types)
>>> 
>>> +static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
>>> +{
>>> +int slot;
>>> +
>>> +slot = PCI_SLOT(pci_dev->devfn);
>>> +
>>> +switch (slot) {
>>> +/* PIIX4 USB */
>>> +case 10:
>>> +return 3;
>>> +/* AMD 79C973 Ethernet */
>>> +case 11:
>>> +return 1;
>>> +/* Crystal 4281 Sound */
>>> +case 12:
>>> +return 2;
>>> +/* PCI slot 1 to 4 */
>>> +case 18 ... 21:
>>> +return ((slot - 18) + irq_num) & 0x03;
>>> +/* Unknown device, don't do any translation */
>>> +default:
>>> +return irq_num;
>>> +}
>>> +}
>>> +
>>> DeviceState *piix4_create(PCIBus *pci_bus, ISABus **isa_bus, I2CBus 
>>> **smbus)
>>> {
>>> +PIIX4State *s;
>>> PCIDevice *pci;
>>> DeviceState *dev;
>>> int devfn = PCI_DEVFN(10, 0);
>>> @@ -257,6 +308,7 @@ DeviceState *piix4_create(PCIBus *pci_bus, ISABus 
>>> **isa_bus, I2CBus **smbus)
>>> pci = pci_create_simple_multifunction(pci_bus, devfn,  true,
>>>   TYPE_PIIX4_PCI_DEVICE);
>>> dev = DEVICE(pci);
>>> +s = PIIX4_PCI_DEVICE(pci);
>>> if (isa_bus) {
>>> *isa_bus = ISA_BUS(qdev_get_child_bus(dev, "isa.0"));
>>> }
>>> @@ -271,5 +323,11 @@ DeviceState *piix4_create(PCIBus *pci_bus, ISABus 
>>> **isa_bus, I2CBus **smbus)
>>>NULL, 0, NULL);
>>> }
>>> 
>>> +pci_bus_irqs(pci_bus, piix4_set_irq, pci_slot_get_pirq, s->i8259, 4);
>>> +
>>> +for (int i = 0; i < ISA_NUM_IRQS; i++) {
>
>Sorry one more nit. This is code movement but are we OK with declaring 
>local variable within for? I thinks coding style advises against this, not 
>sure if checkpatch accepts it or not. This could be cleaned up the next 
>patch together with removing the i8259 field which are simple enough to do 
>in one patch then this one stays as code movement and changes in next 
>patch.

I'll move the i8259-removing patch right after the code movement patch where 
this loop is removed entirely.

>>> +s->i8259[i] = qdev_get_gpio_in_named(dev, "isa", i);
>>> +}
>>> +
>>> return dev;
>>> }
>>> diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
>[...]
>>> diff --git a/include/hw/mips/mips.h b/include/hw/mips/mips.h
>>> index 6c9c8805f3..ff88942e63 100644
>>> --- a/include/hw/mips/mips.h
>>> +++ b/include/hw/mips/mips.h
>>> @@ -10,7 +10,7 @@
>>> #include "exec/memory.h"
>>> 
>>> /* gt64xxx.c */
>>> -PCIBus *gt64120_register(qemu_irq *pic);
>>> +PCIBus *gt64120_register(void);
>>
>> Now that you don't need to pass anything to it, do you still need this 
>> function? Maybe what it does now could be done in the gt64120 device's 
>> realize functi

bdrv_is_allocated

2022-02-13 Thread 沈梦姣
Hi,
I’m trying to understand this function, but seems no note in the header file, 
could anyone help explain this function? It will be great if there is an 
example. Thanks in advance!

thanks 


Re: [PATCH v2 2/5] pci: Always pass own DeviceState to pci_map_irq_fn's

2022-02-13 Thread Bernhard Beschow
Am 12. Februar 2022 17:13:19 MEZ schrieb BALATON Zoltan :
>On Sat, 12 Feb 2022, Bernhard Beschow wrote:
>> Am 12. Februar 2022 14:27:32 MEZ schrieb BALATON Zoltan :
>>> On Sat, 12 Feb 2022, Bernhard Beschow wrote:
 Passing own DeviceState rather than just the IRQs allows for resolving
 global variables.
>>>
>>> Do you mean pci_set_irq_fn instead of pci_map_irq_fn in the patch title?
>>
>> I'm referring to the typedef in pci.h because the patch establishes
>> consistency across the board.
>>
 Signed-off-by: Bernhard Beschow 
 Reviewed-by: Peter Maydell 
 Reviewed-by: Philippe Mathieu-Daudé 
 ---
 hw/isa/piix4.c  | 6 +++---
 hw/pci-host/sh_pci.c| 6 +++---
 hw/pci-host/versatile.c | 6 +++---
 hw/ppc/ppc440_pcix.c| 6 +++---
 hw/ppc/ppc4xx_pci.c | 6 +++---
 5 files changed, 15 insertions(+), 15 deletions(-)
>>>
>>> You don't seem to change any global function definition that reqires this
>>> change in all these devices so why can't these decide on their own what
>>> their preferred opaque data is for their set irq function and only change
>>> piix4? Am I missing something? I'm not opposed to this change but it seems
>>> to be unnecessary to touch other device implementations for this and may
>>> just make them more complex for no good reason.
>>
>> This patch is a segway into a direction where the type of the opaque 
>> parameter
>> of the pci_map_irq_fn typedef could be changed from void* to DeviceState* in
>
>I'm still not quite sure what you mean considering these typedefs in 
>include/hw/pci/pci.h:
>
>typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
>typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
>typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
>
>pci_map_irq_fn already has PCIDevice *, it's pci_set_irq_fn that has void 
>*opaque and is what this patch appears to be changing. Am I looking at the 
>wrong typedefs?

Oh sorry, I mixed things up. You're correct: I meant pci_set_irq_fn.

>> order to facilitate the more typesafe QOM casting. I see, though, why this is
>> confusing in the context of this patch series. I don't have strong feelings 
>> for
>> converting the other devices or not. So I can only change piix4 if desired.
>
>Yes that seems to be an independent cahange so maybe it's better to just 
>change piix4 in this series and then have a separate patch for changing 
>the void *opaque to DeviceState * independent of this series. But I'm not 
>sure that's necessarily a good idea. It may give some more checks but for 
>callbacks used internally by device implementations I think it can be 
>expected that code that registers the callback also knows what its opaque 
>data should be so it does not have to be checked additionally, especially 
>in code that may be called frequently. Although in a similar via-ide 
>callback I could not measure a big penalty the last time I tried but I 
>suspect there still mey be some overhead involving QOM casts (maybe there 
>are just bigger bottle necks in ide emulation so it was masked) so I'm not 
>sure it's worth the added complexity but if others prefer that I'm not 
>that much opposed to it but it's clearer as a separate change anyway.

I'll just change piix4, leaving the other devices as is. This also allows for 
merging this patch with the next.

Regards,
Bernhard

>Regards,
>BALATON Zoltan



Re: [PATCH v2 1/5] malta: Move PCI interrupt handling from gt64xxx to piix4

2022-02-13 Thread Bernhard Beschow
Am 12. Februar 2022 14:18:54 MEZ schrieb BALATON Zoltan :
>On Sat, 12 Feb 2022, Bernhard Beschow wrote:
>> Handling PCI interrupts in piix4 increases cohesion and reduces differences
>> between piix4 and piix3.
>>
>> Signed-off-by: Bernhard Beschow 
>> Reviewed-by: Philippe Mathieu-Daudé 
>
>Sorry for being late in commenting, I've missed the first round. Apologies 
>if this causes a delay or another version.

Don't worry. Your comments are appreciated!

>> ---
>> hw/isa/piix4.c | 58 +++
>> hw/mips/gt64xxx_pci.c  | 62 --
>> hw/mips/malta.c|  6 +---
>> include/hw/mips/mips.h |  2 +-
>> 4 files changed, 65 insertions(+), 63 deletions(-)
>>
>> diff --git a/hw/isa/piix4.c b/hw/isa/piix4.c
>> index 0fe7b69bc4..5a86308689 100644
>> --- a/hw/isa/piix4.c
>> +++ b/hw/isa/piix4.c
>> @@ -45,6 +45,7 @@ struct PIIX4State {
>> PCIDevice dev;
>> qemu_irq cpu_intr;
>> qemu_irq *isa;
>> +qemu_irq i8259[ISA_NUM_IRQS];
>>
>> RTCState rtc;
>> /* Reset Control Register */
>> @@ -54,6 +55,30 @@ struct PIIX4State {
>>
>> OBJECT_DECLARE_SIMPLE_TYPE(PIIX4State, PIIX4_PCI_DEVICE)
>>
>> +static int pci_irq_levels[4];
>> +
>> +static void piix4_set_irq(void *opaque, int irq_num, int level)
>> +{
>> +int i, pic_irq, pic_level;
>> +qemu_irq *pic = opaque;
>> +
>> +pci_irq_levels[irq_num] = level;
>> +
>> +/* now we change the pic irq level according to the piix irq mappings */
>> +/* XXX: optimize */
>> +pic_irq = piix4_dev->config[PIIX_PIRQCA + irq_num];
>> +if (pic_irq < 16) {
>> +/* The pic level is the logical OR of all the PCI irqs mapped to 
>> it. */
>> +pic_level = 0;
>> +for (i = 0; i < 4; i++) {
>> +if (pic_irq == piix4_dev->config[PIIX_PIRQCA + i]) {
>> +pic_level |= pci_irq_levels[i];
>> +}
>> +}
>> +qemu_set_irq(pic[pic_irq], pic_level);
>> +}
>> +}
>> +
>> static void piix4_isa_reset(DeviceState *dev)
>> {
>> PIIX4State *d = PIIX4_PCI_DEVICE(dev);
>> @@ -248,8 +273,34 @@ static void piix4_register_types(void)
>>
>> type_init(piix4_register_types)
>>
>> +static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
>> +{
>> +int slot;
>> +
>> +slot = PCI_SLOT(pci_dev->devfn);
>> +
>> +switch (slot) {
>> +/* PIIX4 USB */
>> +case 10:
>> +return 3;
>> +/* AMD 79C973 Ethernet */
>> +case 11:
>> +return 1;
>> +/* Crystal 4281 Sound */
>> +case 12:
>> +return 2;
>> +/* PCI slot 1 to 4 */
>> +case 18 ... 21:
>> +return ((slot - 18) + irq_num) & 0x03;
>> +/* Unknown device, don't do any translation */
>> +default:
>> +return irq_num;
>> +}
>> +}
>> +
>> DeviceState *piix4_create(PCIBus *pci_bus, ISABus **isa_bus, I2CBus **smbus)
>> {
>> +PIIX4State *s;
>> PCIDevice *pci;
>> DeviceState *dev;
>> int devfn = PCI_DEVFN(10, 0);
>> @@ -257,6 +308,7 @@ DeviceState *piix4_create(PCIBus *pci_bus, ISABus 
>> **isa_bus, I2CBus **smbus)
>> pci = pci_create_simple_multifunction(pci_bus, devfn,  true,
>>   TYPE_PIIX4_PCI_DEVICE);
>> dev = DEVICE(pci);
>> +s = PIIX4_PCI_DEVICE(pci);
>> if (isa_bus) {
>> *isa_bus = ISA_BUS(qdev_get_child_bus(dev, "isa.0"));
>> }
>> @@ -271,5 +323,11 @@ DeviceState *piix4_create(PCIBus *pci_bus, ISABus 
>> **isa_bus, I2CBus **smbus)
>>NULL, 0, NULL);
>> }
>>
>> +pci_bus_irqs(pci_bus, piix4_set_irq, pci_slot_get_pirq, s->i8259, 4);
>> +
>> +for (int i = 0; i < ISA_NUM_IRQS; i++) {
>> +s->i8259[i] = qdev_get_gpio_in_named(dev, "isa", i);
>> +}
>> +
>> return dev;
>> }
>> diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
>> index c7480bd019..9e23e32eff 100644
>> --- a/hw/mips/gt64xxx_pci.c
>> +++ b/hw/mips/gt64xxx_pci.c
>> @@ -981,56 +981,6 @@ static const MemoryRegionOps isd_mem_ops = {
>> },
>> };
>>
>> -static int gt64120_pci_map_irq(PCIDevice *pci_dev, int irq_num)
>> -{
>> -int slot;
>> -
>> -slot = PCI_SLOT(pci_dev->devfn);
>> -
>> -switch (slot) {
>> -/* PIIX4 USB */
>> -case 10:
>> -return 3;
>> -/* AMD 79C973 Ethernet */
>> -case 11:
>> -return 1;
>> -/* Crystal 4281 Sound */
>> -case 12:
>> -return 2;
>> -/* PCI slot 1 to 4 */
>> -case 18 ... 21:
>> -return ((slot - 18) + irq_num) & 0x03;
>> -/* Unknown device, don't do any translation */
>> -default:
>> -return irq_num;
>> -}
>> -}
>> -
>> -static int pci_irq_levels[4];
>> -
>> -static void gt64120_pci_set_irq(void *opaque, int irq_num, int level)
>> -{
>> -int i, pic_irq, pic_level;
>> -qemu_irq *pic = opaque;
>> -
>> -pci_irq_levels[irq_num] = level;
>> -
>> -/* now we change the pic irq level according to the piix irq mappings */
>> -/* XXX: optimize */
>> -

Re: [PATCH v2 2/5] pci: Always pass own DeviceState to pci_map_irq_fn's

2022-02-13 Thread BALATON Zoltan

On Sun, 13 Feb 2022, Bernhard Beschow wrote:

Am 12. Februar 2022 17:13:19 MEZ schrieb BALATON Zoltan :

On Sat, 12 Feb 2022, Bernhard Beschow wrote:

Am 12. Februar 2022 14:27:32 MEZ schrieb BALATON Zoltan :

On Sat, 12 Feb 2022, Bernhard Beschow wrote:

Passing own DeviceState rather than just the IRQs allows for resolving
global variables.


Do you mean pci_set_irq_fn instead of pci_map_irq_fn in the patch title?


I'm referring to the typedef in pci.h because the patch establishes
consistency across the board.


Signed-off-by: Bernhard Beschow 
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
---
hw/isa/piix4.c  | 6 +++---
hw/pci-host/sh_pci.c| 6 +++---
hw/pci-host/versatile.c | 6 +++---
hw/ppc/ppc440_pcix.c| 6 +++---
hw/ppc/ppc4xx_pci.c | 6 +++---
5 files changed, 15 insertions(+), 15 deletions(-)


You don't seem to change any global function definition that reqires this
change in all these devices so why can't these decide on their own what
their preferred opaque data is for their set irq function and only change
piix4? Am I missing something? I'm not opposed to this change but it seems
to be unnecessary to touch other device implementations for this and may
just make them more complex for no good reason.


This patch is a segway into a direction where the type of the opaque parameter
of the pci_map_irq_fn typedef could be changed from void* to DeviceState* in


I'm still not quite sure what you mean considering these typedefs in
include/hw/pci/pci.h:

typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);

pci_map_irq_fn already has PCIDevice *, it's pci_set_irq_fn that has void
*opaque and is what this patch appears to be changing. Am I looking at the
wrong typedefs?


Oh sorry, I mixed things up. You're correct: I meant pci_set_irq_fn.


order to facilitate the more typesafe QOM casting. I see, though, why this is
confusing in the context of this patch series. I don't have strong feelings for
converting the other devices or not. So I can only change piix4 if desired.


Yes that seems to be an independent cahange so maybe it's better to just
change piix4 in this series and then have a separate patch for changing
the void *opaque to DeviceState * independent of this series. But I'm not
sure that's necessarily a good idea. It may give some more checks but for
callbacks used internally by device implementations I think it can be
expected that code that registers the callback also knows what its opaque
data should be so it does not have to be checked additionally, especially
in code that may be called frequently. Although in a similar via-ide
callback I could not measure a big penalty the last time I tried but I
suspect there still mey be some overhead involving QOM casts (maybe there
are just bigger bottle necks in ide emulation so it was masked) so I'm not
sure it's worth the added complexity but if others prefer that I'm not
that much opposed to it but it's clearer as a separate change anyway.


I'll just change piix4, leaving the other devices as is. This also allows for 
merging this patch with the next.


I'm not sure what will be the next patch after all the changes we were 
talking about but don't put too many changes in one patch. It may worth 
keeping this as a separate change even if it's simple and only affecting 
piix4 now just to make it clear and simpler to review and maybe bisect 
later. Generally one change per patch is preferred even if it results in a 
lot of small patches (unless maybe combining a small style fix or very 
simple one line change with another simple patch that may not worth a 
separate patch). This may be over that limit so better to keep as separate 
patch but again not sure what patch you meant to combine it with but the 
patch submission guidelines say separate changes should be separate 
patches.


Regards,
BALATON Zoltan

Re: bdrv_is_allocated

2022-02-13 Thread Philippe Mathieu-Daudé via

On 13/2/22 15:24, 沈梦姣 wrote:

Hi,
I’m trying to understand this function, but seems no note in the header file, 
could anyone help explain this function? It will be great if there is an 
example. Thanks in advance!

thanks


Cc'ing qemu-block@ list.



Re: [PULL 0/5] 9p queue 2022-02-10

2022-02-13 Thread Peter Maydell
On Thu, 10 Feb 2022 at 11:33, Christian Schoenebeck
 wrote:
>
> The following changes since commit 0a301624c2f4ced3331ffd5bce85b4274fe132af:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20220208' into staging (2022-02-08 
> 11:40:08 +)
>
> are available in the Git repository at:
>
>   https://github.com/cschoenebeck/qemu.git tags/pull-9p-20220210
>
> for you to fetch changes up to de19c79dad6a2cad54ae04ce754d47c07bf9bc93:
>
>   9pfs: Fix segfault in do_readdir_many caused by struct dirent overread 
> (2022-02-10 11:56:01 +0100)
>
> 
> 9pfs: fixes and cleanup
>
> * Fifth patch fixes a 9pfs server crash that happened on some systems due
>   to incorrect (system dependant) handling of struct dirent size.
>
> * Tests: Second patch fixes a test error that happened on some systems due
>   mkdir() being called twice for creating the test directory for the 9p
>   'local' tests.
>
> * Tests: Third patch fixes a memory leak.
>
> * Tests: The remaining two patches are code cleanup.
>
> 

Hi; this fails CI for the build-oss-fuzz job, which finds
a heap-buffer-overflow:
https://gitlab.com/qemu-project/qemu/-/jobs/2087610013

8/152 qemu:qtest+qtest-i386 / qtest-i386/qos-test ERROR 66.74s killed
by signal 6 SIGABRT
>>> QTEST_QEMU_BINARY=./qemu-system-i386 
>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>> MALLOC_PERTURB_=120 
>>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh 
>>> QTEST_QEMU_IMG=./qemu-img 
>>> /builds/qemu-project/qemu/build-oss-fuzz/tests/qtest/qos-test --tap -k
― ✀ ―
Listing only the last 100 lines from a long log.
For details see https://github.com/google/sanitizers/issues/189
==7270==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
==7270==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7ffc79fb; bottom 0x7ff908ffd000; size:
0x000370fb3000 (14780411904)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
==7276==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
==7276==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7fff7e4a8000; bottom 0x7fd6363fd000; size:
0x0029480ab000 (177302319104)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
==7282==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
==7282==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7ffee6e7f000; bottom 0x7f32fb5fd000; size:
0x00cbeb882000 (875829927936)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
==7288==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
==7288==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7ffc6118e000; bottom 0x7f6391cfd000; size:
0x0098cf491000 (656312700928)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
==7294==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
==7294==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7ffef665d000; bottom 0x7f69dc8fd000; size:
0x009519d6 (640383582208)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
==7300==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
==7300==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7ffe33db; bottom 0x7f01421fd000; size:
0x00fcf1bb3000 (1086387335168)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
==7306==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
==7306==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7ffebd618000; bottom 0x7ff1179fd000; size:
0x000da5c1b000 (58615508992)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
=
==7306==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x61230768 at pc 0x562351066c74 bp 0x7ff1078c3a90 sp
0x7ff1078c3240
READ of size 48830 at 0x61230768 thread T4
#0 0x562351066c73 in __interceptor_memcpy.part.0 asan_intercep

Re: [PATCH] hw/virtio: vdpa: Fix leak of host-notifier memory-region

2022-02-13 Thread Jason Wang
On Sat, Feb 12, 2022 at 1:03 AM Laurent Vivier  wrote:
>
> If call virtio_queue_set_host_notifier_mr fails, should free
> host-notifier memory-region.
>
> This problem can trigger a coredump with some vDPA drivers (mlx5,
> but not with the vdpasim), if we unplug the virtio-net card from
> the guest after a stop/start.
>
> The same fix has been done for vhost-user:
>   1f89d3b91e3e ("hw/virtio: Fix leak of host-notifier memory-region")
>
> Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if 
> possible")
> Cc: jasow...@redhat.com
> Resolves: https://bugzilla.redhat.com/2027208
> Signed-off-by: Laurent Vivier 

Cc: qemu-sta...@nongnu.org
Acked-by: Jason Wang 

> ---
>  hw/virtio/vhost-vdpa.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 04ea43704f5d..11f696468dc1 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -431,6 +431,7 @@ static int vhost_vdpa_host_notifier_init(struct vhost_dev 
> *dev, int queue_index)
>  g_free(name);
>
>  if (virtio_queue_set_host_notifier_mr(vdev, queue_index, &n->mr, true)) {
> +object_unparent(OBJECT(&n->mr));
>  munmap(addr, page_size);
>  goto err;
>  }
> --
> 2.34.1
>




Re: [RFC] vhost-vdpa: make notifiers _init()/_uninit() symmetric

2022-02-13 Thread Jason Wang
On Sat, Feb 12, 2022 at 12:13 AM Laurent Vivier  wrote:
>
> vhost_vdpa_host_notifiers_init() initializes queue notifiers
> for queues "dev->vq_index" to queue "dev->vq_index + dev->nvqs",
> whereas vhost_vdpa_host_notifiers_uninit() uninitializes the
> same notifiers for queue "0" to queue "dev->nvqs".
>
> This asymmetry seems buggy, fix that by using dev->vq_index
> as the base for both.
>
> Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if 
> possible")
> Cc: jasow...@redhat.com
> Signed-off-by: Laurent Vivier 
> ---
>  hw/virtio/vhost-vdpa.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 04ea43704f5d..9be3dc66580c 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -395,15 +395,6 @@ static void vhost_vdpa_host_notifier_uninit(struct 
> vhost_dev *dev,
>  }
>  }
>
> -static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
> -{
> -int i;
> -
> -for (i = 0; i < n; i++) {
> -vhost_vdpa_host_notifier_uninit(dev, i);
> -}
> -}
> -
>  static int vhost_vdpa_host_notifier_init(struct vhost_dev *dev, int 
> queue_index)
>  {
>  size_t page_size = qemu_real_host_page_size;
> @@ -442,6 +433,15 @@ err:
>  return -1;
>  }
>
> +static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
> +{
> +int i;
> +
> +for (i = dev->vq_index; i < dev->vq_index + n; i++) {
> +vhost_vdpa_host_notifier_uninit(dev, i);
> +}
> +}

Patch looks good but I wonder why we need to move this function?

Thanks

> +
>  static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
>  {
>  int i;
> @@ -455,7 +455,7 @@ static void vhost_vdpa_host_notifiers_init(struct 
> vhost_dev *dev)
>  return;
>
>  err:
> -vhost_vdpa_host_notifiers_uninit(dev, i);
> +vhost_vdpa_host_notifiers_uninit(dev, i - dev->vq_index);
>  return;
>  }
>
> --
> 2.34.1
>




Re: [PATCH v4 12/13] ui/cocoa: Remove allowedFileTypes restriction in SavePanel

2022-02-13 Thread Cameron Esfahani
Reviewed-by: Cameron Esfahani mailto:di...@apple.com>>


> On Feb 11, 2022, at 8:34 AM, Philippe Mathieu-Daudé via 
>  wrote:
> 
> setAllowedFileTypes is deprecated in macOS 12.
> 
> Per Akihiko Odaki [*]:
> 
>  An image file, which is being chosen by the panel, can be a
>  raw file and have a variety of file extensions and many are not
>  covered by the provided list (e.g. "udf"). Other platforms like
>  GTK can provide an option to open a file with an extension not
>  listed, but Cocoa can't. It forces the user to rename the file
>  to give an extension in the list. Moreover, Cocoa does not tell
>  which extensions are in the list so the user needs to read the
>  source code, which is pretty bad.
> 
> Since this code is harming the usability rather than improving it,
> simply remove the [NSSavePanel allowedFileTypes:] call, fixing:
> 
>  [2789/6622] Compiling Objective-C object libcommon.fa.p/ui_cocoa.m.o
>  ui/cocoa.m:1411:16: error: 'setAllowedFileTypes:' is deprecated: first 
> deprecated in macOS 12.0 - Use -allowedContentTypes instead 
> [-Werror,-Wdeprecated-declarations]
>  [openPanel setAllowedFileTypes: supportedImageFileTypes];
> ^
>  
> /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks/AppKit.framework/Headers/NSSavePanel.h:215:49:
>  note: property 'allowedFileTypes' is declared deprecated here
>  @property (nullable, copy) NSArray *allowedFileTypes 
> API_DEPRECATED("Use -allowedContentTypes instead", macos(10.3,12.0));
>  ^
>  
> /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks/AppKit.framework/Headers/NSSavePanel.h:215:49:
>  note: 'setAllowedFileTypes:' has been explicitly marked deprecated here
>  FAILED: libcommon.fa.p/ui_cocoa.m.o
> 
> [*] 
> https://lore.kernel.org/qemu-devel/4dde2e66-63cb-4390-9538-c032310db...@gmail.com/
> 
> Suggested-by: Akihiko Odaki 
> Reviewed-by: Roman Bolshakov 
> Tested-by: Roman Bolshakov 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> ui/cocoa.m | 6 --
> 1 file changed, 6 deletions(-)
> 
> diff --git a/ui/cocoa.m b/ui/cocoa.m
> index ac18e14ce0..7a1ddd4075 100644
> --- a/ui/cocoa.m
> +++ b/ui/cocoa.m
> @@ -100,7 +100,6 @@ static int gArgc;
> static char **gArgv;
> static bool stretch_video;
> static NSTextField *pauseLabel;
> -static NSArray * supportedImageFileTypes;
> 
> static QemuSemaphore display_init_sem;
> static QemuSemaphore app_started_sem;
> @@ -1168,10 +1167,6 @@ QemuCocoaView *cocoaView;
> [pauseLabel setTextColor: [NSColor blackColor]];
> [pauseLabel sizeToFit];
> 
> -// set the supported image file types that can be opened
> -supportedImageFileTypes = [NSArray arrayWithObjects: @"img", @"iso", 
> @"dmg",
> - @"qcow", @"qcow2", @"cloop", @"vmdk", 
> @"cdr",
> -  @"toast", nil];
> [self make_about_window];
> }
> return self;
> @@ -1414,7 +1409,6 @@ QemuCocoaView *cocoaView;
> openPanel = [NSOpenPanel openPanel];
> [openPanel setCanChooseFiles: YES];
> [openPanel setAllowsMultipleSelection: NO];
> -[openPanel setAllowedFileTypes: supportedImageFileTypes];
> if([openPanel runModal] == NSModalResponseOK) {
> NSString * file = [[[openPanel URLs] objectAtIndex: 0] path];
> if(file == nil) {
> -- 
> 2.34.1
> 
> 



Re: [PATCH v4 09/13] block/file-posix: Remove a deprecation warning on macOS 12

2022-02-13 Thread Cameron Esfahani
Reviewed by: Cameron Esfahani mailto:di...@apple.com>>

> On Feb 11, 2022, at 8:34 AM, Philippe Mathieu-Daudé via 
>  wrote:
> 
> When building on macOS 12 we get:
> 
>  block/file-posix.c:3335:18: warning: 'IOMasterPort' is deprecated: first 
> deprecated in macOS 12.0 [-Wdeprecated-declarations]
>  kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
>   ^~~~
>   IOMainPort
> 
> Replace by IOMainPort, redefining it to IOMasterPort if not available.
> 
> Suggested-by: Akihiko Odaki 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> block/file-posix.c | 14 ++
> 1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 1f1756e192..13393ad296 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -3319,17 +3319,23 @@ BlockDriver bdrv_file = {
> #if defined(__APPLE__) && defined(__MACH__)
> static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
> CFIndex maxPathSize, int flags);
> +
> +#if !defined(MAC_OS_VERSION_12_0) \
> +|| (MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_VERSION_12_0)
> +#define IOMainPort IOMasterPort
> +#endif
> +
> static char *FindEjectableOpticalMedia(io_iterator_t *mediaIterator)
> {
> kern_return_t kernResult = KERN_FAILURE;
> -mach_port_t masterPort;
> +mach_port_t mainPort;
> CFMutableDictionaryRef  classesToMatch;
> const char *matching_array[] = {kIODVDMediaClass, kIOCDMediaClass};
> char *mediaType = NULL;
> 
> -kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
> +kernResult = IOMainPort(MACH_PORT_NULL, &mainPort);
> if ( KERN_SUCCESS != kernResult ) {
> -printf( "IOMasterPort returned %d\n", kernResult );
> +printf("IOMainPort returned %d\n", kernResult);
> }
> 
> int index;
> @@ -3342,7 +3348,7 @@ static char *FindEjectableOpticalMedia(io_iterator_t 
> *mediaIterator)
> }
> CFDictionarySetValue(classesToMatch, CFSTR(kIOMediaEjectableKey),
>  kCFBooleanTrue);
> -kernResult = IOServiceGetMatchingServices(masterPort, classesToMatch,
> +kernResult = IOServiceGetMatchingServices(mainPort, classesToMatch,
>   mediaIterator);
> if (kernResult != KERN_SUCCESS) {
> error_report("Note: IOServiceGetMatchingServices returned %d",
> -- 
> 2.34.1
> 
> 



Re: [PATCH v2] hw/net: e1000e: Clear ICR on read when using non MSI-X interrupts

2022-02-13 Thread Jason Wang



在 2022/2/12 下午5:44, Nick Hudson 写道:

In section 7.4.3 of the 82574 datasheet it states that

 "In systems that do not support MSI-X, reading the ICR
  register clears it's bits..."

Some OSes rely on this.

Signed-off-by: Nick Hudson 



Applied.

Thanks



---
  hw/net/e1000e_core.c | 5 +
  hw/net/trace-events  | 1 +
  2 files changed, 6 insertions(+)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 8ae6fb7e14..2c51089a82 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2607,6 +2607,11 @@ e1000e_mac_icr_read(E1000ECore *core, int index)
  core->mac[ICR] = 0;
  }
  
+if (!msix_enabled(core->owner)) {

+trace_e1000e_irq_icr_clear_nonmsix_icr_read();
+core->mac[ICR] = 0;
+}
+
  if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
  (core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
  trace_e1000e_irq_icr_clear_iame();
diff --git a/hw/net/trace-events b/hw/net/trace-events
index 643338f610..4c0ec3fda1 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -221,6 +221,7 @@ e1000e_irq_write_ics(uint32_t val) "Adding ICR bits 0x%x"
  e1000e_irq_icr_process_iame(void) "Clearing IMS bits due to IAME"
  e1000e_irq_read_ics(uint32_t ics) "Current ICS: 0x%x"
  e1000e_irq_read_ims(uint32_t ims) "Current IMS: 0x%x"
+e1000e_irq_icr_clear_nonmsix_icr_read(void) "Clearing ICR on read due to non MSI-X 
int"
  e1000e_irq_icr_read_entry(uint32_t icr) "Starting ICR read. Current ICR: 0x%x"
  e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current ICR: 0x%x"
  e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to zero IMS"





[PATCH 1/8] hw/net/vmxnet3: Log guest-triggerable errors using LOG_GUEST_ERROR

2022-02-13 Thread Jason Wang
From: Philippe Mathieu-Daudé 

The "Interrupt Cause" register (VMXNET3_REG_ICR) is read-only.
Write accesses are ignored. Log them with as LOG_GUEST_ERROR
instead of aborting:

  [R +0.239743] writeq 0xe0002031 0x46291a5a55460800
  ERROR:hw/net/vmxnet3.c:1819:vmxnet3_io_bar1_write: code should not be reached
  Thread 1 "qemu-system-i38" received signal SIGABRT, Aborted.
  (gdb) bt
  #3  0x74c397d3 in __GI_abort () at abort.c:79
  #4  0x76d3cd4c in g_assertion_message (domain=, 
file=, line=, func=, 
message=) at ../glib/gtestutils.c:3223
  #5  0x76d9d45f in g_assertion_message_expr
  (domain=0x0, file=0x59fc2e53 "hw/net/vmxnet3.c", line=1819, 
func=0x59fc11e0 <__func__.vmxnet3_io_bar1_write> "vmxnet3_io_bar1_write", 
expr=)
  at ../glib/gtestutils.c:3249
  #6  0x57e80a3a in vmxnet3_io_bar1_write (opaque=0x62814100, addr=56, val=70, 
size=4) at hw/net/vmxnet3.c:1819
  #7  0x58c2d894 in memory_region_write_accessor (mr=0x62816b90, addr=56, 
value=0x7fff9450, size=4, shift=0, mask=4294967295, attrs=...) at 
softmmu/memory.c:492
  #8  0x58c2d1d2 in access_with_adjusted_size (addr=56, value=0x7fff9450, 
size=1, access_size_min=4, access_size_max=4, access_fn=
  0x58c2d290 , mr=0x62816b90, attrs=...) at 
softmmu/memory.c:554
  #9  0x58c2bae7 in memory_region_dispatch_write (mr=0x62816b90, addr=56, 
data=70, op=MO_8, attrs=...) at softmmu/memory.c:1504
  #10 0x58bfd034 in flatview_write_continue (fv=0x606000181700, 
addr=0xe0002038, attrs=..., ptr=0x7fffb9e0, len=1, addr1=56, l=1, mr=0x62816b90)
  at softmmu/physmem.c:2782
  #11 0x58beba00 in flatview_write (fv=0x606000181700, addr=0xe0002031, 
attrs=..., buf=0x7fffb9e0, len=8) at softmmu/physmem.c:2822
  #12 0x58beb589 in address_space_write (as=0x60815f20, addr=0xe0002031, 
attrs=..., buf=0x7fffb9e0, len=8) at softmmu/physmem.c:2914

Reported-by: Dike 
Reported-by: Duhao <504224...@qq.com>
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2032932
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Jason Wang 
---
 hw/net/vmxnet3.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index f65af4e9ef..0b7acf7f89 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1816,7 +1816,9 @@ vmxnet3_io_bar1_write(void *opaque,
 case VMXNET3_REG_ICR:
 VMW_CBPRN("Write BAR1 [VMXNET3_REG_ICR] = %" PRIx64 ", size %d",
   val, size);
-g_assert_not_reached();
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: write to read-only register VMXNET3_REG_ICR\n",
+  TYPE_VMXNET3);
 break;
 
 /* Event Cause Register */
-- 
2.32.0 (Apple Git-132)




[PATCH 2/8] net/tap: Set return code on failure

2022-02-13 Thread Jason Wang
From: Peter Foley 

Match the other error handling in this function.

Fixes: e7b347d0bf6 ("net: detect errors from probing vnet hdr flag for TAP 
devices")

Reviewed-by: Patrick Venture 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Peter Foley 
Signed-off-by: Jason Wang 
---
 net/tap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tap.c b/net/tap.c
index f716be3e3f..c5cbeaa7a2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -900,6 +900,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
 if (i == 0) {
 vnet_hdr = tap_probe_vnet_hdr(fd, errp);
 if (vnet_hdr < 0) {
+ret = -1;
 goto free_fail;
 }
 } else if (vnet_hdr != tap_probe_vnet_hdr(fd, NULL)) {
-- 
2.32.0 (Apple Git-132)




[PATCH 5/8] net/colo-compare.c: Update the default value comments

2022-02-13 Thread Jason Wang
From: Zhang Chen 

Make the comments consistent with the REGULAR_PACKET_CHECK_MS.

Signed-off-by: Zhang Chen 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 216de5a12b..62554b5b3c 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1267,7 +1267,7 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
 }
 
 if (!s->expired_scan_cycle) {
-/* Set default value to 3000 MS */
+/* Set default value to 1000 MS */
 s->expired_scan_cycle = REGULAR_PACKET_CHECK_MS;
 }
 
-- 
2.32.0 (Apple Git-132)




[PATCH 7/8] hw/net: e1000e: Clear ICR on read when using non MSI-X interrupts

2022-02-13 Thread Jason Wang
From: Nick Hudson 

In section 7.4.3 of the 82574 datasheet it states that

"In systems that do not support MSI-X, reading the ICR
 register clears it's bits..."

Some OSes rely on this.

Signed-off-by: Nick Hudson 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 5 +
 hw/net/trace-events  | 1 +
 2 files changed, 6 insertions(+)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 8ae6fb7e14..2c51089a82 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2607,6 +2607,11 @@ e1000e_mac_icr_read(E1000ECore *core, int index)
 core->mac[ICR] = 0;
 }
 
+if (!msix_enabled(core->owner)) {
+trace_e1000e_irq_icr_clear_nonmsix_icr_read();
+core->mac[ICR] = 0;
+}
+
 if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
 (core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
 trace_e1000e_irq_icr_clear_iame();
diff --git a/hw/net/trace-events b/hw/net/trace-events
index 643338f610..4c0ec3fda1 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -221,6 +221,7 @@ e1000e_irq_write_ics(uint32_t val) "Adding ICR bits 0x%x"
 e1000e_irq_icr_process_iame(void) "Clearing IMS bits due to IAME"
 e1000e_irq_read_ics(uint32_t ics) "Current ICS: 0x%x"
 e1000e_irq_read_ims(uint32_t ims) "Current IMS: 0x%x"
+e1000e_irq_icr_clear_nonmsix_icr_read(void) "Clearing ICR on read due to non 
MSI-X int"
 e1000e_irq_icr_read_entry(uint32_t icr) "Starting ICR read. Current ICR: 0x%x"
 e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current ICR: 0x%x"
 e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to zero IMS"
-- 
2.32.0 (Apple Git-132)




[PATCH 3/8] net: Fix uninitialized data usage

2022-02-13 Thread Jason Wang
From: Peter Foley 

e.g.
1109 15:16:20.151506 Uninitialized bytes in ioctl_common_pre at offset 0 inside 
[0x7ffc516af9b8, 4)
 1109 15:16:20.151659 ==588974==WARNING: MemorySanitizer: 
use-of-uninitialized-value
 1109 15:16:20.312923 #0 0x5639b88acb21 in tap_probe_vnet_hdr_len 
third_party/qemu/net/tap-linux.c:183:9
 1109 15:16:20.312952 #1 0x5639b88afd66 in net_tap_fd_init 
third_party/qemu/net/tap.c:409:9
 1109 15:16:20.312954 #2 0x5639b88b2d1b in net_init_tap_one 
third_party/qemu/net/tap.c:681:19
 1109 15:16:20.312956 #3 0x5639b88b16a8 in net_init_tap 
third_party/qemu/net/tap.c:912:13
 1109 15:16:20.312957 #4 0x5639b8890175 in net_client_init1 
third_party/qemu/net/net.c:1110:9
 1109 15:16:20.312958 #5 0x5639b888f912 in net_client_init 
third_party/qemu/net/net.c:1208:15
 1109 15:16:20.312960 #6 0x5639b8894aa5 in net_param_nic 
third_party/qemu/net/net.c:1588:11
 1109 15:16:20.312961 #7 0x5639b900cd18 in qemu_opts_foreach 
third_party/qemu/util/qemu-option.c:1135:14
 1109 15:16:20.312962 #8 0x5639b889393c in net_init_clients 
third_party/qemu/net/net.c:1612:9
 1109 15:16:20.312964 #9 0x5639b717aaf3 in qemu_create_late_backends 
third_party/qemu/softmmu/vl.c:1962:5
 1109 15:16:20.312965 #10 0x5639b717aaf3 in qemu_init 
third_party/qemu/softmmu/vl.c:3694:5
 1109 15:16:20.312967 #11 0x5639b71083b8 in main 
third_party/qemu/softmmu/main.c:49:5
 1109 15:16:20.312968 #12 0x7f464de1d8d2 in __libc_start_main 
(/usr/grte/v5/lib64/libc.so.6+0x628d2)
 1109 15:16:20.312969 #13 0x5639b6bbd389 in _start 
/usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
 1109 15:16:20.312970
 1109 15:16:20.312975   Uninitialized value was stored to memory at
 1109 15:16:20.313393 #0 0x5639b88acbee in tap_probe_vnet_hdr_len 
third_party/qemu/net/tap-linux.c
 1109 15:16:20.313396 #1 0x5639b88afd66 in net_tap_fd_init 
third_party/qemu/net/tap.c:409:9
 1109 15:16:20.313398 #2 0x5639b88b2d1b in net_init_tap_one 
third_party/qemu/net/tap.c:681:19
 1109 15:16:20.313399 #3 0x5639b88b16a8 in net_init_tap 
third_party/qemu/net/tap.c:912:13
 1109 15:16:20.313400 #4 0x5639b8890175 in net_client_init1 
third_party/qemu/net/net.c:1110:9
 1109 15:16:20.313401 #5 0x5639b888f912 in net_client_init 
third_party/qemu/net/net.c:1208:15
 1109 15:16:20.313403 #6 0x5639b8894aa5 in net_param_nic 
third_party/qemu/net/net.c:1588:11
 1109 15:16:20.313404 #7 0x5639b900cd18 in qemu_opts_foreach 
third_party/qemu/util/qemu-option.c:1135:14
 1109 15:16:20.313405 #8 0x5639b889393c in net_init_clients 
third_party/qemu/net/net.c:1612:9
 1109 15:16:20.313407 #9 0x5639b717aaf3 in qemu_create_late_backends 
third_party/qemu/softmmu/vl.c:1962:5
 1109 15:16:20.313408 #10 0x5639b717aaf3 in qemu_init 
third_party/qemu/softmmu/vl.c:3694:5
 1109 15:16:20.313409 #11 0x5639b71083b8 in main 
third_party/qemu/softmmu/main.c:49:5
 1109 15:16:20.313410 #12 0x7f464de1d8d2 in __libc_start_main 
(/usr/grte/v5/lib64/libc.so.6+0x628d2)
 1109 15:16:20.313412 #13 0x5639b6bbd389 in _start 
/usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
 1109 15:16:20.313413
 1109 15:16:20.313417   Uninitialized value was stored to memory at
 1109 15:16:20.313791 #0 0x5639b88affbd in net_tap_fd_init 
third_party/qemu/net/tap.c:400:26
 1109 15:16:20.313826 #1 0x5639b88b2d1b in net_init_tap_one 
third_party/qemu/net/tap.c:681:19
 1109 15:16:20.313829 #2 0x5639b88b16a8 in net_init_tap 
third_party/qemu/net/tap.c:912:13
 1109 15:16:20.313831 #3 0x5639b8890175 in net_client_init1 
third_party/qemu/net/net.c:1110:9
 1109 15:16:20.313836 #4 0x5639b888f912 in net_client_init 
third_party/qemu/net/net.c:1208:15
 1109 15:16:20.313838 #5 0x5639b8894aa5 in net_param_nic 
third_party/qemu/net/net.c:1588:11
 1109 15:16:20.313839 #6 0x5639b900cd18 in qemu_opts_foreach 
third_party/qemu/util/qemu-option.c:1135:14
 1109 15:16:20.313841 #7 0x5639b889393c in net_init_clients 
third_party/qemu/net/net.c:1612:9
 1109 15:16:20.313843 #8 0x5639b717aaf3 in qemu_create_late_backends 
third_party/qemu/softmmu/vl.c:1962:5
 1109 15:16:20.313844 #9 0x5639b717aaf3 in qemu_init 
third_party/qemu/softmmu/vl.c:3694:5
 1109 15:16:20.313845 #10 0x5639b71083b8 in main 
third_party/qemu/softmmu/main.c:49:5
 1109 15:16:20.313846 #11 0x7f464de1d8d2 in __libc_start_main 
(/usr/grte/v5/lib64/libc.so.6+0x628d2)
 1109 15:16:20.313847 #12 0x5639b6bbd389 in _start 
/usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
 1109 15:16:20.313849
 1109 15:16:20.313851   Uninitialized value was created by an allocation of 
'ifr' in the stack frame of function 'tap_probe_vnet_hdr'
 1109 15:16:20.313855 #0 0x5639b88ac680 in tap_probe_vnet_hdr 
third_party/qemu/net/tap-linux.c:151
 1109 15:16:20.313856
 1109 15:16:20.313878 SUMMARY: MemorySanitizer: use-of-uninitialized-value 
third_party/qemu/net/tap-linux.c:183:9 in tap_probe_vnet_hdr_len

Fixes: dc69004c7d8 (

[PULL 0/8] Net patches

2022-02-13 Thread Jason Wang
The following changes since commit 48033ad678ae2def43bf0d543a2c4c3d2a93feaf:

  Merge remote-tracking branch 'remotes/vsementsov/tags/pull-nbd-2022-02-09-v2' 
into staging (2022-02-12 22:04:07 +)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 9d6267b240c114d1a3cd314a08fd6e1339d34b83:

  net/eth: Don't consider ESP to be an IPv6 option header (2022-02-14 11:50:44 
+0800)




Nick Hudson (1):
  hw/net: e1000e: Clear ICR on read when using non MSI-X interrupts

Peter Foley (2):
  net/tap: Set return code on failure
  net: Fix uninitialized data usage

Philippe Mathieu-Daudé (1):
  hw/net/vmxnet3: Log guest-triggerable errors using LOG_GUEST_ERROR

Rao Lei (1):
  net/filter: Optimize filter_send to coroutine

Thomas Jansen (1):
  net/eth: Don't consider ESP to be an IPv6 option header

Zhang Chen (2):
  net/colo-compare.c: Optimize compare order for performance
  net/colo-compare.c: Update the default value comments

 hw/net/e1000e_core.c |  5 
 hw/net/trace-events  |  1 +
 hw/net/vmxnet3.c |  4 +++-
 net/colo-compare.c   | 28 +++---
 net/eth.c|  1 -
 net/filter-mirror.c  | 66 +---
 net/tap-linux.c  |  1 +
 net/tap.c|  1 +
 8 files changed, 78 insertions(+), 29 deletions(-)




[PATCH 6/8] net/filter: Optimize filter_send to coroutine

2022-02-13 Thread Jason Wang
From: Rao Lei 

This patch is to improve the logic of QEMU main thread sleep code in
qemu_chr_write_buffer() where it can be blocked and can't run other
coroutines during COLO IO stress test.

Our approach is to put filter_send() in a coroutine. In this way,
filter_send() will call qemu_coroutine_yield() in qemu_co_sleep_ns(),
so that it can be scheduled out and QEMU main thread has opportunity to
run other tasks.

Signed-off-by: Lei Rao 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
Reviewed-by: Zhang Chen 
Signed-off-by: Jason Wang 
---
 net/filter-mirror.c | 66 -
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index f20240cc9f..34a63b5dbb 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -20,6 +20,7 @@
 #include "chardev/char-fe.h"
 #include "qemu/iov.h"
 #include "qemu/sockets.h"
+#include "block/aio-wait.h"
 
 #define TYPE_FILTER_MIRROR "filter-mirror"
 typedef struct MirrorState MirrorState;
@@ -42,20 +43,21 @@ struct MirrorState {
 bool vnet_hdr;
 };
 
-static int filter_send(MirrorState *s,
-   const struct iovec *iov,
-   int iovcnt)
+typedef struct FilterSendCo {
+MirrorState *s;
+char *buf;
+ssize_t size;
+bool done;
+int ret;
+} FilterSendCo;
+
+static int _filter_send(MirrorState *s,
+   char *buf,
+   ssize_t size)
 {
 NetFilterState *nf = NETFILTER(s);
 int ret = 0;
-ssize_t size = 0;
 uint32_t len = 0;
-char *buf;
-
-size = iov_size(iov, iovcnt);
-if (!size) {
-return 0;
-}
 
 len = htonl(size);
 ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
@@ -80,10 +82,7 @@ static int filter_send(MirrorState *s,
 }
 }
 
-buf = g_malloc(size);
-iov_to_buf(iov, iovcnt, 0, buf, size);
 ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
-g_free(buf);
 if (ret != size) {
 goto err;
 }
@@ -94,6 +93,47 @@ err:
 return ret < 0 ? ret : -EIO;
 }
 
+static void coroutine_fn filter_send_co(void *opaque)
+{
+FilterSendCo *data = opaque;
+
+data->ret = _filter_send(data->s, data->buf, data->size);
+data->done = true;
+g_free(data->buf);
+aio_wait_kick();
+}
+
+static int filter_send(MirrorState *s,
+   const struct iovec *iov,
+   int iovcnt)
+{
+ssize_t size = iov_size(iov, iovcnt);
+char *buf = NULL;
+
+if (!size) {
+return 0;
+}
+
+buf = g_malloc(size);
+iov_to_buf(iov, iovcnt, 0, buf, size);
+
+FilterSendCo data = {
+.s = s,
+.size = size,
+.buf = buf,
+.ret = 0,
+};
+
+Coroutine *co = qemu_coroutine_create(filter_send_co, &data);
+qemu_coroutine_enter(co);
+
+while (!data.done) {
+aio_poll(qemu_get_aio_context(), true);
+}
+
+return data.ret;
+}
+
 static void redirector_to_filter(NetFilterState *nf,
  const uint8_t *buf,
  int len)
-- 
2.32.0 (Apple Git-132)




[PATCH 8/8] net/eth: Don't consider ESP to be an IPv6 option header

2022-02-13 Thread Jason Wang
From: Thomas Jansen 

The IPv6 option headers all have in common that they start with some
common fields, in particular the type of the next header followed by the
extention header length. This is used to traverse the list of the
options. The ESP header does not follow that format, which can break the
IPv6 option header traversal code in eth_parse_ipv6_hdr().

The effect of that is that network interfaces such as vmxnet3 that use
the following call chain
  eth_is_ip6_extension_header_type
  eth_parse_ipv6_hdr
  net_tx_pkt_parse_headers
  net_tx_pkt_parse
  vmxnet3_process_tx_queue
to send packets from the VM out to the host will drop packets of the
following structure:
  Ethernet-Header(IPv6-Header(ESP(encrypted data)))

Note that not all types of network interfaces use the net_tx_pkt_parse
function though, leading to inconsistent behavior regarding sending
those packets. The e1000 network interface for example does not suffer
from this limitation.

By not considering ESP to be an IPv6 header we can allow sending those
packets out to the host on all types of network interfaces.

Fixes: 75020a702151 ("Common definitions for VMWARE devices")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/149
Buglink: https://bugs.launchpad.net/qemu/+bug/1758091
Signed-off-by: Thomas Jansen 
Signed-off-by: Jason Wang 
---
 net/eth.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/eth.c b/net/eth.c
index fe876d1a55..f074b2f9f3 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -389,7 +389,6 @@ eth_is_ip6_extension_header_type(uint8_t hdr_type)
 case IP6_HOP_BY_HOP:
 case IP6_ROUTING:
 case IP6_FRAGMENT:
-case IP6_ESP:
 case IP6_AUTHENTICATION:
 case IP6_DESTINATON:
 case IP6_MOBILITY:
-- 
2.32.0 (Apple Git-132)




[PATCH 4/8] net/colo-compare.c: Optimize compare order for performance

2022-02-13 Thread Jason Wang
From: Zhang Chen 

COLO-compare use the glib function g_queue_find_custom to dump
another VM's networking packet to compare. But this function always
start find from the queue->head(here is the newest packet), It will
reduce the success rate of comparison. So this patch reversed
the order of the queues for performance.

Signed-off-by: Zhang Chen 
Reported-by: leirao 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index b966e7e514..216de5a12b 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -197,7 +197,7 @@ static void colo_compare_inconsistency_notify(CompareState 
*s)
 /* Use restricted to colo_insert_packet() */
 static gint seq_sorter(Packet *a, Packet *b, gpointer data)
 {
-return a->tcp_seq - b->tcp_seq;
+return b->tcp_seq - a->tcp_seq;
 }
 
 static void fill_pkt_tcp_info(void *data, uint32_t *max_ack)
@@ -421,13 +421,13 @@ pri:
 if (g_queue_is_empty(&conn->primary_list)) {
 return;
 }
-ppkt = g_queue_pop_head(&conn->primary_list);
+ppkt = g_queue_pop_tail(&conn->primary_list);
 sec:
 if (g_queue_is_empty(&conn->secondary_list)) {
-g_queue_push_head(&conn->primary_list, ppkt);
+g_queue_push_tail(&conn->primary_list, ppkt);
 return;
 }
-spkt = g_queue_pop_head(&conn->secondary_list);
+spkt = g_queue_pop_tail(&conn->secondary_list);
 
 if (ppkt->tcp_seq == ppkt->seq_end) {
 colo_release_primary_pkt(s, ppkt);
@@ -458,7 +458,7 @@ sec:
 }
 }
 if (!ppkt) {
-g_queue_push_head(&conn->secondary_list, spkt);
+g_queue_push_tail(&conn->secondary_list, spkt);
 goto pri;
 }
 }
@@ -477,7 +477,7 @@ sec:
 if (mark == COLO_COMPARE_FREE_PRIMARY) {
 conn->compare_seq = ppkt->seq_end;
 colo_release_primary_pkt(s, ppkt);
-g_queue_push_head(&conn->secondary_list, spkt);
+g_queue_push_tail(&conn->secondary_list, spkt);
 goto pri;
 } else if (mark == COLO_COMPARE_FREE_SECONDARY) {
 conn->compare_seq = spkt->seq_end;
@@ -490,8 +490,8 @@ sec:
 goto pri;
 }
 } else {
-g_queue_push_head(&conn->primary_list, ppkt);
-g_queue_push_head(&conn->secondary_list, spkt);
+g_queue_push_tail(&conn->primary_list, ppkt);
+g_queue_push_tail(&conn->secondary_list, spkt);
 
 #ifdef DEBUG_COLO_PACKETS
 qemu_hexdump(stderr, "colo-compare ppkt", ppkt->data, ppkt->size);
@@ -673,7 +673,7 @@ static void colo_compare_packet(CompareState *s, Connection 
*conn,
 
 while (!g_queue_is_empty(&conn->primary_list) &&
!g_queue_is_empty(&conn->secondary_list)) {
-pkt = g_queue_pop_head(&conn->primary_list);
+pkt = g_queue_pop_tail(&conn->primary_list);
 result = g_queue_find_custom(&conn->secondary_list,
  pkt, (GCompareFunc)HandlePacket);
 
@@ -689,7 +689,7 @@ static void colo_compare_packet(CompareState *s, Connection 
*conn,
  * timeout, it will trigger a checkpoint request.
  */
 trace_colo_compare_main("packet different");
-g_queue_push_head(&conn->primary_list, pkt);
+g_queue_push_tail(&conn->primary_list, pkt);
 
 colo_compare_inconsistency_notify(s);
 break;
@@ -819,7 +819,7 @@ static int compare_chr_send(CompareState *s,
 entry->buf = g_malloc(size);
 memcpy(entry->buf, buf, size);
 }
-g_queue_push_head(&sendco->send_list, entry);
+g_queue_push_tail(&sendco->send_list, entry);
 
 if (sendco->done) {
 sendco->co = qemu_coroutine_create(_compare_chr_send, sendco);
@@ -1347,7 +1347,7 @@ static void colo_flush_packets(void *opaque, void 
*user_data)
 Packet *pkt = NULL;
 
 while (!g_queue_is_empty(&conn->primary_list)) {
-pkt = g_queue_pop_head(&conn->primary_list);
+pkt = g_queue_pop_tail(&conn->primary_list);
 compare_chr_send(s,
  pkt->data,
  pkt->size,
@@ -1357,7 +1357,7 @@ static void colo_flush_packets(void *opaque, void 
*user_data)
 packet_destroy_partial(pkt, NULL);
 }
 while (!g_queue_is_empty(&conn->secondary_list)) {
-pkt = g_queue_pop_head(&conn->secondary_list);
+pkt = g_queue_pop_tail(&conn->secondary_list);
 packet_destroy(pkt, NULL);
 }
 }
-- 
2.32.0 (Apple Git-132)




[PULL 8/8] net/eth: Don't consider ESP to be an IPv6 option header

2022-02-13 Thread Jason Wang
From: Thomas Jansen 

The IPv6 option headers all have in common that they start with some
common fields, in particular the type of the next header followed by the
extention header length. This is used to traverse the list of the
options. The ESP header does not follow that format, which can break the
IPv6 option header traversal code in eth_parse_ipv6_hdr().

The effect of that is that network interfaces such as vmxnet3 that use
the following call chain
  eth_is_ip6_extension_header_type
  eth_parse_ipv6_hdr
  net_tx_pkt_parse_headers
  net_tx_pkt_parse
  vmxnet3_process_tx_queue
to send packets from the VM out to the host will drop packets of the
following structure:
  Ethernet-Header(IPv6-Header(ESP(encrypted data)))

Note that not all types of network interfaces use the net_tx_pkt_parse
function though, leading to inconsistent behavior regarding sending
those packets. The e1000 network interface for example does not suffer
from this limitation.

By not considering ESP to be an IPv6 header we can allow sending those
packets out to the host on all types of network interfaces.

Fixes: 75020a702151 ("Common definitions for VMWARE devices")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/149
Buglink: https://bugs.launchpad.net/qemu/+bug/1758091
Signed-off-by: Thomas Jansen 
Signed-off-by: Jason Wang 
---
 net/eth.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/eth.c b/net/eth.c
index fe876d1..f074b2f 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -389,7 +389,6 @@ eth_is_ip6_extension_header_type(uint8_t hdr_type)
 case IP6_HOP_BY_HOP:
 case IP6_ROUTING:
 case IP6_FRAGMENT:
-case IP6_ESP:
 case IP6_AUTHENTICATION:
 case IP6_DESTINATON:
 case IP6_MOBILITY:
-- 
2.7.4




Re: [PATCH 1/8] hw/net/vmxnet3: Log guest-triggerable errors using LOG_GUEST_ERROR

2022-02-13 Thread Jason Wang
Hi:

Hit the wrong button, please ignore this series.

Thanks

On Mon, Feb 14, 2022 at 11:57 AM Jason Wang  wrote:
>
> From: Philippe Mathieu-Daudé 
>
> The "Interrupt Cause" register (VMXNET3_REG_ICR) is read-only.
> Write accesses are ignored. Log them with as LOG_GUEST_ERROR
> instead of aborting:
>
>   [R +0.239743] writeq 0xe0002031 0x46291a5a55460800
>   ERROR:hw/net/vmxnet3.c:1819:vmxnet3_io_bar1_write: code should not be 
> reached
>   Thread 1 "qemu-system-i38" received signal SIGABRT, Aborted.
>   (gdb) bt
>   #3  0x74c397d3 in __GI_abort () at abort.c:79
>   #4  0x76d3cd4c in g_assertion_message (domain=, 
> file=, line=, func=, 
> message=) at ../glib/gtestutils.c:3223
>   #5  0x76d9d45f in g_assertion_message_expr
>   (domain=0x0, file=0x59fc2e53 "hw/net/vmxnet3.c", line=1819, 
> func=0x59fc11e0 <__func__.vmxnet3_io_bar1_write> "vmxnet3_io_bar1_write", 
> expr=)
>   at ../glib/gtestutils.c:3249
>   #6  0x57e80a3a in vmxnet3_io_bar1_write (opaque=0x62814100, addr=56, 
> val=70, size=4) at hw/net/vmxnet3.c:1819
>   #7  0x58c2d894 in memory_region_write_accessor (mr=0x62816b90, addr=56, 
> value=0x7fff9450, size=4, shift=0, mask=4294967295, attrs=...) at 
> softmmu/memory.c:492
>   #8  0x58c2d1d2 in access_with_adjusted_size (addr=56, value=0x7fff9450, 
> size=1, access_size_min=4, access_size_max=4, access_fn=
>   0x58c2d290 , mr=0x62816b90, attrs=...) at 
> softmmu/memory.c:554
>   #9  0x58c2bae7 in memory_region_dispatch_write (mr=0x62816b90, addr=56, 
> data=70, op=MO_8, attrs=...) at softmmu/memory.c:1504
>   #10 0x58bfd034 in flatview_write_continue (fv=0x606000181700, 
> addr=0xe0002038, attrs=..., ptr=0x7fffb9e0, len=1, addr1=56, l=1, 
> mr=0x62816b90)
>   at softmmu/physmem.c:2782
>   #11 0x58beba00 in flatview_write (fv=0x606000181700, addr=0xe0002031, 
> attrs=..., buf=0x7fffb9e0, len=8) at softmmu/physmem.c:2822
>   #12 0x58beb589 in address_space_write (as=0x60815f20, addr=0xe0002031, 
> attrs=..., buf=0x7fffb9e0, len=8) at softmmu/physmem.c:2914
>
> Reported-by: Dike 
> Reported-by: Duhao <504224...@qq.com>
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2032932
> Signed-off-by: Philippe Mathieu-Daudé 
> Signed-off-by: Jason Wang 
> ---
>  hw/net/vmxnet3.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index f65af4e9ef..0b7acf7f89 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -1816,7 +1816,9 @@ vmxnet3_io_bar1_write(void *opaque,
>  case VMXNET3_REG_ICR:
>  VMW_CBPRN("Write BAR1 [VMXNET3_REG_ICR] = %" PRIx64 ", size %d",
>val, size);
> -g_assert_not_reached();
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: write to read-only register VMXNET3_REG_ICR\n",
> +  TYPE_VMXNET3);
>  break;
>
>  /* Event Cause Register */
> --
> 2.32.0 (Apple Git-132)
>




[PULL 1/8] hw/net/vmxnet3: Log guest-triggerable errors using LOG_GUEST_ERROR

2022-02-13 Thread Jason Wang
From: Philippe Mathieu-Daudé 

The "Interrupt Cause" register (VMXNET3_REG_ICR) is read-only.
Write accesses are ignored. Log them with as LOG_GUEST_ERROR
instead of aborting:

  [R +0.239743] writeq 0xe0002031 0x46291a5a55460800
  ERROR:hw/net/vmxnet3.c:1819:vmxnet3_io_bar1_write: code should not be reached
  Thread 1 "qemu-system-i38" received signal SIGABRT, Aborted.
  (gdb) bt
  #3  0x74c397d3 in __GI_abort () at abort.c:79
  #4  0x76d3cd4c in g_assertion_message (domain=, 
file=, line=, func=, 
message=) at ../glib/gtestutils.c:3223
  #5  0x76d9d45f in g_assertion_message_expr
  (domain=0x0, file=0x59fc2e53 "hw/net/vmxnet3.c", line=1819, 
func=0x59fc11e0 <__func__.vmxnet3_io_bar1_write> "vmxnet3_io_bar1_write", 
expr=)
  at ../glib/gtestutils.c:3249
  #6  0x57e80a3a in vmxnet3_io_bar1_write (opaque=0x62814100, addr=56, val=70, 
size=4) at hw/net/vmxnet3.c:1819
  #7  0x58c2d894 in memory_region_write_accessor (mr=0x62816b90, addr=56, 
value=0x7fff9450, size=4, shift=0, mask=4294967295, attrs=...) at 
softmmu/memory.c:492
  #8  0x58c2d1d2 in access_with_adjusted_size (addr=56, value=0x7fff9450, 
size=1, access_size_min=4, access_size_max=4, access_fn=
  0x58c2d290 , mr=0x62816b90, attrs=...) at 
softmmu/memory.c:554
  #9  0x58c2bae7 in memory_region_dispatch_write (mr=0x62816b90, addr=56, 
data=70, op=MO_8, attrs=...) at softmmu/memory.c:1504
  #10 0x58bfd034 in flatview_write_continue (fv=0x606000181700, 
addr=0xe0002038, attrs=..., ptr=0x7fffb9e0, len=1, addr1=56, l=1, mr=0x62816b90)
  at softmmu/physmem.c:2782
  #11 0x58beba00 in flatview_write (fv=0x606000181700, addr=0xe0002031, 
attrs=..., buf=0x7fffb9e0, len=8) at softmmu/physmem.c:2822
  #12 0x58beb589 in address_space_write (as=0x60815f20, addr=0xe0002031, 
attrs=..., buf=0x7fffb9e0, len=8) at softmmu/physmem.c:2914

Reported-by: Dike 
Reported-by: Duhao <504224...@qq.com>
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2032932
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Jason Wang 
---
 hw/net/vmxnet3.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index f65af4e..0b7acf7 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1816,7 +1816,9 @@ vmxnet3_io_bar1_write(void *opaque,
 case VMXNET3_REG_ICR:
 VMW_CBPRN("Write BAR1 [VMXNET3_REG_ICR] = %" PRIx64 ", size %d",
   val, size);
-g_assert_not_reached();
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: write to read-only register VMXNET3_REG_ICR\n",
+  TYPE_VMXNET3);
 break;
 
 /* Event Cause Register */
-- 
2.7.4




[PULL 4/8] net/colo-compare.c: Optimize compare order for performance

2022-02-13 Thread Jason Wang
From: Zhang Chen 

COLO-compare use the glib function g_queue_find_custom to dump
another VM's networking packet to compare. But this function always
start find from the queue->head(here is the newest packet), It will
reduce the success rate of comparison. So this patch reversed
the order of the queues for performance.

Signed-off-by: Zhang Chen 
Reported-by: leirao 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index b966e7e..216de5a 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -197,7 +197,7 @@ static void colo_compare_inconsistency_notify(CompareState 
*s)
 /* Use restricted to colo_insert_packet() */
 static gint seq_sorter(Packet *a, Packet *b, gpointer data)
 {
-return a->tcp_seq - b->tcp_seq;
+return b->tcp_seq - a->tcp_seq;
 }
 
 static void fill_pkt_tcp_info(void *data, uint32_t *max_ack)
@@ -421,13 +421,13 @@ pri:
 if (g_queue_is_empty(&conn->primary_list)) {
 return;
 }
-ppkt = g_queue_pop_head(&conn->primary_list);
+ppkt = g_queue_pop_tail(&conn->primary_list);
 sec:
 if (g_queue_is_empty(&conn->secondary_list)) {
-g_queue_push_head(&conn->primary_list, ppkt);
+g_queue_push_tail(&conn->primary_list, ppkt);
 return;
 }
-spkt = g_queue_pop_head(&conn->secondary_list);
+spkt = g_queue_pop_tail(&conn->secondary_list);
 
 if (ppkt->tcp_seq == ppkt->seq_end) {
 colo_release_primary_pkt(s, ppkt);
@@ -458,7 +458,7 @@ sec:
 }
 }
 if (!ppkt) {
-g_queue_push_head(&conn->secondary_list, spkt);
+g_queue_push_tail(&conn->secondary_list, spkt);
 goto pri;
 }
 }
@@ -477,7 +477,7 @@ sec:
 if (mark == COLO_COMPARE_FREE_PRIMARY) {
 conn->compare_seq = ppkt->seq_end;
 colo_release_primary_pkt(s, ppkt);
-g_queue_push_head(&conn->secondary_list, spkt);
+g_queue_push_tail(&conn->secondary_list, spkt);
 goto pri;
 } else if (mark == COLO_COMPARE_FREE_SECONDARY) {
 conn->compare_seq = spkt->seq_end;
@@ -490,8 +490,8 @@ sec:
 goto pri;
 }
 } else {
-g_queue_push_head(&conn->primary_list, ppkt);
-g_queue_push_head(&conn->secondary_list, spkt);
+g_queue_push_tail(&conn->primary_list, ppkt);
+g_queue_push_tail(&conn->secondary_list, spkt);
 
 #ifdef DEBUG_COLO_PACKETS
 qemu_hexdump(stderr, "colo-compare ppkt", ppkt->data, ppkt->size);
@@ -673,7 +673,7 @@ static void colo_compare_packet(CompareState *s, Connection 
*conn,
 
 while (!g_queue_is_empty(&conn->primary_list) &&
!g_queue_is_empty(&conn->secondary_list)) {
-pkt = g_queue_pop_head(&conn->primary_list);
+pkt = g_queue_pop_tail(&conn->primary_list);
 result = g_queue_find_custom(&conn->secondary_list,
  pkt, (GCompareFunc)HandlePacket);
 
@@ -689,7 +689,7 @@ static void colo_compare_packet(CompareState *s, Connection 
*conn,
  * timeout, it will trigger a checkpoint request.
  */
 trace_colo_compare_main("packet different");
-g_queue_push_head(&conn->primary_list, pkt);
+g_queue_push_tail(&conn->primary_list, pkt);
 
 colo_compare_inconsistency_notify(s);
 break;
@@ -819,7 +819,7 @@ static int compare_chr_send(CompareState *s,
 entry->buf = g_malloc(size);
 memcpy(entry->buf, buf, size);
 }
-g_queue_push_head(&sendco->send_list, entry);
+g_queue_push_tail(&sendco->send_list, entry);
 
 if (sendco->done) {
 sendco->co = qemu_coroutine_create(_compare_chr_send, sendco);
@@ -1347,7 +1347,7 @@ static void colo_flush_packets(void *opaque, void 
*user_data)
 Packet *pkt = NULL;
 
 while (!g_queue_is_empty(&conn->primary_list)) {
-pkt = g_queue_pop_head(&conn->primary_list);
+pkt = g_queue_pop_tail(&conn->primary_list);
 compare_chr_send(s,
  pkt->data,
  pkt->size,
@@ -1357,7 +1357,7 @@ static void colo_flush_packets(void *opaque, void 
*user_data)
 packet_destroy_partial(pkt, NULL);
 }
 while (!g_queue_is_empty(&conn->secondary_list)) {
-pkt = g_queue_pop_head(&conn->secondary_list);
+pkt = g_queue_pop_tail(&conn->secondary_list);
 packet_destroy(pkt, NULL);
 }
 }
-- 
2.7.4




[PULL 2/8] net/tap: Set return code on failure

2022-02-13 Thread Jason Wang
From: Peter Foley 

Match the other error handling in this function.

Fixes: e7b347d0bf6 ("net: detect errors from probing vnet hdr flag for TAP 
devices")

Reviewed-by: Patrick Venture 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Peter Foley 
Signed-off-by: Jason Wang 
---
 net/tap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tap.c b/net/tap.c
index f716be3..c5cbeaa 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -900,6 +900,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
 if (i == 0) {
 vnet_hdr = tap_probe_vnet_hdr(fd, errp);
 if (vnet_hdr < 0) {
+ret = -1;
 goto free_fail;
 }
 } else if (vnet_hdr != tap_probe_vnet_hdr(fd, NULL)) {
-- 
2.7.4




[PULL 6/8] net/filter: Optimize filter_send to coroutine

2022-02-13 Thread Jason Wang
From: Rao Lei 

This patch is to improve the logic of QEMU main thread sleep code in
qemu_chr_write_buffer() where it can be blocked and can't run other
coroutines during COLO IO stress test.

Our approach is to put filter_send() in a coroutine. In this way,
filter_send() will call qemu_coroutine_yield() in qemu_co_sleep_ns(),
so that it can be scheduled out and QEMU main thread has opportunity to
run other tasks.

Signed-off-by: Lei Rao 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
Reviewed-by: Zhang Chen 
Signed-off-by: Jason Wang 
---
 net/filter-mirror.c | 66 ++---
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index f20240c..34a63b5 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -20,6 +20,7 @@
 #include "chardev/char-fe.h"
 #include "qemu/iov.h"
 #include "qemu/sockets.h"
+#include "block/aio-wait.h"
 
 #define TYPE_FILTER_MIRROR "filter-mirror"
 typedef struct MirrorState MirrorState;
@@ -42,20 +43,21 @@ struct MirrorState {
 bool vnet_hdr;
 };
 
-static int filter_send(MirrorState *s,
-   const struct iovec *iov,
-   int iovcnt)
+typedef struct FilterSendCo {
+MirrorState *s;
+char *buf;
+ssize_t size;
+bool done;
+int ret;
+} FilterSendCo;
+
+static int _filter_send(MirrorState *s,
+   char *buf,
+   ssize_t size)
 {
 NetFilterState *nf = NETFILTER(s);
 int ret = 0;
-ssize_t size = 0;
 uint32_t len = 0;
-char *buf;
-
-size = iov_size(iov, iovcnt);
-if (!size) {
-return 0;
-}
 
 len = htonl(size);
 ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
@@ -80,10 +82,7 @@ static int filter_send(MirrorState *s,
 }
 }
 
-buf = g_malloc(size);
-iov_to_buf(iov, iovcnt, 0, buf, size);
 ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
-g_free(buf);
 if (ret != size) {
 goto err;
 }
@@ -94,6 +93,47 @@ err:
 return ret < 0 ? ret : -EIO;
 }
 
+static void coroutine_fn filter_send_co(void *opaque)
+{
+FilterSendCo *data = opaque;
+
+data->ret = _filter_send(data->s, data->buf, data->size);
+data->done = true;
+g_free(data->buf);
+aio_wait_kick();
+}
+
+static int filter_send(MirrorState *s,
+   const struct iovec *iov,
+   int iovcnt)
+{
+ssize_t size = iov_size(iov, iovcnt);
+char *buf = NULL;
+
+if (!size) {
+return 0;
+}
+
+buf = g_malloc(size);
+iov_to_buf(iov, iovcnt, 0, buf, size);
+
+FilterSendCo data = {
+.s = s,
+.size = size,
+.buf = buf,
+.ret = 0,
+};
+
+Coroutine *co = qemu_coroutine_create(filter_send_co, &data);
+qemu_coroutine_enter(co);
+
+while (!data.done) {
+aio_poll(qemu_get_aio_context(), true);
+}
+
+return data.ret;
+}
+
 static void redirector_to_filter(NetFilterState *nf,
  const uint8_t *buf,
  int len)
-- 
2.7.4




[PULL 7/8] hw/net: e1000e: Clear ICR on read when using non MSI-X interrupts

2022-02-13 Thread Jason Wang
From: Nick Hudson 

In section 7.4.3 of the 82574 datasheet it states that

"In systems that do not support MSI-X, reading the ICR
 register clears it's bits..."

Some OSes rely on this.

Signed-off-by: Nick Hudson 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 5 +
 hw/net/trace-events  | 1 +
 2 files changed, 6 insertions(+)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 8ae6fb7..2c51089 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2607,6 +2607,11 @@ e1000e_mac_icr_read(E1000ECore *core, int index)
 core->mac[ICR] = 0;
 }
 
+if (!msix_enabled(core->owner)) {
+trace_e1000e_irq_icr_clear_nonmsix_icr_read();
+core->mac[ICR] = 0;
+}
+
 if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
 (core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
 trace_e1000e_irq_icr_clear_iame();
diff --git a/hw/net/trace-events b/hw/net/trace-events
index 643338f..4c0ec3f 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -221,6 +221,7 @@ e1000e_irq_write_ics(uint32_t val) "Adding ICR bits 0x%x"
 e1000e_irq_icr_process_iame(void) "Clearing IMS bits due to IAME"
 e1000e_irq_read_ics(uint32_t ics) "Current ICS: 0x%x"
 e1000e_irq_read_ims(uint32_t ims) "Current IMS: 0x%x"
+e1000e_irq_icr_clear_nonmsix_icr_read(void) "Clearing ICR on read due to non 
MSI-X int"
 e1000e_irq_icr_read_entry(uint32_t icr) "Starting ICR read. Current ICR: 0x%x"
 e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current ICR: 0x%x"
 e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to zero IMS"
-- 
2.7.4




[PULL 3/8] net: Fix uninitialized data usage

2022-02-13 Thread Jason Wang
From: Peter Foley 

e.g.
1109 15:16:20.151506 Uninitialized bytes in ioctl_common_pre at offset 0 inside 
[0x7ffc516af9b8, 4)
 1109 15:16:20.151659 ==588974==WARNING: MemorySanitizer: 
use-of-uninitialized-value
 1109 15:16:20.312923 #0 0x5639b88acb21 in tap_probe_vnet_hdr_len 
third_party/qemu/net/tap-linux.c:183:9
 1109 15:16:20.312952 #1 0x5639b88afd66 in net_tap_fd_init 
third_party/qemu/net/tap.c:409:9
 1109 15:16:20.312954 #2 0x5639b88b2d1b in net_init_tap_one 
third_party/qemu/net/tap.c:681:19
 1109 15:16:20.312956 #3 0x5639b88b16a8 in net_init_tap 
third_party/qemu/net/tap.c:912:13
 1109 15:16:20.312957 #4 0x5639b8890175 in net_client_init1 
third_party/qemu/net/net.c:1110:9
 1109 15:16:20.312958 #5 0x5639b888f912 in net_client_init 
third_party/qemu/net/net.c:1208:15
 1109 15:16:20.312960 #6 0x5639b8894aa5 in net_param_nic 
third_party/qemu/net/net.c:1588:11
 1109 15:16:20.312961 #7 0x5639b900cd18 in qemu_opts_foreach 
third_party/qemu/util/qemu-option.c:1135:14
 1109 15:16:20.312962 #8 0x5639b889393c in net_init_clients 
third_party/qemu/net/net.c:1612:9
 1109 15:16:20.312964 #9 0x5639b717aaf3 in qemu_create_late_backends 
third_party/qemu/softmmu/vl.c:1962:5
 1109 15:16:20.312965 #10 0x5639b717aaf3 in qemu_init 
third_party/qemu/softmmu/vl.c:3694:5
 1109 15:16:20.312967 #11 0x5639b71083b8 in main 
third_party/qemu/softmmu/main.c:49:5
 1109 15:16:20.312968 #12 0x7f464de1d8d2 in __libc_start_main 
(/usr/grte/v5/lib64/libc.so.6+0x628d2)
 1109 15:16:20.312969 #13 0x5639b6bbd389 in _start 
/usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
 1109 15:16:20.312970
 1109 15:16:20.312975   Uninitialized value was stored to memory at
 1109 15:16:20.313393 #0 0x5639b88acbee in tap_probe_vnet_hdr_len 
third_party/qemu/net/tap-linux.c
 1109 15:16:20.313396 #1 0x5639b88afd66 in net_tap_fd_init 
third_party/qemu/net/tap.c:409:9
 1109 15:16:20.313398 #2 0x5639b88b2d1b in net_init_tap_one 
third_party/qemu/net/tap.c:681:19
 1109 15:16:20.313399 #3 0x5639b88b16a8 in net_init_tap 
third_party/qemu/net/tap.c:912:13
 1109 15:16:20.313400 #4 0x5639b8890175 in net_client_init1 
third_party/qemu/net/net.c:1110:9
 1109 15:16:20.313401 #5 0x5639b888f912 in net_client_init 
third_party/qemu/net/net.c:1208:15
 1109 15:16:20.313403 #6 0x5639b8894aa5 in net_param_nic 
third_party/qemu/net/net.c:1588:11
 1109 15:16:20.313404 #7 0x5639b900cd18 in qemu_opts_foreach 
third_party/qemu/util/qemu-option.c:1135:14
 1109 15:16:20.313405 #8 0x5639b889393c in net_init_clients 
third_party/qemu/net/net.c:1612:9
 1109 15:16:20.313407 #9 0x5639b717aaf3 in qemu_create_late_backends 
third_party/qemu/softmmu/vl.c:1962:5
 1109 15:16:20.313408 #10 0x5639b717aaf3 in qemu_init 
third_party/qemu/softmmu/vl.c:3694:5
 1109 15:16:20.313409 #11 0x5639b71083b8 in main 
third_party/qemu/softmmu/main.c:49:5
 1109 15:16:20.313410 #12 0x7f464de1d8d2 in __libc_start_main 
(/usr/grte/v5/lib64/libc.so.6+0x628d2)
 1109 15:16:20.313412 #13 0x5639b6bbd389 in _start 
/usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
 1109 15:16:20.313413
 1109 15:16:20.313417   Uninitialized value was stored to memory at
 1109 15:16:20.313791 #0 0x5639b88affbd in net_tap_fd_init 
third_party/qemu/net/tap.c:400:26
 1109 15:16:20.313826 #1 0x5639b88b2d1b in net_init_tap_one 
third_party/qemu/net/tap.c:681:19
 1109 15:16:20.313829 #2 0x5639b88b16a8 in net_init_tap 
third_party/qemu/net/tap.c:912:13
 1109 15:16:20.313831 #3 0x5639b8890175 in net_client_init1 
third_party/qemu/net/net.c:1110:9
 1109 15:16:20.313836 #4 0x5639b888f912 in net_client_init 
third_party/qemu/net/net.c:1208:15
 1109 15:16:20.313838 #5 0x5639b8894aa5 in net_param_nic 
third_party/qemu/net/net.c:1588:11
 1109 15:16:20.313839 #6 0x5639b900cd18 in qemu_opts_foreach 
third_party/qemu/util/qemu-option.c:1135:14
 1109 15:16:20.313841 #7 0x5639b889393c in net_init_clients 
third_party/qemu/net/net.c:1612:9
 1109 15:16:20.313843 #8 0x5639b717aaf3 in qemu_create_late_backends 
third_party/qemu/softmmu/vl.c:1962:5
 1109 15:16:20.313844 #9 0x5639b717aaf3 in qemu_init 
third_party/qemu/softmmu/vl.c:3694:5
 1109 15:16:20.313845 #10 0x5639b71083b8 in main 
third_party/qemu/softmmu/main.c:49:5
 1109 15:16:20.313846 #11 0x7f464de1d8d2 in __libc_start_main 
(/usr/grte/v5/lib64/libc.so.6+0x628d2)
 1109 15:16:20.313847 #12 0x5639b6bbd389 in _start 
/usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
 1109 15:16:20.313849
 1109 15:16:20.313851   Uninitialized value was created by an allocation of 
'ifr' in the stack frame of function 'tap_probe_vnet_hdr'
 1109 15:16:20.313855 #0 0x5639b88ac680 in tap_probe_vnet_hdr 
third_party/qemu/net/tap-linux.c:151
 1109 15:16:20.313856
 1109 15:16:20.313878 SUMMARY: MemorySanitizer: use-of-uninitialized-value 
third_party/qemu/net/tap-linux.c:183:9 in tap_probe_vnet_hdr_len

Fixes: dc69004c7d8 (

[PULL 5/8] net/colo-compare.c: Update the default value comments

2022-02-13 Thread Jason Wang
From: Zhang Chen 

Make the comments consistent with the REGULAR_PACKET_CHECK_MS.

Signed-off-by: Zhang Chen 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 216de5a..62554b5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1267,7 +1267,7 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
 }
 
 if (!s->expired_scan_cycle) {
-/* Set default value to 3000 MS */
+/* Set default value to 1000 MS */
 s->expired_scan_cycle = REGULAR_PACKET_CHECK_MS;
 }
 
-- 
2.7.4




[PATCH] intel_iommu: support snoop control

2022-02-13 Thread Jason Wang
SC is required for some kernel features like vhost-vDPA. So this patch
implements basic SC feature. The idea is pretty simple, for software
emulated DMA it would be always coherent. In this case we can simple
advertise ECAP_SC bit. For VFIO and vhost, thing will be more much
complicated, so this patch simply fail the IOMMU notifier
registration.

In the future, we may want to have a dedicated notifiers flag or
similar mechanism to demonstrate the coherency so VFIO could advertise
that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't
need that since it's a software backend.

Signed-off-by: Jason Wang 
---
 hw/i386/intel_iommu.c  | 14 +-
 hw/i386/intel_iommu_internal.h |  1 +
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5b865ac08c..5fa8e361b8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3022,6 +3022,13 @@ static int 
vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
 IntelIOMMUState *s = vtd_as->iommu_state;
 
+/* TODO: add support for VFIO and vhost users */
+if (s->snoop_control) {
+error_setg_errno(errp, -ENOTSUP,
+ "Snoop Control with vhost or VFIO is not supported");
+return -ENOTSUP;
+}
+
 /* Update per-address-space notifier flags */
 vtd_as->notifier_flags = new;
 
@@ -3105,6 +3112,7 @@ static Property vtd_properties[] = {
   VTD_HOST_ADDRESS_WIDTH),
 DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
 DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
+DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control, false),
 DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -3635,7 +3643,7 @@ static void vtd_init(IntelIOMMUState *s)
 vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
  
x86_iommu->dt_supported);
 
-if (s->scalable_mode) {
+if (s->scalable_mode || s->snoop_control) {
 vtd_spte_rsvd[1] &= ~VTD_SPTE_SNP;
 vtd_spte_rsvd_large[2] &= ~VTD_SPTE_SNP;
 vtd_spte_rsvd_large[3] &= ~VTD_SPTE_SNP;
@@ -3666,6 +3674,10 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
 }
 
+if (s->snoop_control) {
+s->ecap |= VTD_ECAP_SC;
+}
+
 vtd_reset_caches(s);
 
 /* Define registers with default values and bit semantics */
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index a6c788049b..1ff13b40f9 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -188,6 +188,7 @@
 #define VTD_ECAP_IR (1ULL << 3)
 #define VTD_ECAP_EIM(1ULL << 4)
 #define VTD_ECAP_PT (1ULL << 6)
+#define VTD_ECAP_SC (1ULL << 7)
 #define VTD_ECAP_MHMV   (15ULL << 20)
 #define VTD_ECAP_SRS(1ULL << 31)
 #define VTD_ECAP_SMTS   (1ULL << 43)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 41783ee46d..3b5ac869db 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -228,6 +228,7 @@ struct IntelIOMMUState {
 
 bool caching_mode;  /* RO - is cap CM enabled? */
 bool scalable_mode; /* RO - is Scalable Mode supported? */
+bool snoop_control; /* RO - is SNP filed supported? */
 
 dma_addr_t root;/* Current root table pointer */
 bool root_scalable; /* Type of root table (scalable or not) */
-- 
2.25.1




Re: [PATCH] intel_iommu: support snoop control

2022-02-13 Thread Peter Xu
On Mon, Feb 14, 2022 at 02:03:46PM +0800, Jason Wang wrote:
> SC is required for some kernel features like vhost-vDPA. So this patch
> implements basic SC feature. The idea is pretty simple, for software
> emulated DMA it would be always coherent. In this case we can simple
> advertise ECAP_SC bit. For VFIO and vhost, thing will be more much
> complicated, so this patch simply fail the IOMMU notifier
> registration.

Could we spell out which vhost branch won't work?  How about also mention what
this patch is used for (perhaps for some pure vdpa tests on fully emulated)?

> 
> In the future, we may want to have a dedicated notifiers flag or
> similar mechanism to demonstrate the coherency so VFIO could advertise
> that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't
> need that since it's a software backend.
> 
> Signed-off-by: Jason Wang 
> ---
>  hw/i386/intel_iommu.c  | 14 +-
>  hw/i386/intel_iommu_internal.h |  1 +
>  include/hw/i386/intel_iommu.h  |  1 +
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 5b865ac08c..5fa8e361b8 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3022,6 +3022,13 @@ static int 
> vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>  IntelIOMMUState *s = vtd_as->iommu_state;
>  
> +/* TODO: add support for VFIO and vhost users */
> +if (s->snoop_control) {
> +error_setg_errno(errp, -ENOTSUP,
> + "Snoop Control with vhost or VFIO is not 
> supported");
> +return -ENOTSUP;
> +}

IIUC this will also fail things like e.g. vhost-kernel but afaiu that can be
fully emulated too.  That's expected, am I right?

Thanks,

> +
>  /* Update per-address-space notifier flags */
>  vtd_as->notifier_flags = new;

-- 
Peter Xu




Re: [PATCH] intel_iommu: support snoop control

2022-02-13 Thread Jason Wang
On Mon, Feb 14, 2022 at 2:31 PM Peter Xu  wrote:
>
> On Mon, Feb 14, 2022 at 02:03:46PM +0800, Jason Wang wrote:
> > SC is required for some kernel features like vhost-vDPA. So this patch
> > implements basic SC feature. The idea is pretty simple, for software
> > emulated DMA it would be always coherent. In this case we can simple
> > advertise ECAP_SC bit. For VFIO and vhost, thing will be more much
> > complicated, so this patch simply fail the IOMMU notifier
> > registration.
>
> Could we spell out which vhost branch won't work?

For vhost, it should work but the problem is that we need to introduce
more logics to demonstrate the notifier ability (e.g a dedicated
notifier flag for cc).

> How about also mention what
> this patch is used for (perhaps for some pure vdpa tests on fully emulated)?

That's fine, the main use case so far is to test vDPA in L1 guest.

>
> >
> > In the future, we may want to have a dedicated notifiers flag or
> > similar mechanism to demonstrate the coherency so VFIO could advertise
> > that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't
> > need that since it's a software backend.
> >
> > Signed-off-by: Jason Wang 
> > ---
> >  hw/i386/intel_iommu.c  | 14 +-
> >  hw/i386/intel_iommu_internal.h |  1 +
> >  include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 5b865ac08c..5fa8e361b8 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -3022,6 +3022,13 @@ static int 
> > vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
> >  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> >  IntelIOMMUState *s = vtd_as->iommu_state;
> >
> > +/* TODO: add support for VFIO and vhost users */
> > +if (s->snoop_control) {
> > +error_setg_errno(errp, -ENOTSUP,
> > + "Snoop Control with vhost or VFIO is not 
> > supported");
> > +return -ENOTSUP;
> > +}
>
> IIUC this will also fail things like e.g. vhost-kernel but afaiu that can be
> fully emulated too.  That's expected, am I right?

Yes, I try to make the patch as simple as possible, so VFIO or any
kind of vhost won't work.

Thanks

>
> Thanks,
>
> > +
> >  /* Update per-address-space notifier flags */
> >  vtd_as->notifier_flags = new;
>
> --
> Peter Xu
>




Re: [PATCH] intel_iommu: support snoop control

2022-02-13 Thread Peter Xu
On Mon, Feb 14, 2022 at 02:35:18PM +0800, Jason Wang wrote:
> On Mon, Feb 14, 2022 at 2:31 PM Peter Xu  wrote:
> >
> > On Mon, Feb 14, 2022 at 02:03:46PM +0800, Jason Wang wrote:
> > > SC is required for some kernel features like vhost-vDPA. So this patch
> > > implements basic SC feature. The idea is pretty simple, for software
> > > emulated DMA it would be always coherent. In this case we can simple
> > > advertise ECAP_SC bit. For VFIO and vhost, thing will be more much
> > > complicated, so this patch simply fail the IOMMU notifier
> > > registration.
> >
> > Could we spell out which vhost branch won't work?
> 
> For vhost, it should work but the problem is that we need to introduce
> more logics to demonstrate the notifier ability (e.g a dedicated
> notifier flag for cc).
> 
> > How about also mention what
> > this patch is used for (perhaps for some pure vdpa tests on fully emulated)?
> 
> That's fine, the main use case so far is to test vDPA in L1 guest.

Yeah, that looks okay.  Leave it be or add some commit message would work too,
either way:

Reviewed-by: Peter Xu 

Thanks,

-- 
Peter Xu




Re: [PATCH] intel_iommu: support snoop control

2022-02-13 Thread Jason Wang
On Mon, Feb 14, 2022 at 2:35 PM Jason Wang  wrote:
>
> On Mon, Feb 14, 2022 at 2:31 PM Peter Xu  wrote:
> >
> > On Mon, Feb 14, 2022 at 02:03:46PM +0800, Jason Wang wrote:
> > > SC is required for some kernel features like vhost-vDPA. So this patch
> > > implements basic SC feature. The idea is pretty simple, for software
> > > emulated DMA it would be always coherent. In this case we can simple
> > > advertise ECAP_SC bit. For VFIO and vhost, thing will be more much
> > > complicated, so this patch simply fail the IOMMU notifier
> > > registration.
> >
> > Could we spell out which vhost branch won't work?
>
> For vhost, it should work but the problem is that we need to introduce
> more logics to demonstrate the notifier ability (e.g a dedicated
> notifier flag for cc).

Or we can do tricks like checking IOMMU_NOTIFIER_DEVIOTLB_UNMAP and
assume it's vhost.

But I'm not sure it's worth it.

Thanks

>
> > How about also mention what
> > this patch is used for (perhaps for some pure vdpa tests on fully emulated)?
>
> That's fine, the main use case so far is to test vDPA in L1 guest.
>
> >
> > >
> > > In the future, we may want to have a dedicated notifiers flag or
> > > similar mechanism to demonstrate the coherency so VFIO could advertise
> > > that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't
> > > need that since it's a software backend.
> > >
> > > Signed-off-by: Jason Wang 
> > > ---
> > >  hw/i386/intel_iommu.c  | 14 +-
> > >  hw/i386/intel_iommu_internal.h |  1 +
> > >  include/hw/i386/intel_iommu.h  |  1 +
> > >  3 files changed, 15 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index 5b865ac08c..5fa8e361b8 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -3022,6 +3022,13 @@ static int 
> > > vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
> > >  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, 
> > > iommu);
> > >  IntelIOMMUState *s = vtd_as->iommu_state;
> > >
> > > +/* TODO: add support for VFIO and vhost users */
> > > +if (s->snoop_control) {
> > > +error_setg_errno(errp, -ENOTSUP,
> > > + "Snoop Control with vhost or VFIO is not 
> > > supported");
> > > +return -ENOTSUP;
> > > +}
> >
> > IIUC this will also fail things like e.g. vhost-kernel but afaiu that can be
> > fully emulated too.  That's expected, am I right?
>
> Yes, I try to make the patch as simple as possible, so VFIO or any
> kind of vhost won't work.
>
> Thanks
>
> >
> > Thanks,
> >
> > > +
> > >  /* Update per-address-space notifier flags */
> > >  vtd_as->notifier_flags = new;
> >
> > --
> > Peter Xu
> >




Re: [PATCH] intel_iommu: support snoop control

2022-02-13 Thread Peter Xu
On Mon, Feb 14, 2022 at 02:40:20PM +0800, Jason Wang wrote:
> Or we can do tricks like checking IOMMU_NOTIFIER_DEVIOTLB_UNMAP and
> assume it's vhost.
> 
> But I'm not sure it's worth it.

If not any use, then probably not at all. :-)

-- 
Peter Xu




Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-13 Thread Jason Wang
On Fri, Jan 28, 2022 at 11:47 PM Stefan Hajnoczi  wrote:
>
> Dear QEMU, KVM, and rust-vmm communities,
> QEMU will apply for Google Summer of Code 2022
> (https://summerofcode.withgoogle.com/) and has been accepted into
> Outreachy May-August 2022 (https://www.outreachy.org/). You can now
> submit internship project ideas for QEMU, KVM, and rust-vmm!
>
> If you have experience contributing to QEMU, KVM, or rust-vmm you can
> be a mentor. It's a great way to give back and you get to work with
> people who are just starting out in open source.
>
> Please reply to this email by February 21st with your project ideas.
>
> Good project ideas are suitable for remote work by a competent
> programmer who is not yet familiar with the codebase. In
> addition, they are:
> - Well-defined - the scope is clear
> - Self-contained - there are few dependencies
> - Uncontroversial - they are acceptable to the community
> - Incremental - they produce deliverables along the way
>
> Feel free to post ideas even if you are unable to mentor the project.
> It doesn't hurt to share the idea!

Implementing the VIRTIO_F_IN_ORDER feature for both Qemu and kernel
(vhost/virtio drivers) would be an interesting idea.

It satisfies all the points above since it's supported by virtio spec.

(Unfortunately, I won't have time in the mentoring)

Thanks

>
> I will review project ideas and keep you up-to-date on QEMU's
> acceptance into GSoC.
>
> Internship program details:
> - Paid, remote work open source internships
> - GSoC projects are 175 or 350 hours, Outreachy projects are 30
> hrs/week for 12 weeks
> - Mentored by volunteers from QEMU, KVM, and rust-vmm
> - Mentors typically spend at least 5 hours per week during the coding period
>
> Changes since last year: GSoC now has 175 or 350 hour project sizes
> instead of 12 week full-time projects. GSoC will accept applicants who
> are not students, before it was limited to students.
>
> For more background on QEMU internships, check out this video:
> https://www.youtube.com/watch?v=xNVCX7YMUL8
>
> Please let me know if you have any questions!
>
> Stefan
>




Re: [PATCH] intel_iommu: support snoop control

2022-02-13 Thread Jason Wang
On Mon, Feb 14, 2022 at 3:05 PM Peter Xu  wrote:
>
> On Mon, Feb 14, 2022 at 02:40:20PM +0800, Jason Wang wrote:
> > Or we can do tricks like checking IOMMU_NOTIFIER_DEVIOTLB_UNMAP and
> > assume it's vhost.
> >
> > But I'm not sure it's worth it.
>
> If not any use, then probably not at all. :-)
>

Right. Let's leave it until there's a real user.

Thanks

> --
> Peter Xu
>




Re: [PATCH v4 4/4] hw/i386/sgx: Attach SGX-EPC objects to machine

2022-02-13 Thread Yang Zhong
On Mon, Feb 07, 2022 at 09:37:52AM +0100, Igor Mammedov wrote:
> On Sat,  5 Feb 2022 13:45:26 +0100
> Philippe Mathieu-Daudé  wrote:
> 
> > Previously SGX-EPC objects were exposed in the QOM tree at a path
> > 
> >   /machine/unattached/device[nn]
> > 
> > where the 'nn' varies depending on what devices were already created.
> > 
> > With this change the SGX-EPC objects are now at
> > 
> >   /machine/sgx-epc[nn]
> > 
> > where the 'nn' of the first SGX-EPC object is always zero.
> 
> yet again, why it's necessary?


  Igor, Sorry for delay feedback because of Chinese New Year holiday.

  This series patches are to fix below issues I reported before,
  https://lists.nongnu.org/archive/html/qemu-devel/2021-11/msg05670.html

  Since the /machine/unattached/device[0] is used by vcpu and Libvirt
  use this interface to get unavailable-features list. But in the SGX
  VM, the device[0] will be occupied by virtual sgx epc device, Libvirt
  can't get unavailable-features from this device[0].

  Although patch 2 in this series already fixed "unavailable-features" issue,
  this patch can move sgx virtual device from /machine/unattached/device[nn]
  to /machine/sgx-epc[nn], which seems more clear. Thanks!

  Yang
  

> 
> > 
> > Reported-by: Yang Zhong 
> > Suggested-by: Paolo Bonzini 
> > Reviewed-by: Daniel P. Berrangé 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  hw/i386/sgx.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> > index a2b318dd938..3ab2217ca43 100644
> > --- a/hw/i386/sgx.c
> > +++ b/hw/i386/sgx.c
> > @@ -304,6 +304,8 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
> >  for (list = x86ms->sgx_epc_list; list; list = list->next) {
> >  obj = object_new("sgx-epc");
> >  
> > +object_property_add_child(OBJECT(pcms), "sgx-epc[*]", OBJECT(obj));
> > +
> >  /* set the memdev link with memory backend */
> >  object_property_parse(obj, SGX_EPC_MEMDEV_PROP, 
> > list->value->memdev,
> >&error_fatal);



Re: [RFC] vhost-vdpa: make notifiers _init()/_uninit() symmetric

2022-02-13 Thread Laurent Vivier

On 14/02/2022 04:20, Jason Wang wrote:

On Sat, Feb 12, 2022 at 12:13 AM Laurent Vivier  wrote:


vhost_vdpa_host_notifiers_init() initializes queue notifiers
for queues "dev->vq_index" to queue "dev->vq_index + dev->nvqs",
whereas vhost_vdpa_host_notifiers_uninit() uninitializes the
same notifiers for queue "0" to queue "dev->nvqs".

This asymmetry seems buggy, fix that by using dev->vq_index
as the base for both.

Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if possible")
Cc: jasow...@redhat.com
Signed-off-by: Laurent Vivier 
---
  hw/virtio/vhost-vdpa.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 04ea43704f5d..9be3dc66580c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -395,15 +395,6 @@ static void vhost_vdpa_host_notifier_uninit(struct 
vhost_dev *dev,
  }
  }

-static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
-{
-int i;
-
-for (i = 0; i < n; i++) {
-vhost_vdpa_host_notifier_uninit(dev, i);
-}
-}
-
  static int vhost_vdpa_host_notifier_init(struct vhost_dev *dev, int 
queue_index)
  {
  size_t page_size = qemu_real_host_page_size;
@@ -442,6 +433,15 @@ err:
  return -1;
  }

+static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
+{
+int i;
+
+for (i = dev->vq_index; i < dev->vq_index + n; i++) {
+vhost_vdpa_host_notifier_uninit(dev, i);
+}
+}


Patch looks good but I wonder why we need to move this function?


I moved the _uninit function close to the _init one to be able to compare them 
easier.
I think it will help reviewers in the future if code is side-by-side.

But we can let it at its original place.

Thanks,
Laurent




Re: [PATCH] hw/timer: fix a9gtimer vmstate

2022-02-13 Thread Pavel Dovgalyuk

ping

On 07.02.2022 11:44, Pavel Dovgalyuk wrote:

A9 gtimer includes global control field and number of per-cpu fields.
But only per-cpu ones are migrated. This patch adds a subsection for
global control field migration.

Signed-off-by: Pavel Dovgalyuk 
---
  hw/timer/a9gtimer.c |   21 +
  1 file changed, 21 insertions(+)

diff --git a/hw/timer/a9gtimer.c b/hw/timer/a9gtimer.c
index 7233068a37..5e959b6d09 100644
--- a/hw/timer/a9gtimer.c
+++ b/hw/timer/a9gtimer.c
@@ -318,6 +318,12 @@ static void a9_gtimer_realize(DeviceState *dev, Error 
**errp)
  }
  }
  
+static bool vmstate_a9_gtimer_control_needed(void *opaque)

+{
+A9GTimerState *s = opaque;
+return s->control != 0;
+}
+
  static const VMStateDescription vmstate_a9_gtimer_per_cpu = {
  .name = "arm.cortex-a9-global-timer.percpu",
  .version_id = 1,
@@ -331,6 +337,17 @@ static const VMStateDescription vmstate_a9_gtimer_per_cpu 
= {
  }
  };
  
+static const VMStateDescription vmstate_a9_gtimer_control = {

+.name = "arm.cortex-a9-global-timer.control",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = vmstate_a9_gtimer_control_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(control, A9GTimerState),
+VMSTATE_END_OF_LIST()
+}
+};
+
  static const VMStateDescription vmstate_a9_gtimer = {
  .name = "arm.cortex-a9-global-timer",
  .version_id = 1,
@@ -344,6 +361,10 @@ static const VMStateDescription vmstate_a9_gtimer = {
   1, vmstate_a9_gtimer_per_cpu,
   A9GTimerPerCPU),
  VMSTATE_END_OF_LIST()
+},
+.subsections = (const VMStateDescription*[]) {
+&vmstate_a9_gtimer_control,
+NULL
  }
  };
  






Re: [PATCH v4 4/4] hw/i386/sgx: Attach SGX-EPC objects to machine

2022-02-13 Thread Yang Zhong
On Mon, Feb 07, 2022 at 09:37:52AM +0100, Igor Mammedov wrote:
> On Sat,  5 Feb 2022 13:45:26 +0100
> Philippe Mathieu-Daudé  wrote:
> 
> > Previously SGX-EPC objects were exposed in the QOM tree at a path
> > 
> >   /machine/unattached/device[nn]
> > 
> > where the 'nn' varies depending on what devices were already created.
> > 
> > With this change the SGX-EPC objects are now at
> > 
> >   /machine/sgx-epc[nn]
> > 
> > where the 'nn' of the first SGX-EPC object is always zero.
> 
> yet again, why it's necessary?
> 
> > 
> > Reported-by: Yang Zhong 
> > Suggested-by: Paolo Bonzini 
> > Reviewed-by: Daniel P. Berrangé 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  hw/i386/sgx.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> > index a2b318dd938..3ab2217ca43 100644
> > --- a/hw/i386/sgx.c
> > +++ b/hw/i386/sgx.c
> > @@ -304,6 +304,8 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
> >  for (list = x86ms->sgx_epc_list; list; list = list->next) {
> >  obj = object_new("sgx-epc");
> >  
> > +object_property_add_child(OBJECT(pcms), "sgx-epc[*]", OBJECT(obj));
> > +
> >  /* set the memdev link with memory backend */
> >  object_property_parse(obj, SGX_EPC_MEMDEV_PROP, 
> > list->value->memdev,
> >&error_fatal);

   Philippe, I verified this patch, which work well. Thanks a lot!

   (qemu) qom-list /machine
   ..
   sgx-epc[2] (child)
   ..
   sgx-epc[0] (child)
   acpi-device (link)
   sgx-epc[1] (child)
   ..

   Yang
  



Re: [RFC] vhost-vdpa: make notifiers _init()/_uninit() symmetric

2022-02-13 Thread Jason Wang
On Mon, Feb 14, 2022 at 3:33 PM Laurent Vivier  wrote:
>
> On 14/02/2022 04:20, Jason Wang wrote:
> > On Sat, Feb 12, 2022 at 12:13 AM Laurent Vivier  wrote:
> >>
> >> vhost_vdpa_host_notifiers_init() initializes queue notifiers
> >> for queues "dev->vq_index" to queue "dev->vq_index + dev->nvqs",
> >> whereas vhost_vdpa_host_notifiers_uninit() uninitializes the
> >> same notifiers for queue "0" to queue "dev->nvqs".
> >>
> >> This asymmetry seems buggy, fix that by using dev->vq_index
> >> as the base for both.
> >>
> >> Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if 
> >> possible")
> >> Cc: jasow...@redhat.com
> >> Signed-off-by: Laurent Vivier 
> >> ---
> >>   hw/virtio/vhost-vdpa.c | 20 ++--
> >>   1 file changed, 10 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >> index 04ea43704f5d..9be3dc66580c 100644
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -395,15 +395,6 @@ static void vhost_vdpa_host_notifier_uninit(struct 
> >> vhost_dev *dev,
> >>   }
> >>   }
> >>
> >> -static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
> >> -{
> >> -int i;
> >> -
> >> -for (i = 0; i < n; i++) {
> >> -vhost_vdpa_host_notifier_uninit(dev, i);
> >> -}
> >> -}
> >> -
> >>   static int vhost_vdpa_host_notifier_init(struct vhost_dev *dev, int 
> >> queue_index)
> >>   {
> >>   size_t page_size = qemu_real_host_page_size;
> >> @@ -442,6 +433,15 @@ err:
> >>   return -1;
> >>   }
> >>
> >> +static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
> >> +{
> >> +int i;
> >> +
> >> +for (i = dev->vq_index; i < dev->vq_index + n; i++) {
> >> +vhost_vdpa_host_notifier_uninit(dev, i);
> >> +}
> >> +}
> >
> > Patch looks good but I wonder why we need to move this function?
>
> I moved the _uninit function close to the _init one to be able to compare 
> them easier.
> I think it will help reviewers in the future if code is side-by-side.

Fine.

So

Acked-by: Jason Wang 

Thanks

>
> But we can let it at its original place.
>
> Thanks,
> Laurent
>




Re: [RFC PATCH] spapr: Add SPAPR_CAP_AIL_MODES for supported AIL modes for H_SET_MODE hcall

2022-02-13 Thread Nicholas Piggin
Excerpts from David Gibson's message of February 7, 2022 11:41 am:
> On Sat, Jan 29, 2022 at 04:50:07PM +1000, Nicholas Piggin wrote:
>> The behaviour of the Address Translation Mode on Interrupt resource is
>> not consistently supported by all CPU versions or all KVM versions.  In
>> particular KVM HV only supports mode 0 on POWER7 processors, and does
>> not support mode 2 on any processors. KVM PR only supports mode 0. TCG
>> can support all modes (0,2,3).
>> 
>> This leads to inconsistencies in guest behaviour and could cause
>> problems migrating guests.
>> 
>> This was not too noticable for Linux guests for a long time because the
>> kernel only used mode 0 or 3, and it used to consider AIL to be somewhat
>> advisory (KVM would not always honor it either) and it kept both sets of
>> interrupt vectors around.
>> 
>> Recent Linux guests depend on the AIL mode working as defined by the ISA
>> to support the SCV facility interrupt. If AIL mode 3 can not be provided,
>> then Linux must be given an error so it can disable the SCV facility.
>> 
>> Add the ail-modes capability which is a bitmap of the supported values
>> for the H_SET_MODE Address Translation Mode on Interrupt resource. Add
>> a new KVM CAP that exports the same thing, and provide defaults for PR
>> and HV KVM that predate the cap.
>> ---
>> 
>> I just wanted to get some feedback on the approach before submitting a
>> patch for the KVM cap.
>> 
>> The reason I don't make that a boolean cap for AIL=3 is that future
>> processors might implement new modes a guest would like to use even
>> though it's not the nicest interface.
> 
> [snip]
>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>  [SPAPR_CAP_HTM] = {
>>  .name = "htm",
>> @@ -730,6 +802,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>  .type = "bool",
>>  .apply = cap_rpt_invalidate_apply,
>>  },
>> +[SPAPR_CAP_AIL_MODES] = {
>> +.name = "ail-modes",
>> +.description = "Bitmap of AIL (alternate interrupt location) mode 
>> support",
> 
> A bitmap doesn't quite work as an spapr cap.  The general caps code
> assumes that bigger is always better, or more precisely that migrating
> from an instance that has a lower value to one which has a higher
> value is "good enough" to be compatible.  That's obviously not the
> case for a bitmap.

Yeah, it was clearly a nasty wart :P

> I think to handle this properly within the limitations of papr caps,
> you instead want a separate boolean cap for each supported AIL mode
> (or at least for each AIL mode you want to have control over).

Oh that's a good idea. We could just do ail-mode-3 for now.

2 is supported by previous processors, but is now deprecated. I don't 
think AIX ever used it although it (or something else) may have. Even
KVM never really implemented it correctly. Although I think TCG does.

So in theory we could be causing a regression if we leave out mode 2,
although it should be easy to re-add (we can leave the support in TCG
for a while and it's not much work anyway).

I'll try with just 3 as the optional cap. Should make it a lot cleaner.

Thanks,
Nick