date:20190514

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Cornelia Huck

On Tue, 14 May 2019 02:12:35 -0400
Yan Zhao  wrote:

> On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:

> > In case of libvirt checking the compatibility, it won't matter how good the
> > error message in the kernel log is and regardless of how many error states 
> > you
> > want to handle, libvirt's only limited to errno here, since we're going to 
> > do
> > plain read/write, so our internal error message returned to the user is only
> > going to contain what the errno says - okay, of course we can (and we DO)
> > provide libvirt specific string, further specifying the error but like I
> > mentioned, depending on how many error cases we want to distinguish this 
> > may be
> > hard for anyone to figure out solely on the error code, as apps will most
> > probably not parse the
> > logs.
> > 
> > Regards,
> > Erik  
> hi Erik
> do you mean you are agreeing on defining common errors and only returning 
> errno?
> 
> e.g.
> #define ENOMIGRATION 140  /* device not supporting migration */
> #define EUNATCH  49  /* software version not match */
> #define EHWNM142  /* hardware not matching*/

Defining custom error codes is probably not such a good idea... can we
match to common error codes instead? Do we have a good idea about
common error categories, anyway?

(Btw: does libvirt do a generic error-to-description translation, or
does it match to the context? I.e., can libvirt translate well-defined
error codes to a useful message for a specific case?)

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger




On 13.05.19 13:46, Cornelia Huck wrote:
> On Mon, 13 May 2019 13:34:35 +0200
> David Hildenbrand  wrote:
> 
>> On 13.05.19 12:55, Christian Borntraeger wrote:
>>>
>>>
>>> On 13.05.19 11:57, David Hildenbrand wrote:  
 On 13.05.19 11:51, Christian Borntraeger wrote:  
>
>
> On 13.05.19 11:40, David Hildenbrand wrote:  
>> On 13.05.19 11:34, Christian Borntraeger wrote:  
>>>
>>>
>>> On 13.05.19 10:03, David Hildenbrand wrote:  
>> +if ((SCCB_SIZE - sizeof(ReadInfo)) / sizeof(CPUEntry) < 
>> S390_MAX_CPUS)
>> +mc->max_cpus = S390_MAX_CPUS - 8;  
>
> This is too complicated, just set it always to 240.
>
> However, I am still not sure how to best handle this scenario. One
> solution is
>
> 1. Set it statically to 240 for machine > 4.1
> 2. Keep the old machines unmodifed
> 3. Don't indicate the CPU feature for machines <= 4.0
>
> #3 is the problematic part, as it mixes host CPU features and 
> machines.
> Bad. The host CPU model should always look the same on all machines. I
> don't like this.
>  

 FWIW, #3 is only an issue when modeling it via the CPU model, like
 Christian suggested.

 I suggest the following

 1. Set the max #cpus for 4.1 to 240 (already done)
 2. Keep it for the other machines unmodified (as suggested by Thomas)
 3. Create the layout of the SCCB depending on the machine type (to be 
 done)

 If we want to model diag318 via a CPU feature (which makes sense for
 migration):

 4. Disable diag318 with a warning if used with a machine < 4.1
  
>>>
>>> I think there is a simpler solution. It is perfectly fine to fail the 
>>> startup
>>> if we cannot fulfil the cpu model. So lets just allow 248 and allow 
>>> this feature 
>>> also for older machines. And if somebody chooses both at the same time,
>>> lets fails the startup.  
>>
>> To which knob do you want to glue the layout of the SCLP response? Like
>> I described?  Do you mean instead of warning and masking the feature off
>> as I suggested, simply failing?  
>
> The sclp response will depend on the dia318 cpu model flag. If its on, 
> the sclp
> response will have it, otherwise not.
> - host-passthrough: not migration safe anyway
> - host-model: if the target has diag318 good, otherwise we reject 
> migration   
>>
>> In that case, -machine ..-4.0 -cpu host will not work on new HW with new
>> KVM. Just noting.  
>
> Only if you have 248 CPUs (which is unlikely). My point was to do that 
> for all
> machine levels.
>  

 The issue with this approach is that e.g. libvirt is not aware of this
 restriction. It could query "max_cpus" and expand the host-cpu model,
 but starting a guest with > 240 cpus would fail. Maybe this is acceptable. 
  
>>>
>>> As of today we do the cpu model check in the same way. libvirt actually 
>>> tries
>>> to run QEMU and handles failures.
>>>
>>> For a failure, the user still has still to use >240 CPUs in its XML. The 
>>> only downside
>>> is that libvirt will not reject this right away.
>>>
>>> During startup we would then print an error message like
>>>
>>> "The diag318 cpu feature is only supported for 240 and less CPUs."
>>>
>>> This is of similar quality as
>>> "Selected CPU GA level is too new. Maximum supported model in the 
>>> configuration: \'%s\'",
>>>   
>>
>> But that can be tested using the runability information if I am not wrong.
> 
> You mean the cpu level information, right?
> 
>>
>>> and others that we have today.
>>>
>>> So yes, I think this would be acceptable.  
>>
>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>> production either way. But you never know.
> 
> I think that using that many cpus is a more uncommon setup, but I still
> think that having to wait for actual failure

That can happen all the time today. You can easily say z14 in the xml when 
on a zEC12. Only at startup you get the error. The question is really:
do you want to error on definition of the xml or on startup. And I think
startup is the better place here. This allows to create definitions that will
be useful in the future (pre-planning), e.g. if you know that you will update
your machine or the code soon.

> is worse than being able
> to find out beforehand. Any way to make this discoverable?

[Qemu-devel] [PULL 04/31] tcg: Specify optional vector requirements with a list

2019-05-14 Thread Richard Henderson

Replace the single opcode in .opc with a null-terminated
array in .opt_opc.  We still require that all opcodes be
used with the same .vece.

Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion.  Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.

Convert all existing vector aware front ends.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.h   |  24 +--
 tcg/tcg.h   |  20 +++
 target/arm/translate-sve.c  |   9 +-
 target/arm/translate.c  | 123 +-
 target/ppc/translate/vmx-impl.inc.c |   7 +-
 tcg/tcg-op-gvec.c   | 249 
 tcg/tcg-op-vec.c| 102 
 7 files changed, 372 insertions(+), 162 deletions(-)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index c093243c4c..ac744ff7c9 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -91,8 +91,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_2 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 int32_t data;
 /* The vector element size, if applicable.  */
@@ -112,8 +112,8 @@ typedef struct {
 gen_helper_gvec_2 *fno;
 /* Expand out-of-line helper w/descriptor, data as argument.  */
 gen_helper_gvec_2i *fnoi;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The vector element size, if applicable.  */
 uint8_t vece;
 /* Prefer i64 to v64.  */
@@ -131,8 +131,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_2i *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 uint32_t data;
 /* The vector element size, if applicable.  */
@@ -152,8 +152,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_3 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 int32_t data;
 /* The vector element size, if applicable.  */
@@ -175,8 +175,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
 /* Expand out-of-line helper w/descriptor, data in descriptor.  */
 gen_helper_gvec_3 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The vector element size, if applicable.  */
 uint8_t vece;
 /* Prefer i64 to v64.  */
@@ -194,8 +194,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_4 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 int32_t data;
 /* The vector element size, if applicable.  */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index cfc57110a1..2c7315da25 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -692,6 +692,7 @@ struct TCGContext {
 #ifdef CONFIG_DEBUG_TCG
 int temps_in_use;
 int goto_tb_issue_mask;
+const TCGOpcode *vecop_list;
 #endif
 
 /* Code generation.  Note that we specifically do not use tcg_insn_unit
@@ -1492,4 +1493,23 @@ void helper_atomic_sto_le_mmu(CPUArchState *env, 
target_ulong addr, Int128 val,
 void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
   TCGMemOpIdx oi, uintptr_t retaddr);
 
+#ifdef CONFIG_DEBUG_TCG
+void tcg_assert_listed_vecop(TCGOpcode);
+#else
+static inline void tcg_assert_listed_vecop(TCGOpcode op) { }
+#endif
+
+static inline const TCGOpcode *tcg_swap_vecop_list(const TCGOpcode *n)
+{
+#ifdef CONFIG_DEBUG_TCG
+const TCGOpcode *o = tcg_ctx->vecop_list;
+tcg_ctx->vecop_list = n;
+return o;
+#else
+return NULL;
+#endif
+}
+
+bool tcg_can_emit_vecop_list(const TCGOpcode *, TCGType, unsigned);
+
 #endif /* TCG_H */
diff --git a/target/arm/translate-sve.c b/target/ar

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Erik Skultety

On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > On Fri, 10 May 2019 10:36:09 +0100
> > > "Dr. David Alan Gilbert"  wrote:
> > >
> > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > "Dr. David Alan Gilbert"  wrote:
> > > > >
> > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > > >
> > > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > Alex Williamson  wrote:
> > > > > > > > >
> > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > Yan Zhao  wrote:
> > > > > > > > >
> > > > > > > > > > > +  Errno:
> > > > > > > > > > > +  If vendor driver wants to claim a mdev device 
> > > > > > > > > > > incompatible to all other mdev
> > > > > > > > > > > +  devices, it should not register version attribute for 
> > > > > > > > > > > this mdev device. But if
> > > > > > > > > > > +  a vendor driver has already registered version 
> > > > > > > > > > > attribute and it wants to claim
> > > > > > > > > > > +  a mdev device incompatible to all other mdev devices, 
> > > > > > > > > > > it needs to return
> > > > > > > > > > > +  -ENODEV on access to this mdev device's version 
> > > > > > > > > > > attribute.
> > > > > > > > > > > +  If a mdev device is only incompatible to certain mdev 
> > > > > > > > > > > devices, write of
> > > > > > > > > > > +  incompatible mdev devices's version strings to its 
> > > > > > > > > > > version attribute should
> > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > >
> > > > > > > > > > I think it's best not to define the specific errno returned 
> > > > > > > > > > for a
> > > > > > > > > > specific situation, let the vendor driver decide, userspace 
> > > > > > > > > > simply
> > > > > > > > > > needs to know that an errno on read indicates the device 
> > > > > > > > > > does not
> > > > > > > > > > support migration version comparison and that an errno on 
> > > > > > > > > > write
> > > > > > > > > > indicates the devices are incompatible or the target 
> > > > > > > > > > doesn't support
> > > > > > > > > > migration versions.
> > > > > > > > >
> > > > > > > > > I think I have to disagree here: It's probably valuable to 
> > > > > > > > > have an
> > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate 
> > > > > > > > > between
> > > > > > > > > those two particular devices'. Userspace might want to do 
> > > > > > > > > different
> > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > >
> > > > > > > > Trying to stuff these things down an errno seems a bad idea; we 
> > > > > > > > can't
> > > > > > > > get much information that way.
> > > > > > >
> > > > > > > So, what would be a reasonable approach? Userspace should first 
> > > > > > > read
> > > > > > > the version attributes on both devices (to find out whether 
> > > > > > > migration
> > > > > > > is supported at all), and only then figure out via writing 
> > > > > > > whether they
> > > > > > > are compatible?
> > > > > > >
> > > > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > > > >
> > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > the write or what.
> > > > >
> > > > > Hm, so I basically see two ways of doing that:
> > > > > - standardize on some error codes... problem: error codes can be hard
> > > > >   to fit to reasons
> > > > > - make the error available in some attribute that can be read
> > > > >
> > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > though (this looks inherently racy).
> > > > >
> > > > > How important is detailed error reporting here?
> > > >
> > > > I think we need something, otherwise we're just going to get vague
> > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > good enough to point most users to something they can understand
> > > > (e.g. wrong card family/too old a driver etc).
> > >
> > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > how to achieve that, though... we could also log a more verbose error
> > > message to the kernel log, but that's not necessarily where a user will
> > > look first.
> >
> > In case of libvirt checking the compatibility, it won't matter how good the
> > error message in the kernel log is and regardless of how many error states 
> > you
> > want to handle, libvirt's only limited to errno here, since we're going to 
> > do
> > plain read/write, so our internal error message returned to the user is only
> > goi

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

>>> But that can be tested using the runability information if I am not wrong.
>>
>> You mean the cpu level information, right?

Yes, query-cpu-definition includes for each model runability information
via "unavailable-features" (valid under the started QEMU machine).

>>
>>>
 and others that we have today.

 So yes, I think this would be acceptable.  
>>>
>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>> production either way. But you never know.
>>
>> I think that using that many cpus is a more uncommon setup, but I still
>> think that having to wait for actual failure
> 
> That can happen all the time today. You can easily say z14 in the xml when 
> on a zEC12. Only at startup you get the error. The question is really:

"-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
will work. Actually, even "-smp 248" will no longer work on affected
machines.

That is why wonder if it is better to disable the feature and print a
warning. Similar to CMMA, where want want to tolerate when CMMA is not
possible in the current environment (huge pages).

"Diag318 will not be enabled because it is not compatible with more than
240 CPUs".

However, I still think that implementing support for more than one SCLP
response page is the best solution. Guests will need adaptions for > 240
CPUs with Diag318, but who cares? Existing setups will continue to work.

Implementing that SCLP thingy will avoid any warnings and any errors. It
just works from the QEMU perspective.

Is implementing this realistic?

> do you want to error on definition of the xml or on startup.

I actually have no idea what the best practice on the libvirt side is.
There seems to be a user for max-cpus and unavailable-features in QEMU.

And I think
> startup is the better place here. This allows to create definitions that will
> be useful in the future (pre-planning), e.g. if you know that you will update
> your machine or the code soon.



-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [Qemu-ppc] [PATCH] docs: provide documentation on the POWER9 XIVE interrupt controller

2019-05-14 Thread Satheesh Rajendran

On Tue, May 14, 2019 at 08:46:27AM +0200, Cédric Le Goater wrote:
> This documents the overall XIVE architecture and gives an overview of
> the QEMU models. It also provides documentation on the 'info pic'
> command.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  docs/index.rst |   1 +
>  docs/ppc/index.rst |  13 ++
>  docs/ppc/xive.rst  | 344 +
>  MAINTAINERS|   1 +
>  4 files changed, 359 insertions(+)
>  create mode 100644 docs/ppc/index.rst
>  create mode 100644 docs/ppc/xive.rst

Overall doc, looks great, have few minor suggestions below.
> 
> diff --git a/docs/index.rst b/docs/index.rst
> index 3690955dd1f5..557fe86233e3 100644
> --- a/docs/index.rst
> +++ b/docs/index.rst
> @@ -12,4 +12,5 @@ Welcome to QEMU's documentation!
>  
> interop/index
> devel/index
> +   ppc/index
>  
> diff --git a/docs/ppc/index.rst b/docs/ppc/index.rst
> new file mode 100644
> index ..146f416ea3a0
> --- /dev/null
> +++ b/docs/ppc/index.rst
> @@ -0,0 +1,13 @@
> +.. This is the top level page for the 'ppc' manual
> +
> +
> +QEMU PowerPC Machine and Controller Guide
> +=
> +
> +
> +Contents:
> +
> +.. toctree::
> +   :maxdepth: 2
> +
> +   xive
> diff --git a/docs/ppc/xive.rst b/docs/ppc/xive.rst
> new file mode 100644
> index ..90ddde6bf39f
> --- /dev/null
> +++ b/docs/ppc/xive.rst
> @@ -0,0 +1,344 @@
> +
> +POWER9 XIVE interrupt controller
> +
> +
> +The POWER9 processor comes with a new interrupt controller
> +architecture, called XIVE as "eXternal Interrupt Virtualization
> +Engine".
> +
> +Compared to the previous architecture, the main characteristics of
> +XIVE are to support a larger number of interrupt sources and to
> +deliver interrupts directly to virtual processors without hypervisor
> +assistance. This removes the context switches required for the
> +delivery process.
> +
> +
> +Overall architecture
> +
> +
> +The XIVE IC is composed of three sub-engines, each taking care of a
> +processing layer of external interrupts:
> +
> +- Interrupt Virtualization Source Engine (IVSE), or Source Controller
> +  (SC). These are found in PCI PHBs, in the PSI host bridge
> +  controller, but also inside the main controller for the core IPIs
> +  and other sub-chips (NX, CAP, NPU) of the chip/processor. They are
> +  configured to feed the IVRE with events.
> +- Interrupt Virtualization Routing Engine (IVRE) or Virtualization
> +  Controller (VC). It handles event coalescing and perform interrupt
> +  routing by matching an event source number with an Event
> +  Notification Descriptor (END).
> +- Interrupt Virtualization Presentation Engine (IVPE) or Presentation
> +  Controller (PC). It maintains the interrupt context state of each
> +  thread and handles the delivery of the external interrupt to the
> +  thread.
> +
> +::
> +
> +XIVE Interrupt Controller
> +++  IPIs
> +| +-+ +-+ ++ |+---+
> +| |IVRE | |Common Q | |IVPE|> | CORES |
> +| | esb | | | ||> |   |
> +| | eas | |  Bridge | |   tctx |> |   |
> +| |SC   end | | | |nvt | ||   |
> ++--+| +-+ +++ ++ |+-+-+-+-+
> +| RAM  |+--|-+  | | |
> +|  |   || | |
> +|  |   || | |
> +|  |  +vv-v-v--+other
> +|  <--+ Power Bus  +--> chips
> +|  esb |  +-+---+--+
> +|  eas ||   |
> +|  end | +--|--+|
> +|  nvt |   +++ |   +++
> ++--+   |IVSE | |   |IVSE |
> +   | | |   | |
> +   | PQ-bits | |   | PQ-bits |
> +   | local   |-+   |  in VC  |
> +   +-+ +-+
> +  PCIe NX,NPU,CAPI
> +
> +
> +PQ-bits: 2 bits source state machine (P:pending Q:queued)
> +esb: Event State Buffer (Array of PQ bits in an IVSE)
> +eas: Event Assignment Structure
> +end: Event Notification Descriptor
> +nvt: Notification Virtual Target
> +tctx: Thread interrupt Context registers
> +
> +
> +
> +XIVE internal tables
> +
> +
> +Each of the sub-engines uses a set of tables to redirect interrupts
> +from event sources to CPU threads.
> +
> +::
> +
> +

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Yan Zhao

On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
> On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > "Dr. David Alan Gilbert"  wrote:
> > > >
> > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > >
> > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > > > >
> > > > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > Alex Williamson  wrote:
> > > > > > > > > >
> > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > Yan Zhao  wrote:
> > > > > > > > > >
> > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > +  If vendor driver wants to claim a mdev device 
> > > > > > > > > > > > incompatible to all other mdev
> > > > > > > > > > > > +  devices, it should not register version attribute 
> > > > > > > > > > > > for this mdev device. But if
> > > > > > > > > > > > +  a vendor driver has already registered version 
> > > > > > > > > > > > attribute and it wants to claim
> > > > > > > > > > > > +  a mdev device incompatible to all other mdev 
> > > > > > > > > > > > devices, it needs to return
> > > > > > > > > > > > +  -ENODEV on access to this mdev device's version 
> > > > > > > > > > > > attribute.
> > > > > > > > > > > > +  If a mdev device is only incompatible to certain 
> > > > > > > > > > > > mdev devices, write of
> > > > > > > > > > > > +  incompatible mdev devices's version strings to its 
> > > > > > > > > > > > version attribute should
> > > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > > >
> > > > > > > > > > > I think it's best not to define the specific errno 
> > > > > > > > > > > returned for a
> > > > > > > > > > > specific situation, let the vendor driver decide, 
> > > > > > > > > > > userspace simply
> > > > > > > > > > > needs to know that an errno on read indicates the device 
> > > > > > > > > > > does not
> > > > > > > > > > > support migration version comparison and that an errno on 
> > > > > > > > > > > write
> > > > > > > > > > > indicates the devices are incompatible or the target 
> > > > > > > > > > > doesn't support
> > > > > > > > > > > migration versions.
> > > > > > > > > >
> > > > > > > > > > I think I have to disagree here: It's probably valuable to 
> > > > > > > > > > have an
> > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate 
> > > > > > > > > > between
> > > > > > > > > > those two particular devices'. Userspace might want to do 
> > > > > > > > > > different
> > > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > > >
> > > > > > > > > Trying to stuff these things down an errno seems a bad idea; 
> > > > > > > > > we can't
> > > > > > > > > get much information that way.
> > > > > > > >
> > > > > > > > So, what would be a reasonable approach? Userspace should first 
> > > > > > > > read
> > > > > > > > the version attributes on both devices (to find out whether 
> > > > > > > > migration
> > > > > > > > is supported at all), and only then figure out via writing 
> > > > > > > > whether they
> > > > > > > > are compatible?
> > > > > > > >
> > > > > > > > (Or just go ahead and try, if it does not care about the 
> > > > > > > > reason.)
> > > > > > >
> > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > the write or what.
> > > > > >
> > > > > > Hm, so I basically see two ways of doing that:
> > > > > > - standardize on some error codes... problem: error codes can be 
> > > > > > hard
> > > > > >   to fit to reasons
> > > > > > - make the error available in some attribute that can be read
> > > > > >
> > > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > > though (this looks inherently racy).
> > > > > >
> > > > > > How important is detailed error reporting here?
> > > > >
> > > > > I think we need something, otherwise we're just going to get vague
> > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > > good enough to point most users to something they can understand
> > > > > (e.g. wrong card family/too old a driver etc).
> > > >
> > > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > > how to achieve that, though... we could also log a more verbose error
> > > > message to the kernel log, but that's not necessarily where a user will
> > > > look first.
> > >
> > > In case of libvirt checking the compatibility, it

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Erik Skultety

On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
> > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > > "Dr. David Alan Gilbert"  wrote:
> > > > >
> > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > > >
> > > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > > > > >
> > > > > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > > Alex Williamson  wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > > Yan Zhao  wrote:
> > > > > > > > > > >
> > > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > > +  If vendor driver wants to claim a mdev device 
> > > > > > > > > > > > > incompatible to all other mdev
> > > > > > > > > > > > > +  devices, it should not register version attribute 
> > > > > > > > > > > > > for this mdev device. But if
> > > > > > > > > > > > > +  a vendor driver has already registered version 
> > > > > > > > > > > > > attribute and it wants to claim
> > > > > > > > > > > > > +  a mdev device incompatible to all other mdev 
> > > > > > > > > > > > > devices, it needs to return
> > > > > > > > > > > > > +  -ENODEV on access to this mdev device's version 
> > > > > > > > > > > > > attribute.
> > > > > > > > > > > > > +  If a mdev device is only incompatible to certain 
> > > > > > > > > > > > > mdev devices, write of
> > > > > > > > > > > > > +  incompatible mdev devices's version strings to its 
> > > > > > > > > > > > > version attribute should
> > > > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > > > >
> > > > > > > > > > > > I think it's best not to define the specific errno 
> > > > > > > > > > > > returned for a
> > > > > > > > > > > > specific situation, let the vendor driver decide, 
> > > > > > > > > > > > userspace simply
> > > > > > > > > > > > needs to know that an errno on read indicates the 
> > > > > > > > > > > > device does not
> > > > > > > > > > > > support migration version comparison and that an errno 
> > > > > > > > > > > > on write
> > > > > > > > > > > > indicates the devices are incompatible or the target 
> > > > > > > > > > > > doesn't support
> > > > > > > > > > > > migration versions.
> > > > > > > > > > >
> > > > > > > > > > > I think I have to disagree here: It's probably valuable 
> > > > > > > > > > > to have an
> > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot 
> > > > > > > > > > > migrate between
> > > > > > > > > > > those two particular devices'. Userspace might want to do 
> > > > > > > > > > > different
> > > > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > > > >
> > > > > > > > > > Trying to stuff these things down an errno seems a bad 
> > > > > > > > > > idea; we can't
> > > > > > > > > > get much information that way.
> > > > > > > > >
> > > > > > > > > So, what would be a reasonable approach? Userspace should 
> > > > > > > > > first read
> > > > > > > > > the version attributes on both devices (to find out whether 
> > > > > > > > > migration
> > > > > > > > > is supported at all), and only then figure out via writing 
> > > > > > > > > whether they
> > > > > > > > > are compatible?
> > > > > > > > >
> > > > > > > > > (Or just go ahead and try, if it does not care about the 
> > > > > > > > > reason.)
> > > > > > > >
> > > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > > the write or what.
> > > > > > >
> > > > > > > Hm, so I basically see two ways of doing that:
> > > > > > > - standardize on some error codes... problem: error codes can be 
> > > > > > > hard
> > > > > > >   to fit to reasons
> > > > > > > - make the error available in some attribute that can be read
> > > > > > >
> > > > > > > I'm not sure how we can serialize the readback with the last 
> > > > > > > write,
> > > > > > > though (this looks inherently racy).
> > > > > > >
> > > > > > > How important is detailed error reporting here?
> > > > > >
> > > > > > I think we need something, otherwise we're just going to get vague
> > > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to 
> > > > > > be
> > > > > > good enough to point most users to something they can understand
> > > > > > (e.g. wrong card family/too old a driver etc).
> > > > >
> > > > > Ok, that sounds like a reasonable

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Yan Zhao

On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
> > > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> > > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > >
> > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > > > >
> > > > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > > > "Dr. David Alan Gilbert"  wrote:
> > > > > > > > > >
> > > > > > > > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > > > Alex Williamson  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > > > Yan Zhao  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > > > +  If vendor driver wants to claim a mdev device 
> > > > > > > > > > > > > > incompatible to all other mdev
> > > > > > > > > > > > > > +  devices, it should not register version 
> > > > > > > > > > > > > > attribute for this mdev device. But if
> > > > > > > > > > > > > > +  a vendor driver has already registered version 
> > > > > > > > > > > > > > attribute and it wants to claim
> > > > > > > > > > > > > > +  a mdev device incompatible to all other mdev 
> > > > > > > > > > > > > > devices, it needs to return
> > > > > > > > > > > > > > +  -ENODEV on access to this mdev device's version 
> > > > > > > > > > > > > > attribute.
> > > > > > > > > > > > > > +  If a mdev device is only incompatible to certain 
> > > > > > > > > > > > > > mdev devices, write of
> > > > > > > > > > > > > > +  incompatible mdev devices's version strings to 
> > > > > > > > > > > > > > its version attribute should
> > > > > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it's best not to define the specific errno 
> > > > > > > > > > > > > returned for a
> > > > > > > > > > > > > specific situation, let the vendor driver decide, 
> > > > > > > > > > > > > userspace simply
> > > > > > > > > > > > > needs to know that an errno on read indicates the 
> > > > > > > > > > > > > device does not
> > > > > > > > > > > > > support migration version comparison and that an 
> > > > > > > > > > > > > errno on write
> > > > > > > > > > > > > indicates the devices are incompatible or the target 
> > > > > > > > > > > > > doesn't support
> > > > > > > > > > > > > migration versions.
> > > > > > > > > > > >
> > > > > > > > > > > > I think I have to disagree here: It's probably valuable 
> > > > > > > > > > > > to have an
> > > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot 
> > > > > > > > > > > > migrate between
> > > > > > > > > > > > those two particular devices'. Userspace might want to 
> > > > > > > > > > > > do different
> > > > > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > > > > >
> > > > > > > > > > > Trying to stuff these things down an errno seems a bad 
> > > > > > > > > > > idea; we can't
> > > > > > > > > > > get much information that way.
> > > > > > > > > >
> > > > > > > > > > So, what would be a reasonable approach? Userspace should 
> > > > > > > > > > first read
> > > > > > > > > > the version attributes on both devices (to find out whether 
> > > > > > > > > > migration
> > > > > > > > > > is supported at all), and only then figure out via writing 
> > > > > > > > > > whether they
> > > > > > > > > > are compatible?
> > > > > > > > > >
> > > > > > > > > > (Or just go ahead and try, if it does not care about the 
> > > > > > > > > > reason.)
> > > > > > > > >
> > > > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > > > I'm not sure if that involves reading back from somewhere 
> > > > > > > > > after
> > > > > > > > > the write or what.
> > > > > > > >
> > > > > > > > Hm, so I basically see two ways of doing that:
> > > > > > > > - standardize on some error codes... problem: error codes can 
> > > > > > > > be hard
> > > > > > > >   to fit to reasons
> > > > > > > > - make the error available in some attribute that can be read
> > > > > > > >
> > > > > > > > I'm not sure how we can serialize the readback with the last 
> > > > > > > > write,
> > > > > > > > though (this looks inherently racy).
> > > > > > > >
> > > > > > > > How important is detailed error reporting here?
> > > > > > >
> > > > > > > I think we need something, otherwise we're just going to get vague
> > > > > > > us

[Qemu-devel] [PATCH v4 0/3] rng-builtin: add an RNG backend that uses qemu_guest_getrandom()

2019-05-14 Thread Laurent Vivier

Add a new RNG backend using QEMU builtin getrandom function.

This patch applies on top of
"[PATCH v6 00/24] Add qemu_getrandom and ARMv8.5-RNG etc"
Based-on: 20190510173049.28171-1-richard.hender...@linaro.org

v4: update PATCH 1 commit message

v3: Include Kashyap's patch in the series
Add a patch to change virtio-rng default backend to rng-builtin

v2: Update qemu-options.hx
describe the new backend and specify virtio-rng uses the
rng-random by default

Kashyap Chamarthy (1):
  VirtIO-RNG: Update default entropy source to `/dev/urandom`

Laurent Vivier (2):
  rng-builtin: add an RNG backend that uses qemu_guest_getrandom()
  virtio-rng: change default backend to rng-builtin

 backends/Makefile.objs |  2 +-
 backends/rng-builtin.c | 54 ++
 backends/rng-random.c  |  2 +-
 hw/virtio/virtio-rng.c |  2 +-
 include/hw/virtio/virtio-rng.h |  4 +--
 include/sysemu/rng-builtin.h   | 17 +++
 qemu-options.hx|  9 +-
 7 files changed, 84 insertions(+), 6 deletions(-)
 create mode 100644 backends/rng-builtin.c
 create mode 100644 include/sysemu/rng-builtin.h

-- 
2.20.1

[Qemu-devel] [PATCH v4 1/3] VirtIO-RNG: Update default entropy source to `/dev/urandom`

2019-05-14 Thread Laurent Vivier

From: Kashyap Chamarthy 

When QEMU exposes a VirtIO-RNG device to the guest, that device needs a
source of entropy, and that source needs to be "non-blocking", like
`/dev/urandom`.  However, currently QEMU defaults to the problematic
`/dev/random`, which on linux is "blocking" (as in, it waits until
sufficient entropy is available).

Why prefer `/dev/urandom` over `/dev/random`?
-

The man pages of urandom(4) and random(4) state:

"The /dev/random device is a legacy interface which dates back to a
time where the cryptographic primitives used in the implementation
of /dev/urandom were not widely trusted.  It will return random
bytes only within the estimated number of bits of fresh noise in the
entropy pool, blocking if necessary.  /dev/random is suitable for
applications that need high quality randomness, and can afford
indeterminate delays."

Further, the "Usage" section of the said man pages state:

"The /dev/random interface is considered a legacy interface, and
/dev/urandom is preferred and sufficient in all use cases, with the
exception of applications which require randomness during early boot
time; for these applications, getrandom(2) must be used instead,
because it will block until the entropy pool is initialized.

"If a seed file is saved across reboots as recommended below (all
major Linux distributions have done this since 2000 at least), the
output is cryptographically secure against attackers without local
root access as soon as it is reloaded in the boot sequence, and
perfectly adequate for network encryption session keys.  Since reads
from /dev/random may block, users will usually want to open it in
nonblocking mode (or perform a read with timeout), and provide some
sort of user notification if the desired entropy is not immediately
available."

And refer to random(7) for a comparison of `/dev/random` and
`/dev/urandom`.

What about other OSes?
--

`/dev/urandom` exists and works on OS-X, FreeBSD, DragonFlyBSD, NetBSD
and OpenBSD, which cover all the non-Linux platforms we explicitly
support, aside from Windows.

On Windows `/dev/random` doesn't work either so we don't regress.
This is actually another argument in favour of using the newly
proposed 'rng-builtin' backend by default, as that will work on
Windows.

- - -

Given the above, change the entropy source for VirtIO-RNG device to
`/dev/urandom`.

Related discussion in these[1][2] past threads.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg08335.html
-- "RNG: Any reason QEMU doesn't default to `/dev/urandom`?"
[2] https://lists.nongnu.org/archive/html/qemu-devel/2018-09/msg02724.html
-- "[RFC] Virtio RNG: Consider changing the default entropy source to
   /dev/urandom"

Signed-off-by: Kashyap Chamarthy 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Laurent Vivier 
---
 backends/rng-random.c | 2 +-
 qemu-options.hx   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/backends/rng-random.c b/backends/rng-random.c
index e2a49b0571d7..eff36ef14084 100644
--- a/backends/rng-random.c
+++ b/backends/rng-random.c
@@ -112,7 +112,7 @@ static void rng_random_init(Object *obj)
 rng_random_set_filename,
 NULL);
 
-s->filename = g_strdup("/dev/random");
+s->filename = g_strdup("/dev/urandom");
 s->fd = -1;
 }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 0191ef8b1eb7..4df0ea3aed5c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4286,7 +4286,7 @@ Creates a random number generator backend which obtains 
entropy from
 a device on the host. The @option{id} parameter is a unique ID that
 will be used to reference this entropy backend from the @option{virtio-rng}
 device. The @option{filename} parameter specifies which file to obtain
-entropy from and if omitted defaults to @option{/dev/random}.
+entropy from and if omitted defaults to @option{/dev/urandom}.
 
 @item -object rng-egd,id=@var{id},chardev=@var{chardevid}
 
-- 
2.20.1

[Qemu-devel] [PATCH v4 2/3] rng-builtin: add an RNG backend that uses qemu_guest_getrandom()

2019-05-14 Thread Laurent Vivier

Add a new RNG backend using QEMU builtin getrandom function.

It can be created and used with something like:

... -object rng-builtin,id=rng0 -device virtio-rng,rng=rng0 ...

Reviewed-by: Richard Henderson 
Signed-off-by: Laurent Vivier 
---
 backends/Makefile.objs |  2 +-
 backends/rng-builtin.c | 56 ++
 qemu-options.hx| 10 +++-
 3 files changed, 66 insertions(+), 2 deletions(-)
 create mode 100644 backends/rng-builtin.c

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 981e8e122f2c..f0691116e86e 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += rng.o rng-egd.o
+common-obj-y += rng.o rng-egd.o rng-builtin.o
 common-obj-$(CONFIG_POSIX) += rng-random.o
 
 common-obj-$(CONFIG_TPM) += tpm.o
diff --git a/backends/rng-builtin.c b/backends/rng-builtin.c
new file mode 100644
index ..b1264b745407
--- /dev/null
+++ b/backends/rng-builtin.c
@@ -0,0 +1,56 @@
+/*
+ * QEMU Builtin Random Number Generator Backend
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/rng.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
+
+#define TYPE_RNG_BUILTIN "rng-builtin"
+#define RNG_BUILTIN(obj) OBJECT_CHECK(RngBuiltin, (obj), TYPE_RNG_BUILTIN)
+
+typedef struct RngBuiltin {
+RngBackend parent;
+} RngBuiltin;
+
+static void rng_builtin_request_entropy(RngBackend *b, RngRequest *req)
+{
+RngBuiltin *s = RNG_BUILTIN(b);
+
+while (!QSIMPLEQ_EMPTY(&s->parent.requests)) {
+RngRequest *req = QSIMPLEQ_FIRST(&s->parent.requests);
+
+qemu_guest_getrandom_nofail(req->data, req->size);
+
+req->receive_entropy(req->opaque, req->data, req->size);
+
+rng_backend_finalize_request(&s->parent, req);
+}
+}
+
+static void rng_builtin_class_init(ObjectClass *klass, void *data)
+{
+RngBackendClass *rbc = RNG_BACKEND_CLASS(klass);
+
+rbc->request_entropy = rng_builtin_request_entropy;
+}
+
+static const TypeInfo rng_builtin_info = {
+.name = TYPE_RNG_BUILTIN,
+.parent = TYPE_RNG_BACKEND,
+.instance_size = sizeof(RngBuiltin),
+.class_init = rng_builtin_class_init,
+};
+
+static void register_types(void)
+{
+type_register_static(&rng_builtin_info);
+}
+
+type_init(register_types);
diff --git a/qemu-options.hx b/qemu-options.hx
index 4df0ea3aed5c..6ab920f12be4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4280,13 +4280,21 @@ other options.
 
 The @option{share} boolean option is @var{on} by default with memfd.
 
+@item -object rng-builtin,id=@var{id}
+
+Creates a random number generator backend which obtains entropy from
+QEMU builtin functions. The @option{id} parameter is a unique ID that
+will be used to reference this entropy backend from the @option{virtio-rng}
+device.
+
 @item -object rng-random,id=@var{id},filename=@var{/dev/random}
 
 Creates a random number generator backend which obtains entropy from
 a device on the host. The @option{id} parameter is a unique ID that
 will be used to reference this entropy backend from the @option{virtio-rng}
 device. The @option{filename} parameter specifies which file to obtain
-entropy from and if omitted defaults to @option{/dev/urandom}.
+entropy from and if omitted defaults to @option{/dev/urandom}. By default,
+the @option{virtio-rng} device uses this RNG backend.
 
 @item -object rng-egd,id=@var{id},chardev=@var{chardevid}
 
-- 
2.20.1

[Qemu-devel] [PATCH v4 3/3] virtio-rng: change default backend to rng-builtin

2019-05-14 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 backends/rng-builtin.c |  8 +++-
 hw/virtio/virtio-rng.c |  2 +-
 include/hw/virtio/virtio-rng.h |  4 ++--
 include/sysemu/rng-builtin.h   | 17 +
 qemu-options.hx|  5 ++---
 5 files changed, 25 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/rng-builtin.h

diff --git a/backends/rng-builtin.c b/backends/rng-builtin.c
index b1264b745407..27675301933b 100644
--- a/backends/rng-builtin.c
+++ b/backends/rng-builtin.c
@@ -7,17 +7,15 @@
 
 #include "qemu/osdep.h"
 #include "sysemu/rng.h"
+#include "sysemu/rng-builtin.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/main-loop.h"
 #include "qemu/guest-random.h"
 
-#define TYPE_RNG_BUILTIN "rng-builtin"
-#define RNG_BUILTIN(obj) OBJECT_CHECK(RngBuiltin, (obj), TYPE_RNG_BUILTIN)
-
-typedef struct RngBuiltin {
+struct RngBuiltin {
 RngBackend parent;
-} RngBuiltin;
+};
 
 static void rng_builtin_request_entropy(RngBackend *b, RngRequest *req)
 {
diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index 30493a258622..67209f63ddbc 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -189,7 +189,7 @@ static void virtio_rng_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 if (vrng->conf.rng == NULL) {
-vrng->conf.default_backend = RNG_RANDOM(object_new(TYPE_RNG_RANDOM));
+vrng->conf.default_backend = RNG_BUILTIN(object_new(TYPE_RNG_BUILTIN));
 
 user_creatable_complete(USER_CREATABLE(vrng->conf.default_backend),
 &local_err);
diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h
index 922dce7caccf..f9b6339b19a4 100644
--- a/include/hw/virtio/virtio-rng.h
+++ b/include/hw/virtio/virtio-rng.h
@@ -13,7 +13,7 @@
 #define QEMU_VIRTIO_RNG_H
 
 #include "sysemu/rng.h"
-#include "sysemu/rng-random.h"
+#include "sysemu/rng-builtin.h"
 #include "standard-headers/linux/virtio_rng.h"
 
 #define TYPE_VIRTIO_RNG "virtio-rng-device"
@@ -26,7 +26,7 @@ struct VirtIORNGConf {
 RngBackend *rng;
 uint64_t max_bytes;
 uint32_t period_ms;
-RngRandom *default_backend;
+RngBuiltin *default_backend;
 };
 
 typedef struct VirtIORNG {
diff --git a/include/sysemu/rng-builtin.h b/include/sysemu/rng-builtin.h
new file mode 100644
index ..a0f75f97dde8
--- /dev/null
+++ b/include/sysemu/rng-builtin.h
@@ -0,0 +1,17 @@
+/*
+ * QEMU Builtin Random Number Generator Backend
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_RNG_BUILTIN_H
+#define QEMU_RNG_BUILTIN_H
+
+#include "qom/object.h"
+
+#define TYPE_RNG_BUILTIN "rng-builtin"
+#define RNG_BUILTIN(obj) OBJECT_CHECK(RngBuiltin, (obj), TYPE_RNG_BUILTIN)
+
+typedef struct RngBuiltin RngBuiltin;
+
+#endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 6ab920f12be4..c9784be83cb5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4285,7 +4285,7 @@ The @option{share} boolean option is @var{on} by default 
with memfd.
 Creates a random number generator backend which obtains entropy from
 QEMU builtin functions. The @option{id} parameter is a unique ID that
 will be used to reference this entropy backend from the @option{virtio-rng}
-device.
+device. By default, the @option{virtio-rng} device uses this RNG backend.
 
 @item -object rng-random,id=@var{id},filename=@var{/dev/random}
 
@@ -4293,8 +4293,7 @@ Creates a random number generator backend which obtains 
entropy from
 a device on the host. The @option{id} parameter is a unique ID that
 will be used to reference this entropy backend from the @option{virtio-rng}
 device. The @option{filename} parameter specifies which file to obtain
-entropy from and if omitted defaults to @option{/dev/urandom}. By default,
-the @option{virtio-rng} device uses this RNG backend.
+entropy from and if omitted defaults to @option{/dev/urandom}.
 
 @item -object rng-egd,id=@var{id},chardev=@var{chardevid}
 
-- 
2.20.1

Re: [Qemu-devel] [Bug 1826393] Re: QEMU 3.1.0 stuck waiting for 800ms (5 times slower) in pre-bios phase

2019-05-14 Thread Stefano Garzarella

On Mon, May 06, 2019 at 05:40:05PM -, Waldemar Kozaczuk wrote:
> The last bios indeed helped. It knows runs under 200ms.
> 
> Do you anticipate doing minor release of 3.1.0 with updated bios to address
> this issue? Or users are expected to upgrade to QEMU 4.0.0?

CCing Gerd

I'm not sure we will release SeaBIOS 1.12.1 with a minor release of QEMU
3.1.0, so upgrading to QEMU 4.0 should be the way to address this issue.

Regards,
Stefano

> 
> Regards,
> Waldek
> 
> On Thu, May 2, 2019 at 4:05 AM Stefano Garzarella <
> 1826...@bugs.launchpad.net> wrote:
> 
> > Oh sorry, you're using the 'pc' machine, so please try this bios:
> > https://github.com/qemu/qemu/blob/v4.0.0/pc-bios/bios.bin
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1826393
> >
> > Title:
> >   QEMU 3.1.0 stuck waiting for 800ms (5 times slower) in pre-bios phase
> >
> > Status in QEMU:
> >   New
> >
> > Bug description:
> >   Yesterday I have upgraded my laptop from Ubuntu 18.10 to 19.04 and
> >   that way got newer QEMU 3.1.0 along vs QEMU 2.12.0 before. I have
> >   noticed that everytime I start QEMU to run OSv, QEMU seems to hand
> >   noticably longer (~1 second) before showing SeaBIOS output. I have
> >   tried all kind of combinations to get rid of that pause and nothing
> >   helped.
> >
> >   Here is my start command:
> >   time qemu-system-x86_64 -m 256M -smp 1 -nographic -nodefaults \
> >-device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \
> >-drive file=usr.img,if=none,id=hd0,cache=none,aio=thre\
> >-enable-kvm \
> >-cpu host,+x2apic -chardev stdio,mux=on,id=stdio,signal=off \
> >-mon chardev=stdio,mode=readline -device isa-serial,chardev=stdio
> >
> >   It looks like qemu process starts, waits almost a second for something
> >   and then print SeaBIOS splashscreen and continues boot:
> >
> >   --> waits here
> >   SeaBIOS (version 1.12.0-1)
> >   Booting from Hard Disk..OSv v0.53.0-6-gc8395118
> > disk read (real mode): 27.25ms, (+27.25ms)
> > uncompress lzloader.elf: 46.22ms, (+18.97ms)
> > TLS initialization: 46.79ms, (+0.57ms)
> > .init functions: 47.82ms, (+1.03ms)
> > SMP launched: 48.08ms, (+0.26ms)
> > VFS initialized: 49.25ms, (+1.17ms)
> > Network initialized: 49.48ms, (+0.24ms)
> > pvpanic done: 49.57ms, (+0.08ms)
> > pci enumerated: 52.42ms, (+2.85ms)
> > drivers probe: 52.42ms, (+0.00ms)
> > drivers loaded: 55.33ms, (+2.90ms)
> > ROFS mounted: 56.37ms, (+1.04ms)
> > Total time: 56.37ms, (+0.00ms)
> >   Found optarg
> >   dev  etc  hello  libenviron.solibvdso.so  proc  tmp  tools  usr
> >
> >   real  0m0.935s
> >   user  0m0.426s
> >   sys   0m0.490s
> >
> >   With version 2.12.0 I used to see real below 200ms. So it seems qemu
> >   slowed down 5 times.
> >
> >   I ran strace -tt against it and I have noticed a pause here:
> >   ...
> >   07:31:41.848579 futex(0x55c4a2fd34c0, FUTEX_WAKE_PRIVATE, 1) = 0
> >   07:31:41.848604 futex(0x55c4a2ff6308, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
> >   07:31:41.848649 ioctl(10, KVM_SET_PIT2, 0x7ffdd272d1f0) = 0
> >   07:31:41.848674 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_KVMCLOCK_CTRL) = 1
> >   07:31:41.848699 ioctl(10, KVM_SET_CLOCK, 0x7ffdd272d230) = 0
> >   07:31:41.848724 futex(0x55c4a49a9a9c, FUTEX_WAKE_PRIVATE, 2147483647) = 1
> >   07:31:41.848747 getpid()= 5162
> >   07:31:41.848769 tgkill(5162, 5166, SIGUSR1) = 0
> >   07:31:41.848791 futex(0x55c4a2fd34c0, FUTEX_WAKE_PRIVATE, 1) = 0
> >   07:31:41.848814 futex(0x55c4a49a9a98, FUTEX_WAKE_PRIVATE, 2147483647) = 1
> >   07:31:41.848837 getpid()= 5162
> >   07:31:41.848858 tgkill(5162, 5166, SIGUSR1) = 0
> >   07:31:41.848889 write(8, "\1\0\0\0\0\0\0\0", 8) = 8
> >   07:31:41.848919 futex(0x55c4a2fd34c0, FUTEX_WAKE_PRIVATE, 1) = 1
> >   07:31:41.848943 ppoll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN},
> > {fd=5, events=POLLIN}, {fd=7, events=POLLIN},
> >   {fd=8, events=POLLIN}], 5, {tv_sec=0, tv_nsec=0}, NULL, 8) = 1 ([{fd=8,
> > revents=POLLIN}], left {tv_sec=0, tv_nsec=0
> >   })
> >   07:31:41.849003 futex(0x55c4a2fd34c0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1
> > EAGAIN (Resource temporarily unavailable)
> >   07:31:41.849031 read(8, "\5\0\0\0\0\0\0\0", 16) = 8
> >   07:31:41.849064 futex(0x55c4a2fd34c0, FUTEX_WAKE_PRIVATE, 1) = 0
> >   07:31:41.849086 ppoll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN},
> > {fd=5, events=POLLIN}, {fd=7, events=POLLIN},
> >   {fd=8, events=POLLIN}], 5, {tv_sec=0, tv_nsec=984624000}, NULL, 8) = 1
> > ([{fd=7, revents=POLLIN}], left {tv_sec=0, t
> >   v_nsec=190532609})
> >
> >   --> waits for almost 800ms
> >
> >   07:31:42.643272 futex(0x55c4a2fd34c0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
> >   07:31:42.643522 read(7, "\1\0\0\0\0\0\0\0", 512) = 8
> >   07:31:42.643625 futex(0x55c4a2fd34c0, FUTEX_WAKE_PRIVATE, 1) = 1
> >   07:31:42.6436

Re: [Qemu-devel] [PATCH v4 1/3] VirtIO-RNG: Update default entropy source to `/dev/urandom`

2019-05-14 Thread Kashyap Chamarthy

On Tue, May 14, 2019 at 09:56:00AM +0200, Laurent Vivier wrote:
> From: Kashyap Chamarthy 
> 
> When QEMU exposes a VirtIO-RNG device to the guest, that device needs a
> source of entropy, and that source needs to be "non-blocking", like
> `/dev/urandom`.  However, currently QEMU defaults to the problematic
> `/dev/random`, which on linux is "blocking" (as in, it waits until

OCD nit: s/linux/Linux/

Maybe Michael can do the touch up when applying.

Thanks, Laurent, for reworking the commit message update.

> sufficient entropy is available).

[...]

-- 
/kashyap

Re: [Qemu-devel] [PATCH v4 1/3] VirtIO-RNG: Update default entropy source to `/dev/urandom`

2019-05-14 Thread Laurent Vivier


On 14/05/2019 10:08, Kashyap Chamarthy wrote:

On Tue, May 14, 2019 at 09:56:00AM +0200, Laurent Vivier wrote:

From: Kashyap Chamarthy 

When QEMU exposes a VirtIO-RNG device to the guest, that device needs a
source of entropy, and that source needs to be "non-blocking", like
`/dev/urandom`.  However, currently QEMU defaults to the problematic
`/dev/random`, which on linux is "blocking" (as in, it waits until


OCD nit: s/linux/Linux/

Maybe Michael can do the touch up when applying.


A little reminder: this patch can be applied alone, but the followings 
need the series from Richard to be applied first.


Thanks,
Laurent

Re: [Qemu-devel] [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread David Hildenbrand

On 10.05.19 17:51, Pankaj Gupta wrote:
> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
> 
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta 
> Reviewed-by: Yuval Shaia 
> ---
>  drivers/nvdimm/Makefile  |   1 +
>  drivers/nvdimm/nd_virtio.c   | 129 +++
>  drivers/nvdimm/virtio_pmem.c | 117 
>  drivers/virtio/Kconfig   |  10 +++
>  include/linux/virtio_pmem.h  |  60 ++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 328 insertions(+)
>  create mode 100644 drivers/nvdimm/nd_virtio.c
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> index 6f2a088afad6..cefe233e0b52 100644
> --- a/drivers/nvdimm/Makefile
> +++ b/drivers/nvdimm/Makefile
> @@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
>  obj-$(CONFIG_ND_BLK) += nd_blk.o
>  obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
>  obj-$(CONFIG_OF_PMEM) += of_pmem.o
> +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
>  
>  nd_pmem-y := pmem.o
>  
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> new file mode 100644
> index ..ed7ddcc5a62c
> --- /dev/null
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -0,0 +1,129 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include 
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> + unsigned int len;
> + unsigned long flags;
> + struct virtio_pmem_request *req, *req_buf;
> + struct virtio_pmem *vpmem = vq->vdev->priv;

Nit: use reverse Christmas tree layout :)

> +
> + spin_lock_irqsave(&vpmem->pmem_lock, flags);
> + while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> + req->done = true;
> + wake_up(&req->host_acked);
> +
> + if (!list_empty(&vpmem->req_list)) {
> + req_buf = list_first_entry(&vpmem->req_list,
> + struct virtio_pmem_request, list);
> + req_buf->wq_buf_avail = true;
> + wake_up(&req_buf->wq_buf);
> + list_del(&req_buf->list);
> + }
> + }
> + spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> + int err, err1;
> + unsigned long flags;
> + struct scatterlist *sgs[2], sg, ret;
> + struct virtio_device *vdev = nd_region->provider_data;
> + struct virtio_pmem *vpmem = vdev->priv;
> + struct virtio_pmem_request *req;

Nit: use reverse Christmas tree layout :)

> +
> + might_sleep();
> + req = kmalloc(sizeof(*req), GFP_KERNEL);
> + if (!req)
> + return -ENOMEM;
> +
> + req->done = false;
> + strcpy(req->name, "FLUSH");
> + init_waitqueue_head(&req->host_acked);
> + init_waitqueue_head(&req->wq_buf);
> + INIT_LIST_HEAD(&req->list);
> + sg_init_one(&sg, req->name, strlen(req->name));
> + sgs[0] = &sg;
> + sg_init_one(&ret, &req->ret, sizeof(req->ret));
> + sgs[1] = &ret;
> +
> + spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +  /*
> +   * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
> +   * queue does not have free descriptor. We add the request
> +   * to req_list and wait for host_ack to wake us up when free
> +   * slots are available.
> +   */
> + while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> + GFP_ATOMIC)) == -ENOSPC) {
> +
> + dev_err(&vdev->dev, "failed to send command to virtio pmem"\
> + "device, no free slots in the virtqueue\n");
> + req->wq_buf_avail = false;
> + list_add_tail(&req->list, &vpmem->req_list);
> + spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> + /* When host has read buffer, this completes via host_ack */

"A host repsonse results in "host_ack" getting called" ... ?

> +

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger




On 14.05.19 09:28, David Hildenbrand wrote:
 But that can be tested using the runability information if I am not wrong.
>>>
>>> You mean the cpu level information, right?
> 
> Yes, query-cpu-definition includes for each model runability information
> via "unavailable-features" (valid under the started QEMU machine).
> 
>>>

> and others that we have today.
>
> So yes, I think this would be acceptable.  

 I guess it is acceptable yes. I doubt anybody uses that many CPUs in
 production either way. But you never know.
>>>
>>> I think that using that many cpus is a more uncommon setup, but I still
>>> think that having to wait for actual failure
>>
>> That can happen all the time today. You can easily say z14 in the xml when 
>> on a zEC12. Only at startup you get the error. The question is really:
> 
> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
> will work. Actually, even "-smp 248" will no longer work on affected
> machines.
> 
> That is why wonder if it is better to disable the feature and print a
> warning. Similar to CMMA, where want want to tolerate when CMMA is not
> possible in the current environment (huge pages).
> 
> "Diag318 will not be enabled because it is not compatible with more than
> 240 CPUs".
> 
> However, I still think that implementing support for more than one SCLP
> response page is the best solution. Guests will need adaptions for > 240
> CPUs with Diag318, but who cares? Existing setups will continue to work.
> 
> Implementing that SCLP thingy will avoid any warnings and any errors. It
> just works from the QEMU perspective.
> 
> Is implementing this realistic?

Yes it is but it will take time. I will try to get this rolling. To make
progress on the diag318 thing, can we error on startup now and simply
remove that check when when have implemented a larger sccb? If we would
now do all kinds of "change the max number games" would be harder to "fix".



> 
>> do you want to error on definition of the xml or on startup.
> 
> I actually have no idea what the best practice on the libvirt side is.
> There seems to be a user for max-cpus and unavailable-features in QEMU.
> 
> And I think
>> startup is the better place here. This allows to create definitions that will
>> be useful in the future (pre-planning), e.g. if you know that you will update
>> your machine or the code soon.
> 
> 
>

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Cornelia Huck

On Tue, 14 May 2019 10:37:32 +0200
Christian Borntraeger  wrote:

> On 14.05.19 09:28, David Hildenbrand wrote:
>  But that can be tested using the runability information if I am not 
>  wrong.  
> >>>
> >>> You mean the cpu level information, right?  
> > 
> > Yes, query-cpu-definition includes for each model runability information
> > via "unavailable-features" (valid under the started QEMU machine).
> >   
> >>>  
>   
> > and others that we have today.
> >
> > So yes, I think this would be acceptable.
> 
>  I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>  production either way. But you never know.  
> >>>
> >>> I think that using that many cpus is a more uncommon setup, but I still
> >>> think that having to wait for actual failure  
> >>
> >> That can happen all the time today. You can easily say z14 in the xml when 
> >> on a zEC12. Only at startup you get the error. The question is really:  
> > 
> > "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
> > will work. Actually, even "-smp 248" will no longer work on affected
> > machines.
> > 
> > That is why wonder if it is better to disable the feature and print a
> > warning. Similar to CMMA, where want want to tolerate when CMMA is not
> > possible in the current environment (huge pages).
> > 
> > "Diag318 will not be enabled because it is not compatible with more than
> > 240 CPUs".
> > 
> > However, I still think that implementing support for more than one SCLP
> > response page is the best solution. Guests will need adaptions for > 240
> > CPUs with Diag318, but who cares? Existing setups will continue to work.
> > 
> > Implementing that SCLP thingy will avoid any warnings and any errors. It
> > just works from the QEMU perspective.
> > 
> > Is implementing this realistic?  
> 
> Yes it is but it will take time. I will try to get this rolling. To make
> progress on the diag318 thing, can we error on startup now and simply
> remove that check when when have implemented a larger sccb? If we would
> now do all kinds of "change the max number games" would be harder to "fix".

So, the idea right now is:

- fail to start if you try to specify a diag318 device and more than
  240 cpus (do we need a knob to turn off the device?)
- in the future, support more than one SCLP response page

I'm getting a bit lost in the discussion; but the above sounds
reasonable to me.

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

On 14.05.19 10:37, Christian Borntraeger wrote:
> 
> 
> On 14.05.19 09:28, David Hildenbrand wrote:
> But that can be tested using the runability information if I am not wrong.

 You mean the cpu level information, right?
>>
>> Yes, query-cpu-definition includes for each model runability information
>> via "unavailable-features" (valid under the started QEMU machine).
>>

>
>> and others that we have today.
>>
>> So yes, I think this would be acceptable.  
>
> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
> production either way. But you never know.

 I think that using that many cpus is a more uncommon setup, but I still
 think that having to wait for actual failure
>>>
>>> That can happen all the time today. You can easily say z14 in the xml when 
>>> on a zEC12. Only at startup you get the error. The question is really:
>>
>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>> will work. Actually, even "-smp 248" will no longer work on affected
>> machines.
>>
>> That is why wonder if it is better to disable the feature and print a
>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>> possible in the current environment (huge pages).
>>
>> "Diag318 will not be enabled because it is not compatible with more than
>> 240 CPUs".
>>
>> However, I still think that implementing support for more than one SCLP
>> response page is the best solution. Guests will need adaptions for > 240
>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>
>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>> just works from the QEMU perspective.
>>
>> Is implementing this realistic?
> 
> Yes it is but it will take time. I will try to get this rolling. To make
> progress on the diag318 thing, can we error on startup now and simply
> remove that check when when have implemented a larger sccb? If we would
> now do all kinds of "change the max number games" would be harder to "fix".


Another idea for temporary handling: Simply only indicate 240 CPUs to
the guest if the response does not fit into a page. Once we have that
SCLP thingy, this will be fixed. Guest migration back and forth should
work, as the VCPUs are fully functional (and initially always stopped),
the guest will simply not be able to detect them via SCLP when booting
up, and therefore not use them.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [Bug 1826393] Re: QEMU 3.1.0 stuck waiting for 800ms (5 times slower) in pre-bios phase

2019-05-14 Thread Gerd Hoffmann

On Tue, May 14, 2019 at 10:04:14AM +0200, Stefano Garzarella wrote:
> On Mon, May 06, 2019 at 05:40:05PM -, Waldemar Kozaczuk wrote:
> > The last bios indeed helped. It knows runs under 200ms.
> > 
> > Do you anticipate doing minor release of 3.1.0 with updated bios to address
> > this issue? Or users are expected to upgrade to QEMU 4.0.0?
> 
> CCing Gerd
> 
> I'm not sure we will release SeaBIOS 1.12.1 with a minor release of QEMU
> 3.1.0, so upgrading to QEMU 4.0 should be the way to address this issue.

I think with the 4.0 release 3.1 is EOL and there will be no more 3.1.x
stable releases ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger




On 14.05.19 10:49, Cornelia Huck wrote:
> On Tue, 14 May 2019 10:37:32 +0200
> Christian Borntraeger  wrote:
> 
>> On 14.05.19 09:28, David Hildenbrand wrote:
>> But that can be tested using the runability information if I am not 
>> wrong.  
>
> You mean the cpu level information, right?  
>>>
>>> Yes, query-cpu-definition includes for each model runability information
>>> via "unavailable-features" (valid under the started QEMU machine).
>>>   
>  
>>  
>>> and others that we have today.
>>>
>>> So yes, I think this would be acceptable.
>>
>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>> production either way. But you never know.  
>
> I think that using that many cpus is a more uncommon setup, but I still
> think that having to wait for actual failure  

 That can happen all the time today. You can easily say z14 in the xml when 
 on a zEC12. Only at startup you get the error. The question is really:  
>>>
>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>>> will work. Actually, even "-smp 248" will no longer work on affected
>>> machines.
>>>
>>> That is why wonder if it is better to disable the feature and print a
>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>>> possible in the current environment (huge pages).
>>>
>>> "Diag318 will not be enabled because it is not compatible with more than
>>> 240 CPUs".
>>>
>>> However, I still think that implementing support for more than one SCLP
>>> response page is the best solution. Guests will need adaptions for > 240
>>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>>
>>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>>> just works from the QEMU perspective.
>>>
>>> Is implementing this realistic?  
>>
>> Yes it is but it will take time. I will try to get this rolling. To make
>> progress on the diag318 thing, can we error on startup now and simply
>> remove that check when when have implemented a larger sccb? If we would
>> now do all kinds of "change the max number games" would be harder to "fix".
> 
> So, the idea right now is:
> 
> - fail to start if you try to specify a diag318 device and more than
>   240 cpus (do we need a knob to turn off the device?)

The know will be the cpu-model e.g. -cpu z14,diag318=off or something like that
> - in the future, support more than one SCLP response page
> 
> I'm getting a bit lost in the discussion; but the above sounds
> reasonable to me.
>

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger




On 14.05.19 10:50, David Hildenbrand wrote:
> On 14.05.19 10:37, Christian Borntraeger wrote:
>>
>>
>> On 14.05.19 09:28, David Hildenbrand wrote:
>> But that can be tested using the runability information if I am not 
>> wrong.
>
> You mean the cpu level information, right?
>>>
>>> Yes, query-cpu-definition includes for each model runability information
>>> via "unavailable-features" (valid under the started QEMU machine).
>>>
>
>>
>>> and others that we have today.
>>>
>>> So yes, I think this would be acceptable.  
>>
>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>> production either way. But you never know.
>
> I think that using that many cpus is a more uncommon setup, but I still
> think that having to wait for actual failure

 That can happen all the time today. You can easily say z14 in the xml when 
 on a zEC12. Only at startup you get the error. The question is really:
>>>
>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>>> will work. Actually, even "-smp 248" will no longer work on affected
>>> machines.
>>>
>>> That is why wonder if it is better to disable the feature and print a
>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>>> possible in the current environment (huge pages).
>>>
>>> "Diag318 will not be enabled because it is not compatible with more than
>>> 240 CPUs".
>>>
>>> However, I still think that implementing support for more than one SCLP
>>> response page is the best solution. Guests will need adaptions for > 240
>>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>>
>>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>>> just works from the QEMU perspective.
>>>
>>> Is implementing this realistic?
>>
>> Yes it is but it will take time. I will try to get this rolling. To make
>> progress on the diag318 thing, can we error on startup now and simply
>> remove that check when when have implemented a larger sccb? If we would
>> now do all kinds of "change the max number games" would be harder to "fix".
> 
> 
> Another idea for temporary handling: Simply only indicate 240 CPUs to
> the guest if the response does not fit into a page. Once we have that
> SCLP thingy, this will be fixed. Guest migration back and forth should
> work, as the VCPUs are fully functional (and initially always stopped),
> the guest will simply not be able to detect them via SCLP when booting
> up, and therefore not use them.

Yes, that looks like a good temporary solution. In fact if the guest relies
on simply probing it could even make use of the additional CPUs. Its just
the sclp response that is limited to 240 (or make it 247?)

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

On 14.05.19 10:49, Cornelia Huck wrote:
> On Tue, 14 May 2019 10:37:32 +0200
> Christian Borntraeger  wrote:
> 
>> On 14.05.19 09:28, David Hildenbrand wrote:
>> But that can be tested using the runability information if I am not 
>> wrong.  
>
> You mean the cpu level information, right?  
>>>
>>> Yes, query-cpu-definition includes for each model runability information
>>> via "unavailable-features" (valid under the started QEMU machine).
>>>   
>  
>>  
>>> and others that we have today.
>>>
>>> So yes, I think this would be acceptable.
>>
>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>> production either way. But you never know.  
>
> I think that using that many cpus is a more uncommon setup, but I still
> think that having to wait for actual failure  

 That can happen all the time today. You can easily say z14 in the xml when 
 on a zEC12. Only at startup you get the error. The question is really:  
>>>
>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>>> will work. Actually, even "-smp 248" will no longer work on affected
>>> machines.
>>>
>>> That is why wonder if it is better to disable the feature and print a
>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>>> possible in the current environment (huge pages).
>>>
>>> "Diag318 will not be enabled because it is not compatible with more than
>>> 240 CPUs".
>>>
>>> However, I still think that implementing support for more than one SCLP
>>> response page is the best solution. Guests will need adaptions for > 240
>>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>>
>>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>>> just works from the QEMU perspective.
>>>
>>> Is implementing this realistic?  
>>
>> Yes it is but it will take time. I will try to get this rolling. To make
>> progress on the diag318 thing, can we error on startup now and simply
>> remove that check when when have implemented a larger sccb? If we would
>> now do all kinds of "change the max number games" would be harder to "fix".
> 
> So, the idea right now is:
> 
> - fail to start if you try to specify a diag318 device and more than
>   240 cpus (do we need a knob to turn off the device?)
> - in the future, support more than one SCLP response page
> 
> I'm getting a bit lost in the discussion; but the above sounds
> reasonable to me.
> 

We can

1. Fail to start with #cpus > 240 when diag318=on
2. Remove the error once we support more than one SCLP response page

Or

1. Allow to start with #cpus > 240 when diag318=on, but indicate only
   240 CPUs via SCLP
2. Print a warning
3. Remove the restriction and the warning once we support more than one
   SCLP response page

While I prefer the second approach (similar to defining zPCI devices
without zpci=on), I could also live with the first approach.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Cornelia Huck

On Tue, 14 May 2019 10:56:43 +0200
Christian Borntraeger  wrote:

> On 14.05.19 10:50, David Hildenbrand wrote:

> > Another idea for temporary handling: Simply only indicate 240 CPUs to
> > the guest if the response does not fit into a page. Once we have that
> > SCLP thingy, this will be fixed. Guest migration back and forth should
> > work, as the VCPUs are fully functional (and initially always stopped),
> > the guest will simply not be able to detect them via SCLP when booting
> > up, and therefore not use them.  
> 
> Yes, that looks like a good temporary solution. In fact if the guest relies
> on simply probing it could even make use of the additional CPUs. Its just
> the sclp response that is limited to 240 (or make it 247?)

Where did the 240 come from - extra spare room? If so, 247 would
probably be all right?

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

On 14.05.19 10:56, Christian Borntraeger wrote:
> 
> 
> On 14.05.19 10:50, David Hildenbrand wrote:
>> On 14.05.19 10:37, Christian Borntraeger wrote:
>>>
>>>
>>> On 14.05.19 09:28, David Hildenbrand wrote:
>>> But that can be tested using the runability information if I am not 
>>> wrong.
>>
>> You mean the cpu level information, right?

 Yes, query-cpu-definition includes for each model runability information
 via "unavailable-features" (valid under the started QEMU machine).

>>
>>>
 and others that we have today.

 So yes, I think this would be acceptable.  
>>>
>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>> production either way. But you never know.
>>
>> I think that using that many cpus is a more uncommon setup, but I still
>> think that having to wait for actual failure
>
> That can happen all the time today. You can easily say z14 in the xml 
> when 
> on a zEC12. Only at startup you get the error. The question is really:

 "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
 will work. Actually, even "-smp 248" will no longer work on affected
 machines.

 That is why wonder if it is better to disable the feature and print a
 warning. Similar to CMMA, where want want to tolerate when CMMA is not
 possible in the current environment (huge pages).

 "Diag318 will not be enabled because it is not compatible with more than
 240 CPUs".

 However, I still think that implementing support for more than one SCLP
 response page is the best solution. Guests will need adaptions for > 240
 CPUs with Diag318, but who cares? Existing setups will continue to work.

 Implementing that SCLP thingy will avoid any warnings and any errors. It
 just works from the QEMU perspective.

 Is implementing this realistic?
>>>
>>> Yes it is but it will take time. I will try to get this rolling. To make
>>> progress on the diag318 thing, can we error on startup now and simply
>>> remove that check when when have implemented a larger sccb? If we would
>>> now do all kinds of "change the max number games" would be harder to "fix".
>>
>>
>> Another idea for temporary handling: Simply only indicate 240 CPUs to
>> the guest if the response does not fit into a page. Once we have that
>> SCLP thingy, this will be fixed. Guest migration back and forth should
>> work, as the VCPUs are fully functional (and initially always stopped),
>> the guest will simply not be able to detect them via SCLP when booting
>> up, and therefore not use them.
> 
> Yes, that looks like a good temporary solution. In fact if the guest relies
> on simply probing it could even make use of the additional CPUs. Its just
> the sclp response that is limited to 240 (or make it 247?)

I think the limiting factor was more than a single CPU, but I don't
recall. We can do the math again and come up with the right number.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] How do we do user input bitmap properties?

2019-05-14 Thread Andrew Jones

On Tue, May 14, 2019 at 06:54:03AM +0200, Markus Armbruster wrote:
> Andrew Jones  writes:
> 
> > On Thu, Apr 18, 2019 at 07:48:09PM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé  writes:
> >> 
> >> > On Thu, Apr 18, 2019 at 11:28:41AM +0200, Andrew Jones wrote:
> >> >> Hi all,
> >> >> 
> >> >> First some background:
> >> >> 
> >> >> For the userspace side of AArch64 guest SVE support we need to
> >> >> expose KVM's allowed vector lengths bitmap to the user and allow
> >> >> the user to choose a subset of that bitmap. Since bitmaps are a
> >> >> bit awkward to work with then we'll likely want to expose it as
> >> >> an array of vector lengths instead. Also, assuming we want to
> >> >> expose the lengths as number-of-quadwords (quadword == 128 bits
> >> >> for AArch64 and vector lengths must be multiples of quadwords)
> >> >> rather than number-of-bits, then an example array (which will
> >> >> always be a sequence) might be
> >> >> 
> >> >>  [ 8, 16, 32 ]
> >> >> 
> >> >> The user may choose a subsequence, but only through truncation,
> >> >> i.e. [ 8, 32 ] is not valid, but [ 8, 16 ] is.
> >> >> 
> >> >> Furthermore, different hosts may support different sequences
> >> >> which have the same maximum. For example, if the above sequence
> >> >> is for Host_A, then Host_B could be
> >> >> 
> >> >>  [ 8, 16, 24, 32 ]
> >> >> 
> >> >> The host must support all lengths in the sequence, which means
> >> >> that while Host_A supports 32, since it doesn't support 24 and
> >> >> we can only truncate sequences, we must use either [ 8 ] or
> >> >> [ 8, 16 ] for a compatible sequence if we intend to migrate
> >> >> between the hosts.
> >> >> 
> >> >> Now to the $SUBJECT question:
> >> >> 
> >> >> My feeling is that we should require the sequence to be
> >> >> provided on the command line as a cpu property. Something
> >> >> like
> >> >> 
> >> >>   -cpu host,sve-vl-list=8:16
> >> >> 
> >> >> (I chose ':' for the delimiter because ',' can't work, but
> >> >> if there's a better choice, then that's fine by me.)
> >> >> 
> >> >> Afaict a property list like this will require a new parser,
> >> 
> >> We had 20+ of those when I last counted.  Among the more annoying
> >> reasons CLI QAPIfication is hard[1].
> >> 
> >> >> which feels a bit funny since it seems we should already
> >> >> have support for this type of thing somewhere in QEMU. So,
> >> >> the question is: do we? I see we have array properties, but
> >> >> I don't believe that works with the command line. Should we
> >> >> only use QMP for this? We already want some QMP in order to
> >> >> query the supported vector lengths. Maybe we should use QMP
> >> >> to set the selection too? But then what about command line
> >> >> support for developers? And if the property is on the command
> >> >> line then we don't have to add it to the migration stream.
> >> >
> >> > You should be able to use arrays from the CLI with QemuOpts by repeating
> >> > the same option name many times, though I can't say it is a very
> >> > nice approach if you have many values to list as it gets very repetative.
> >> 
> >> Yes, this is one of the ways the current CLI does lists.  It's also one
> >> of the more annoying reasons CLI QAPIfication is hard[2].
> >> 
> >> QemuOpts let the last param=value win the stupidest way that could
> >> possibly work (I respect that): add to the front of the list, search it
> >> front to back.
> >> 
> >> Then somebody discovered that if you search the list manually, you can
> >> see them all, and abuse that to get a list-valued param.  I'm sure that
> >> felt clever at the time.
> >> 
> >> Another way to do lists the funky list feature of string input and opts
> >> visitor.  Yet another annoying reason CLI QAPIfication is hard[3].
> >> 
> >> We use the opts visitor's list feature for -numa node,cpus=...  Hmm,
> >> looks like we even combine it with the "multiple param=value build up a
> >> list" technique: -smp node,cpus=0-1,cpus=4-5 denotes [0,1,4,5].
> >> 
> >> > That's the curse of not having a good CLI syntax for non-scalar data in
> >> > QemuOpts & why Markus believes we should switch to JSON for the CLI too
> >> >
> >> >  -cpu host,sve-vl=8,sve-vl=16
> >> 
> >> We actually have CLI syntax for non-scalar data: dotted keys.  Dotted
> >> keys are syntactic sugar for JSON.  It looks friendlier than JSON for
> >> simple cases, then gets uglier as things get more complex, and then it
> >> falls apart: it can't quite express all of JSON.
> >> 
> >> Example: sve-vl.0=8,sve-vl.1=16
> >> gets desugared into {"sve": [8, 16]}
> >> if the QAPI schema has 'sve': ['int'].
> >> 
> >> The comment at the beginning of util/keyval.c explains it in more
> >> detail.
> >> 
> >> It powers -blockdev and -display.  Both options accept either JSON or
> >> dotted keys.  If the option argument starts with '{', it's JSON.
> >> Management applications should stick to JSON.
> >> 
> >> 
> >> [1] Towards a more expressive and introspectable QEMU command line
> >>

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger




On 14.05.19 11:00, David Hildenbrand wrote:
> On 14.05.19 10:56, Christian Borntraeger wrote:
>>
>>
>> On 14.05.19 10:50, David Hildenbrand wrote:
>>> On 14.05.19 10:37, Christian Borntraeger wrote:


 On 14.05.19 09:28, David Hildenbrand wrote:
 But that can be tested using the runability information if I am not 
 wrong.
>>>
>>> You mean the cpu level information, right?
>
> Yes, query-cpu-definition includes for each model runability information
> via "unavailable-features" (valid under the started QEMU machine).
>
>>>

> and others that we have today.
>
> So yes, I think this would be acceptable.  

 I guess it is acceptable yes. I doubt anybody uses that many CPUs in
 production either way. But you never know.
>>>
>>> I think that using that many cpus is a more uncommon setup, but I still
>>> think that having to wait for actual failure
>>
>> That can happen all the time today. You can easily say z14 in the xml 
>> when 
>> on a zEC12. Only at startup you get the error. The question is really:
>
> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
> will work. Actually, even "-smp 248" will no longer work on affected
> machines.
>
> That is why wonder if it is better to disable the feature and print a
> warning. Similar to CMMA, where want want to tolerate when CMMA is not
> possible in the current environment (huge pages).
>
> "Diag318 will not be enabled because it is not compatible with more than
> 240 CPUs".
>
> However, I still think that implementing support for more than one SCLP
> response page is the best solution. Guests will need adaptions for > 240
> CPUs with Diag318, but who cares? Existing setups will continue to work.
>
> Implementing that SCLP thingy will avoid any warnings and any errors. It
> just works from the QEMU perspective.
>
> Is implementing this realistic?

 Yes it is but it will take time. I will try to get this rolling. To make
 progress on the diag318 thing, can we error on startup now and simply
 remove that check when when have implemented a larger sccb? If we would
 now do all kinds of "change the max number games" would be harder to "fix".
>>>
>>>
>>> Another idea for temporary handling: Simply only indicate 240 CPUs to
>>> the guest if the response does not fit into a page. Once we have that
>>> SCLP thingy, this will be fixed. Guest migration back and forth should
>>> work, as the VCPUs are fully functional (and initially always stopped),
>>> the guest will simply not be able to detect them via SCLP when booting
>>> up, and therefore not use them.
>>
>> Yes, that looks like a good temporary solution. In fact if the guest relies
>> on simply probing it could even make use of the additional CPUs. Its just
>> the sclp response that is limited to 240 (or make it 247?)
> 
> I think the limiting factor was more than a single CPU, but I don't
> recall. We can do the math again and come up with the right number.

I think We need 8 byte per CPU. With byte 134 we should still be ok with
247. Collin can do the math in the patch description.

Re: [Qemu-devel] [PATCH] cadence_gem: Don't define GEM_INT_Q1_MASK twice

2019-05-14 Thread Philippe Mathieu-Daudé

On 5/13/19 9:43 PM, Jonathan Behrens wrote:
> Signed-off-by: Jonathan Behrens 
> ---
>  hw/net/cadence_gem.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
> index 7f63411430..37cb8a4e5c 100644
> --- a/hw/net/cadence_gem.c
> +++ b/hw/net/cadence_gem.c
> @@ -146,7 +146,6 @@
>  #define GEM_DESCONF7  (0x0298/4)
>  
>  #define GEM_INT_Q1_STATUS   (0x0400 / 4)
> -#define GEM_INT_Q1_MASK (0x0640 / 4)
>  
>  #define GEM_TRANSMIT_Q1_PTR (0x0440 / 4)
>  #define GEM_TRANSMIT_Q7_PTR (GEM_TRANSMIT_Q1_PTR + 6)
> 

Reviewed-by: Philippe Mathieu-Daudé

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger

On 14.05.19 10:59, David Hildenbrand wrote:
> On 14.05.19 10:49, Cornelia Huck wrote:
>> On Tue, 14 May 2019 10:37:32 +0200
>> Christian Borntraeger  wrote:
>>
>>> On 14.05.19 09:28, David Hildenbrand wrote:
>>> But that can be tested using the runability information if I am not 
>>> wrong.  
>>
>> You mean the cpu level information, right?  

 Yes, query-cpu-definition includes for each model runability information
 via "unavailable-features" (valid under the started QEMU machine).

>>  
>>>  
 and others that we have today.

 So yes, I think this would be acceptable.
>>>
>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>> production either way. But you never know.  
>>
>> I think that using that many cpus is a more uncommon setup, but I still
>> think that having to wait for actual failure  
>
> That can happen all the time today. You can easily say z14 in the xml 
> when 
> on a zEC12. Only at startup you get the error. The question is really:  

 "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
 will work. Actually, even "-smp 248" will no longer work on affected
 machines.

 That is why wonder if it is better to disable the feature and print a
 warning. Similar to CMMA, where want want to tolerate when CMMA is not
 possible in the current environment (huge pages).

 "Diag318 will not be enabled because it is not compatible with more than
 240 CPUs".

 However, I still think that implementing support for more than one SCLP
 response page is the best solution. Guests will need adaptions for > 240
 CPUs with Diag318, but who cares? Existing setups will continue to work.

 Implementing that SCLP thingy will avoid any warnings and any errors. It
 just works from the QEMU perspective.

 Is implementing this realistic?  
>>>
>>> Yes it is but it will take time. I will try to get this rolling. To make
>>> progress on the diag318 thing, can we error on startup now and simply
>>> remove that check when when have implemented a larger sccb? If we would
>>> now do all kinds of "change the max number games" would be harder to "fix".
>>
>> So, the idea right now is:
>>
>> - fail to start if you try to specify a diag318 device and more than
>>   240 cpus (do we need a knob to turn off the device?)
>> - in the future, support more than one SCLP response page
>>
>> I'm getting a bit lost in the discussion; but the above sounds
>> reasonable to me.
>>
> 
> We can
> 
> 1. Fail to start with #cpus > 240 when diag318=on
> 2. Remove the error once we support more than one SCLP response page
> 
> Or
> 
> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
>240 CPUs via SCLP
> 2. Print a warning
> 3. Remove the restriction and the warning once we support more than one
>SCLP response page
> 
> While I prefer the second approach (similar to defining zPCI devices
> without zpci=on), I could also live with the first approach.

Lets just continue with your other suggestion to simply limit the sclp 
response and do not do any failure or machine change. That  seems like
the easiest solution.

Re: [Qemu-devel] [PATCH 00/13] target/arm/kvm: enable SVE in guests

2019-05-14 Thread Peter Maydell

On Mon, 13 May 2019 at 19:46, Richard Henderson
 wrote:
>
> On 5/12/19 1:36 AM, Andrew Jones wrote:
> >CPU type | accel | sve-max-vq | sve-vls-map
> >---
> >  1) max | tcg   |  $MAX_VQ   |  $VLS_MAP
> >  2) max | kvm   |  $MAX_VQ   |  $VLS_MAP
> >  3)host | kvm   |  N/A   |  $VLS_MAP
>
> This doesn't seem right.  Why is -cpu host not whatever the host supports?  It
> certainly has been so far.  I really don't see how -cpu max makes any sense 
> for
> kvm.

The point of '-cpu max' is that it works and gives you the
best thing QEMU can support regardless of what accelerator
is in use. This means that you don't need to do tedious
workarounds like "if KVM then -cpu host else -cpu somethingelse".

thanks
-- PMM

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:00, Cornelia Huck wrote:
> On Tue, 14 May 2019 10:56:43 +0200
> Christian Borntraeger  wrote:
> 
>> On 14.05.19 10:50, David Hildenbrand wrote:
> 
>>> Another idea for temporary handling: Simply only indicate 240 CPUs to
>>> the guest if the response does not fit into a page. Once we have that
>>> SCLP thingy, this will be fixed. Guest migration back and forth should
>>> work, as the VCPUs are fully functional (and initially always stopped),
>>> the guest will simply not be able to detect them via SCLP when booting
>>> up, and therefore not use them.  
>>
>> Yes, that looks like a good temporary solution. In fact if the guest relies
>> on simply probing it could even make use of the additional CPUs. Its just
>> the sclp response that is limited to 240 (or make it 247?)
> 
> Where did the 240 come from - extra spare room? If so, 247 would
> probably be all right?
> 

+++ b/include/hw/s390x/sclp.h
@@ -133,6 +133,8 @@ typedef struct ReadInfo {
 uint16_t highest_cpu;
 uint8_t  _reserved5[124 - 122]; /* 122-123 */
 uint32_t hmfai;
+uint8_t  _reserved7[134 - 128]; /* 128-133 */
+uint8_t  fac134;
 struct CPUEntry entries[0];
 } QEMU_PACKED ReadInfo;


So we have "4096 - 135 + 1" memory. Each element is 16 bytes wide.
-> 246 CPUs fit.


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:03, David Hildenbrand wrote:
> On 14.05.19 11:00, Cornelia Huck wrote:
>> On Tue, 14 May 2019 10:56:43 +0200
>> Christian Borntraeger  wrote:
>>
>>> On 14.05.19 10:50, David Hildenbrand wrote:
>>
 Another idea for temporary handling: Simply only indicate 240 CPUs to
 the guest if the response does not fit into a page. Once we have that
 SCLP thingy, this will be fixed. Guest migration back and forth should
 work, as the VCPUs are fully functional (and initially always stopped),
 the guest will simply not be able to detect them via SCLP when booting
 up, and therefore not use them.  
>>>
>>> Yes, that looks like a good temporary solution. In fact if the guest relies
>>> on simply probing it could even make use of the additional CPUs. Its just
>>> the sclp response that is limited to 240 (or make it 247?)
>>
>> Where did the 240 come from - extra spare room? If so, 247 would
>> probably be all right?
>>
> 
> +++ b/include/hw/s390x/sclp.h
> @@ -133,6 +133,8 @@ typedef struct ReadInfo {
>  uint16_t highest_cpu;
>  uint8_t  _reserved5[124 - 122]; /* 122-123 */
>  uint32_t hmfai;
> +uint8_t  _reserved7[134 - 128]; /* 128-133 */
> +uint8_t  fac134;
>  struct CPUEntry entries[0];
>  } QEMU_PACKED ReadInfo;
> 
> 
> So we have "4096 - 135 + 1" memory. Each element is 16 bytes wide.
> -> 246 CPUs fit.

(I meant 247 :( )


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger

On 14.05.19 10:59, David Hildenbrand wrote:
> On 14.05.19 10:49, Cornelia Huck wrote:
>> On Tue, 14 May 2019 10:37:32 +0200
>> Christian Borntraeger  wrote:
>>
>>> On 14.05.19 09:28, David Hildenbrand wrote:
>>> But that can be tested using the runability information if I am not 
>>> wrong.  
>>
>> You mean the cpu level information, right?  

 Yes, query-cpu-definition includes for each model runability information
 via "unavailable-features" (valid under the started QEMU machine).

>>  
>>>  
 and others that we have today.

 So yes, I think this would be acceptable.
>>>
>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>> production either way. But you never know.  
>>
>> I think that using that many cpus is a more uncommon setup, but I still
>> think that having to wait for actual failure  
>
> That can happen all the time today. You can easily say z14 in the xml 
> when 
> on a zEC12. Only at startup you get the error. The question is really:  

 "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
 will work. Actually, even "-smp 248" will no longer work on affected
 machines.

 That is why wonder if it is better to disable the feature and print a
 warning. Similar to CMMA, where want want to tolerate when CMMA is not
 possible in the current environment (huge pages).

 "Diag318 will not be enabled because it is not compatible with more than
 240 CPUs".

 However, I still think that implementing support for more than one SCLP
 response page is the best solution. Guests will need adaptions for > 240
 CPUs with Diag318, but who cares? Existing setups will continue to work.

 Implementing that SCLP thingy will avoid any warnings and any errors. It
 just works from the QEMU perspective.

 Is implementing this realistic?  
>>>
>>> Yes it is but it will take time. I will try to get this rolling. To make
>>> progress on the diag318 thing, can we error on startup now and simply
>>> remove that check when when have implemented a larger sccb? If we would
>>> now do all kinds of "change the max number games" would be harder to "fix".
>>
>> So, the idea right now is:
>>
>> - fail to start if you try to specify a diag318 device and more than
>>   240 cpus (do we need a knob to turn off the device?)
>> - in the future, support more than one SCLP response page
>>
>> I'm getting a bit lost in the discussion; but the above sounds
>> reasonable to me.
>>
> 
> We can
> 
> 1. Fail to start with #cpus > 240 when diag318=on
> 2. Remove the error once we support more than one SCLP response page
> 
> Or
> 
> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
>240 CPUs via SCLP
> 2. Print a warning
> 3. Remove the restriction and the warning once we support more than one
>SCLP response page
> 
> While I prefer the second approach (similar to defining zPCI devices
> without zpci=on), I could also live with the first approach.

I prefer approach 1.

Re: [Qemu-devel] [PATCH 05/13] target/arm/kvm: Add kvm_arch_get/put_sve

2019-05-14 Thread Dave Martin

On Mon, May 13, 2019 at 05:58:59PM +0100, Richard Henderson wrote:
> On 5/13/19 7:39 AM, Dave Martin wrote:
> > On that point, could TCG easily be made to expose a larger vector length
> > to the kernel?  I'd be interested to see what happened.
> 
> It would be easy enough to extend the maximum vector length within TCG.
> 
> Increase ARM_MAX_VQ.  Alter the couple of places where we manipulate ZCR.LEN 
> to
> extend the current 4-bit mask.
> 
> How large do you need the max to be, for testing?

Anything upwards of 256 bytes is interesting.

The architecture reserves space for it to grow up to 9 bits, though it's
unlikely it would ever get that large in reality.

So if you wanted to go crazy, you might be able to go up to 8192 bytes.

This is just for fun, since it goes outside the architecture and Linux
officially doesn't support it today in any case.  So definitely not a
priority!

Cheers
---Dave

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Cornelia Huck

On Tue, 14 May 2019 11:07:41 +0200
Christian Borntraeger  wrote:

> On 14.05.19 10:59, David Hildenbrand wrote:

> > We can
> > 
> > 1. Fail to start with #cpus > 240 when diag318=on
> > 2. Remove the error once we support more than one SCLP response page
> > 
> > Or
> > 
> > 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
> >240 CPUs via SCLP
> > 2. Print a warning
> > 3. Remove the restriction and the warning once we support more than one
> >SCLP response page

We'd need compat handling for step 3., then?

> > 
> > While I prefer the second approach (similar to defining zPCI devices
> > without zpci=on), I could also live with the first approach.  
> 
> Lets just continue with your other suggestion to simply limit the sclp 
> response and do not do any failure or machine change. That  seems like
> the easiest solution.

That's the second option, right? Should be reasonable.

Re: [Qemu-devel] [PATCH] nvme: add Get/Set Feature Timestamp support

2019-05-14 Thread Philippe Mathieu-Daudé

Hi Kenneth,

On 4/5/19 11:41 PM, Kenneth Heitke wrote:
> Signed-off-by: Kenneth Heitke 
> ---
>  hw/block/nvme.c   | 120 +-
>  hw/block/nvme.h   |   3 ++
>  hw/block/trace-events |   2 +
>  include/block/nvme.h  |   2 +
>  4 files changed, 125 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 7caf92532a..e775e89299 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -219,6 +219,30 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, 
> QEMUIOVector *iov, uint64_t prp1,
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> +static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> +   uint64_t prp1, uint64_t prp2)
> +{
> +QEMUSGList qsg;
> +QEMUIOVector iov;
> +uint16_t status = NVME_SUCCESS;
> +
> +if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +if (qsg.nsg > 0) {
> +if (dma_buf_write(ptr, len, &qsg)) {
> +status = NVME_INVALID_FIELD | NVME_DNR;
> +}
> +qemu_sglist_destroy(&qsg);
> +} else {
> +if (qemu_iovec_from_buf(&iov, 0, ptr, len) != len) {
> +status = NVME_INVALID_FIELD | NVME_DNR;
> +}
> +qemu_iovec_destroy(&iov);
> +}
> +return status;
> +}
> +
>  static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>  uint64_t prp1, uint64_t prp2)
>  {
> @@ -678,7 +702,6 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
> NvmeIdentify *c)
>  return ret;
>  }
>  
> -
>  static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>  {
>  NvmeIdentify *c = (NvmeIdentify *)cmd;
> @@ -696,6 +719,63 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>  }
>  }
>  
> +static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
> +{
> +n->host_timestamp = ts;

Can we use keep the endianess switch local to this static inline function?

   trace_nvme_set_timestamp(ts);

   n->host_timestamp = le64_to_cpu(ts);

> +n->timestamp_set_qemu_clock_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +}
> +
> +static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
> +{
> +uint64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +uint64_t elapsed_time = current_time - n->timestamp_set_qemu_clock_ms;
> +
> +union nvme_timestamp {
> +struct {
> +uint64_t timestamp:48;
> +uint64_t sync:1;
> +uint64_t origin:3;
> +uint64_t rsvd1:12;
> +};
> +uint64_t all;
> +};
> +
> +union nvme_timestamp ts;
> +ts.all = 0;
> +
> +/*
> + * If the sum of the Timestamp value set by the host and the elapsed
> + * time exceeds 2^48, the value returned should be reduced modulo 2^48.
> + */
> +ts.timestamp = (n->host_timestamp + elapsed_time) & 0x;
> +
> +/* If the host timestamp is non-zero, set the timestamp origin */
> +ts.origin = n->host_timestamp ? 0x01 : 0x00;
> +
> +return ts.all;

Same here, can we return the timestamp in correct endianess directly?

   trace_nvme_get_timestamp(timestamp);

   return cpu_to_le64(ts.all);

> +}
> +
> +static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
> +{
> +uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +
> +uint64_t timestamp = nvme_get_timestamp(n);
> +
> +if (!(n->oncs & NVME_ONCS_TIMESTAMP)) {
> +trace_nvme_err_invalid_getfeat(dw10);
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +trace_nvme_getfeat_timestamp(timestamp);
> +
> +timestamp = cpu_to_le64(timestamp);

So you can drop the previous 2 lines, ...

> +
> +return nvme_dma_read_prp(n, (uint8_t *)×tamp,
> + sizeof(timestamp), prp1, prp2);
> +}
> +
>  static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
>  uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> @@ -710,6 +790,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd 
> *cmd, NvmeRequest *req)
>  result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 
> 16));
>  trace_nvme_getfeat_numq(result);
>  break;
> +case NVME_TIMESTAMP:
> +return nvme_get_feature_timestamp(n, cmd);
> +break;
>  default:
>  trace_nvme_err_invalid_getfeat(dw10);
>  return NVME_INVALID_FIELD | NVME_DNR;
> @@ -719,6 +802,31 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd 
> *cmd, NvmeRequest *req)
>  return NVME_SUCCESS;
>  }
>  
> +static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
> +{
> +uint16_t ret;
> +uint64_t timestamp;
> +uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +

Re: [Qemu-devel] [PATCH] docs: provide documentation on the POWER9 XIVE interrupt controller

2019-05-14 Thread Peter Maydell

On Tue, 14 May 2019 at 07:46, Cédric Le Goater  wrote:
>
> This documents the overall XIVE architecture and gives an overview of
> the QEMU models. It also provides documentation on the 'info pic'
> command.
>
> Signed-off-by: Cédric Le Goater 
> ---
>  docs/index.rst |   1 +
>  docs/ppc/index.rst |  13 ++
>  docs/ppc/xive.rst  | 344 +
>  MAINTAINERS|   1 +
>  4 files changed, 359 insertions(+)
>  create mode 100644 docs/ppc/index.rst
>  create mode 100644 docs/ppc/xive.rst

Hi -- it's great to see this documentation. Unfortunately,
where you've put it doesn't match our intended layout for docs.

Each subdirectory of docs/ becomes its own manual, and
the intention is to eventually have five manuals
(as sketched out in https://wiki.qemu.org/Features/Documentation):
 * QEMU user mode emulation -- docs/user
 * QEMU full-system emulation user's guide -- docs/system
 * QEMU full-system emulation management and interoperability guide --
docs/interop
 * QEMU full-system emulation guest hardware specifications  -- docs/specs
 * QEMU developer's guide -- docs/devel

We don't want to have a separate PPC-specific manual.

Currently we only have interop and devel. I have on
my todo list to try to sort out the others, including
figuring out how to transition from our current set
of texinfo-based manuals to this layout.

I'm not sure exactly where this document should live.
>From a quick scan it appears to be mixing together
information aimed at several different audiences --
the "Overview of the QEMU models for XIVE" part looks
like information about QEMU internals which belongs
in docs/devel, but some other parts seem to be user
facing information which should go in one of the
other manuals.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] tests/libqtest: Fix description of qtest_vinitf() and qtest_initf()

2019-05-14 Thread Philippe Mathieu-Daudé

On 5/13/19 5:47 PM, Thomas Huth wrote:
> These functions are convenience wrappers of qtest_init() and not of
> qtest_start().

Maybe "The qtest_vinitf() and qtest_initf() functions are convenience
wrappers of qtest_init() and not of qtest_start()." as it is easier to
read the commit description in some git review tools (gitk i.e.).

> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  tests/libqtest.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/libqtest.h b/tests/libqtest.h
> index 3f7675fcf0..a98ea15b7d 100644
> --- a/tests/libqtest.h
> +++ b/tests/libqtest.h
> @@ -29,7 +29,7 @@ extern QTestState *global_qtest;
>   * @fmt...: Format for creating other arguments to pass to QEMU, formatted
>   * like sprintf().
>   *
> - * Convenience wrapper around qtest_start().
> + * Convenience wrapper around qtest_init().
>   *
>   * Returns: #QTestState instance.
>   */
> @@ -41,7 +41,7 @@ QTestState *qtest_initf(const char *fmt, ...) 
> GCC_FMT_ATTR(1, 2);
>   * like vsprintf().
>   * @ap: Format arguments.
>   *
> - * Convenience wrapper around qtest_start().
> + * Convenience wrapper around qtest_init().
>   *
>   * Returns: #QTestState instance.
>   */
>

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:10, Christian Borntraeger wrote:
> 
> 
> On 14.05.19 10:59, David Hildenbrand wrote:
>> On 14.05.19 10:49, Cornelia Huck wrote:
>>> On Tue, 14 May 2019 10:37:32 +0200
>>> Christian Borntraeger  wrote:
>>>
 On 14.05.19 09:28, David Hildenbrand wrote:
 But that can be tested using the runability information if I am not 
 wrong.  
>>>
>>> You mean the cpu level information, right?  
>
> Yes, query-cpu-definition includes for each model runability information
> via "unavailable-features" (valid under the started QEMU machine).
>   
>>>  
  
> and others that we have today.
>
> So yes, I think this would be acceptable.

 I guess it is acceptable yes. I doubt anybody uses that many CPUs in
 production either way. But you never know.  
>>>
>>> I think that using that many cpus is a more uncommon setup, but I still
>>> think that having to wait for actual failure  
>>
>> That can happen all the time today. You can easily say z14 in the xml 
>> when 
>> on a zEC12. Only at startup you get the error. The question is really:  
>
> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
> will work. Actually, even "-smp 248" will no longer work on affected
> machines.
>
> That is why wonder if it is better to disable the feature and print a
> warning. Similar to CMMA, where want want to tolerate when CMMA is not
> possible in the current environment (huge pages).
>
> "Diag318 will not be enabled because it is not compatible with more than
> 240 CPUs".
>
> However, I still think that implementing support for more than one SCLP
> response page is the best solution. Guests will need adaptions for > 240
> CPUs with Diag318, but who cares? Existing setups will continue to work.
>
> Implementing that SCLP thingy will avoid any warnings and any errors. It
> just works from the QEMU perspective.
>
> Is implementing this realistic?  

 Yes it is but it will take time. I will try to get this rolling. To make
 progress on the diag318 thing, can we error on startup now and simply
 remove that check when when have implemented a larger sccb? If we would
 now do all kinds of "change the max number games" would be harder to "fix".
>>>
>>> So, the idea right now is:
>>>
>>> - fail to start if you try to specify a diag318 device and more than
>>>   240 cpus (do we need a knob to turn off the device?)
>>> - in the future, support more than one SCLP response page
>>>
>>> I'm getting a bit lost in the discussion; but the above sounds
>>> reasonable to me.
>>>
>>
>> We can
>>
>> 1. Fail to start with #cpus > 240 when diag318=on
>> 2. Remove the error once we support more than one SCLP response page
>>
>> Or
>>
>> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
>>240 CPUs via SCLP
>> 2. Print a warning
>> 3. Remove the restriction and the warning once we support more than one
>>SCLP response page
>>
>> While I prefer the second approach (similar to defining zPCI devices
>> without zpci=on), I could also live with the first approach.
> 
> I prefer approach 1.
> 

Isn't approach #2 what we discussed (limiting sclp, but of course to 247
CPUs), but with an additional warning? I'm confused.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger




On 14.05.19 11:20, David Hildenbrand wrote:
> On 14.05.19 11:10, Christian Borntraeger wrote:
>>
>>
>> On 14.05.19 10:59, David Hildenbrand wrote:
>>> On 14.05.19 10:49, Cornelia Huck wrote:
 On Tue, 14 May 2019 10:37:32 +0200
 Christian Borntraeger  wrote:

> On 14.05.19 09:28, David Hildenbrand wrote:
> But that can be tested using the runability information if I am not 
> wrong.  

 You mean the cpu level information, right?  
>>
>> Yes, query-cpu-definition includes for each model runability information
>> via "unavailable-features" (valid under the started QEMU machine).
>>   
  
>  
>> and others that we have today.
>>
>> So yes, I think this would be acceptable.
>
> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
> production either way. But you never know.  

 I think that using that many cpus is a more uncommon setup, but I still
 think that having to wait for actual failure  
>>>
>>> That can happen all the time today. You can easily say z14 in the xml 
>>> when 
>>> on a zEC12. Only at startup you get the error. The question is really:  
>>
>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>> will work. Actually, even "-smp 248" will no longer work on affected
>> machines.
>>
>> That is why wonder if it is better to disable the feature and print a
>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>> possible in the current environment (huge pages).
>>
>> "Diag318 will not be enabled because it is not compatible with more than
>> 240 CPUs".
>>
>> However, I still think that implementing support for more than one SCLP
>> response page is the best solution. Guests will need adaptions for > 240
>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>
>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>> just works from the QEMU perspective.
>>
>> Is implementing this realistic?  
>
> Yes it is but it will take time. I will try to get this rolling. To make
> progress on the diag318 thing, can we error on startup now and simply
> remove that check when when have implemented a larger sccb? If we would
> now do all kinds of "change the max number games" would be harder to 
> "fix".

 So, the idea right now is:

 - fail to start if you try to specify a diag318 device and more than
   240 cpus (do we need a knob to turn off the device?)
 - in the future, support more than one SCLP response page

 I'm getting a bit lost in the discussion; but the above sounds
 reasonable to me.

>>>
>>> We can
>>>
>>> 1. Fail to start with #cpus > 240 when diag318=on
>>> 2. Remove the error once we support more than one SCLP response page
>>>
>>> Or
>>>
>>> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
>>>240 CPUs via SCLP
>>> 2. Print a warning
>>> 3. Remove the restriction and the warning once we support more than one
>>>SCLP response page
>>>
>>> While I prefer the second approach (similar to defining zPCI devices
>>> without zpci=on), I could also live with the first approach.
>>
>> I prefer approach 1.
>>
> 
> Isn't approach #2 what we discussed (limiting sclp, but of course to 247
> CPUs), but with an additional warning? I'm confused.

Different numbering interpretion. I was talking about 1 = "Allow to start with 
#cpus > 240 when diag318=on, but indicate only
240 CPUs via SCLP"

Re: [Qemu-devel] [qemu-s390x] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Christian Borntraeger




On 14.05.19 11:23, Christian Borntraeger wrote:
> 
> 
> On 14.05.19 11:20, David Hildenbrand wrote:
>> On 14.05.19 11:10, Christian Borntraeger wrote:
>>>
>>>
>>> On 14.05.19 10:59, David Hildenbrand wrote:
 On 14.05.19 10:49, Cornelia Huck wrote:
> On Tue, 14 May 2019 10:37:32 +0200
> Christian Borntraeger  wrote:
>
>> On 14.05.19 09:28, David Hildenbrand wrote:
>> But that can be tested using the runability information if I am not 
>> wrong.  
>
> You mean the cpu level information, right?  
>>>
>>> Yes, query-cpu-definition includes for each model runability information
>>> via "unavailable-features" (valid under the started QEMU machine).
>>>   
>  
>>  
>>> and others that we have today.
>>>
>>> So yes, I think this would be acceptable.
>>
>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>> production either way. But you never know.  
>
> I think that using that many cpus is a more uncommon setup, but I 
> still
> think that having to wait for actual failure  

 That can happen all the time today. You can easily say z14 in the xml 
 when 
 on a zEC12. Only at startup you get the error. The question is really: 
  
>>>
>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>>> will work. Actually, even "-smp 248" will no longer work on affected
>>> machines.
>>>
>>> That is why wonder if it is better to disable the feature and print a
>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>>> possible in the current environment (huge pages).
>>>
>>> "Diag318 will not be enabled because it is not compatible with more than
>>> 240 CPUs".
>>>
>>> However, I still think that implementing support for more than one SCLP
>>> response page is the best solution. Guests will need adaptions for > 240
>>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>>
>>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>>> just works from the QEMU perspective.
>>>
>>> Is implementing this realistic?  
>>
>> Yes it is but it will take time. I will try to get this rolling. To make
>> progress on the diag318 thing, can we error on startup now and simply
>> remove that check when when have implemented a larger sccb? If we would
>> now do all kinds of "change the max number games" would be harder to 
>> "fix".
>
> So, the idea right now is:
>
> - fail to start if you try to specify a diag318 device and more than
>   240 cpus (do we need a knob to turn off the device?)
> - in the future, support more than one SCLP response page
>
> I'm getting a bit lost in the discussion; but the above sounds
> reasonable to me.
>

 We can

 1. Fail to start with #cpus > 240 when diag318=on
 2. Remove the error once we support more than one SCLP response page

 Or

 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
240 CPUs via SCLP
 2. Print a warning
 3. Remove the restriction and the warning once we support more than one
SCLP response page

 While I prefer the second approach (similar to defining zPCI devices
 without zpci=on), I could also live with the first approach.
>>>
>>> I prefer approach 1.
>>>
>>
>> Isn't approach #2 what we discussed (limiting sclp, but of course to 247
>> CPUs), but with an additional warning? I'm confused.
> 
> Different numbering interpretion. I was talking about 1 = "Allow to start 
> with #cpus > 240 when diag318=on, but indicate only
> 240 CPUs via SCLP"

So yes, variant 2 when I use your numbering. The only question is: do we need
a warning? It probably does not hurt.

Re: [Qemu-devel] [libvirt] QMP; unsigned 64-bit ints; JSON standards compliance

2019-05-14 Thread Daniel P . Berrangé

On Tue, May 14, 2019 at 08:02:49AM +0200, Markus Armbruster wrote:
> Eric Blake  writes:
> 
> > On 5/13/19 8:53 AM, Markus Armbruster wrote:
> >
> >>> We have a few options
> >>>
> >>>  1. Use string format for values > 2^53-1, int format below that
> >>>  2. Use string format for all fields which are 64-bit ints whether
> >>> signed or unsigned
> >>>  3. Use string format for all fields which are integers, even 32-bit
> >>> ones
> >>>
> >>> I would probably suggest option 2. It would make the QEMU impl quite
> >>> easy IIUC, we we'd just change the QAPI visitor's impl for the int64
> >>> and uint64 fields to use string format (when the right capability is
> >>> negotiated by QMP).
> >>>
> >>> I include 3 only for completeness - I don't think there's a hugely
> >>> compelling reason to mess with 32-bit ints.
> >> 
> >> Agree.
> >
> > Other than if we ever change the type of a QMP integer. Right now, if we
> > widen from 'int32' to 'int' (aka 'int64'), it is invisible to clients;
> > but once we start stringizing 64-bit numbers (at client request) but NOT
> > 32-bit numbers, then changing a type from 32 to 64 bits (or the
> > converse) becomes an API change to clients. Introspection will at least
> > let a client know which form to expect, but it does mean we have to be
> > more aware of typing issues going forward.
> 
> Thank you so much for helping my old synapses finally fire!  Option 2 is
> not what we thought it is.  Let me explain.
> 
> Introspection reports *all* QAPI integer types as "int".  This is
> deliberate.
> 
> So, when the client that negotiated the interoperability capability sees
> "int", it has to accept *both* integer encodings: JSON number and JSON
> string.
> 
> The difference between option 1 and option 2 for the client is that
> option 2 will use only one encoding.  But the client must not rely on
> that!  Another QEMU version may well use the other encoding (because we
> narrowed or widened the QAPI integer type in the QAPI schema).
> 
> Elsewhere in this thread, David pointed out that option 1 complicates
> testing QEMU: full coverage requires passing both a small number (to
> cover JSON number encoding) and a large number (to cover JSON string
> encoding), to which I replied that there are very few places to test.
> 
> Option 2 complicates testing clients: full coverage requires testing
> with both a version of QEMU (or a mock-up) that uses wide integers
> (encoded as JSON string) and narrow integers (encoded as JSON number).
> Impractical.
> 
> >>> Option 1 is the bare minimum needed to ensure precision, but to me
> >>> it feels a bit dirty to say a given field will have different encoding
> >>> depending on the value. If apps need to deal with string encoding, they
> >>> might as well just use it for all values in a given field.
> >> 
> >> I guess that depends on what this interoperability capability does for
> >> QMP *input*.
> >
> > "Be liberal in what you accept, strict in what you produce" - that
> > argues we should accept both forms on input (it's easy enough to ALWAYS
> > permit a string in place of an integer, and to take an in-range integer
> > even when we would in turn output it as a string).
> 
> With option 2, QEMU *has* to be liberal in what it accepts, because the
> client cannot deduce from introspection whether the integer is wide or
> narrow.
> 
> [...]
> 
> Daniel, you wrote you'd probably suggest option 2.  Would you like to
> reconsider?

Based on the above, let me try & summarize what we need behaviour to be:

  - Integer mode (current default):

   - QEMU & clients MUST format integer fields as numbers
 regardless of size

   - QEMU & clients MUST parse number format for any integer
 fields

  - String mode:

   - QEMU & clients MUST format integer fields as strings
 if their value can not fit in a 32-bit integer.

   - QEMU & clients MAY format integer fields as strings
 even if their value can fit in 32-bit integer

   - QEMU & client MUST parse both string and number format
 for any integer fields.

Unless I'm missing something, this should ensure we don't loose precision,
can always parse large numbers, and can internally change QEMU precision
from int8/16/32 upto full int64 without breaking clients.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [qemu-s390x] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:25, Christian Borntraeger wrote:
> 
> 
> On 14.05.19 11:23, Christian Borntraeger wrote:
>>
>>
>> On 14.05.19 11:20, David Hildenbrand wrote:
>>> On 14.05.19 11:10, Christian Borntraeger wrote:


 On 14.05.19 10:59, David Hildenbrand wrote:
> On 14.05.19 10:49, Cornelia Huck wrote:
>> On Tue, 14 May 2019 10:37:32 +0200
>> Christian Borntraeger  wrote:
>>
>>> On 14.05.19 09:28, David Hildenbrand wrote:
>>> But that can be tested using the runability information if I am not 
>>> wrong.  
>>
>> You mean the cpu level information, right?  

 Yes, query-cpu-definition includes for each model runability 
 information
 via "unavailable-features" (valid under the started QEMU machine).
   
>>  
>>>  
 and others that we have today.

 So yes, I think this would be acceptable.
>>>
>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>> production either way. But you never know.  
>>
>> I think that using that many cpus is a more uncommon setup, but I 
>> still
>> think that having to wait for actual failure  
>
> That can happen all the time today. You can easily say z14 in the xml 
> when 
> on a zEC12. Only at startup you get the error. The question is 
> really:  

 "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu 
 z12"
 will work. Actually, even "-smp 248" will no longer work on affected
 machines.

 That is why wonder if it is better to disable the feature and print a
 warning. Similar to CMMA, where want want to tolerate when CMMA is not
 possible in the current environment (huge pages).

 "Diag318 will not be enabled because it is not compatible with more 
 than
 240 CPUs".

 However, I still think that implementing support for more than one SCLP
 response page is the best solution. Guests will need adaptions for > 
 240
 CPUs with Diag318, but who cares? Existing setups will continue to 
 work.

 Implementing that SCLP thingy will avoid any warnings and any errors. 
 It
 just works from the QEMU perspective.

 Is implementing this realistic?  
>>>
>>> Yes it is but it will take time. I will try to get this rolling. To make
>>> progress on the diag318 thing, can we error on startup now and simply
>>> remove that check when when have implemented a larger sccb? If we would
>>> now do all kinds of "change the max number games" would be harder to 
>>> "fix".
>>
>> So, the idea right now is:
>>
>> - fail to start if you try to specify a diag318 device and more than
>>   240 cpus (do we need a knob to turn off the device?)
>> - in the future, support more than one SCLP response page
>>
>> I'm getting a bit lost in the discussion; but the above sounds
>> reasonable to me.
>>
>
> We can
>
> 1. Fail to start with #cpus > 240 when diag318=on
> 2. Remove the error once we support more than one SCLP response page
>
> Or
>
> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
>240 CPUs via SCLP
> 2. Print a warning
> 3. Remove the restriction and the warning once we support more than one
>SCLP response page
>
> While I prefer the second approach (similar to defining zPCI devices
> without zpci=on), I could also live with the first approach.

 I prefer approach 1.

>>>
>>> Isn't approach #2 what we discussed (limiting sclp, but of course to 247
>>> CPUs), but with an additional warning? I'm confused.
>>
>> Different numbering interpretion. I was talking about 1 = "Allow to start 
>> with #cpus > 240 when diag318=on, but indicate only
>> 240 CPUs via SCLP"
> 
> So yes, variant 2 when I use your numbering. The only question is: do we need
> a warning? It probably does not hurt. 

After all, we are talking about 1 VCPU that the guest can only use by
indirect probing ... I leave that up to Collin :)


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta



Hi David,

Thank you for the review.

> On 10.05.19 17:51, Pankaj Gupta wrote:
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > Reviewed-by: Yuval Shaia 
> > ---
> >  drivers/nvdimm/Makefile  |   1 +
> >  drivers/nvdimm/nd_virtio.c   | 129 +++
> >  drivers/nvdimm/virtio_pmem.c | 117 
> >  drivers/virtio/Kconfig   |  10 +++
> >  include/linux/virtio_pmem.h  |  60 ++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 328 insertions(+)
> >  create mode 100644 drivers/nvdimm/nd_virtio.c
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> > index 6f2a088afad6..cefe233e0b52 100644
> > --- a/drivers/nvdimm/Makefile
> > +++ b/drivers/nvdimm/Makefile
> > @@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
> >  obj-$(CONFIG_ND_BLK) += nd_blk.o
> >  obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
> >  obj-$(CONFIG_OF_PMEM) += of_pmem.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
> >  
> >  nd_pmem-y := pmem.o
> >  
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > new file mode 100644
> > index ..ed7ddcc5a62c
> > --- /dev/null
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -0,0 +1,129 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include 
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +   unsigned int len;
> > +   unsigned long flags;
> > +   struct virtio_pmem_request *req, *req_buf;
> > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> 
> Nit: use reverse Christmas tree layout :)

o.k

> 
> > +
> > +   spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +   while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +   req->done = true;
> > +   wake_up(&req->host_acked);
> > +
> > +   if (!list_empty(&vpmem->req_list)) {
> > +   req_buf = list_first_entry(&vpmem->req_list,
> > +   struct virtio_pmem_request, list);
> > +   req_buf->wq_buf_avail = true;
> > +   wake_up(&req_buf->wq_buf);
> > +   list_del(&req_buf->list);
> > +   }
> > +   }
> > +   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +   int err, err1;
> > +   unsigned long flags;
> > +   struct scatterlist *sgs[2], sg, ret;
> > +   struct virtio_device *vdev = nd_region->provider_data;
> > +   struct virtio_pmem *vpmem = vdev->priv;
> > +   struct virtio_pmem_request *req;
> 
> Nit: use reverse Christmas tree layout :)

o.k

> 
> > +
> > +   might_sleep();
> > +   req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +   if (!req)
> > +   return -ENOMEM;
> > +
> > +   req->done = false;
> > +   strcpy(req->name, "FLUSH");
> > +   init_waitqueue_head(&req->host_acked);
> > +   init_waitqueue_head(&req->wq_buf);
> > +   INIT_LIST_HEAD(&req->list);
> > +   sg_init_one(&sg, req->name, strlen(req->name));
> > +   sgs[0] = &sg;
> > +   sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +   sgs[1] = &ret;
> > +
> > +   spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +/*
> > + * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
> > + * queue does not have free descriptor. We add the request
> > + * to req_list and wait for host_ack to wake us up when free
> > + * slots are available.
> > + */
> > +   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> > +   GFP_ATOMIC)) == -ENOSPC) {
> > +
> > +   dev_err(&vdev->dev, "failed to send command to virtio pmem"\
> > +   "device, no free slots in the virtqueue\n");
> > +   req->wq_buf_avail = false;
> > +   list_add_tail(&req->list, &vpmem->req_list);

Re: [Qemu-devel] [RFC PATCH] target/arm: semihosting docs, formatting and return clean-ups

2019-05-14 Thread Philippe Mathieu-Daudé

Hi Alex,

On 5/10/19 9:10 PM, Alex Bennée wrote:
> This is a clean-up of the semihosting calls after reading ver 2.0 of
> the specification. There are a number of small fixes that seemed too
> insignificant to split into smaller patches:
> 

Can you split at least this one of:

>   - fixup block comments as per standard

The rest is probably acceptable as an unique patch:

>   - add reference to the ARM semihosting spec
>   - add some additional commentary on return values
>   - audit return values, return 0xdeadbeef for corrupted values
>   - fix up leaks from early returns with lock_user_string
>   - return bytes not written/read instead of -1
>   - add LOG_UNIMP for missing functionality

Thanks!

Phil.

> 
> This is very much a Friday patch. It might be worth splitting up if
> coming back for a more concerted clean-up series for semihosting as
> the asynchronous gdb calls probably need more attention.
> 
> Signed-off-by: Alex Bennée 
> ---
>  target/arm/arm-semi.c | 180 +-
>  1 file changed, 109 insertions(+), 71 deletions(-)
> 
> diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c
> index 4c326fdc2fb..8deaed2807c 100644
> --- a/target/arm/arm-semi.c
> +++ b/target/arm/arm-semi.c
> @@ -2,6 +2,7 @@
>   *  Arm "Angel" semihosting syscalls
>   *
>   *  Copyright (c) 2005, 2007 CodeSourcery.
> + *  Copyright (c) 2019 Linaro
>   *  Written by Paul Brook.
>   *
>   *  This program is free software; you can redistribute it and/or modify
> @@ -15,13 +16,19 @@
>   *  GNU General Public License for more details.
>   *
>   *  You should have received a copy of the GNU General Public License
> - *  along with this program; if not, see .
> + *  along with this program; if not, see
> + *  .
> + *
> + *  ARM Semihosting is documented in:
> + * Semihosting for AArch32 and AArch64 Release 2.0
> + * https://static.docs.arm.com/100863/0200/semihosting.pdf
>   */
>  
>  #include "qemu/osdep.h"
>  
>  #include "cpu.h"
>  #include "exec/semihost.h"
> +#include "exec/log.h"
>  #ifdef CONFIG_USER_ONLY
>  #include "qemu.h"
>  
> @@ -241,13 +248,18 @@ static target_ulong arm_gdb_syscall(ARMCPU *cpu, 
> gdb_syscall_complete_cb cb,
>   put_user_u64(val, args + (n) * 8) :\
>   put_user_u32(val, args + (n) * 4))
>  
> +/*
> + * Do a semihosting call. Returns the "RETURN REGISTER" which is
> + * documented as corrupted for some calls. In this case we use the
> + * venerable 0xdeadbeef.
> + */
>  target_ulong do_arm_semihosting(CPUARMState *env)
>  {
>  ARMCPU *cpu = arm_env_get_cpu(env);
>  CPUState *cs = CPU(cpu);
>  target_ulong args;
>  target_ulong arg0, arg1, arg2, arg3;
> -char * s;
> +char *s;
>  int nr;
>  uint32_t ret;
>  uint32_t len;
> @@ -273,9 +285,9 @@ target_ulong do_arm_semihosting(CPUARMState *env)
>  GET_ARG(2);
>  s = lock_user_string(arg0);
>  if (!s) {
> -/* FIXME - should this error code be -TARGET_EFAULT ? */
>  return (uint32_t)-1;
>  }
> +/* check for invalid open mode */
>  if (arg1 >= 12) {
>  unlock_user(s, arg0, 0);
>  return (uint32_t)-1;
> @@ -287,7 +299,7 @@ target_ulong do_arm_semihosting(CPUARMState *env)
>  }
>  if (use_gdb_syscalls()) {
>  ret = arm_gdb_syscall(cpu, arm_semi_cb, "open,%s,%x,1a4", arg0,
> -  (int)arg2+1, gdb_open_modeflags[arg1]);
> +  (int) arg2 + 1, gdb_open_modeflags[arg1]);
>  } else {
>  ret = set_swi_errno(ts, open(s, open_modeflags[arg1], 0644));
>  }
> @@ -301,48 +313,51 @@ target_ulong do_arm_semihosting(CPUARMState *env)
>  return set_swi_errno(ts, close(arg0));
>  }
>  case TARGET_SYS_WRITEC:
> -{
> -  char c;
> -
> -  if (get_user_u8(c, args))
> -  /* FIXME - should this error code be -TARGET_EFAULT ? */
> -  return (uint32_t)-1;
> -  /* Write to debug console.  stderr is near enough.  */
> -  if (use_gdb_syscalls()) {
> +{
> +char c;
> +if (!get_user_u8(c, args)) {
> +/* Write to debug console.  stderr is near enough.  */
> +if (use_gdb_syscalls()) {
>  return arm_gdb_syscall(cpu, arm_semi_cb, "write,2,%x,1", 
> args);
> -  } else {
> +} else {
>  #ifdef CONFIG_SOFTMMU
> -  Chardev *chardev = semihosting_get_chardev();
> -  if (chardev) {
> -  return qemu_chr_write_all(chardev, (uint8_t *) &c, 1);
> -  } else
> +Chardev *chardev = semihosting_get_chardev();
> +if (chardev) {
> +return qemu_chr_write_all(chardev, (uint8_t *) &c, 1);
> +}
>  #endif
> -  {
> -  return write(STDE

Re: [Qemu-devel] [qemu-s390x] [PATCH v4] s390: diagnose 318 info reset and migration support

2019-05-14 Thread Cornelia Huck

On Tue, 14 May 2019 11:27:32 +0200
David Hildenbrand  wrote:

> On 14.05.19 11:25, Christian Borntraeger wrote:
> > 
> > 
> > On 14.05.19 11:23, Christian Borntraeger wrote:  
> >>
> >>
> >> On 14.05.19 11:20, David Hildenbrand wrote:  
> >>> On 14.05.19 11:10, Christian Borntraeger wrote:  
> 
> 
>  On 14.05.19 10:59, David Hildenbrand wrote:  

> > We can
> >
> > 1. Fail to start with #cpus > 240 when diag318=on
> > 2. Remove the error once we support more than one SCLP response page
> >
> > Or
> >
> > 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
> >240 CPUs via SCLP
> > 2. Print a warning
> > 3. Remove the restriction and the warning once we support more than one
> >SCLP response page
> >
> > While I prefer the second approach (similar to defining zPCI devices
> > without zpci=on), I could also live with the first approach.  
> 
>  I prefer approach 1.
>   
> >>>
> >>> Isn't approach #2 what we discussed (limiting sclp, but of course to 247
> >>> CPUs), but with an additional warning? I'm confused.  
> >>
> >> Different numbering interpretion. I was talking about 1 = "Allow to start 
> >> with #cpus > 240 when diag318=on, but indicate only
> >> 240 CPUs via SCLP"  
> > 
> > So yes, variant 2 when I use your numbering. The only question is: do we 
> > need
> > a warning? It probably does not hurt.   
> 
> After all, we are talking about 1 VCPU that the guest can only use by
> indirect probing ... I leave that up to Collin :)

I'd prefer a warning... even if it is a corner case, I think it's
better to be explicit instead of silent.

Re: [Qemu-devel] [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread David Hildenbrand

>>
>>> +   }
>>> +
>>> +   /* When host has read buffer, this completes via host_ack */
>>
>> "A host repsonse results in "host_ack" getting called" ... ?
>>
>>> +   wait_event(req->host_acked, req->done);
>>> +   err = req->ret;
>>> +ret:
>>> +   kfree(req);
>>> +   return err;
>>> +};
>>> +
>>> +/* The asynchronous flush callback function */
>>> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
>>> +{
>>> +   int rc = 0;
>>> +
>>> +   /* Create child bio for asynchronous flush and chain with
>>> +* parent bio. Otherwise directly call nd_region flush.
>>> +*/
>>> +   if (bio && bio->bi_iter.bi_sector != -1) {
>>> +   struct bio *child = bio_alloc(GFP_ATOMIC, 0);
>>> +
>>> +   if (!child)
>>> +   return -ENOMEM;
>>> +   bio_copy_dev(child, bio);
>>> +   child->bi_opf = REQ_PREFLUSH;
>>> +   child->bi_iter.bi_sector = -1;
>>> +   bio_chain(child, bio);
>>> +   submit_bio(child);
>>
>> return 0;
>>
>> Then, drop the "else" case and "int rc" and do directly
>>
>> if (virtio_pmem_flush(nd_region))
>>  return -EIO;
> 
> and another 'return 0' here :)
> 
> I don't like return from multiple places instead I prefer
> single exit point from function.

Makes this function more complicated than necessary. I agree when there
are locks involved.

>  
>>
>>> +
>>> +   return 0;
>>> +};
>>> +
>>> +static int virtio_pmem_probe(struct virtio_device *vdev)
>>> +{
>>> +   int err = 0;
>>> +   struct resource res;
>>> +   struct virtio_pmem *vpmem;
>>> +   struct nd_region_desc ndr_desc = {};
>>> +   int nid = dev_to_node(&vdev->dev);
>>> +   struct nd_region *nd_region;
>>
>> Nit: use reverse Christmas tree layout :)
> 
> Done.
> 
>>
>>> +
>>> +   if (!vdev->config->get) {
>>> +   dev_err(&vdev->dev, "%s failure: config access disabled\n",
>>> +   __func__);
>>> +   return -EINVAL;
>>> +   }
>>> +
>>> +   vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
>>> +   if (!vpmem) {
>>> +   err = -ENOMEM;
>>> +   goto out_err;
>>> +   }
>>> +
>>> +   vpmem->vdev = vdev;
>>> +   vdev->priv = vpmem;
>>> +   err = init_vq(vpmem);
>>> +   if (err)
>>> +   goto out_err;
>>> +
>>> +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
>>> +   start, &vpmem->start);
>>> +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
>>> +   size, &vpmem->size);
>>> +
>>> +   res.start = vpmem->start;
>>> +   res.end   = vpmem->start + vpmem->size-1;
>>> +   vpmem->nd_desc.provider_name = "virtio-pmem";
>>> +   vpmem->nd_desc.module = THIS_MODULE;
>>> +
>>> +   vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
>>> +   &vpmem->nd_desc);
>>> +   if (!vpmem->nvdimm_bus)
>>> +   goto out_vq;
>>> +
>>> +   dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
>>> +
>>> +   ndr_desc.res = &res;
>>> +   ndr_desc.numa_node = nid;
>>> +   ndr_desc.flush = async_pmem_flush;
>>> +   set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
>>> +   set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
>>> +   nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
>>> +
>>
>> I'd drop this empty line.
> 
> hmm.
> 

The common pattern after allocating something, immediately check for it
in the next line (like you do throughout this patch ;) )

...
>> You are not freeing "vdev->priv".
> 
> vdev->priv is vpmem which is allocated using devm API.

I'm confused. Looking at drivers/virtio/virtio_balloon.c:

static void virtballoon_remove(struct virtio_device *vdev)
{
struct virtio_balloon *vb = vdev->priv;

...

kfree(vb);
}

I think you should do the same here, vdev->priv is allocated in
virtio_pmem_probe.

But maybe I am missing something important here :)

>>
>>> +   nvdimm_bus_unregister(nvdimm_bus);
>>> +   vdev->config->del_vqs(vdev);
>>> +   vdev->config->reset(vdev);
>>> +}
>>> +

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol

2019-05-14 Thread Yury Kotov

Ping

18.04.2019, 20:46, "Yury Kotov" :
> 18.04.2019, 20:01, "Dr. David Alan Gilbert" :
>>  * Yury Kotov (yury-ko...@yandex-team.ru) wrote:
>>>   18.04.2019, 19:03, "Dr. David Alan Gilbert" :
>>>   > * Yury Kotov (yury-ko...@yandex-team.ru) wrote:
>>>   >>  18.04.2019, 17:20, "Dr. David Alan Gilbert" :
>>>   >>  > * Yury Kotov (yury-ko...@yandex-team.ru) wrote:
>>>   >>  >>  15.04.2019, 14:30, "Dr. David Alan Gilbert" :
>>>   >>  >>  > * Daniel P. Berrangé (berra...@redhat.com) wrote:
>>>   >>  >>  >>  On Mon, Apr 15, 2019 at 12:15:12PM +0100, Dr. David Alan 
>>> Gilbert wrote:
>>>   >>  >>  >>  > * Daniel P. Berrangé (berra...@redhat.com) wrote:
>>>   >>  >>  >>  > > On Mon, Apr 15, 2019 at 01:33:21PM +0300, Yury Kotov 
>>> wrote:
>>>   >>  >>  >>  > > > 15.04.2019, 13:25, "Daniel P. Berrangé" 
>>> :
>>>   >>  >>  >>  > > > > On Mon, Apr 15, 2019 at 01:17:06PM +0300, Yury Kotov 
>>> wrote:
>>>   >>  >>  >>  > > > >>  15.04.2019, 13:11, "Daniel P. Berrangé" 
>>> :
>>>   >>  >>  >>  > > > >>  > On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury 
>>> Kotov wrote:
>>>   >>  >>  >>  > > > >>  >>  Hi,
>>>   >>  >>  >>  > > > >>  >>
>>>   >>  >>  >>  > > > >>  >>  Just to clarify. I see two possible solutions:
>>>   >>  >>  >>  > > > >>  >>
>>>   >>  >>  >>  > > > >>  >>  1) Since the migration code doesn't receive fd, 
>>> it isn't responsible for
>>>   >>  >>  >>  > > > >>  >>  closing it. So, it may be better to use 
>>> migrate_fd_param for both
>>>   >>  >>  >>  > > > >>  >>  incoming/outgoing and add dupping for 
>>> migrate_fd_param. Thus, clients must
>>>   >>  >>  >>  > > > >>  >>  close the fd themselves. But existing clients 
>>> will have a leak.
>>>   >>  >>  >>  > > > >>  >
>>>   >>  >>  >>  > > > >>  > We can't break existing clients in this way as 
>>> they are correctly
>>>   >>  >>  >>  > > > >>  > using the monitor with its current semantics.
>>>   >>  >>  >>  > > > >>  >
>>>   >>  >>  >>  > > > >>  >>  2) If we don't duplicate fd, then at least we 
>>> should remove fd from
>>>   >>  >>  >>  > > > >>  >>  the corresponding list. Therefore, the solution 
>>> is to fix qemu_close to find
>>>   >>  >>  >>  > > > >>  >>  the list and remove fd from it. But qemu_close 
>>> is currently consistent with
>>>   >>  >>  >>  > > > >>  >>  qemu_open (which opens/dups fd), so adding 
>>> additional logic might not be
>>>   >>  >>  >>  > > > >>  >>  a very good idea.
>>>   >>  >>  >>  > > > >>  >
>>>   >>  >>  >>  > > > >>  > qemu_close is not appropriate place to deal with 
>>> something speciifc
>>>   >>  >>  >>  > > > >>  > to the montor.
>>>   >>  >>  >>  > > > >>  >
>>>   >>  >>  >>  > > > >>  >>  I don't see any other solution, but I might 
>>> miss something.
>>>   >>  >>  >>  > > > >>  >>  What do you think?
>>>   >>  >>  >>  > > > >>  >
>>>   >>  >>  >>  > > > >>  > All callers of monitor_get_fd() will close() the 
>>> FD they get back.
>>>   >>  >>  >>  > > > >>  > Thus monitor_get_fd() should remove it from the 
>>> list when it returns
>>>   >>  >>  >>  > > > >>  > it, and we should add API docs to 
>>> monitor_get_fd() to explain this.
>>>   >>  >>  >>  > > > >>  >
>>>   >>  >>  >>  > > > >>  Ok, it sounds reasonable. But monitor_get_fd is 
>>> only about outgoing migration.
>>>   >>  >>  >>  > > > >>  But what about the incoming migration? It doesn't 
>>> use monitor_get_fd but just
>>>   >>  >>  >>  > > > >>  converts input string to int and use it as fd.
>>>   >>  >>  >>  > > > >
>>>   >>  >>  >>  > > > > The incoming migration expects the FD to be passed 
>>> into QEMU by the mgmt
>>>   >>  >>  >>  > > > > app when it is exec'ing the QEMU binary. It doesn't 
>>> interact with the
>>>   >>  >>  >>  > > > > monitor at all AFAIR.
>>>   >>  >>  >>  > > > >
>>>   >>  >>  >>  > > >
>>>   >>  >>  >>  > > > Oh, sorry. This use case is not obvious. We used add-fd 
>>> to pass fd for
>>>   >>  >>  >>  > > > migrate-incoming and such way has described problems.
>>>   >>  >>  >>  > >
>>>   >>  >>  >>  > > That's a bug in your usage of QEMU IMHO, as the incoming 
>>> code is not
>>>   >>  >>  >>  > > designed to use add-fd.
>>>   >>  >>  >>  >
>>>   >>  >>  >>  > Hmm, that's true - although:
>>>   >>  >>  >>  > a) It's very non-obvious
>>>   >>  >>  >>  > b) Unfortunate, since it would go well with -incoming defer
>>>   >>  >>  >>
>>>   >>  >>  >>  Yeah I think this is a screw up on QMEU's part when 
>>> introducing 'defer'.
>>>   >>  >>  >>
>>>   >>  >>  >>  We should have mandated use of 'add-fd' when using 'defer', 
>>> since FD
>>>   >>  >>  >>  inheritance-over-execve() should only be used for command 
>>> line args,
>>>   >>  >>  >>  not monitor commands.
>>>   >>  >>  >>
>>>   >>  >>  >>  Not sure how to best fix this is QEMU though without breaking 
>>> back
>>>   >>  >>  >>  compat for apps using 'defer' already.
>>>   >>  >>  >
>>>   >>  >>  > We could add mon-fd: transports that has the same behaviour as 
>>> now for
>>>   >>  >>  > outgoing, and for incoming uses the add-fd stash.
>>>   >>  >>

Re: [Qemu-devel] [libvirt] QMP; unsigned 64-bit ints; JSON standards compliance

2019-05-14 Thread Dr. David Alan Gilbert

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Tue, May 14, 2019 at 08:02:49AM +0200, Markus Armbruster wrote:
> > Eric Blake  writes:
> > 
> > > On 5/13/19 8:53 AM, Markus Armbruster wrote:
> > >
> > >>> We have a few options
> > >>>
> > >>>  1. Use string format for values > 2^53-1, int format below that
> > >>>  2. Use string format for all fields which are 64-bit ints whether
> > >>> signed or unsigned
> > >>>  3. Use string format for all fields which are integers, even 32-bit
> > >>> ones
> > >>>
> > >>> I would probably suggest option 2. It would make the QEMU impl quite
> > >>> easy IIUC, we we'd just change the QAPI visitor's impl for the int64
> > >>> and uint64 fields to use string format (when the right capability is
> > >>> negotiated by QMP).
> > >>>
> > >>> I include 3 only for completeness - I don't think there's a hugely
> > >>> compelling reason to mess with 32-bit ints.
> > >> 
> > >> Agree.
> > >
> > > Other than if we ever change the type of a QMP integer. Right now, if we
> > > widen from 'int32' to 'int' (aka 'int64'), it is invisible to clients;
> > > but once we start stringizing 64-bit numbers (at client request) but NOT
> > > 32-bit numbers, then changing a type from 32 to 64 bits (or the
> > > converse) becomes an API change to clients. Introspection will at least
> > > let a client know which form to expect, but it does mean we have to be
> > > more aware of typing issues going forward.
> > 
> > Thank you so much for helping my old synapses finally fire!  Option 2 is
> > not what we thought it is.  Let me explain.
> > 
> > Introspection reports *all* QAPI integer types as "int".  This is
> > deliberate.
> > 
> > So, when the client that negotiated the interoperability capability sees
> > "int", it has to accept *both* integer encodings: JSON number and JSON
> > string.
> > 
> > The difference between option 1 and option 2 for the client is that
> > option 2 will use only one encoding.  But the client must not rely on
> > that!  Another QEMU version may well use the other encoding (because we
> > narrowed or widened the QAPI integer type in the QAPI schema).
> > 
> > Elsewhere in this thread, David pointed out that option 1 complicates
> > testing QEMU: full coverage requires passing both a small number (to
> > cover JSON number encoding) and a large number (to cover JSON string
> > encoding), to which I replied that there are very few places to test.
> > 
> > Option 2 complicates testing clients: full coverage requires testing
> > with both a version of QEMU (or a mock-up) that uses wide integers
> > (encoded as JSON string) and narrow integers (encoded as JSON number).
> > Impractical.
> > 
> > >>> Option 1 is the bare minimum needed to ensure precision, but to me
> > >>> it feels a bit dirty to say a given field will have different encoding
> > >>> depending on the value. If apps need to deal with string encoding, they
> > >>> might as well just use it for all values in a given field.
> > >> 
> > >> I guess that depends on what this interoperability capability does for
> > >> QMP *input*.
> > >
> > > "Be liberal in what you accept, strict in what you produce" - that
> > > argues we should accept both forms on input (it's easy enough to ALWAYS
> > > permit a string in place of an integer, and to take an in-range integer
> > > even when we would in turn output it as a string).
> > 
> > With option 2, QEMU *has* to be liberal in what it accepts, because the
> > client cannot deduce from introspection whether the integer is wide or
> > narrow.
> > 
> > [...]
> > 
> > Daniel, you wrote you'd probably suggest option 2.  Would you like to
> > reconsider?
> 
> Based on the above, let me try & summarize what we need behaviour to be:
> 
>   - Integer mode (current default):
> 
>- QEMU & clients MUST format integer fields as numbers
>  regardless of size
> 
>- QEMU & clients MUST parse number format for any integer
>  fields
> 
>   - String mode:
> 
>- QEMU & clients MUST format integer fields as strings
>  if their value can not fit in a 32-bit integer.
> 
>- QEMU & clients MAY format integer fields as strings
>  even if their value can fit in 32-bit integer
> 
>- QEMU & client MUST parse both string and number format
>  for any integer fields.
> 
> Unless I'm missing something, this should ensure we don't loose precision,
> can always parse large numbers, and can internally change QEMU precision
> from int8/16/32 upto full int64 without breaking clients.

But we could be stricter and simpler in string mode:

  - QEMU & clients MUST format integer fields as strings, always
  - QEMU & clients MUST parse only strings for integer fields.

That's (3) above, but also meets your requirements.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Cornelia Huck

On Tue, 14 May 2019 03:47:36 -0400
Yan Zhao  wrote:

> On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:  
> > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:  

> > > > That said, from libvirt POV as a consumer, I'd expect there to be truly 
> > > > only 2
> > > > errors (I believe Alex has mentioned something similar in one of his 
> > > > responses
> > > > in one of the threads):
> > > > a) read error indicating that an mdev type doesn't support migration
> > > > - I assume if one type doesn't support migration, none of the 
> > > > other
> > > >   types exposed on the parent device do, is that a fair 
> > > > assumption?

Probably; but there might be cases where the migratability depends not
on the device type, but how the partitioning has been done... or is
that too contrived?

> > > > b) write error indicating that the mdev types are incompatible for
> > > > migration
> > > >
> > > > Regards,
> > > > Erik  
> > > Thanks for this explanation.
> > > so, can we arrive at below agreements?
> > >
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an 
> > > errno on
> > > read indicates the device does not support migration version comparison 
> > > and
> > > that an errno on write indicates the devices are incompatible or the 
> > > target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.  
> > 
> > That would be my take on this, yes, but I open to hear any other 
> > suggestions and
> > ideas I couldn't think of as well.

So, read to find out whether migration is supported at all, write to
find out whether it is supported for that concrete pairing is
reasonable for libvirt?

> > 
> > Erik  
> got it. thanks a lot!
> 
> hi Cornelia and Dave,
> do you also agree on:
> 1. "not to define the specific errno returned for a specific situation,
> let the vendor driver decide, userspace simply needs to know that an errno on
> read indicates the device does not support migration version comparison and
> that an errno on write indicates the devices are incompatible or the target
> doesn't support migration versions. "
> 2. vendor driver should log detailed error reasons in kernel log.

Two questions:
- How reasonable is it to refer to the system log in order to find out
  what exactly went wrong?
- If detailed error reporting is basically done to the syslog, do
  different error codes still provide useful information? Or should the
  vendor driver decide what it wants to do?

Re: [Qemu-devel] [libvirt] QMP; unsigned 64-bit ints; JSON standards compliance

2019-05-14 Thread Peter Krempa

On Tue, May 14, 2019 at 10:43:31 +0100, Daniel Berrange wrote:
> On Tue, May 14, 2019 at 10:37:55AM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > > On Tue, May 14, 2019 at 08:02:49AM +0200, Markus Armbruster wrote:
> > > > Eric Blake  writes:

[...]

> > > Unless I'm missing something, this should ensure we don't loose precision,
> > > can always parse large numbers, and can internally change QEMU precision
> > > from int8/16/32 upto full int64 without breaking clients.
> > 
> > But we could be stricter and simpler in string mode:
> > 
> >   - QEMU & clients MUST format integer fields as strings, always
> >   - QEMU & clients MUST parse only strings for integer fields.
> > 
> > That's (3) above, but also meets your requirements.
> 
> Yep, given that we don't actually expose the int8/int16/int32/int64
> distinction via the QMP introspection data, that would be fine too.
> 
> Its basically saying we'll never use JSON's number format.

I think this would make the most sense. If you are going to switch to
the "string" mode, why bother doing any compat?


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta



> 
> >>
> >>> + }
> >>> +
> >>> + /* When host has read buffer, this completes via host_ack */
> >>
> >> "A host repsonse results in "host_ack" getting called" ... ?
> >>
> >>> + wait_event(req->host_acked, req->done);
> >>> + err = req->ret;
> >>> +ret:
> >>> + kfree(req);
> >>> + return err;
> >>> +};
> >>> +
> >>> +/* The asynchronous flush callback function */
> >>> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> >>> +{
> >>> + int rc = 0;
> >>> +
> >>> + /* Create child bio for asynchronous flush and chain with
> >>> +  * parent bio. Otherwise directly call nd_region flush.
> >>> +  */
> >>> + if (bio && bio->bi_iter.bi_sector != -1) {
> >>> + struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> >>> +
> >>> + if (!child)
> >>> + return -ENOMEM;
> >>> + bio_copy_dev(child, bio);
> >>> + child->bi_opf = REQ_PREFLUSH;
> >>> + child->bi_iter.bi_sector = -1;
> >>> + bio_chain(child, bio);
> >>> + submit_bio(child);
> >>
> >> return 0;
> >>
> >> Then, drop the "else" case and "int rc" and do directly
> >>
> >> if (virtio_pmem_flush(nd_region))
> >>return -EIO;
> > 
> > and another 'return 0' here :)
> > 
> > I don't like return from multiple places instead I prefer
> > single exit point from function.
> 
> Makes this function more complicated than necessary. I agree when there
> are locks involved.

o.k. I will change as you suggest :)

> 
> >  
> >>
> >>> +
> >>> + return 0;
> >>> +};
> >>> +
> >>> +static int virtio_pmem_probe(struct virtio_device *vdev)
> >>> +{
> >>> + int err = 0;
> >>> + struct resource res;
> >>> + struct virtio_pmem *vpmem;
> >>> + struct nd_region_desc ndr_desc = {};
> >>> + int nid = dev_to_node(&vdev->dev);
> >>> + struct nd_region *nd_region;
> >>
> >> Nit: use reverse Christmas tree layout :)
> > 
> > Done.
> > 
> >>
> >>> +
> >>> + if (!vdev->config->get) {
> >>> + dev_err(&vdev->dev, "%s failure: config access disabled\n",
> >>> + __func__);
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> >>> + if (!vpmem) {
> >>> + err = -ENOMEM;
> >>> + goto out_err;
> >>> + }
> >>> +
> >>> + vpmem->vdev = vdev;
> >>> + vdev->priv = vpmem;
> >>> + err = init_vq(vpmem);
> >>> + if (err)
> >>> + goto out_err;
> >>> +
> >>> + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> >>> + start, &vpmem->start);
> >>> + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> >>> + size, &vpmem->size);
> >>> +
> >>> + res.start = vpmem->start;
> >>> + res.end   = vpmem->start + vpmem->size-1;
> >>> + vpmem->nd_desc.provider_name = "virtio-pmem";
> >>> + vpmem->nd_desc.module = THIS_MODULE;
> >>> +
> >>> + vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> >>> + &vpmem->nd_desc);
> >>> + if (!vpmem->nvdimm_bus)
> >>> + goto out_vq;
> >>> +
> >>> + dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> >>> +
> >>> + ndr_desc.res = &res;
> >>> + ndr_desc.numa_node = nid;
> >>> + ndr_desc.flush = async_pmem_flush;
> >>> + set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> >>> + set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> >>> + nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> >>> +
> >>
> >> I'd drop this empty line.
> > 
> > hmm.
> > 
> 
> The common pattern after allocating something, immediately check for it
> in the next line (like you do throughout this patch ;) )

Right. But rare times when I see space will beauty the code I tend to
add it. Maybe I should not :)

> 
> ...
> >> You are not freeing "vdev->priv".
> > 
> > vdev->priv is vpmem which is allocated using devm API.
> 
> I'm confused. Looking at drivers/virtio/virtio_balloon.c:
> 
> static void virtballoon_remove(struct virtio_device *vdev)
> {
>   struct virtio_balloon *vb = vdev->priv;
> 
>   ...
> 
>   kfree(vb);
> }
> 
> I think you should do the same here, vdev->priv is allocated in
> virtio_pmem_probe.
> 
> But maybe I am missing something important here :)

Because virtio_balloon use "kzalloc" for allocation and needs to be freed. 
But virtio pmem uses "devm_kzalloc" which takes care of automatically deleting 
the device memory when associated device is detached.

Thanks,
Pankaj
> 
> >>
> >>> + nvdimm_bus_unregister(nvdimm_bus);
> >>> + vdev->config->del_vqs(vdev);
> >>> + vdev->config->reset(vdev);
> >>> +}
> >>> +
> 
> --
> 
> Thanks,
> 
> David / dhildenb
>

Re: [Qemu-devel] [libvirt] QMP; unsigned 64-bit ints; JSON standards compliance

2019-05-14 Thread Daniel P . Berrangé

On Tue, May 14, 2019 at 10:37:55AM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > On Tue, May 14, 2019 at 08:02:49AM +0200, Markus Armbruster wrote:
> > > Eric Blake  writes:
> > > 
> > > > On 5/13/19 8:53 AM, Markus Armbruster wrote:
> > > >
> > > >>> We have a few options
> > > >>>
> > > >>>  1. Use string format for values > 2^53-1, int format below that
> > > >>>  2. Use string format for all fields which are 64-bit ints whether
> > > >>> signed or unsigned
> > > >>>  3. Use string format for all fields which are integers, even 32-bit
> > > >>> ones
> > > >>>
> > > >>> I would probably suggest option 2. It would make the QEMU impl quite
> > > >>> easy IIUC, we we'd just change the QAPI visitor's impl for the int64
> > > >>> and uint64 fields to use string format (when the right capability is
> > > >>> negotiated by QMP).
> > > >>>
> > > >>> I include 3 only for completeness - I don't think there's a hugely
> > > >>> compelling reason to mess with 32-bit ints.
> > > >> 
> > > >> Agree.
> > > >
> > > > Other than if we ever change the type of a QMP integer. Right now, if we
> > > > widen from 'int32' to 'int' (aka 'int64'), it is invisible to clients;
> > > > but once we start stringizing 64-bit numbers (at client request) but NOT
> > > > 32-bit numbers, then changing a type from 32 to 64 bits (or the
> > > > converse) becomes an API change to clients. Introspection will at least
> > > > let a client know which form to expect, but it does mean we have to be
> > > > more aware of typing issues going forward.
> > > 
> > > Thank you so much for helping my old synapses finally fire!  Option 2 is
> > > not what we thought it is.  Let me explain.
> > > 
> > > Introspection reports *all* QAPI integer types as "int".  This is
> > > deliberate.
> > > 
> > > So, when the client that negotiated the interoperability capability sees
> > > "int", it has to accept *both* integer encodings: JSON number and JSON
> > > string.
> > > 
> > > The difference between option 1 and option 2 for the client is that
> > > option 2 will use only one encoding.  But the client must not rely on
> > > that!  Another QEMU version may well use the other encoding (because we
> > > narrowed or widened the QAPI integer type in the QAPI schema).
> > > 
> > > Elsewhere in this thread, David pointed out that option 1 complicates
> > > testing QEMU: full coverage requires passing both a small number (to
> > > cover JSON number encoding) and a large number (to cover JSON string
> > > encoding), to which I replied that there are very few places to test.
> > > 
> > > Option 2 complicates testing clients: full coverage requires testing
> > > with both a version of QEMU (or a mock-up) that uses wide integers
> > > (encoded as JSON string) and narrow integers (encoded as JSON number).
> > > Impractical.
> > > 
> > > >>> Option 1 is the bare minimum needed to ensure precision, but to me
> > > >>> it feels a bit dirty to say a given field will have different encoding
> > > >>> depending on the value. If apps need to deal with string encoding, 
> > > >>> they
> > > >>> might as well just use it for all values in a given field.
> > > >> 
> > > >> I guess that depends on what this interoperability capability does for
> > > >> QMP *input*.
> > > >
> > > > "Be liberal in what you accept, strict in what you produce" - that
> > > > argues we should accept both forms on input (it's easy enough to ALWAYS
> > > > permit a string in place of an integer, and to take an in-range integer
> > > > even when we would in turn output it as a string).
> > > 
> > > With option 2, QEMU *has* to be liberal in what it accepts, because the
> > > client cannot deduce from introspection whether the integer is wide or
> > > narrow.
> > > 
> > > [...]
> > > 
> > > Daniel, you wrote you'd probably suggest option 2.  Would you like to
> > > reconsider?
> > 
> > Based on the above, let me try & summarize what we need behaviour to be:
> > 
> >   - Integer mode (current default):
> > 
> >- QEMU & clients MUST format integer fields as numbers
> >  regardless of size
> > 
> >- QEMU & clients MUST parse number format for any integer
> >  fields
> > 
> >   - String mode:
> > 
> >- QEMU & clients MUST format integer fields as strings
> >  if their value can not fit in a 32-bit integer.
> > 
> >- QEMU & clients MAY format integer fields as strings
> >  even if their value can fit in 32-bit integer
> > 
> >- QEMU & client MUST parse both string and number format
> >  for any integer fields.
> > 
> > Unless I'm missing something, this should ensure we don't loose precision,
> > can always parse large numbers, and can internally change QEMU precision
> > from int8/16/32 upto full int64 without breaking clients.
> 
> But we could be stricter and simpler in string mode:
> 
>   - QEMU & clients MUST format integer fields as strings, always
>   - QEMU & clients MUST p

Re: [Qemu-devel] [PATCH] migration: Fix use-after-free during process exit

2019-05-14 Thread Yury Kotov

Ping ping

17.04.2019, 15:44, "Yury Kotov" :
> Ping
>
> 08.04.2019, 14:34, "Yury Kotov" :
>>  It fixes heap-use-after-free which was found by clang's ASAN.
>>
>>  Control flow of this use-after-free:
>>  main_thread:
>>  * Got SIGTERM and completes main loop
>>  * Calls migration_shutdown
>>    - migrate_fd_cancel (so, migration_thread begins to complete)
>>    - object_unref(OBJECT(current_migration));
>>
>>  migration_thread:
>>  * migration_iteration_finish -> schedule cleanup bh
>>  * object_unref(OBJECT(s)); (Now, current_migration is freed)
>>  * exits
>>
>>  main_thread:
>>  * Calls vm_shutdown -> drain bdrvs -> main loop
>>    -> cleanup_bh -> use after free
>>
>>  If you want to reproduce, these couple of sleeps will help:
>>  vl.c:4613:
>>   migration_shutdown();
>>  + sleep(2);
>>  migration.c:3269:
>>  + sleep(1);
>>   trace_migration_thread_after_loop();
>>   migration_iteration_finish(s);
>>
>>  Original output:
>>  qemu-system-x86_64: terminating on signal 15 from pid 31980 (> process>)
>>  =
>>  ==31958==ERROR: AddressSanitizer: heap-use-after-free on address 
>> 0x6191d210
>>    at pc 0x58a535ca bp 0x7fffb190 sp 0x7fffb188
>>  READ of size 8 at 0x6191d210 thread T0 (qemu-vm-0)
>>  #0 0x58a535c9 in migrate_fd_cleanup migration/migration.c:1502:23
>>  #1 0x594fde0a in aio_bh_call util/async.c:90:5
>>  #2 0x594fe522 in aio_bh_poll util/async.c:118:13
>>  #3 0x59524783 in aio_poll util/aio-posix.c:725:17
>>  #4 0x59504fb3 in aio_wait_bh_oneshot util/aio-wait.c:71:5
>>  #5 0x573bddf6 in virtio_blk_data_plane_stop
>>    hw/block/dataplane/virtio-blk.c:282:5
>>  #6 0x589d5c09 in virtio_bus_stop_ioeventfd 
>> hw/virtio/virtio-bus.c:246:9
>>  #7 0x589e9917 in virtio_pci_stop_ioeventfd 
>> hw/virtio/virtio-pci.c:287:5
>>  #8 0x589e22bf in virtio_pci_vmstate_change 
>> hw/virtio/virtio-pci.c:1072:9
>>  #9 0x57628931 in virtio_vmstate_change hw/virtio/virtio.c:2257:9
>>  #10 0x57c36713 in vm_state_notify vl.c:1605:9
>>  #11 0x5716ef53 in do_vm_stop cpus.c:1074:9
>>  #12 0x5716eeff in vm_shutdown cpus.c:1092:12
>>  #13 0x57c4283e in main vl.c:4617:5
>>  #14 0x7fffdfdb482f in __libc_start_main
>>    (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
>>  #15 0x56ecb118 in _start 
>> (x86_64-softmmu/qemu-system-x86_64+0x1977118)
>>
>>  0x6191d210 is located 144 bytes inside of 952-byte region
>>    [0x6191d180,0x6191d538)
>>  freed by thread T6 (live_migration) here:
>>  #0 0x56f76782 in __interceptor_free
>>    
>> /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:124:3
>>  #1 0x58d5fa94 in object_finalize qom/object.c:618:9
>>  #2 0x58d57651 in object_unref qom/object.c:1068:9
>>  #3 0x58a55588 in migration_thread migration/migration.c:3272:5
>>  #4 0x595393f2 in qemu_thread_start util/qemu-thread-posix.c:502:9
>>  #5 0x7fffe057f6b9 in start_thread 
>> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
>>
>>  previously allocated by thread T0 (qemu-vm-0) here:
>>  #0 0x56f76b03 in __interceptor_malloc
>>    
>> /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146:3
>>  #1 0x76ee37b8 in g_malloc 
>> (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4f7b8)
>>  #2 0x58d58031 in object_new qom/object.c:640:12
>>  #3 0x58a31f21 in migration_object_init migration/migration.c:139:25
>>  #4 0x57c41398 in main vl.c:4320:5
>>  #5 0x7fffdfdb482f in __libc_start_main 
>> (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
>>
>>  Thread T6 (live_migration) created by T0 (qemu-vm-0) here:
>>  #0 0x56f5f0dd in pthread_create
>>    
>> /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:210:3
>>  #1 0x59538cf9 in qemu_thread_create util/qemu-thread-posix.c:539:11
>>  #2 0x58a53304 in migrate_fd_connect migration/migration.c:3332:5
>>  #3 0x58a72bd8 in migration_channel_connect migration/channel.c:92:5
>>  #4 0x58a6ef87 in exec_start_outgoing_migration migration/exec.c:42:5
>>  #5 0x58a4f3c2 in qmp_migrate migration/migration.c:1922:9
>>  #6 0x58bb4f6a in qmp_marshal_migrate 
>> qapi/qapi-commands-migration.c:607:5
>>  #7 0x59363738 in do_qmp_dispatch qapi/qmp-dispatch.c:131:5
>>  #8 0x59362a15 in qmp_dispatch qapi/qmp-dispatch.c:174:11
>>  #9 0x571bac15 in monitor_qmp_dispatch monitor.c:4124:11
>>  #10 0x5719a22d in monitor_qmp_bh_dispatcher monitor.c:4207:9
>>  #11 0x594fde0a in aio_bh_call util/async.c:90:5
>>  #12 0x594fe522 in aio_bh_poll util/async.c:118:13
>>  #13 0x595201e0 in aio_dispatch util/aio-posix.c:460:5
>>  #14 0x59503553 in aio_ctx_dispatch util/async.c:261:5
>>  #15 0x76ede196

Re: [Qemu-devel] [RFC PATCH] QEMU may write to system_memory before guest starts

2019-05-14 Thread Yury Kotov

Ping ping

17.04.2019, 15:46, "Yury Kotov" :
> Ping
>
> 04.04.2019, 13:01, "Yury Kotov" :
>>  I saw Catherine Ho's patch series and it seems ok to me, but in this RFC I 
>> asked
>>  about a way how to detect other writes which may not be covered by 
>> particular
>>  fixes.
>>  Perhaps this is excessive caution...
>>
>>  Regards,
>>  Yury
>>
>>  04.04.2019, 12:52, "Dr. David Alan Gilbert" :
>>>   * Юрий Котов (yury-ko...@yandex-team.ru) wrote:
    Ping
>>>
>>>   Is this fixed by Catherine Ho's patch series?
>>>
>>>   Dave
>>>
    21.03.2019, 19:27, "Yury Kotov" :
    > Hi,
    >
    > 19.03.2019, 14:52, "Dr. David Alan Gilbert" :
    >>  * Peter Maydell (peter.mayd...@linaro.org) wrote:
    >>>   On Tue, 19 Mar 2019 at 11:03, Dr. David Alan Gilbert
    >>>    wrote:
    >>>   >
    >>>   > * Peter Maydell (peter.mayd...@linaro.org) wrote:
    >>>   > > I didn't think migration distinguished between "main memory"
    >>>   > > and any other kind of RAMBlock-backed memory ?
    >>>   >
    >>>   > In Yury's case there's a distinction between RAMBlock's that 
 are mapped
    >>>   > with RAM_SHARED (which normally ends up as MAP_SHARED) and all 
 others.
    >>>   > You can set that for main memory by using -numa to specify a 
 memdev
    >>>   > that's backed by a file and has the share=on property.
    >>>   >
    >>>   > On x86 the ROMs end up as separate RAMBlock's that aren't 
 affected
    >>>   > by that -numa/share=on - so they don't fight Yury's trick.
    >>>
    >>>   You can use the generic loader on x86 to load an ELF file
    >>>   into RAM if you want, which would I think also trigger this.
    >>
    >>  OK, although that doesn't worry me too much - since in the majority
    >>  of cases Yury's trick still works well.
    >>
    >>  I wonder if there's a way to make Yury's code to detect these cases
    >>  and not allow the feature; the best thing for the moment would seem 
 to
    >>  be to skip the aarch test that uses elf loading.
    >
    > Currently, I've no idea how to detect such cases, but there is an 
 ability to
    > detect memory corruption. I want to update the RFC patch to let user 
 to map some
    > memory regions as readonly until incoming migration start.
    >
    > E.g.
    > 1) If x-ignore-shared is enabled in command line or memory region is 
 marked
    >    (something like ',readonly=on'),
    > 2) Memory region is shared (,share=on),
    > 3) And qemu is started with '-incoming' option
    >
    > Then map such regions as readonly until incoming migration finished.
    > Thus, the patch will be able to detect memory corruption and will not 
 affect
    > normal cases.
    >
    > How do you think, is it needed?
    >
    > I already have a cleaner version of the RFC patch, but I'm not sure 
 about 1).
    > Which way is better: enable capability in command line, add a new 
 option for
    > memory-backend or something else.
    >
    >>  Dave
    >>
    >>>   thanks
    >>>   -- PMM
    >>  --
    >>  Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
    >
    > Regards,
    > Yury
>>>   --
>>>   Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread David Hildenbrand

>>
>> I think you should do the same here, vdev->priv is allocated in
>> virtio_pmem_probe.
>>
>> But maybe I am missing something important here :)
> 
> Because virtio_balloon use "kzalloc" for allocation and needs to be freed. 
> But virtio pmem uses "devm_kzalloc" which takes care of automatically 
> deleting 
> the device memory when associated device is detached.

Hehe, thanks, that was the part that I was missing!

-- 

Thanks,

David / dhildenb

[Qemu-devel] [PATCH 0/4] Kconfig switches for core / misc devices

2019-05-14 Thread Thomas Huth

Here are some more Kconfig patches that introduce proper config
switches for some devices in the hw/core and hw/misc directories.

Thomas Huth (4):
  hw/core: Add a config switch for the "register" device
  hw/core: Add a config switch for the "or-irq" device
  hw/core: Add a config switch for the "split-irq" device
  hw/misc: Add a config switch for the "unimplemented" device

 hw/arm/Kconfig| 12 
 hw/core/Kconfig   |  9 +
 hw/core/Makefile.objs |  6 +++---
 hw/dma/Kconfig|  1 +
 hw/microblaze/Kconfig |  1 +
 hw/misc/Kconfig   |  3 +++
 hw/misc/Makefile.objs |  2 +-
 hw/pci-host/Kconfig   |  3 ++-
 hw/sparc64/Kconfig|  1 +
 hw/timer/Kconfig  |  1 +
 10 files changed, 34 insertions(+), 5 deletions(-)

-- 
2.21.0

[Qemu-devel] [PATCH 2/4] hw/core: Add a config switch for the "or-irq" device

2019-05-14 Thread Thomas Huth

The "or-irq" device is only used by certain machines. Let's add
a proper config switch for it so that it only gets compiled when we
really need it.

Signed-off-by: Thomas Huth 
---
 hw/arm/Kconfig| 2 ++
 hw/core/Kconfig   | 3 +++
 hw/core/Makefile.objs | 2 +-
 hw/pci-host/Kconfig   | 3 ++-
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index af8cffde9c..0bb3bbe9d3 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -277,6 +277,7 @@ config RASPI
 config STM32F205_SOC
 bool
 select ARM_V7M
+select OR_IRQ
 select STM32F2XX_TIMER
 select STM32F2XX_USART
 select STM32F2XX_SYSCFG
@@ -424,6 +425,7 @@ config ARMSSE
 select IOTKIT_SECCTL
 select IOTKIT_SYSCTL
 select IOTKIT_SYSINFO
+select OR_IRQ
 select TZ_MPC
 select TZ_MSC
 select TZ_PPC
diff --git a/hw/core/Kconfig b/hw/core/Kconfig
index d11920fcb3..984143456a 100644
--- a/hw/core/Kconfig
+++ b/hw/core/Kconfig
@@ -7,6 +7,9 @@ config PTIMER
 config FITLOADER
 bool
 
+config OR_IRQ
+bool
+
 config PLATFORM_BUS
 bool
 
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index d493a051ee..dd2c2ca812 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -17,7 +17,7 @@ common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_FITLOADER) += loader-fit.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
 common-obj-$(CONFIG_REGISTER) += register.o
-common-obj-$(CONFIG_SOFTMMU) += or-irq.o
+common-obj-$(CONFIG_OR_IRQ) += or-irq.o
 common-obj-$(CONFIG_SOFTMMU) += split-irq.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
 common-obj-$(CONFIG_SOFTMMU) += generic-loader.o
diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
index 8c16d96b3f..1edc1a31d4 100644
--- a/hw/pci-host/Kconfig
+++ b/hw/pci-host/Kconfig
@@ -2,8 +2,9 @@ config PAM
 bool
 
 config PREP_PCI
-select PCI
 bool
+select PCI
+select OR_IRQ
 
 config GRACKLE_PCI
 select PCI
-- 
2.21.0

[Qemu-devel] [PATCH 1/4] hw/core: Add a config switch for the "register" device

2019-05-14 Thread Thomas Huth

The "register" device is only used by certain machines. Let's add
a proper config switch for it so that it only gets compiled when we
really need it.

Signed-off-by: Thomas Huth 
---
 hw/core/Kconfig   | 3 +++
 hw/core/Makefile.objs | 2 +-
 hw/dma/Kconfig| 1 +
 hw/timer/Kconfig  | 1 +
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/core/Kconfig b/hw/core/Kconfig
index c2a1ae8122..d11920fcb3 100644
--- a/hw/core/Kconfig
+++ b/hw/core/Kconfig
@@ -9,3 +9,6 @@ config FITLOADER
 
 config PLATFORM_BUS
 bool
+
+config REGISTER
+bool
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index a799c83815..d493a051ee 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -16,7 +16,7 @@ common-obj-$(CONFIG_SOFTMMU) += machine.o
 common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_FITLOADER) += loader-fit.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
-common-obj-$(CONFIG_SOFTMMU) += register.o
+common-obj-$(CONFIG_REGISTER) += register.o
 common-obj-$(CONFIG_SOFTMMU) += or-irq.o
 common-obj-$(CONFIG_SOFTMMU) += split-irq.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
diff --git a/hw/dma/Kconfig b/hw/dma/Kconfig
index 751dec5426..5c61b67bc0 100644
--- a/hw/dma/Kconfig
+++ b/hw/dma/Kconfig
@@ -16,6 +16,7 @@ config I8257
 
 config ZYNQ_DEVCFG
 bool
+select REGISTER
 
 config STP2000
 bool
diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
index 51921eb63f..f575481210 100644
--- a/hw/timer/Kconfig
+++ b/hw/timer/Kconfig
@@ -36,6 +36,7 @@ config TWL92230
 
 config XLNX_ZYNQMP
 bool
+select REGISTER
 
 config ALTERA_TIMER
 bool
-- 
2.21.0

[Qemu-devel] [PATCH 4/4] hw/misc: Add a config switch for the "unimplemented" device

2019-05-14 Thread Thomas Huth

The device is only used by certain Arm boards. Now that we have
fine-grained Kconfig for these machines, too, we can enable the
"unimplemented" devices only for the machines that really need it.

Signed-off-by: Thomas Huth 
---
 hw/arm/Kconfig| 9 +
 hw/microblaze/Kconfig | 1 +
 hw/misc/Kconfig   | 3 +++
 hw/misc/Makefile.objs | 2 +-
 hw/sparc64/Kconfig| 1 +
 5 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index ac1e94f63a..7e261f5d73 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -201,6 +201,7 @@ config STELLARIS
 select SSI_SD
 select STELLARIS_INPUT
 select STELLARIS_ENET # ethernet
+select UNIMP
 
 config STRONGARM
 bool
@@ -267,6 +268,7 @@ config ALLWINNER_A10
 select ALLWINNER_A10_PIC
 select ALLWINNER_EMAC
 select SERIAL
+select UNIMP
 
 config RASPI
 bool
@@ -304,6 +306,7 @@ config XLNX_VERSAL
 select PL011
 select CADENCE
 select VIRTIO_MMIO
+select UNIMP
 
 config FSL_IMX25
 bool
@@ -339,6 +342,7 @@ config ASPEED_SOC
 select SSI_M25P80
 select TMP105
 select TMP421
+select UNIMP
 
 config MPS2
 bool
@@ -360,6 +364,7 @@ config FSL_IMX7
 select IMX_I2C
 select PCI_EXPRESS_DESIGNWARE
 select SDHCI
+select UNIMP
 
 config ARM_SMMUV3
 bool
@@ -371,6 +376,7 @@ config FSL_IMX6UL
 select IMX_FEC
 select IMX_I2C
 select SDHCI
+select UNIMP
 
 config MICROBIT
 bool
@@ -380,6 +386,7 @@ config NRF51_SOC
 bool
 select I2C
 select ARM_V7M
+select UNIMP
 
 config EMCRAFT_SF2
 bool
@@ -392,6 +399,7 @@ config MSF2
 select PTIMER
 select SERIAL
 select SSI
+select UNIMP
 
 config ZAURUS
 bool
@@ -430,6 +438,7 @@ config ARMSSE
 select TZ_MPC
 select TZ_MSC
 select TZ_PPC
+select UNIMP
 
 config ARMSSE_CPUID
 bool
diff --git a/hw/microblaze/Kconfig b/hw/microblaze/Kconfig
index c4dc120973..e2697ced9c 100644
--- a/hw/microblaze/Kconfig
+++ b/hw/microblaze/Kconfig
@@ -4,6 +4,7 @@ config PETALOGIX_S3ADSP1800
 select XILINX
 select XILINX_AXI
 select XILINX_ETHLITE
+select UNIMP
 
 config PETALOGIX_ML605
 bool
diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 385e1b0cec..51754bb47c 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -117,4 +117,7 @@ config AUX
 bool
 select I2C
 
+config UNIMP
+bool
+
 source macio/Kconfig
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index c71e07ae35..7a0902c76f 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -9,7 +9,7 @@ common-obj-$(CONFIG_PCI_TESTDEV) += pci-testdev.o
 common-obj-$(CONFIG_EDU) += edu.o
 common-obj-$(CONFIG_PCA9552) += pca9552.o
 
-common-obj-y += unimp.o
+common-obj-$(CONFIG_UNIMP) += unimp.o
 common-obj-$(CONFIG_FW_CFG_DMA) += vmcoreinfo.o
 
 # ARM devices
diff --git a/hw/sparc64/Kconfig b/hw/sparc64/Kconfig
index d4d76a89be..f9f8b0f73a 100644
--- a/hw/sparc64/Kconfig
+++ b/hw/sparc64/Kconfig
@@ -17,3 +17,4 @@ config NIAGARA
 bool
 select EMPTY_SLOT
 select SUN4V_RTC
+select UNIMP
-- 
2.21.0

Re: [Qemu-devel] [PATCH v2 1/2] vl: Deprecate -virtfs_synth

2019-05-14 Thread Thomas Huth

On 13/05/2019 12.34, Greg Kurz wrote:
> The synth fsdriver never got used for anything else but the QTest
> testcase for VirtIO 9P. And even there, QTest uses -fsdev synth and
> -device virtio-9p-... directly.
> 
> Signed-off-by: Greg Kurz 
> ---
> v2: - change "no replacement" to "use '-fsdev synth' instead"
> ---
>  qemu-deprecated.texi |5 +
>  qemu-options.hx  |3 ++-
>  vl.c |4 
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> index 842e71b11dcc..1a821b68f435 100644
> --- a/qemu-deprecated.texi
> +++ b/qemu-deprecated.texi
> @@ -72,6 +72,11 @@ backend settings instead of environment variables.  To 
> ease migration to
>  the new format, the ``-audiodev-help'' option can be used to convert
>  the current values of the environment variables to ``-audiodev'' options.
>  
> +@subsection -virtfs_synth (since 4.1)
> +
> +The ``-virtfs_synth'' argument is now deprecated. Please use ``-fsdev synth''
> +and ``-device virtio-9p-...'' instead.
> +
>  @section QEMU Machine Protocol (QMP) commands
>  
>  @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 51802cbb266a..03c50ba0f0b2 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1368,7 +1368,8 @@ DEF("virtfs_synth", 0, QEMU_OPTION_virtfs_synth,
>  STEXI
>  @item -virtfs_synth
>  @findex -virtfs_synth
> -Create synthetic file system image
> +Create synthetic file system image. Note that this option is now deprecated.
> +Please use @code{-fsdev synth} and @code{-device virtio-9p-...} instead.
>  ETEXI
>  
>  DEF("iscsi", HAS_ARG, QEMU_OPTION_iscsi,
> diff --git a/vl.c b/vl.c
> index b6709514c1bb..8456f006edbd 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3535,6 +3535,10 @@ int main(int argc, char **argv, char **envp)
>  QemuOpts *fsdev;
>  QemuOpts *device;
>  
> +warn_report("'-virtfs_synth' is deprecated, please use "
> + "'-fsdev synth' and '-device virtio-9p-...' "
> +"instead");
> +
>  fsdev = qemu_opts_create(qemu_find_opts("fsdev"), "v_synth",
>   1, NULL);
>  if (!fsdev) {
> 

Reviewed-by: Thomas Huth

[Qemu-devel] [PATCH 3/4] hw/core: Add a config switch for the "split-irq" device

2019-05-14 Thread Thomas Huth

The "split-irq" device is currently only used by machines that use
CONFIG_ARMSSE. Let's add a proper CONFIG_SPLIT_IRQ switch for this
so that it only gets compiled when we really need it.

Signed-off-by: Thomas Huth 
---
 hw/arm/Kconfig| 1 +
 hw/core/Kconfig   | 3 +++
 hw/core/Makefile.objs | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 0bb3bbe9d3..ac1e94f63a 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -426,6 +426,7 @@ config ARMSSE
 select IOTKIT_SYSCTL
 select IOTKIT_SYSINFO
 select OR_IRQ
+select SPLIT_IRQ
 select TZ_MPC
 select TZ_MSC
 select TZ_PPC
diff --git a/hw/core/Kconfig b/hw/core/Kconfig
index 984143456a..fffb3d62b2 100644
--- a/hw/core/Kconfig
+++ b/hw/core/Kconfig
@@ -15,3 +15,6 @@ config PLATFORM_BUS
 
 config REGISTER
 bool
+
+config SPLIT_IRQ
+bool
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index dd2c2ca812..d8c908da14 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -18,7 +18,7 @@ common-obj-$(CONFIG_FITLOADER) += loader-fit.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
 common-obj-$(CONFIG_REGISTER) += register.o
 common-obj-$(CONFIG_OR_IRQ) += or-irq.o
-common-obj-$(CONFIG_SOFTMMU) += split-irq.o
+common-obj-$(CONFIG_SPLIT_IRQ) += split-irq.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
 common-obj-$(CONFIG_SOFTMMU) += generic-loader.o
 common-obj-$(CONFIG_SOFTMMU) += null-machine.o
-- 
2.21.0

Re: [Qemu-devel] [Qemu-arm] [PATCH 4/4] hw/misc: Add a config switch for the "unimplemented" device

2019-05-14 Thread Peter Maydell

On Tue, 14 May 2019 at 11:02, Thomas Huth  wrote:
>
> The device is only used by certain Arm boards. Now that we have
> fine-grained Kconfig for these machines, too, we can enable the
> "unimplemented" devices only for the machines that really need it.
>
> Signed-off-by: Thomas Huth 
> ---
>  hw/arm/Kconfig| 9 +
>  hw/microblaze/Kconfig | 1 +
>  hw/misc/Kconfig   | 3 +++
>  hw/misc/Makefile.objs | 2 +-
>  hw/sparc64/Kconfig| 1 +
>  5 files changed, 15 insertions(+), 1 deletion(-)

Commit message says "only Arm" but code change changes
sparc and microblaze Kconfigs too :-)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta



> >>
> >> I think you should do the same here, vdev->priv is allocated in
> >> virtio_pmem_probe.
> >>
> >> But maybe I am missing something important here :)
> > 
> > Because virtio_balloon use "kzalloc" for allocation and needs to be freed.
> > But virtio pmem uses "devm_kzalloc" which takes care of automatically
> > deleting
> > the device memory when associated device is detached.
> 
> Hehe, thanks, that was the part that I was missing!

Thank you for the review.

Best regards,
Pankaj
> 
> --
> 
> Thanks,
> 
> David / dhildenb
>

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/4] hw/core: Add a config switch for the "or-irq" device

2019-05-14 Thread Peter Maydell

On Tue, 14 May 2019 at 11:00, Thomas Huth  wrote:
>
> The "or-irq" device is only used by certain machines. Let's add
> a proper config switch for it so that it only gets compiled when we
> really need it.
>
> Signed-off-by: Thomas Huth 
> ---
>  hw/arm/Kconfig| 2 ++
>  hw/core/Kconfig   | 3 +++
>  hw/core/Makefile.objs | 2 +-
>  hw/pci-host/Kconfig   | 3 ++-
>  4 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index af8cffde9c..0bb3bbe9d3 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -277,6 +277,7 @@ config RASPI
>  config STM32F205_SOC
>  bool
>  select ARM_V7M
> +select OR_IRQ
>  select STM32F2XX_TIMER
>  select STM32F2XX_USART
>  select STM32F2XX_SYSCFG
> @@ -424,6 +425,7 @@ config ARMSSE
>  select IOTKIT_SECCTL
>  select IOTKIT_SYSCTL
>  select IOTKIT_SYSINFO
> +select OR_IRQ
>  select TZ_MPC
>  select TZ_MSC
>  select TZ_PPC

In cases like this where a device is used both by
an SoC and also directly by the board code that uses
that SoC, should we put the select OR_IRQ only in
the SoC's config, or also in the board model's config
(ie, in "config MPS2" as well as "config ARMSSE") ?

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/4] Kconfig switches for core / misc devices

2019-05-14 Thread Paolo Bonzini

On 14/05/19 12:00, Thomas Huth wrote:
> Here are some more Kconfig patches that introduce proper config
> switches for some devices in the hw/core and hw/misc directories.
> 
> Thomas Huth (4):
>   hw/core: Add a config switch for the "register" device
>   hw/core: Add a config switch for the "or-irq" device
>   hw/core: Add a config switch for the "split-irq" device
>   hw/misc: Add a config switch for the "unimplemented" device
> 
>  hw/arm/Kconfig| 12 
>  hw/core/Kconfig   |  9 +
>  hw/core/Makefile.objs |  6 +++---
>  hw/dma/Kconfig|  1 +
>  hw/microblaze/Kconfig |  1 +
>  hw/misc/Kconfig   |  3 +++
>  hw/misc/Makefile.objs |  2 +-
>  hw/pci-host/Kconfig   |  3 ++-
>  hw/sparc64/Kconfig|  1 +
>  hw/timer/Kconfig  |  1 +
>  10 files changed, 34 insertions(+), 5 deletions(-)
> 

Acked-by: Paolo Bonzini 

Paolo

Re: [Qemu-devel] [PATCH v3 3/3] contrib: add vhost-user-input

2019-05-14 Thread Marc-André Lureau

Hi

On Tue, May 14, 2019 at 8:51 AM Gerd Hoffmann  wrote:
>
> On Mon, May 13, 2019 at 08:33:25PM +0200, Marc-André Lureau wrote:
> > Add a vhost-user input backend example, based on virtio-input-host
> > device. It takes an evdev path as argument, and can be associated with
> > a vhost-user-input device via a UNIX socket:
> >
> > $ vhost-user-input -p /dev/input/eventX -s /tmp/vui.sock
> >
> > $ qemu ... -chardev socket,id=vuic,path=/tmp/vui.sock
> >   -device vhost-user-input-pci,chardev=vuic
> >
> > This example is intentionally not included in $TOOLS, and not
> > installed by default.
>
> Patch doesn't apply cleanly to git master.  Also git complains that it
> can't find the sha1 and therefore can't try a 3way merge.  Does this
> depend on unmerged local patches?
>
> (same goes for the vhost-user-gpu patch in the other series btw).

Nothing special, patchew managed to apply the series.

But patch 2/3 here is bad, I'll resend.


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 2/4] hw/core: Add a config switch for the "or-irq" device

2019-05-14 Thread Philippe Mathieu-Daudé

On 5/14/19 12:00 PM, Thomas Huth wrote:
> The "or-irq" device is only used by certain machines. Let's add
> a proper config switch for it so that it only gets compiled when we
> really need it.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/arm/Kconfig| 2 ++
>  hw/core/Kconfig   | 3 +++
>  hw/core/Makefile.objs | 2 +-
>  hw/pci-host/Kconfig   | 3 ++-
>  4 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index af8cffde9c..0bb3bbe9d3 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -277,6 +277,7 @@ config RASPI
>  config STM32F205_SOC
>  bool
>  select ARM_V7M
> +select OR_IRQ
>  select STM32F2XX_TIMER
>  select STM32F2XX_USART
>  select STM32F2XX_SYSCFG
> @@ -424,6 +425,7 @@ config ARMSSE
>  select IOTKIT_SECCTL
>  select IOTKIT_SYSCTL
>  select IOTKIT_SYSINFO
> +select OR_IRQ
>  select TZ_MPC
>  select TZ_MSC
>  select TZ_PPC

You missed the MPS2* boards

> diff --git a/hw/core/Kconfig b/hw/core/Kconfig
> index d11920fcb3..984143456a 100644
> --- a/hw/core/Kconfig
> +++ b/hw/core/Kconfig
> @@ -7,6 +7,9 @@ config PTIMER
>  config FITLOADER
>  bool
>  
> +config OR_IRQ
> +bool
> +
>  config PLATFORM_BUS
>  bool
>  
> diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
> index d493a051ee..dd2c2ca812 100644
> --- a/hw/core/Makefile.objs
> +++ b/hw/core/Makefile.objs
> @@ -17,7 +17,7 @@ common-obj-$(CONFIG_SOFTMMU) += loader.o
>  common-obj-$(CONFIG_FITLOADER) += loader-fit.o
>  common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
>  common-obj-$(CONFIG_REGISTER) += register.o
> -common-obj-$(CONFIG_SOFTMMU) += or-irq.o
> +common-obj-$(CONFIG_OR_IRQ) += or-irq.o
>  common-obj-$(CONFIG_SOFTMMU) += split-irq.o
>  common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
>  common-obj-$(CONFIG_SOFTMMU) += generic-loader.o
> diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
> index 8c16d96b3f..1edc1a31d4 100644
> --- a/hw/pci-host/Kconfig
> +++ b/hw/pci-host/Kconfig
> @@ -2,8 +2,9 @@ config PAM
>  bool
>  
>  config PREP_PCI
> -select PCI
>  bool
> +select PCI
> +select OR_IRQ
>  
>  config GRACKLE_PCI
>  select PCI
>

Re: [Qemu-devel] [PATCH 1/4] hw/core: Add a config switch for the "register" device

2019-05-14 Thread Thomas Huth

On 14/05/2019 12.31, Philippe Mathieu-Daudé wrote:
> On 5/14/19 12:00 PM, Thomas Huth wrote:
>> The "register" device is only used by certain machines. Let's add
>> a proper config switch for it so that it only gets compiled when we
>> really need it.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/core/Kconfig   | 3 +++
>>  hw/core/Makefile.objs | 2 +-
>>  hw/dma/Kconfig| 1 +
>>  hw/timer/Kconfig  | 1 +
>>  4 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/core/Kconfig b/hw/core/Kconfig
>> index c2a1ae8122..d11920fcb3 100644
>> --- a/hw/core/Kconfig
>> +++ b/hw/core/Kconfig
>> @@ -9,3 +9,6 @@ config FITLOADER
>>  
>>  config PLATFORM_BUS
>>  bool
>> +
>> +config REGISTER
>> +bool
>> diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
>> index a799c83815..d493a051ee 100644
>> --- a/hw/core/Makefile.objs
>> +++ b/hw/core/Makefile.objs
>> @@ -16,7 +16,7 @@ common-obj-$(CONFIG_SOFTMMU) += machine.o
>>  common-obj-$(CONFIG_SOFTMMU) += loader.o
>>  common-obj-$(CONFIG_FITLOADER) += loader-fit.o
>>  common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
>> -common-obj-$(CONFIG_SOFTMMU) += register.o
>> +common-obj-$(CONFIG_REGISTER) += register.o
>>  common-obj-$(CONFIG_SOFTMMU) += or-irq.o
>>  common-obj-$(CONFIG_SOFTMMU) += split-irq.o
>>  common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
>> diff --git a/hw/dma/Kconfig b/hw/dma/Kconfig
>> index 751dec5426..5c61b67bc0 100644
>> --- a/hw/dma/Kconfig
>> +++ b/hw/dma/Kconfig
>> @@ -16,6 +16,7 @@ config I8257
>>  
>>  config ZYNQ_DEVCFG
>>  bool
>> +select REGISTER
>>  
>>  config STP2000
>>  bool
>> diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
>> index 51921eb63f..f575481210 100644
>> --- a/hw/timer/Kconfig
>> +++ b/hw/timer/Kconfig
>> @@ -36,6 +36,7 @@ config TWL92230
>>  
>>  config XLNX_ZYNQMP
>>  bool
>> +select REGISTER
>>  
>>  config ALTERA_TIMER
>>  bool
>>
> 
> Annoying, this clashes with "hw/microblaze: Kconfig cleanup" which is
> already queued by Paolo:
> https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg04669.html

Ok, I'll wait for the merge of Paolo's next PULL request before sending
the v2 of my series.

 Thomas

[Qemu-devel] [PATCH v4 3/3] contrib: add vhost-user-input

2019-05-14 Thread Marc-André Lureau

Add a vhost-user input backend example, based on virtio-input-host
device. It takes an evdev path as argument, and can be associated with
a vhost-user-input device via a UNIX socket:

$ vhost-user-input -p /dev/input/eventX -s /tmp/vui.sock

$ qemu ... -chardev socket,id=vuic,path=/tmp/vui.sock
  -device vhost-user-input-pci,chardev=vuic

This example is intentionally not included in $TOOLS, and not
installed by default.

Signed-off-by: Marc-André Lureau 
---
 contrib/vhost-user-input/main.c| 393 +
 MAINTAINERS|   1 +
 Makefile   |  11 +
 Makefile.objs  |   1 +
 contrib/vhost-user-input/Makefile.objs |   1 +
 5 files changed, 407 insertions(+)
 create mode 100644 contrib/vhost-user-input/main.c
 create mode 100644 contrib/vhost-user-input/Makefile.objs

diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
new file mode 100644
index 00..8d493f598e
--- /dev/null
+++ b/contrib/vhost-user-input/main.c
@@ -0,0 +1,393 @@
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include 
+#include 
+
+#include "qemu/iov.h"
+#include "qemu/bswap.h"
+#include "qemu/sockets.h"
+#include "contrib/libvhost-user/libvhost-user.h"
+#include "contrib/libvhost-user/libvhost-user-glib.h"
+#include "standard-headers/linux/virtio_input.h"
+#include "qapi/error.h"
+
+typedef struct virtio_input_event virtio_input_event;
+typedef struct virtio_input_config virtio_input_config;
+
+typedef struct VuInput {
+VugDev dev;
+GSource *evsrc;
+int evdevfd;
+GArray *config;
+virtio_input_config *sel_config;
+struct {
+virtio_input_event event;
+VuVirtqElement *elem;
+} *queue;
+uint32_t qindex, qsize;
+} VuInput;
+
+static void vi_input_send(VuInput *vi, struct virtio_input_event *event)
+{
+VuDev *dev = &vi->dev.parent;
+VuVirtq *vq = vu_get_queue(dev, 0);
+VuVirtqElement *elem;
+int i, len;
+
+/* queue up events ... */
+if (vi->qindex == vi->qsize) {
+vi->qsize++;
+vi->queue = g_realloc_n(vi->queue, vi->qsize, sizeof(vi->queue[0]));
+}
+vi->queue[vi->qindex++].event = *event;
+
+/* ... until we see a report sync ... */
+if (event->type != htole16(EV_SYN) ||
+event->code != htole16(SYN_REPORT)) {
+return;
+}
+
+/* ... then check available space ... */
+for (i = 0; i < vi->qindex; i++) {
+elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+if (!elem) {
+while (--i >= 0) {
+vu_queue_unpop(dev, vq, vi->queue[i].elem, 0);
+}
+vi->qindex = 0;
+g_warning("virtio-input queue full");
+return;
+}
+vi->queue[i].elem = elem;
+}
+
+/* ... and finally pass them to the guest */
+for (i = 0; i < vi->qindex; i++) {
+elem = vi->queue[i].elem;
+len = iov_from_buf(elem->in_sg, elem->in_num,
+   0, &vi->queue[i].event, sizeof(virtio_input_event));
+vu_queue_push(dev, vq, elem, len);
+g_free(elem);
+}
+
+vu_queue_notify(&vi->dev.parent, vq);
+vi->qindex = 0;
+}
+
+static void
+vi_evdev_watch(VuDev *dev, int condition, void *data)
+{
+VuInput *vi = data;
+int fd = vi->evdevfd;
+
+g_debug("Got evdev condition %x", condition);
+
+struct virtio_input_event virtio;
+struct input_event evdev;
+int rc;
+
+for (;;) {
+rc = read(fd, &evdev, sizeof(evdev));
+if (rc != sizeof(evdev)) {
+break;
+}
+
+g_debug("input %d %d %d", evdev.type, evdev.code, evdev.value);
+
+virtio.type  = htole16(evdev.type);
+virtio.code  = htole16(evdev.code);
+virtio.value = htole32(evdev.value);
+vi_input_send(vi, &virtio);
+}
+}
+
+
+static void vi_handle_status(VuInput *vi, virtio_input_event *event)
+{
+struct input_event evdev;
+int rc;
+
+if (gettimeofday(&evdev.time, NULL)) {
+perror("vi_handle_status: gettimeofday");
+return;
+}
+
+evdev.type = le16toh(event->type);
+evdev.code = le16toh(event->code);
+evdev.value = le32toh(event->value);
+
+rc = write(vi->evdevfd, &evdev, sizeof(evdev));
+if (rc == -1) {
+perror("vi_host_handle_status: write");
+}
+}
+
+static void vi_handle_sts(VuDev *dev, int qidx)
+{
+VuInput *vi = container_of(dev, VuInput, dev.parent);
+VuVirtq *vq = vu_get_queue(dev, qidx);
+virtio_input_event event;
+VuVirtqElement *elem;
+int len;
+
+g_debug("%s", G_STRFUNC);
+
+for (;;) {
+elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+if (!elem) {
+break;
+}
+
+memset(&event, 0, sizeof(event));
+len = iov_to_buf(

Re: [Qemu-devel] [PATCH 1/4] hw/core: Add a config switch for the "register" device

2019-05-14 Thread Philippe Mathieu-Daudé

On 5/14/19 12:00 PM, Thomas Huth wrote:
> The "register" device is only used by certain machines. Let's add
> a proper config switch for it so that it only gets compiled when we
> really need it.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/core/Kconfig   | 3 +++
>  hw/core/Makefile.objs | 2 +-
>  hw/dma/Kconfig| 1 +
>  hw/timer/Kconfig  | 1 +
>  4 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/Kconfig b/hw/core/Kconfig
> index c2a1ae8122..d11920fcb3 100644
> --- a/hw/core/Kconfig
> +++ b/hw/core/Kconfig
> @@ -9,3 +9,6 @@ config FITLOADER
>  
>  config PLATFORM_BUS
>  bool
> +
> +config REGISTER
> +bool
> diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
> index a799c83815..d493a051ee 100644
> --- a/hw/core/Makefile.objs
> +++ b/hw/core/Makefile.objs
> @@ -16,7 +16,7 @@ common-obj-$(CONFIG_SOFTMMU) += machine.o
>  common-obj-$(CONFIG_SOFTMMU) += loader.o
>  common-obj-$(CONFIG_FITLOADER) += loader-fit.o
>  common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
> -common-obj-$(CONFIG_SOFTMMU) += register.o
> +common-obj-$(CONFIG_REGISTER) += register.o
>  common-obj-$(CONFIG_SOFTMMU) += or-irq.o
>  common-obj-$(CONFIG_SOFTMMU) += split-irq.o
>  common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
> diff --git a/hw/dma/Kconfig b/hw/dma/Kconfig
> index 751dec5426..5c61b67bc0 100644
> --- a/hw/dma/Kconfig
> +++ b/hw/dma/Kconfig
> @@ -16,6 +16,7 @@ config I8257
>  
>  config ZYNQ_DEVCFG
>  bool
> +select REGISTER
>  
>  config STP2000
>  bool
> diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
> index 51921eb63f..f575481210 100644
> --- a/hw/timer/Kconfig
> +++ b/hw/timer/Kconfig
> @@ -36,6 +36,7 @@ config TWL92230
>  
>  config XLNX_ZYNQMP
>  bool
> +select REGISTER
>  
>  config ALTERA_TIMER
>  bool
> 

Annoying, this clashes with "hw/microblaze: Kconfig cleanup" which is
already queued by Paolo:
https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg04669.html

[Qemu-devel] [PATCH v4 0/3] Add vhost-user-input

2019-05-14 Thread Marc-André Lureau

Hi,

v4:
- update "libvhost-user: fix -Werror=format= on ppc64"

v3:
- rebased, fixing some warnings found during merge

v2:
- build fixes

v1: (changes since original v6 series)
- add "libvhost-user: fix -Waddress-of-packed-member" & "util: simplify 
unix_listen()"
- use unix_listen()
- build vhost-user-input by default (when applicable)

Marc-André Lureau (3):
  libvhost-user: fix cast warnings on 32 bits
  libvhost-user: fix -Werror=format= on ppc64
  contrib: add vhost-user-input

 contrib/libvhost-user/libvhost-user.c  |  12 +-
 contrib/vhost-user-input/main.c| 393 +
 MAINTAINERS|   1 +
 Makefile   |  11 +
 Makefile.objs  |   1 +
 contrib/vhost-user-input/Makefile.objs |   1 +
 6 files changed, 414 insertions(+), 5 deletions(-)
 create mode 100644 contrib/vhost-user-input/main.c
 create mode 100644 contrib/vhost-user-input/Makefile.objs

-- 
2.21.0.777.g83232e3864

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/4] hw/core: Add a config switch for the "or-irq" device

2019-05-14 Thread Philippe Mathieu-Daudé

On 5/14/19 12:06 PM, Peter Maydell wrote:
> On Tue, 14 May 2019 at 11:00, Thomas Huth  wrote:
>>
>> The "or-irq" device is only used by certain machines. Let's add
>> a proper config switch for it so that it only gets compiled when we
>> really need it.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/arm/Kconfig| 2 ++
>>  hw/core/Kconfig   | 3 +++
>>  hw/core/Makefile.objs | 2 +-
>>  hw/pci-host/Kconfig   | 3 ++-
>>  4 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
>> index af8cffde9c..0bb3bbe9d3 100644
>> --- a/hw/arm/Kconfig
>> +++ b/hw/arm/Kconfig
>> @@ -277,6 +277,7 @@ config RASPI
>>  config STM32F205_SOC
>>  bool
>>  select ARM_V7M
>> +select OR_IRQ
>>  select STM32F2XX_TIMER
>>  select STM32F2XX_USART
>>  select STM32F2XX_SYSCFG
>> @@ -424,6 +425,7 @@ config ARMSSE
>>  select IOTKIT_SECCTL
>>  select IOTKIT_SYSCTL
>>  select IOTKIT_SYSINFO
>> +select OR_IRQ
>>  select TZ_MPC
>>  select TZ_MSC
>>  select TZ_PPC
> 
> In cases like this where a device is used both by
> an SoC and also directly by the board code that uses
> that SoC, should we put the select OR_IRQ only in
> the SoC's config, or also in the board model's config
> (ie, in "config MPS2" as well as "config ARMSSE") ?

Someone should be able to work on the board without having to look at
the SoC code/config, so both :) The idea of Kconfig is you only worry
about a specific device, and the qgraph sort the rest out.

So having in both place is safer, and helps to visualize dependencies in
the graph tree (I'm slowly working on this feature to help new-comer to
understand model dependencies).

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/4] hw/core: Add a config switch for the "or-irq" device

2019-05-14 Thread Philippe Mathieu-Daudé

On 5/14/19 12:25 PM, Philippe Mathieu-Daudé wrote:
> On 5/14/19 12:06 PM, Peter Maydell wrote:
>> On Tue, 14 May 2019 at 11:00, Thomas Huth  wrote:
>>>
>>> The "or-irq" device is only used by certain machines. Let's add
>>> a proper config switch for it so that it only gets compiled when we
>>> really need it.
>>>
>>> Signed-off-by: Thomas Huth 
>>> ---
>>>  hw/arm/Kconfig| 2 ++
>>>  hw/core/Kconfig   | 3 +++
>>>  hw/core/Makefile.objs | 2 +-
>>>  hw/pci-host/Kconfig   | 3 ++-
>>>  4 files changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
>>> index af8cffde9c..0bb3bbe9d3 100644
>>> --- a/hw/arm/Kconfig
>>> +++ b/hw/arm/Kconfig
>>> @@ -277,6 +277,7 @@ config RASPI
>>>  config STM32F205_SOC
>>>  bool
>>>  select ARM_V7M
>>> +select OR_IRQ
>>>  select STM32F2XX_TIMER
>>>  select STM32F2XX_USART
>>>  select STM32F2XX_SYSCFG
>>> @@ -424,6 +425,7 @@ config ARMSSE
>>>  select IOTKIT_SECCTL
>>>  select IOTKIT_SYSCTL
>>>  select IOTKIT_SYSINFO
>>> +select OR_IRQ
>>>  select TZ_MPC
>>>  select TZ_MSC
>>>  select TZ_PPC
>>
>> In cases like this where a device is used both by
>> an SoC and also directly by the board code that uses
>> that SoC, should we put the select OR_IRQ only in
>> the SoC's config, or also in the board model's config
>> (ie, in "config MPS2" as well as "config ARMSSE") ?
> 
> Someone should be able to work on the board without having to look at
> the SoC code/config, so both :) The idea of Kconfig is you only worry
> about a specific device, and the qgraph sort the rest out.

Hypothetical example if you only use the selector in the SoC config:
If you replace the SoC OR_IRQ by a more complex device or extended one,
the board will lack the OR_IRQ selector.

Using Kconfig selectors in all place a dependency is explicit also ease
backports.

> 
> So having in both place is safer, and helps to visualize dependencies in
> the graph tree (I'm slowly working on this feature to help new-comer to
> understand model dependencies).
>

[Qemu-devel] [PATCH v4 1/3] libvhost-user: fix cast warnings on 32 bits

2019-05-14 Thread Marc-André Lureau

Fixes warnings:
 warning: cast to pointer from integer of different size
 [-Wint-to-pointer-cast]

Signed-off-by: Marc-André Lureau 
---
 contrib/libvhost-user/libvhost-user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index 74d42177c5..40443a3daa 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -621,7 +621,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
*vmsg)
  * data that's already arrived in the shared process.
  * TODO: How to do hugepage
  */
-ret = madvise((void *)dev_region->mmap_addr,
+ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
   dev_region->size + dev_region->mmap_offset,
   MADV_DONTNEED);
 if (ret) {
@@ -633,7 +633,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
*vmsg)
  * in neighbouring pages.
  * TODO: Turn this backon later.
  */
-ret = madvise((void *)dev_region->mmap_addr,
+ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
   dev_region->size + dev_region->mmap_offset,
   MADV_NOHUGEPAGE);
 if (ret) {
@@ -666,7 +666,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
*vmsg)
 DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
 __func__, i, reg_struct.range.start, reg_struct.range.len);
 /* Now it's registered we can let the client at it */
-if (mprotect((void *)dev_region->mmap_addr,
+if (mprotect((void *)(uintptr_t)dev_region->mmap_addr,
  dev_region->size + dev_region->mmap_offset,
  PROT_READ | PROT_WRITE)) {
 vu_panic(dev, "failed to mprotect region %d for postcopy (%s)",
-- 
2.21.0.777.g83232e3864

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/4] hw/core: Add a config switch for the "or-irq" device

2019-05-14 Thread Thomas Huth

On 14/05/2019 12.25, Philippe Mathieu-Daudé wrote:
> On 5/14/19 12:06 PM, Peter Maydell wrote:
>> On Tue, 14 May 2019 at 11:00, Thomas Huth  wrote:
>>>
>>> The "or-irq" device is only used by certain machines. Let's add
>>> a proper config switch for it so that it only gets compiled when we
>>> really need it.
>>>
>>> Signed-off-by: Thomas Huth 
>>> ---
>>>  hw/arm/Kconfig| 2 ++
>>>  hw/core/Kconfig   | 3 +++
>>>  hw/core/Makefile.objs | 2 +-
>>>  hw/pci-host/Kconfig   | 3 ++-
>>>  4 files changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
>>> index af8cffde9c..0bb3bbe9d3 100644
>>> --- a/hw/arm/Kconfig
>>> +++ b/hw/arm/Kconfig
>>> @@ -277,6 +277,7 @@ config RASPI
>>>  config STM32F205_SOC
>>>  bool
>>>  select ARM_V7M
>>> +select OR_IRQ
>>>  select STM32F2XX_TIMER
>>>  select STM32F2XX_USART
>>>  select STM32F2XX_SYSCFG
>>> @@ -424,6 +425,7 @@ config ARMSSE
>>>  select IOTKIT_SECCTL
>>>  select IOTKIT_SYSCTL
>>>  select IOTKIT_SYSINFO
>>> +select OR_IRQ
>>>  select TZ_MPC
>>>  select TZ_MSC
>>>  select TZ_PPC
>>
>> In cases like this where a device is used both by
>> an SoC and also directly by the board code that uses
>> that SoC, should we put the select OR_IRQ only in
>> the SoC's config, or also in the board model's config
>> (ie, in "config MPS2" as well as "config ARMSSE") ?
> 
> Someone should be able to work on the board without having to look at
> the SoC code/config, so both :) The idea of Kconfig is you only worry
> about a specific device, and the qgraph sort the rest out.

I don't have a strong opinion here, but likely is safer indeed to put
the switch into both sections in this case - so if one of the two ever
gets changed, the config switch is still there for the other one that
still requires it. I'll send a v2.

 Thomas

Re: [Qemu-devel] [PATCH 2/4] hw/core: Add a config switch for the "or-irq" device

2019-05-14 Thread Thomas Huth

On 14/05/2019 12.35, Philippe Mathieu-Daudé wrote:
> On 5/14/19 12:00 PM, Thomas Huth wrote:
>> The "or-irq" device is only used by certain machines. Let's add
>> a proper config switch for it so that it only gets compiled when we
>> really need it.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/arm/Kconfig| 2 ++
>>  hw/core/Kconfig   | 3 +++
>>  hw/core/Makefile.objs | 2 +-
>>  hw/pci-host/Kconfig   | 3 ++-
>>  4 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
>> index af8cffde9c..0bb3bbe9d3 100644
>> --- a/hw/arm/Kconfig
>> +++ b/hw/arm/Kconfig
>> @@ -277,6 +277,7 @@ config RASPI
>>  config STM32F205_SOC
>>  bool
>>  select ARM_V7M
>> +select OR_IRQ
>>  select STM32F2XX_TIMER
>>  select STM32F2XX_USART
>>  select STM32F2XX_SYSCFG
>> @@ -424,6 +425,7 @@ config ARMSSE
>>  select IOTKIT_SECCTL
>>  select IOTKIT_SYSCTL
>>  select IOTKIT_SYSINFO
>> +select OR_IRQ
>>  select TZ_MPC
>>  select TZ_MSC
>>  select TZ_PPC
> 
> You missed the MPS2* boards

No, the MPS2 boards "select ARMSSE", so this gets added via the above
hunk there. But as mentioned in the reply to Peter, it's likely better
to add it there, too.

 Thomas

[Qemu-devel] [PATCH v4 2/3] libvhost-user: fix -Werror=format= on ppc64

2019-05-14 Thread Marc-André Lureau

That should fix the following warning:

/home/pm215/qemu/contrib/libvhost-user/libvhost-user.c: In function
‘vu_set_mem_table_exec_postcopy’:
/home/pm215/qemu/contrib/libvhost-user/libvhost-user.c:666:9: error:
format ‘%llx’ expects argument of type ‘long long unsigned int’, but
argument 5 has type ‘__u64’ [-Werror=format=]
 DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
 ^
/home/pm215/qemu/contrib/libvhost-user/libvhost-user.c:666:9: error:
format ‘%llx’ expects argument of type ‘long long unsigned int’, but
argument 6 has type ‘__u64’ [-Werror=format=]
cc1: all warnings being treated as errors

Signed-off-by: Marc-André Lureau 
---
 contrib/libvhost-user/libvhost-user.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index 40443a3daa..ab85166b15 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -663,8 +663,10 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
*vmsg)
  __func__, i);
 return false;
 }
-DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
-__func__, i, reg_struct.range.start, reg_struct.range.len);
+DPRINT("%s: region %d: Registered userfault for %"
+   PRIu64 " + %" PRIu64 "\n", __func__, i,
+   (uint64_t)reg_struct.range.start,
+   (uint64_t)reg_struct.range.len);
 /* Now it's registered we can let the client at it */
 if (mprotect((void *)(uintptr_t)dev_region->mmap_addr,
  dev_region->size + dev_region->mmap_offset,
-- 
2.21.0.777.g83232e3864

Re: [Qemu-devel] [Qemu-arm] [PATCH 4/4] hw/misc: Add a config switch for the "unimplemented" device

2019-05-14 Thread Thomas Huth

On 14/05/2019 12.08, Peter Maydell wrote:
> On Tue, 14 May 2019 at 11:02, Thomas Huth  wrote:
>>
>> The device is only used by certain Arm boards. Now that we have
>> fine-grained Kconfig for these machines, too, we can enable the
>> "unimplemented" devices only for the machines that really need it.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/arm/Kconfig| 9 +
>>  hw/microblaze/Kconfig | 1 +
>>  hw/misc/Kconfig   | 3 +++
>>  hw/misc/Makefile.objs | 2 +-
>>  hw/sparc64/Kconfig| 1 +
>>  5 files changed, 15 insertions(+), 1 deletion(-)
> 
> Commit message says "only Arm" but code change changes
> sparc and microblaze Kconfigs too :-)

D'oh! ... I started with grep'ing for TYPE_UNIMPLEMENTED_DEVICE and only
saw Arm boards there. When I tested my patches, I noticed that I must
also add the machines that use the create_unimplemented_device()
function, but apparently forgot to fix up the commit message
accordingly. I'll fix it in v2.

 Thomas

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Erik Skultety

On Tue, May 14, 2019 at 11:51:35AM +0200, Cornelia Huck wrote:
> On Tue, 14 May 2019 03:47:36 -0400
> Yan Zhao  wrote:
>
> > On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
>
> > > > > That said, from libvirt POV as a consumer, I'd expect there to be 
> > > > > truly only 2
> > > > > errors (I believe Alex has mentioned something similar in one of his 
> > > > > responses
> > > > > in one of the threads):
> > > > > a) read error indicating that an mdev type doesn't support 
> > > > > migration
> > > > > - I assume if one type doesn't support migration, none of the 
> > > > > other
> > > > >   types exposed on the parent device do, is that a fair 
> > > > > assumption?
>
> Probably; but there might be cases where the migratability depends not
> on the device type, but how the partitioning has been done... or is
> that too contrived?

No, you have a point - once again I let my thoughts be carried away by the idea
of heterogeneous setups, which is a discussion for another time anyway, I was
just thinking out loud.

>
> > > > > b) write error indicating that the mdev types are incompatible for
> > > > > migration
> > > > >
> > > > > Regards,
> > > > > Erik
> > > > Thanks for this explanation.
> > > > so, can we arrive at below agreements?
> > > >
> > > > 1. "not to define the specific errno returned for a specific situation,
> > > > let the vendor driver decide, userspace simply needs to know that an 
> > > > errno on
> > > > read indicates the device does not support migration version comparison 
> > > > and
> > > > that an errno on write indicates the devices are incompatible or the 
> > > > target
> > > > doesn't support migration versions. "
> > > > 2. vendor driver should log detailed error reasons in kernel log.
> > >
> > > That would be my take on this, yes, but I open to hear any other 
> > > suggestions and
> > > ideas I couldn't think of as well.
>
> So, read to find out whether migration is supported at all, write to
> find out whether it is supported for that concrete pairing is
> reasonable for libvirt?

Yes, more specifically, in the prepare phase of migration, we'd retrieve the
string (potentially reporting an error like: "Failed to query migration
support: "), put the string into the migration cookie and
do the check with write on destination. The only thing is that if the error is
on the destination, the error message in kernel log lives only on the
destination, which doesn't help libvirt users, so it would require setting up
remote logging, but for layered products, this is not a problem since those
already utilize central logging nodes.

Then there are the libvirt-specific bits out of scope of this discussion,
whether we should only assume identical mdev type pairs, or whether we should
employ best effort approach and iterate over all the available types exposed by
the vendor and check whether any of the types would support this migration
(back to your note Connie, partitioning would come into the picture here).

>
> > >
> > > Erik
> > got it. thanks a lot!
> >
> > hi Cornelia and Dave,
> > do you also agree on:
> > 1. "not to define the specific errno returned for a specific situation,
> > let the vendor driver decide, userspace simply needs to know that an errno 
> > on
> > read indicates the device does not support migration version comparison and
> > that an errno on write indicates the devices are incompatible or the target
> > doesn't support migration versions. "
> > 2. vendor driver should log detailed error reasons in kernel log.
>
> Two questions:
> - How reasonable is it to refer to the system log in order to find out
>   what exactly went wrong?
> - If detailed error reporting is basically done to the syslog, do
>   different error codes still provide useful information? Or should the
>   vendor driver decide what it wants to do?

I'd leave anything beyond returning -1 on read/write from/to the sysfs to the
vendor driver, as user space has no control over it, even if there was a
facility to interpret different return codes for us, I'm not sure (in this
migration-related case) how much would userspace be able to recover or
fallback anyway, you either can or cannot migrate smoothely.

Regards,
Erik

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Dr. David Alan Gilbert

* Cornelia Huck (coh...@redhat.com) wrote:
> On Tue, 14 May 2019 03:47:36 -0400
> Yan Zhao  wrote:
> 
> > On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:  
> > > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:  
> 
> > > > > That said, from libvirt POV as a consumer, I'd expect there to be 
> > > > > truly only 2
> > > > > errors (I believe Alex has mentioned something similar in one of his 
> > > > > responses
> > > > > in one of the threads):
> > > > > a) read error indicating that an mdev type doesn't support 
> > > > > migration
> > > > > - I assume if one type doesn't support migration, none of the 
> > > > > other
> > > > >   types exposed on the parent device do, is that a fair 
> > > > > assumption?
> 
> Probably; but there might be cases where the migratability depends not
> on the device type, but how the partitioning has been done... or is
> that too contrived?
> 
> > > > > b) write error indicating that the mdev types are incompatible for
> > > > > migration
> > > > >
> > > > > Regards,
> > > > > Erik  
> > > > Thanks for this explanation.
> > > > so, can we arrive at below agreements?
> > > >
> > > > 1. "not to define the specific errno returned for a specific situation,
> > > > let the vendor driver decide, userspace simply needs to know that an 
> > > > errno on
> > > > read indicates the device does not support migration version comparison 
> > > > and
> > > > that an errno on write indicates the devices are incompatible or the 
> > > > target
> > > > doesn't support migration versions. "
> > > > 2. vendor driver should log detailed error reasons in kernel log.  
> > > 
> > > That would be my take on this, yes, but I open to hear any other 
> > > suggestions and
> > > ideas I couldn't think of as well.
> 
> So, read to find out whether migration is supported at all, write to
> find out whether it is supported for that concrete pairing is
> reasonable for libvirt?
> 
> > > 
> > > Erik  
> > got it. thanks a lot!
> > 
> > hi Cornelia and Dave,
> > do you also agree on:
> > 1. "not to define the specific errno returned for a specific situation,
> > let the vendor driver decide, userspace simply needs to know that an errno 
> > on
> > read indicates the device does not support migration version comparison and
> > that an errno on write indicates the devices are incompatible or the target
> > doesn't support migration versions. "
> > 2. vendor driver should log detailed error reasons in kernel log.
> 
> Two questions:
> - How reasonable is it to refer to the system log in order to find out
>   what exactly went wrong?
> - If detailed error reporting is basically done to the syslog, do
>   different error codes still provide useful information? Or should the
>   vendor driver decide what it wants to do?

I don't see error codes as being that helpful; if we can't actually get
an error message back up the stack (which was my preference), then I guess
syslog is as good as it will get.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 0/3] Optimize COLO related codes and description

2019-05-14 Thread Dr. David Alan Gilbert

* Zhang, Chen (chen.zh...@intel.com) wrote:
> Hi Dave,
> 
> I noticed that you have reviewed all the patches in this series, can you 
> queue it?

Yes, I'm about to start a migration pull now.

Dave

> Thanks
> Zhang Chen
> 
> 
> > -Original Message-
> > From: Zhang, Chen
> > Sent: Friday, April 26, 2019 5:07 PM
> > To: Laurent Vivier ; Dr. David Alan Gilbert
> > ; Juan Quintela ; zhanghailiang
> > ; Markus Armbruster
> > ; qemu-dev 
> > Cc: Zhang Chen ; Zhang, Chen 
> > Subject: [PATCH 0/3] Optimize COLO related codes and description
> > 
> > From: Zhang Chen 
> > 
> > In this series we optimize codes and fix some tiny issues.
> > 
> > Zhang Chen (3):
> >   migration/colo.c: Remove redundant input parameter
> >   migration/colo.h: Remove obsolete codes
> >   qemu-option.hx: Update missed parameter for colo-compare
> > 
> >  include/migration/colo.h  | 4 +---
> >  migration/colo-failover.c | 2 +-
> >  migration/colo.c  | 2 +-
> >  qemu-options.hx   | 9 ++---
> >  4 files changed, 9 insertions(+), 8 deletions(-)
> > 
> > --
> > 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 2/6] luks: Create block_crypto_co_create_generic()

2019-05-14 Thread Daniel P . Berrangé

On Mon, Mar 12, 2018 at 04:02:14PM +0100, Kevin Wolf wrote:
> Everything that refers to the protocol layer or QemuOpts is moved out of
> block_crypto_create_generic(), so that the remaining function is
> suitable to be called by a .bdrv_co_create implementation.
> 
> LUKS is the only driver that actually implements the old interface, and
> we don't intend to use it in any new drivers, so put the moved out code
> directly into a LUKS function rather than creating a generic
> intermediate one.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Daniel P. Berrangé 
> Reviewed-by: Eric Blake 
> ---
>  block/crypto.c | 95 
> +-
>  1 file changed, 61 insertions(+), 34 deletions(-)


Reviving a year old commit...

The LUKS driver doesn't implement preallocation during create.

Before this commit this would be reported

 $ qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 
base.luks 1G -o preallocation=full
 Formatting 'base.luks', fmt=luks size=1073741824 key-secret=sec0 
preallocation=full
 qemu-img: base.luks: Parameter 'preallocation' is unexpected


After this commit, there is no error reported - it just silently
ignores the preallocation=full option.

I'm a bit lost in block layer understanding where is the right
place to fix the error reporting in this case.

> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 77871640cc..b0a4cb3388 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -306,43 +306,29 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
> format,
>  }
>  
>  
> -static int block_crypto_create_generic(QCryptoBlockFormat format,
> -   const char *filename,
> -   QemuOpts *opts,
> -   Error **errp)
> +static int block_crypto_co_create_generic(BlockDriverState *bs,
> +  int64_t size,
> +  QCryptoBlockCreateOptions *opts,
> +  Error **errp)
>  {
> -int ret = -EINVAL;
> -QCryptoBlockCreateOptions *create_opts = NULL;
> +int ret;
> +BlockBackend *blk;
>  QCryptoBlock *crypto = NULL;
> -struct BlockCryptoCreateData data = {
> -.size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
> - BDRV_SECTOR_SIZE),
> -};
> -QDict *cryptoopts;
> -
> -/* Parse options */
> -cryptoopts = qemu_opts_to_qdict(opts, NULL);
> +struct BlockCryptoCreateData data;
>  
> -create_opts = block_crypto_create_opts_init(format, cryptoopts, errp);
> -if (!create_opts) {
> -return -1;
> -}
> +blk = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
>  
> -/* Create protocol layer */
> -ret = bdrv_create_file(filename, opts, errp);
> +ret = blk_insert_bs(blk, bs, errp);
>  if (ret < 0) {
> -return ret;
> +goto cleanup;
>  }
>  
> -data.blk = blk_new_open(filename, NULL, NULL,
> -BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
> -errp);
> -if (!data.blk) {
> -return -EINVAL;
> -}
> +data = (struct BlockCryptoCreateData) {
> +.blk = blk,
> +.size = size,
> +};
>  
> -/* Create format layer */
> -crypto = qcrypto_block_create(create_opts, NULL,
> +crypto = qcrypto_block_create(opts, NULL,
>block_crypto_init_func,
>block_crypto_write_func,
>&data,
> @@ -355,10 +341,8 @@ static int 
> block_crypto_create_generic(QCryptoBlockFormat format,
>  
>  ret = 0;
>   cleanup:
> -QDECREF(cryptoopts);
>  qcrypto_block_free(crypto);
> -blk_unref(data.blk);
> -qapi_free_QCryptoBlockCreateOptions(create_opts);
> +blk_unref(blk);
>  return ret;
>  }
>  
> @@ -563,8 +547,51 @@ static int coroutine_fn 
> block_crypto_co_create_opts_luks(const char *filename,
>   QemuOpts *opts,
>   Error **errp)
>  {
> -return block_crypto_create_generic(Q_CRYPTO_BLOCK_FORMAT_LUKS,
> -   filename, opts, errp);
> +QCryptoBlockCreateOptions *create_opts = NULL;
> +BlockDriverState *bs = NULL;
> +QDict *cryptoopts;
> +int64_t size;
> +int ret;
> +
> +/* Parse options */
> +size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
> +
> +cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
> + &block_crypto_create_opts_luks,
> + true);
> +
> +create_opts = block_crypto_create_opts_init(Q_CRYPTO_BLOCK_FORMAT_LUKS,
> +cryptoopts, errp);
> +if (!cr

Re: [Qemu-devel] [PATCH v7 1/8] vhost-user: add vhost_user_gpu_set_socket()

2019-05-14 Thread Marc-André Lureau

Hi

On Tue, May 14, 2019 at 7:38 AM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > +VhostUserGpuCursorUpdate
> > +
> > +
> > ++-+---+---++
> > +| pos | hot_x | hot_y | cursor |
> > ++-+---+---++
> > +
> > +:pos: a ``VhostUserGpuCursorPos``, the cursor location
> > +
> > +:hot_x/hot_y: ``u32``, the cursor hot location
> > +
> > +:cursor: ``[u32; 64 * 64]``, 64x64 RGBA cursor data
>
> Should clarify here what exactly RGBA is. (PIXMAN_a8r8g8b8 I guess).

ok

>
> > +VhostUserGpuUpdate
> > +^^
> > +
> > +++---+---+---+---+--+
> > +| scanout-id | x | y | w | h | data |
> > +++---+---+---+---+--+
> > +
> > +:scanout-id: ``u32``, the scanout content to update
> > +
> > +:x/y/w/h: ``u32``, region of the update
> > +
> > +:data: RGBA data (the size is computed based on the region size, and
> > +   the request type)
>
> Likewise.  Also: alpha channel for the framebuffer?

It is actually PIXMAN_x8r8g8b8, fixed

>
> > +C structure
> > +---
> > +
> > +In QEMU the vhost-user-gpu message is implemented with the following 
> > struct:
> > +
> > +.. code:: c
> > +
> > +  typedef struct VhostUserGpuMsg {
> > +  uint32_t request; /* VhostUserGpuRequest */
> > +  uint32_t flags;
> > +  uint32_t size; /* the following payload size */
>
> uint32_t padding;
>
> > +  union {
> > +  VhostUserGpuCursorPos cursor_pos;
> > +  VhostUserGpuCursorUpdate cursor_update;
> > +  VhostUserGpuScanout scanout;
> > +  VhostUserGpuUpdate update;
> > +  VhostUserGpuDMABUFScanout dmabuf_scanout;
> > +  struct virtio_gpu_resp_display_info display_info;
> > +  uint64_t u64;
>
> ... so this 64bit value will be aligned.

vhost-user didn't bother. Should we?


-- 
Marc-André Lureau

[Qemu-devel] [PATCH v17 04/10] acpi: add build_append_ghes_generic_data() helper for Generic Error Data Entry

2019-05-14 Thread Dongjiu Geng

It will help to add Generic Error Data Entry to ACPI tables
without using packed C structures and avoid endianness
issues as API doesn't need explicit conversion.

Signed-off-by: Dongjiu Geng 
---
 hw/acpi/aml-build.c | 32 
 include/hw/acpi/aml-build.h |  6 ++
 2 files changed, 38 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index fb53f21..102a288 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -296,6 +296,38 @@ void build_append_ghes_notify(GArray *table, const uint8_t 
type,
 build_append_int_noprefix(table, error_threshold_window, 4);
 }
 
+/* Generic Error Data Entry
+ * ACPI 4.0: 17.3.2.6.1 Generic Error Data
+ */
+void build_append_ghes_generic_data(GArray *table, const char *section_type,
+uint32_t error_severity, uint16_t revision,
+uint8_t validation_bits, uint8_t flags,
+uint32_t error_data_length, uint8_t 
*fru_id,
+uint8_t *fru_text, uint64_t time_stamp)
+{
+int i;
+
+for (i = 0; i < 16; i++) {
+build_append_int_noprefix(table, section_type[i], 1);
+}
+
+build_append_int_noprefix(table, error_severity, 4);
+build_append_int_noprefix(table, revision, 2);
+build_append_int_noprefix(table, validation_bits, 1);
+build_append_int_noprefix(table, flags, 1);
+build_append_int_noprefix(table, error_data_length, 4);
+
+for (i = 0; i < 16; i++) {
+build_append_int_noprefix(table, fru_id[i], 1);
+}
+
+for (i = 0; i < 20; i++) {
+build_append_int_noprefix(table, fru_text[i], 1);
+}
+
+build_append_int_noprefix(table, time_stamp, 8);
+}
+
 /*
  * Build NAME(, 0x) where 0x is encoded as a dword,
  * and return the offset to 0x for runtime patching.
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 90c8ef8..a71db2f 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -419,6 +419,12 @@ void build_append_ghes_notify(GArray *table, const uint8_t 
type,
   uint32_t error_threshold_value,
   uint32_t error_threshold_window);
 
+void build_append_ghes_generic_data(GArray *table, const char *section_type,
+uint32_t error_severity, uint16_t revision,
+uint8_t validation_bits, uint8_t flags,
+uint32_t error_data_length, uint8_t 
*fru_id,
+uint8_t *fru_text, uint64_t time_stamp);
+
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
uint64_t len, int node, MemoryAffinityFlags flags);
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v17 01/10] hw/arm/virt: Add RAS platform version for migration

2019-05-14 Thread Dongjiu Geng

Support this feature since version 4.1, disable it by
default in the old version.

Signed-off-by: Dongjiu Geng 
---
 hw/arm/virt.c | 6 ++
 include/hw/arm/virt.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5331ab7..7bdd41b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2043,8 +2043,14 @@ DEFINE_VIRT_MACHINE_AS_LATEST(4, 1)
 
 static void virt_machine_4_0_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_4_1_options(mc);
 compat_props_add(mc->compat_props, hw_compat_4_0, hw_compat_4_0_len);
+/* Disable memory recovery feature for 4.0 as RAS support was
+ * introduced with 4.1.
+ */
+vmc->no_ras = true;
 }
 DEFINE_VIRT_MACHINE(4, 0)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 4240709..7f1a033 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -104,6 +104,7 @@ typedef struct {
 bool disallow_affinity_adjustment;
 bool no_its;
 bool no_pmu;
+bool no_ras;
 bool claim_edge_triggered_timers;
 bool smbios_old_sys_ver;
 bool no_highmem_ecam;
-- 
1.8.3.1

[Qemu-devel] [PATCH v17 03/10] acpi: add build_append_ghes_notify() helper for Hardware Error Notification

2019-05-14 Thread Dongjiu Geng

It will help to add Hardware Error Notification to ACPI tables
without using packed C structures and avoid endianness
issues as API doesn't need explicit conversion.

Signed-off-by: Dongjiu Geng 
---
 hw/acpi/aml-build.c | 22 ++
 include/hw/acpi/aml-build.h |  8 
 2 files changed, 30 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 555c24f..fb53f21 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -274,6 +274,28 @@ void build_append_gas(GArray *table, AmlAddressSpace as,
 build_append_int_noprefix(table, address, 8);
 }
 
+/* Hardware Error Notification
+ * ACPI 4.0: 17.3.2.7 Hardware Error Notification
+ */
+void build_append_ghes_notify(GArray *table, const uint8_t type,
+  uint8_t length, uint16_t config_write_enable,
+  uint32_t poll_interval, uint32_t vector,
+  uint32_t polling_threshold_value,
+  uint32_t polling_threshold_window,
+  uint32_t error_threshold_value,
+  uint32_t error_threshold_window)
+{
+build_append_int_noprefix(table, type, 1); /* type */
+build_append_int_noprefix(table, length, 1);
+build_append_int_noprefix(table, config_write_enable, 2);
+build_append_int_noprefix(table, poll_interval, 4);
+build_append_int_noprefix(table, vector, 4);
+build_append_int_noprefix(table, polling_threshold_value, 4);
+build_append_int_noprefix(table, polling_threshold_window, 4);
+build_append_int_noprefix(table, error_threshold_value, 4);
+build_append_int_noprefix(table, error_threshold_window, 4);
+}
+
 /*
  * Build NAME(, 0x) where 0x is encoded as a dword,
  * and return the offset to 0x for runtime patching.
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 1a563ad..90c8ef8 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -411,6 +411,14 @@ build_append_gas_from_struct(GArray *table, const struct 
AcpiGenericAddress *s)
  s->access_width, s->address);
 }
 
+void build_append_ghes_notify(GArray *table, const uint8_t type,
+  uint8_t length, uint16_t config_write_enable,
+  uint32_t poll_interval, uint32_t vector,
+  uint32_t polling_threshold_value,
+  uint32_t polling_threshold_window,
+  uint32_t error_threshold_value,
+  uint32_t error_threshold_window);
+
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
uint64_t len, int node, MemoryAffinityFlags flags);
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v17 06/10] docs: APEI GHES generation and CPER record description

2019-05-14 Thread Dongjiu Geng

Add APEI/GHES detailed design document

Signed-off-by: Dongjiu Geng 
---
 docs/specs/acpi_hest_ghes.txt | 97 +++
 1 file changed, 97 insertions(+)
 create mode 100644 docs/specs/acpi_hest_ghes.txt

diff --git a/docs/specs/acpi_hest_ghes.txt b/docs/specs/acpi_hest_ghes.txt
new file mode 100644
index 000..fbfc787
--- /dev/null
+++ b/docs/specs/acpi_hest_ghes.txt
@@ -0,0 +1,97 @@
+APEI tables generating and CPER record
+=
+
+Copyright (C) 2017 HuaWei Corporation.
+
+Design Details:
+---
+
+   etc/acpi/tables etc/hardware_errors
+  
==
++ +--++---+
+| | HEST ||address|
+--+
+| +--+|registers  |
| Error Status |
+| | GHES1|| +-+
| Data Block 1 |
+| +--+ +->| |error_block_address1 
|--->| ++
+| | .| |  | +-+
| |  CPER  |
+| | error_status_address-+-+ +--->| |error_block_address2 |+   
| |  CPER  |
+| | .|   || +-+|   
| |    |
+| | read_ack_register+-+ || |..   ||   
| |  CPER  |
+| | read_ack_preserve| | |+---+|   
| ++
+| | read_ack_write   | | | +->| |error_block_addressN |--+ |   
| Error Status |
++ +--+ | | |  | +-+  | |   
| Data Block 2 |
+| | GHES2| +-+-+->| |read_ack_register1   |  | 
+-->| ++
++ +--+   | |  | +-+  | 
| |  CPER  |
+| | .|   | | +--->| |read_ack_register2   |  | 
| |  CPER  |
+| | error_status_address-+---+ | || +-+  | 
| |    |
+| | .| | || |  .  |  | 
| |  CPER  |
+| | read_ack_register+-+-+| +-+  | 
+-++
+| | read_ack_preserve| |   +->| |read_ack_registerN   |  | 
| |..  |
+| | read_ack_write   | |   |  | +-+  | 
| ++
++ +--| |   | | 
| Error Status |
+| | ...  | |   | | 
| Data Block N |
++ +--+ |   | 
+>| ++
+| | GHESN| |   |   
| |  CPER  |
++ +--+ |   |   
| |  CPER  |
+| | .| |   |   
| |    |
+| | error_status_address-+-+   |   
| |  CPER  |
+| | .| |   
+-++
+| | read_ack_register+-+
+| | read_ack_preserve|
+| | read_ack_write   |
++ +--+
+
+(1) QEMU generates the ACPI HEST table. This table goes in the current
+"etc/acpi/tables" fw_cfg blob. Each error source has different
+notification type.
+
+(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
+also need to populate this blob. The "etc/hardwre_errors" fw_cfg blob
+contains one address registers table and one Error Status Data Block
+table, all of which are pre-allocated.
+
+(3) The address registers table contains N Error Block Address entries
+and N Read Ack Address entries, the size for each entry is 8-byte.
+The Error Status Data Block table contains N Error Status Data Block
+entries, the size for each entry is 4096(0x1000) bytes. The total size
+for "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+
+(4) QEMU generates the ACPI linker/loader script for the firmware
+
+(4a) The HEST table is part of "etc/acpi/tables", the firmware already
+allocates the memory for it, because QEMU already generates an ALLOCATE
+linker/loader command for it
+
+(4b) QEMU creates another ALLOCATE command for the "etc/hardware_errors"
+blob. The firmware allocates memory for this blob and downloads it.
+
+(5) QEMU generates N ADD_POINTER commands, which patch address in the
+"error_status_address" fields of the HEST table with a pointer to the
+corresponding "address registers" in th

[Qemu-devel] [PATCH v17 00/10] Add ARMv8 RAS virtualization support in QEMU

2019-05-14 Thread Dongjiu Geng

In the ARMv8 platform, the CPU error type are synchronous external
abort(SEA) and SError Interrupt (SEI). If exception happens to guest,
sometimes guest itself do the recovery is better, because host 
does not know guest's detailed information. For example, if a guest
user-space application happen exception, host does not which application
encounter errors.

For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify user
space. After user space gets  the notification, it will record the CPER
to guest GHES buffer for guest and inject a exception or IRQ to guest.

In the current implement, if the SIGBUS is BUS_MCEERR_AR, we will
treat it as synchronous exception, and use ARMv8 SEA notification type
to notify guest after recording CPER for guest;

This series patches are based on Qemu 4.0, which have two parts:
1. Generate APEI/GHES table.
2. Handle the SIGBUS signal, record the CPER in runtime and fill into guest 
memory,
   then according to SIGBUS type to notify guest.

Whole solution was suggested by James(james.mo...@arm.com); APEI part solution 
is suggested by
Laszlo(ler...@redhat.com). Shown some discussion in [1].


This series patches have already tested on ARM64 platform with RAS feature 
enabled:
Show the APEI part verification result in [2]
Show the BUS_MCEERR_AR SIGBUS handling verification result in [3]

---
change since v16:
1. check whether ACPI table is enabled when handling the memory error in the 
SIGBUS handler.

Change since v15:
1. Add a doc-comment in the proper format for 'include/exec/ram_addr.h'
2. Remove write_part_cpustate_to_list() because there is another bug fix patch
   has been merged "arm: Allow system registers for KVM guests to be changed by 
QEMU code"
3. Add some comments for kvm_inject_arm_sea() in 'target/arm/kvm64.c'
4. Compare the arm_current_el() return value to 0,1,2,3, not to PSTATE_MODE_* 
constants.
5. Change the RAS support wasn't introduced before 4.1 QEMU version.
6. Move the no_ras flag  patch to begin in this series

Change since v14:
1. Remove the BUS_MCEERR_AO handling logic because this asynchronous signal was 
masked by main thread 
2. Address some Igor Mammedov's comments(ACPI part)
   1) change the comments for the enum AcpiHestNotifyType definition and remove 
ditto in patch 1
   2) change some patch commit messages and separate "APEI GHES table 
generation" patch to more patches.
3. Address some peter's comments(arm64 Synchronous External Abort injection)
   1) change some code notes
   2) using arm_current_el() for current EL
   2) use the helper functions for those (syn_data_abort_*).

Change since v13:
1. Move the patches that set guest ESR and inject virtual SError out of this 
series
2. Clean and optimize the APEI part patches 
3. Update the commit messages and add some comments for the code

Change since v12:
1. Address Paolo's comments to move HWPoisonPage definition to 
accel/kvm/kvm-all.c
2. Only call kvm_cpu_synchronize_state() when get the BUS_MCEERR_AR signal
3. Only add and enable GPIO-Signal and ARMv8 SEA two hardware error sources
4. Address Michael's comments to not sync SPDX from Linux kernel header file 

Change since v11:
Address James's comments(james.mo...@arm.com)
1. Check whether KVM has the capability to to set ESR instead of detecting host 
CPU RAS capability
2. For SIGBUS_MCEERR_AR SIGBUS, use Synchronous-External-Abort(SEA) 
notification type
   for SIGBUS_MCEERR_AO SIGBUS, use GPIO-Signal notification


Address Shannon's comments(for ACPI part):
1. Unify hest_ghes.c and hest_ghes.h license declaration
2. Remove unnecessary including "qmp-commands.h" in hest_ghes.c
3. Unconditionally add guest APEI table based on James's 
comments(james.mo...@arm.com) 
4. Add a option to virt machine for migration compatibility. On new virt 
machine it's on
   by default while off for old ones, we enabled it since 2.12
5. Refer to the ACPI spec version which introduces Hardware Error Notification 
first time
6. Add ACPI_HEST_NOTIFY_RESERVED notification type

Address Igor's comments(for ACPI part):
1. Add doc patch first which will describe how it's supposed to work between 
QEMU/firmware/guest
   OS with expected flows.
2. Move APEI diagrams into doc/spec patch
3. Remove redundant g_malloc in ghes_record_cper()
4. Use build_append_int_noprefix() API to compose whole error status block and 
whole APEI table, 
   and try to get rid of most structures in patch 1, as they will be left 
unused after that
5. Reuse something like 
https://github.com/imammedo/qemu/commit/3d2fd6d13a3ea298d2ee814835495ce6241d085c
   to build GAS
6. Remove much offsetof() in the function
7. Build independent tables first and only then build dependent tables passing 
to it pointers
   to previously build table if necessary.
8. Redefine macro GHES_ACPI_HEST_NOTIFY_RESERVED to 
ACPI_HEST_ERROR_SOURCE_COUNT to avoid confusion


Address Peter Maydell's comments
1. linux-headers is done as a patch of their own created using 
scripts/update-linux-headers.sh run ag

[Qemu-devel] [PATCH v17 02/10] ACPI: add some GHES structures and macros definition

2019-05-14 Thread Dongjiu Geng

Add Generic Error Status Block structures and some macros
definitions, which is referred to the ACPI 4.0 or ACPI 6.2. The
HEST table generation and CPER record will use them.

Signed-off-by: Dongjiu Geng 
---
 include/hw/acpi/acpi-defs.h | 52 +
 1 file changed, 52 insertions(+)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index f9aa4bd..d1996fb 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -224,6 +224,25 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
 #define ACPI_APIC_RESERVED  16   /* 16 and greater are reserved */
 
 /*
+ * Values for Hardware Error Notification Type field
+ */
+enum AcpiHestNotifyType {
+ACPI_HEST_NOTIFY_POLLED = 0,
+ACPI_HEST_NOTIFY_EXTERNAL = 1,
+ACPI_HEST_NOTIFY_LOCAL = 2,
+ACPI_HEST_NOTIFY_SCI = 3,
+ACPI_HEST_NOTIFY_NMI = 4,
+ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0: 18.3.2.7, Table 18-290 */
+ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0: 18.3.2.7, Table 18-290 */
+ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0: 18.3.2.7, Table 18-332 */
+ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1: 18.3.2.9, Table 18-345 */
+ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1: 18.3.2.9, Table 18-345 */
+ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1: 18.3.2.9, Table 18-345 */
+ACPI_HEST_NOTIFY_SDEI = 11, /* ACPI 6.2: 18.3.2.9, Table 18-383 */
+ACPI_HEST_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
+};
+
+/*
  * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
  */
 #define ACPI_SUB_HEADER_DEF   /* Common ACPI sub-structure header */\
@@ -400,6 +419,39 @@ struct AcpiSystemResourceAffinityTable {
 } QEMU_PACKED;
 typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
 
+/*
+ * Generic Error Status Block
+ */
+struct AcpiGenericErrorStatus {
+/* It is a bitmask composed of ACPI_GEBS_xxx macros */
+uint32_t block_status;
+uint32_t raw_data_offset;
+uint32_t raw_data_length;
+uint32_t data_length;
+uint32_t error_severity;
+} QEMU_PACKED;
+typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
+
+/*
+ * Masks for block_status flags above
+ */
+#define ACPI_GEBS_UNCORRECTABLE 1
+
+/*
+ * Values for error_severity field above
+ */
+enum AcpiGenericErrorSeverity {
+ACPI_CPER_SEV_RECOVERABLE,
+ACPI_CPER_SEV_FATAL,
+ACPI_CPER_SEV_CORRECTED,
+ACPI_CPER_SEV_NONE,
+};
+
+/*
+ * Generic Hardware Error Source version 2
+ */
+#define ACPI_HEST_SOURCE_GENERIC_ERROR_V210
+
 #define ACPI_SRAT_PROCESSOR_APIC 0
 #define ACPI_SRAT_MEMORY 1
 #define ACPI_SRAT_PROCESSOR_x2APIC   2
-- 
1.8.3.1

[Qemu-devel] [PATCH v17 07/10] ACPI: Add APEI GHES table generation support

2019-05-14 Thread Dongjiu Geng

This implements APEI GHES Table generation via fw_cfg blobs.
Now it only support GPIO-Signal and ARMv8 SEA two types of GHESv2 error
source. Afterwards, we can extend the supported types if needed. For the
CPER section type, currently it is memory section because kernel
mainly wants userspace to handle the memory errors.

This patch follows the spec ACPI 6.2 to build the Hardware Error Source
table, for the detailed information, please refer to document:
docs/specs/acpi_hest_ghes.txt

Suggested-by: Laszlo Ersek 
Signed-off-by: Dongjiu Geng 
---
 default-configs/arm-softmmu.mak |   1 +
 hw/acpi/Kconfig |   4 +
 hw/acpi/Makefile.objs   |   1 +
 hw/acpi/acpi_ghes.c | 171 
 hw/acpi/aml-build.c |   2 +
 hw/arm/virt-acpi-build.c|  12 +++
 include/hw/acpi/acpi_ghes.h |  79 +++
 include/hw/acpi/aml-build.h |   1 +
 8 files changed, 271 insertions(+)
 create mode 100644 hw/acpi/acpi_ghes.c
 create mode 100644 include/hw/acpi/acpi_ghes.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 613d19a..7b33ae9 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -160,3 +160,4 @@ CONFIG_MUSICPAL=y
 
 # for realview and versatilepb
 CONFIG_LSI_SCSI_PCI=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index eca3bee..5228a4b 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -23,6 +23,10 @@ config ACPI_NVDIMM
 bool
 depends on ACPI
 
+config ACPI_APEI
+bool
+depends on ACPI
+
 config ACPI_VMGENID
 bool
 default y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 2d46e37..5099ada 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
new file mode 100644
index 000..d03e797
--- /dev/null
+++ b/hw/acpi/acpi_ghes.c
@@ -0,0 +1,171 @@
+/* Support for generating APEI tables and record CPER for Guests
+ *
+ * Copyright (C) 2017 HuaWei Corporation.
+ *
+ * Author: Dongjiu Geng 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/acpi_ghes.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
+
+/* Build table for the hardware error fw_cfg blob */
+void build_hardware_error_table(GArray *hardware_errors, BIOSLinker *linker)
+{
+int i;
+
+/*
+ * | +--+
+ * | |error_block_address   |
+ * | |  ..  |
+ * | +--+
+ * | |read_ack_register |
+ * | | ...  |
+ * | +--+
+ * | |  Error Status Data Block |
+ * | |  |
+ * | +--+
+ */
+
+/* Build error_block_address */
+build_append_int_noprefix((void *)hardware_errors, 0,
+GHES_ADDRESS_SIZE * ACPI_HEST_ERROR_SOURCE_COUNT);
+
+/* Build read_ack_register */
+for (i = 0; i < ACPI_HEST_ERROR_SOURCE_COUNT; i++)
+/* Initialize the value of read_ack_register to 1, so GHES can be
+ * writeable in the first time
+ */
+build_append_int_noprefix((void *)hardware_errors, 1, 
GHES_ADDRESS_SIZE);
+
+ /* Build Error Status Data Block */
+build_append_int_noprefix((void *)hardware_errors, 0,
+GHES_MAX_RAW_DATA_LENGTH * ACPI_HEST_ERROR_SOURCE_COUNT);
+
+/* Allocate guest memory for the hardware error fw_cfg blob */
+bios_linker_loader_alloc(linker, GHES_ERRORS_FW_CFG_FILE, hardware_errors,
+1, false);
+}
+
+/* Build Hardware Error Source Table */
+void build_apei_hest(GArray *table_data, GArray *hardware_errors,
+BIOSLinker *linker)
+{
+uint32_t i, error_status_block_offset, length = table_data->len;
+
+/* Reser

[Qemu-devel] [PATCH v17 05/10] acpi: add build_append_ghes_generic_status() helper for Generic Error Status Block

2019-05-14 Thread Dongjiu Geng

It will help to add Generic Error Status Block to ACPI tables
without using packed C structures and avoid endianness
issues as API doesn't need explicit conversion.

Signed-off-by: Dongjiu Geng 
---
 hw/acpi/aml-build.c | 14 ++
 include/hw/acpi/aml-build.h |  6 ++
 2 files changed, 20 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 102a288..ce90970 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -296,6 +296,20 @@ void build_append_ghes_notify(GArray *table, const uint8_t 
type,
 build_append_int_noprefix(table, error_threshold_window, 4);
 }
 
+/* Generic Error Status Block
+ * ACPI 4.0: 17.3.2.6.1 Generic Error Data
+ */
+void build_append_ghes_generic_status(GArray *table, uint32_t block_status,
+  uint32_t raw_data_offset, uint32_t raw_data_length,
+  uint32_t data_length, uint32_t error_severity)
+{
+build_append_int_noprefix(table, block_status, 4);
+build_append_int_noprefix(table, raw_data_offset, 4);
+build_append_int_noprefix(table, raw_data_length, 4);
+build_append_int_noprefix(table, data_length, 4);
+build_append_int_noprefix(table, error_severity, 4);
+}
+
 /* Generic Error Data Entry
  * ACPI 4.0: 17.3.2.6.1 Generic Error Data
  */
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index a71db2f..1ec7e1b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -425,6 +425,12 @@ void build_append_ghes_generic_data(GArray *table, const 
char *section_type,
 uint32_t error_data_length, uint8_t 
*fru_id,
 uint8_t *fru_text, uint64_t time_stamp);
 
+void
+build_append_ghes_generic_status(GArray *table, uint32_t block_status,
+ uint32_t raw_data_offset,
+ uint32_t raw_data_length,
+ uint32_t data_length, uint32_t 
error_severity);
+
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
uint64_t len, int node, MemoryAffinityFlags flags);
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v16 10/10] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-05-14 Thread gengdongjiu



> 
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +ARMCPU *cpu = ARM_CPU(c);
>> +CPUARMState *env = &cpu->env;
>> +ram_addr_t ram_addr;
>> +hwaddr paddr;
>> +
>> +assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>> +
>> +if (addr) {
>> +ram_addr = qemu_ram_addr_from_host(addr);
>> +if (ram_addr != RAM_ADDR_INVALID &&
>> +kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) 
>> {
>> +kvm_hwpoison_page_add(ram_addr);
>> +/* Asynchronous signal will be masked by main thread, so
>> + * only handle synchronous signal.
>> + */
>> +if (code == BUS_MCEERR_AR) {
>> +kvm_cpu_synchronize_state(c);
>> +if (GHES_CPER_FAIL != 
>> ghes_record_errors(ACPI_HEST_NOTIFY_SEA, paddr)) {
>> +kvm_inject_arm_sea(c);
>> +} else {
>> +fprintf(stderr, "failed to record the error\n");
>> +}
>> +}
>> +return;
>> +}
>> +fprintf(stderr, "Hardware memory error for memory used by "
>> +"QEMU itself instead of guest system!\n");
>> +}
>> +
>> +if (code == BUS_MCEERR_AR) {
>> +fprintf(stderr, "Hardware memory error!\n");
>> +exit(1);
>> +}
>> +}
> 
> This code appears to still be unconditionally trying to
> notify the guest of the error via the ACPI tables without
> checking whether those ACPI tables even exist. I told you
> about this in a previous round of review :-(

Thanks very much for the comments, and sorry for my forgetting
I added the ACPI checking in the new V17 version.

> 
> thanks
> -- PMM
> .
>

Re: [Qemu-devel] [PATCH] migration: comment VMSTATE_UNUSED*() properly

2019-05-14 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> It is error prone to use VMSTATE_UNUSED*() sometimes especially when
> the size of the migration stream of the field is not the same as the
> size of the structure (boolean is one example).  Comment it well so
> people will be aware of this when people want to use it.
> 
> Signed-off-by: Peter Xu 

Queued

> ---
>  include/migration/vmstate.h | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index a668ec75b8..9224370ed5 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1035,6 +1035,20 @@ extern const VMStateInfo vmstate_info_qtailq;
>  #define VMSTATE_BUFFER_UNSAFE(_field, _state, _version, _size)\
>  VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, 
> vmstate_info_buffer, _size)
>  
> +/*
> + * These VMSTATE_UNUSED*() macros can be used to fill in the holes
> + * when some of the vmstate fields are obsolete to be compatible with
> + * migrations between new/old binaries.
> + *
> + * CAUTION: when using any of the VMSTATE_UNUSED*() macros please be
> + * sure that the size passed in is the size that was actually *sent*
> + * rather than the size of the *structure*.  One example is the
> + * boolean type - the size of the structure can vary depending on the
> + * definition of boolean, however the size we actually sent is always
> + * 1 byte (please refer to implementation of VMSTATE_BOOL_V and
> + * vmstate_info_bool).  So here we should always pass in size==1
> + * rather than size==sizeof(bool).
> + */
>  #define VMSTATE_UNUSED_V(_v, _size)   \
>  VMSTATE_UNUSED_BUFFER(NULL, _v, _size)
>  
> -- 
> 2.17.1
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v17 09/10] target-arm: kvm64: inject synchronous External Abort

2019-05-14 Thread Dongjiu Geng

Add synchronous external abort injection logic, setup
exception type and syndrome value. When switch to guest,
it will jump to the synchronous external abort vector
table entry.

The ESR_ELx.DFSC is set to synchronous external abort(0x10),
and ESR_ELx.FnV is set to not valid(0x1), which will tell
guest that FAR is not valid and hold an UNKNOWN value.
These value will be set to KVM register structures through
KVM_SET_ONE_REG IOCTL.

Signed-off-by: Dongjiu Geng 
---
 target/arm/internals.h |  5 +++--
 target/arm/kvm64.c | 34 ++
 target/arm/op_helper.c |  2 +-
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 587a1dd..4d67a91 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int 
ea, int s1ptw, int fsc)
 | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
 }
 
-static inline uint32_t syn_data_abort_no_iss(int same_el,
+static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
  int ea, int cm, int s1ptw,
  int wnr, int fsc)
 {
 return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
| ARM_EL_IL
-   | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
+   | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
+   | (wnr << 6) | fsc;
 }
 
 static inline uint32_t syn_data_abort_with_iss(int same_el,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index e3ba149..c7bdc6a 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -697,6 +697,40 @@ int kvm_arm_cpreg_level(uint64_t regidx)
 return KVM_PUT_RUNTIME_STATE;
 }
 
+/* Inject synchronous external abort */
+static void kvm_inject_arm_sea(CPUState *c)
+{
+ARMCPU *cpu = ARM_CPU(c);
+CPUARMState *env = &cpu->env;
+CPUClass *cc = CPU_GET_CLASS(c);
+uint32_t esr;
+bool same_el;
+
+/**
+ * set the exception type to synchronous data abort
+ * and the target exception Level to EL1.
+ */
+c->exception_index = EXCP_DATA_ABORT;
+env->exception.target_el = 1;
+
+/*
+ * Set the DFSC to synchronous external abort and set FnV to not valid,
+ * this will tell guest the FAR_ELx is UNKNOWN for this abort.
+ */
+
+/* This exception comes from lower or current exception level. */
+same_el = arm_current_el(env) == env->exception.target_el;
+esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
+
+env->exception.syndrome = esr;
+
+/**
+ * The vcpu thread already hold BQL, so no need hold again when
+ * calling do_interrupt
+ */
+cc->do_interrupt(c);
+}
+
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index 8698b4d..d43134a 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -109,7 +109,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t 
template_syn,
  * ISV field.
  */
 if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
-syn = syn_data_abort_no_iss(same_el,
+syn = syn_data_abort_no_iss(same_el, 0,
 ea, 0, s1ptw, is_write, fsc);
 } else {
 /* Fields: IL, ISV, SAS, SSE, SRT, SF and AR come from the template
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Cornelia Huck

On Tue, 14 May 2019 12:01:45 +0100
"Dr. David Alan Gilbert"  wrote:

> * Cornelia Huck (coh...@redhat.com) wrote:
> > On Tue, 14 May 2019 03:47:36 -0400
> > Yan Zhao  wrote:

> > > hi Cornelia and Dave,
> > > do you also agree on:
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an 
> > > errno on
> > > read indicates the device does not support migration version comparison 
> > > and
> > > that an errno on write indicates the devices are incompatible or the 
> > > target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.  
> > 
> > Two questions:
> > - How reasonable is it to refer to the system log in order to find out
> >   what exactly went wrong?
> > - If detailed error reporting is basically done to the syslog, do
> >   different error codes still provide useful information? Or should the
> >   vendor driver decide what it wants to do?  
> 
> I don't see error codes as being that helpful; if we can't actually get
> an error message back up the stack (which was my preference), then I guess
> syslog is as good as it will get.

Ok, so letting the vendor driver simply return an(y) error and possibly
dumping an error message into the syslog seems to be the most
reasonable approach.

[Qemu-devel] [PATCH v17 10/10] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-05-14 Thread Dongjiu Geng

Add SIGBUS signal handler. In this handler, it checks the SIGBUS type,
translates the host VA delivered by host to guest PA, then fill this PA
to guest APEI GHES memory, then notify guest according to the SIGBUS type.

If guest accesses the poisoned memory, it generates Synchronous External
Abort(SEA). Then host kernel gets an APEI notification and call memory_failure()
to unmapped the affected page for the guest's stage 2, finally return
to guest.

Guest continues to access PG_hwpoison page, it will trap to KVM as stage2 fault,
then a SIGBUS_MCEERR_AR synchronous signal is delivered to Qemu, Qemu record 
this
error address into guest APEI GHES memory and notify guest using
Synchronous-External-Abort(SEA).

Suggested-by: James Morse 
Signed-off-by: Dongjiu Geng 
---
 hw/acpi/acpi_ghes.c | 177 
 include/hw/acpi/acpi_ghes.h |   6 +-
 include/sysemu/kvm.h|   2 +-
 target/arm/kvm64.c  |  39 ++
 4 files changed, 222 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
index d03e797..06b7374 100644
--- a/hw/acpi/acpi_ghes.c
+++ b/hw/acpi/acpi_ghes.c
@@ -26,6 +26,101 @@
 #include "sysemu/sysemu.h"
 #include "qemu/error-report.h"
 
+/* UEFI 2.6: N.2.5 Memory Error Section */
+static void build_append_mem_cper(GArray *table, uint64_t error_physical_addr)
+{
+/*
+ * Memory Error Record
+ */
+build_append_int_noprefix(table,
+ (1UL << 14) | /* Type Valid */
+ (1UL << 1) /* Physical Address Valid */,
+ 8);
+/* Memory error status information */
+build_append_int_noprefix(table, 0, 8);
+/* The physical address at which the memory error occurred */
+build_append_int_noprefix(table, error_physical_addr, 8);
+build_append_int_noprefix(table, 0, 48);
+build_append_int_noprefix(table, 0 /* Unknown error */, 1);
+build_append_int_noprefix(table, 0, 7);
+}
+
+static int ghes_record_mem_error(uint64_t error_block_address,
+uint64_t error_physical_addr)
+{
+GArray *block;
+uint64_t current_block_length;
+uint32_t data_length;
+/* Memory section */
+char mem_section_id_le[] = {0x14, 0x11, 0xBC, 0xA5, 0x64, 0x6F, 0xDE,
+0x4E, 0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C,
+0x83, 0xB1};
+uint8_t fru_id[16] = {0};
+uint8_t fru_text[20] = {0};
+
+/* Generic Error Status Block
+ * | +-+
+ * | | block_status|
+ * | +-+
+ * | |raw_data_offset  |
+ * | +-+
+ * | |raw_data_length  |
+ * | +-+
+ * | | data_length |
+ * | +-+
+ * | |   error_severity|
+ * | +-+
+ */
+block = g_array_new(false, true /* clear */, 1);
+
+/* Get the length of the Generic Error Data Entries */
+cpu_physical_memory_read(error_block_address +
+offsetof(AcpiGenericErrorStatus, data_length), &data_length, 4);
+
+/* The current whole length of the generic error status block */
+current_block_length = sizeof(AcpiGenericErrorStatus) + 
le32_to_cpu(data_length);
+
+/* This is the length if adding a new generic error data entry*/
+data_length += GHES_DATA_LENGTH;
+data_length += GHES_MEM_CPER_LENGTH;
+
+/* Check whether it will run out of the preallocated memory if adding a new
+ * generic error data entry
+ */
+if ((data_length + sizeof(AcpiGenericErrorStatus)) > 
GHES_MAX_RAW_DATA_LENGTH) {
+error_report("Record CPER out of boundary!!!");
+return GHES_CPER_FAIL;
+}
+
+/* Build the new generic error status block header */
+build_append_ghes_generic_status(block, 
cpu_to_le32(ACPI_GEBS_UNCORRECTABLE), 0, 0,
+cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
+
+/* Write back above generic error status block header to guest memory */
+cpu_physical_memory_write(error_block_address, block->data,
+  block->len);
+
+/* Add a new generic error data entry */
+
+data_length = block->len;
+/* Build this new generic error data entry header */
+build_append_ghes_generic_data(block, mem_section_id_le,
+cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), 
cpu_to_le32(0x300), 0, 0,
+cpu_to_le32(80)/* the total size of Memory Error Record 
*/, fru_id,
+fru_text, 0);
+
+/* Build the memory section CPER for above new generic error data entry */
+build_append_mem_cper(block, error_physical_addr);
+
+/* Write back above this new generic error data entry to guest memory */
+cpu_physical_memory_write(error_block_address + current_block_length,
+block->data + data_length, block->len - data_length);
+
+g_array_free(block, true);
+
+retur

Re: [Qemu-devel] [PATCH for-4.1 00/24] Fix record/replay and add reverse debugging

2019-05-14 Thread Pavel Dovgalyuk

> From: Markus Armbruster [mailto:arm...@redhat.com]
> "Pavel Dovgalyuk"  writes:
> 
> > Ping.
> > Can anyone PULL these patches?
> 
> Paolo?

Is there anything new?

Pavel Dovgalyuk

[Qemu-devel] [PATCH v17 08/10] KVM: Move related hwpoison page functions to accel/kvm/ folder

2019-05-14 Thread Dongjiu Geng

kvm_hwpoison_page_add() and kvm_unpoison_all() will be used both
by X86 and ARM platforms, so move these functions to a common
accel/kvm/ folder to avoid duplicate code.

Signed-off-by: Dongjiu Geng 
---
 accel/kvm/kvm-all.c | 33 +
 include/exec/ram_addr.h | 24 
 target/arm/kvm.c|  3 +++
 target/i386/kvm.c   | 34 +-
 4 files changed, 61 insertions(+), 33 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 524c4dd..b9f9f29 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -625,6 +625,39 @@ int kvm_vm_check_extension(KVMState *s, unsigned int 
extension)
 return ret;
 }
 
+typedef struct HWPoisonPage {
+ram_addr_t ram_addr;
+QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+void kvm_unpoison_all(void *param)
+{
+HWPoisonPage *page, *next_page;
+
+QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
+QLIST_REMOVE(page, list);
+qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+g_free(page);
+}
+}
+
+void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+HWPoisonPage *page;
+
+QLIST_FOREACH(page, &hwpoison_page_list, list) {
+if (page->ram_addr == ram_addr) {
+return;
+}
+}
+page = g_new(HWPoisonPage, 1);
+page->ram_addr = ram_addr;
+QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
+}
+
 static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
 {
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 139ad79..193b0a7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -116,6 +116,30 @@ void qemu_ram_free(RAMBlock *block);
 
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
 
+/**
+ * kvm_hwpoison_page_add:
+ *
+ * Parameters:
+ *  @ram_addr: the address in the RAM for the poisoned page
+ *
+ * Add a poisoned page to the list
+ *
+ * Return: None.
+ */
+void kvm_hwpoison_page_add(ram_addr_t ram_addr);
+
+/**
+ * kvm_unpoison_all:
+ *
+ * Parameters:
+ *  @param: some data may be passed to this function
+ *
+ * Free and remove all the poisoned pages in the list
+ *
+ * Return: None.
+ */
+void kvm_unpoison_all(void *param);
+
 #define DIRTY_CLIENTS_ALL ((1 << DIRTY_MEMORY_NUM) - 1)
 #define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 5995634..6d3b25b 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -29,6 +29,7 @@
 #include "exec/address-spaces.h"
 #include "hw/boards.h"
 #include "qemu/log.h"
+#include "exec/ram_addr.h"
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
@@ -187,6 +188,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
 cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
 
+qemu_register_reset(kvm_unpoison_all, NULL);
+
 return 0;
 }
 
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 3b29ce5..9bdb879 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -46,6 +46,7 @@
 #include "migration/blocker.h"
 #include "exec/memattrs.h"
 #include "trace.h"
+#include "exec/ram_addr.h"
 
 //#define DEBUG_KVM
 
@@ -467,39 +468,6 @@ uint32_t kvm_arch_get_supported_msr_feature(KVMState *s, 
uint32_t index)
 }
 
 
-typedef struct HWPoisonPage {
-ram_addr_t ram_addr;
-QLIST_ENTRY(HWPoisonPage) list;
-} HWPoisonPage;
-
-static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
-QLIST_HEAD_INITIALIZER(hwpoison_page_list);
-
-static void kvm_unpoison_all(void *param)
-{
-HWPoisonPage *page, *next_page;
-
-QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
-QLIST_REMOVE(page, list);
-qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
-g_free(page);
-}
-}
-
-static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
-{
-HWPoisonPage *page;
-
-QLIST_FOREACH(page, &hwpoison_page_list, list) {
-if (page->ram_addr == ram_addr) {
-return;
-}
-}
-page = g_new(HWPoisonPage, 1);
-page->ram_addr = ram_addr;
-QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
-}
-
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
  int *max_banks)
 {
-- 
1.8.3.1

1 2 3 4 >

1 - 100 of 337 matches

Mail list logo