date:20150928

Re: [Qemu-devel] [PATCH] trace/simple: Fix warning and wrong trace file name for MinGW

2015-09-28 Thread Markus Armbruster

Stefan Weil  writes:

> On Windows, getpid() always returns an int value, but pid_t (which is
> expected by the format string) is either a 32 bit or a 64 bit value.
>
> Without a type cast (or a modified format string), the compiler prints
> a warning when building for 64 bit Windows and the resulting trace_file_name
> will include a wrong pid:
>
> trace/simple.c:332:9: warning:
>  format ‘%lld’ expects argument of type ‘long long int’,
>  but argument 2 has type ‘int’ [-Wformat=]
>
> Signed-off-by: Stefan Weil 
> ---
>  trace/simple.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/trace/simple.c b/trace/simple.c
> index 11ad030..56a624c 100644
> --- a/trace/simple.c
> +++ b/trace/simple.c
> @@ -329,7 +329,8 @@ bool st_set_trace_file(const char *file)
>  g_free(trace_file_name);
>  
>  if (!file) {
> -trace_file_name = g_strdup_printf(CONFIG_TRACE_FILE, getpid());
> +/* Type cast needed for Windows where getpid() returns an int. */
> +trace_file_name = g_strdup_printf(CONFIG_TRACE_FILE, 
> (pid_t)getpid());
>  } else {
>  trace_file_name = g_strdup_printf("%s", file);
>  }

First we go to the trouble of defining a platform-dependent FMT_pid, and
then we get to cast anyway.  Meh.

Can you explain why osdep.h's

#define FMT_pid "%" PRId64

is appropriate for Windows?

Re: [Qemu-devel] [PATCH v11 09/12] netfilter: add a netbuffer filter

2015-09-28 Thread Markus Armbruster

Jason Wang  writes:

> On 09/25/2015 11:07 PM, Markus Armbruster wrote:
>> Yang Hongyang  writes:
>>
>>> On 09/24/2015 05:12 PM, Markus Armbruster wrote:
 Yang Hongyang  writes:

> This filter is to buffer/release packets, this feature can be used
> when using MicroCheckpointing, or other Remus like VM FT solutions, you
 What's "Remus"?
>
> [...]
>

> +
> +static void filter_buffer_release_timer(void *opaque)
> +{
> +NetFilterState *nf = opaque;
> +FilterBufferState *s = FILTER_BUFFER(nf);
 Style nit: blank line between declarations and statements, please.

> +filter_buffer_flush(nf);
 Is purging correct here?
>> When the timer expires, we flush as many buffered packets as we can,
>> then throw away the rest.  Why throw them away?  Shouldn't we leave them
>> in the buffer, and only throw away packets when the buffer is full?
>
> May need a "FIXME" or "TODO" here. I think this is for simplicity. We
> could queue the packet if the receiver or next filter could not receive
> packets. But currently there's no way for the next filter or recivier to
> notify us that it can receive more packet. This could be done in the future.

Good enough for me.  Make it FIXME if purging packets is actually wrong,
else TODO.

Re: [Qemu-devel] feature idea: allow user to run custom scripts

2015-09-28 Thread Markus Armbruster

Programmingkid  writes:

> On Sep 27, 2015, at 10:30 PM, Michael Roth wrote:
>
>> Quoting Programmingkid (2015-09-27 20:49:24)
>>> 
>>> On Sep 27, 2015, at 2:53 PM, Peter Crosthwaite wrote:
>>> 
 On Sun, Sep 27, 2015 at 3:13 AM, Peter Maydell
  wrote:
> On 27 September 2015 at 04:39, Programmingkid
>  wrote:
>> Would you be open to a feature that allows the user to select
>> and run a custom file that has commands in it that would run
>> in the monitor?
> 
> Sounds like a VM management layer feature.
> 
 
 Should -monitor file:/foo/bar do something like this?
>>> 
>>> If you are saying this command line argument loads monitor commands,
>>> then it would only work when QEMU is first started. The feature I want
>>> would work anytime during the running of QEMU. 
>> 
>> For that sort of flexibility I think writing commands to a socket via
>> a script/program is simple enough that an additional interface doesn't
>> seem worthwhile. Even our qtest unit tests use this approach. Plus you
>> get the flexibility of a being able to branch based on the return
>> value of commands (error-handling, stateful commands, incorporating
>> output from a serial console, etc.). It seems like a nice feature but
>> it's vastly inferior to what's possible with an external driver.
>
> How many people know how to communicate with QEMU via a socket?
> How do you even do it? It doesn't sound very easy to do.

It's easy, as QEMU command line goes:

-qmp unix:test-hmp,server,nowait

This is syntactic sugar for something like

-chardev socket,id=compat_monitor1,path=sock-qmp,server=on,wait=off
-mon mode=control,chardev=compat_monitor1

The long form is more flexible.  If you use it, don't use
id=compat_monitor1, obviously.

Easier on the eyes as configuration file for -readconfig:

[chardev "qmp"]
  backend = "socket"
  path = "sock-qmp"
  server = "on"
  wait = "off"

[mon "qmp"]
  mode = "control"
  chardev = "qmp"

>  A menu item
> that displays a file open dialog is very easy to use. The user just selects
> a file and QEMU loads and runs all the commands in it. This feature
> would make QEMU easier to use. It would also make QEMU easily
> expandable. Typing long commands in the monitor is difficult and
> error prone. Saving these commands in a file would make it much
> easier for the user. An example command someone could put in a
> file is sending Control-Alt-Delete to the emulator. Another command
> could be mounting an image file. This feature would make things
> much easier for the user.

You didn't mention you're talking about a *GUI* feature.

Re: [Qemu-devel] [RFC v5 1/6] exec.c: Add new exclusive bitmap to ram_list

2015-09-28 Thread alvise rigo

On Sat, Sep 26, 2015 at 7:15 PM, Richard Henderson  wrote:

> On 09/24/2015 01:32 AM, Alvise Rigo wrote:
> > +if (cpu == smp_cpus) {
> > +if (smp_cpus >= EXCL_BITMAP_CELL_SZ) {
> > +return bitmap[EXCL_BITMAP_GET_BYTE_OFFSET(addr)];
> > +} else {
> > +return bitmap[EXCL_BITMAP_GET_BYTE_OFFSET(addr)] &
> > +((1 << smp_cpus) - 1);
> > +}
> > +} else {
> > +return bitmap[EXCL_BITMAP_GET_BYTE_OFFSET(addr)] & (1 <<
> EXCL_IDX(cpu));
> > +}
>
> How can more than one cpu have the same address exclusively?
>

The bitmap is used to track which cpus have the EXCL flag set for that
particular page.
A bit set to 0 assures that all the corresponding cpus are following the
slow-path in that page.

>
> Isn't this scheme giving a whole page to a cpu, not a cacheline?
>

The actual exclusive range in set in CPUArchState, this is the information
that we evaluate to see whether there was a conflict or not.

Regards,
alvise

> That's going to cause ll/sc conflicts where real hardware wouldn't.
>
>
>
> r~
>

Re: [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices

2015-09-28 Thread Markus Armbruster

Thomas Huth  writes:

> On 25/09/15 16:17, Markus Armbruster wrote:
>> Thomas Huth  writes:
>> 
>>> On 24/09/15 20:57, Markus Armbruster wrote:
 Several devices don't survive object_unref(object_new(T)): they crash
 or hang during cleanup, or they leave dangling pointers behind.

 This breaks at least device-list-properties, because
 qmp_device_list_properties() needs to create a device to find its
 properties.  Broken in commit f4eb32b "qmp: show QOM properties in
 device-list-properties", v2.1.  Example reproducer:

 $ qemu-system-aarch64 -nodefaults -display none -machine none
 -S -qmp stdio
 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4,
 "major": 2}, "package": ""}, "capabilities": []}}
 { "execute": "qmp_capabilities" }
 {"return": {}}
 { "execute": "device-list-properties", "arguments": {
 "typename": "pxa2xx-pcmcia" } }
 qemu-system-aarch64: /home/armbru/work/qemu/memory.c:1307:
 memory_region_finalize: Assertion `((&mr->subregions)->tqh_first
 == ((void *)0))' failed.
 Aborted (core dumped)
 [Exit 134 (SIGABRT)]

 Unfortunately, I can't fix the problems in these devices right now.
 Instead, add DeviceClass member cannot_even_create_with_object_new_yet
 to mark them:
> ...
  static void pxa2xx_pcmcia_register_types(void)
 diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
 index ed43d5e..e1b115d 100644
 --- a/hw/ppc/spapr_rng.c
 +++ b/hw/ppc/spapr_rng.c
 @@ -169,6 +169,11 @@ static void spapr_rng_class_init(ObjectClass *oc, 
 void *data)
  dc->realize = spapr_rng_realize;
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
  dc->props = spapr_rng_properties;
 +
 +/*
 + * Reason: crashes device-introspect-test for unknown reason.
 + */
 +dc->cannot_even_create_with_object_new_yet = true;
  }
>>>
>>> Please don't do that! That breaks the help output from
>>> "-device spapr-rng,?" which should help the user to see how to use this
>>> device!
>> 
>> Well, device-introspection-test makes qemu crash, with the backtrace
>> pointing squarely to this device.  Stands to reason that device
>> introspection could crash in normal usage, too.  Until the crash is
>> debugged, we better disable introspection of this device.
>> 
>> I quite agree that disabling introspection hurts users.  Just not as
>> much as crashes :)
>> 
>>> I tried to debug why this device breaks the test, but the test
>>> environment is giving me a hard time ... how do you best hook a gdb into
>>> that framework, so you can trace such problems?
>>> Anyway, with some trial and error, I found out that it seems like the
>>>
>>>   object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)
>>>
>>> in spapr_rng_instance_init() is causing the problems. Could it be that
>>> object_resolve_path_type is not working with the test environment?
>> 
>> I tried to figure out why this device breaks under this test, but
>> couldn't, so I posted with the "for unknown reason" comment.
>
> I've debugged this now for a while (thanks for the tip with
> MALLOC_PERTURB, by the way!) and it seems to me that the problem is in
> the macio object than in spapr-rng - the latter is just the victim of
> some memory corruption caused by the first one: The
> object_resolve_path_type() crashes while trying to go through the macio
> object.
>
> So could you please add the "dc->cannot_even_create_with_object_new_yet
> = true;" to macio_class_init() instead? ... that seems to fix the crash
> for me, too, and is likely the better place.

Hmm.

For most of the devices my patch marks, we have a pretty good idea on
what's wrong with them.  spapr-rng is among the exceptions.  You believe
it's actually "the macio object".  Which one?  "macio" is abstract...

You report introspecting "spapr-rng" crashes "while trying to go through
the macio object".  I wonder how omitting introspection of macio objects
(that's what marking them does to this test) could affect the object
we're going through when we crash.

> Or maybe we could get this also fixed? The problem could be the
> memory_region_init(&s->bar, NULL, "macio", 0x8) in
> macio_instance_init() ... is this ok here? Or does this rather have to
> go to the realize() function instead?

Hmm, does creating and destroying a macio object leave the memory region
behind?

Paolo, is calling memory_region_init() in an instance_init() method
okay?

If yes, where should they be destroyed, and how?

If no, we should search for the erroneous pattern and mark the
offenders.

Some more evidence for macio's culpability: valgrind lets me happily
introspect spapr-rng as often as I want, but once I introspected
macio-newworld, further introspection of spapr-rng throws "Invalid read"
errors.

Re: [Qemu-devel] [RFC 2/6] monitor: Simplify event throttling

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/25/2015 08:00 AM, Markus Armbruster wrote:
>> The event throttling state machine is hard to understand.  I'm not
>> sure it's entirely correct.  Rewrite it in a more straightforward
>> manner:
>> 
>> State 1: No event sent recently (less than evconf->rate ns ago)
>> 
>> Invariant: evstate->timer is not pending, evstate->qdict is null
>> 
>> On event: send event, arm timer, goto state 2
>> 
>> State 2: Event sent recently, no additional event being delayed
>> 
>> Invariant: evstate->timer is pending, evstate->qdict is null
>> 
>> On event: store it in evstate->qdict, goto state 3
>> 
>> On timer: goto state 1
>> 
>> State 3: Event sent recently, additional event being delayed
>> 
>> Invariant: evstate->timer is pending, evstate->qdict is non-null
>> 
>> On event: store it in evstate->qdict, goto state 3
>> 
>> On timer: send evstate->qdict, clear evstate->qdict,
>>   arm timer, goto state 2
>> 
>
> Makes sense for what throttling is supposed to be.
>
>> FIXME update trace-event (and recompile the world)
>
> Obviously something you'd fold into a v2, so I'll defer R-b until seeing
> it.  But I like the approach of this patch.
>
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  monitor.c | 70 
>> +++
>>  1 file changed, 35 insertions(+), 35 deletions(-)
>> 
>
>>  if (!evstate->rate) {
>> +/* Unthrottled event */
>>  monitor_qapi_event_emit(event, qdict);
>> -evstate->last = now;
>>  } else {
>> -int64_t delta = now - evstate->last;
>> -if (evstate->qdict ||
>> -delta < evstate->rate) {
>> -/* If there's an existing event pending, replace
>> - * it with the new event, otherwise schedule a
>> - * timer for delayed emission
>> +if (timer_pending(evstate->timer)) {
>
> Possible in states 2 and 3...
>
>> +/*
>> + * Timer is pending for (at least) evstate->rate ns after
>> + * last send.  Store event for sending when timer fires,
>> + * replacing a prior stored event if any.
>>   */
>> -if (evstate->qdict) {
>> -QDECREF(evstate->qdict);
>> -} else {
>> -int64_t then = evstate->last + evstate->rate;
>> -timer_mod_ns(evstate->timer, then);
>> -}
>> +QDECREF(evstate->qdict);
>
> no-op in state 2, otherwise replaces the old pending event in state 3
>
>>  evstate->qdict = qdict;
>>  QINCREF(evstate->qdict);
>
> either way, we are now in state 3 awaiting the next event or timer.
>
>>  } else {
>
> we are in state 1...
>
>> +/*
>> + * Last send was (at least) evstate->rate ns ago.
>> + * Send immediately, and arm the timer to call
>> + * monitor_qapi_event_handler() in evstate->rate ns.  Any
>> + * events arriving before then will be delayed until then.
>> + */
>> +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> +
>>  monitor_qapi_event_emit(event, qdict);
>> -evstate->last = now;
>> +timer_mod_ns(evstate->timer, now + evstate->rate);
>
> and now in state 2.
>
>>  }
>>  }
>> +
>>  qemu_mutex_unlock(&monitor_lock);
>>  }
>>  
>>  /*
>> - * The callback invoked by QemuTimer when a delayed
>> - * event is ready to be emitted
>> + * This function runs evstate->rate ns after sending a throttled
>> + * event.
>> + * If another event has since been stored, send it.
>>   */
>>  static void monitor_qapi_event_handler(void *opaque)
>>  {
>
> We get here in either state 2 or 3...
>
>>  MonitorQAPIEventState *evstate = opaque;
>> -int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>  
>>  trace_monitor_protocol_event_handler(evstate->event,
>>   evstate->qdict,
>> - evstate->last,
>> - now);
>> + 0, 0); /* FIXME drop */
>>  qemu_mutex_lock(&monitor_lock);
>> +
>>  if (evstate->qdict) {
>
> state 3, so send it...
>
>> +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> +
>>  monitor_qapi_event_emit(evstate->event, evstate->qdict);
>>  QDECREF(evstate->qdict);
>>  evstate->qdict = NULL;
>> +timer_mod_ns(evstate->timer, now + evstate->rate);
>
> ...and rearm to go back to state 2
>
>>  }
>> -evstate->last = now;
>> +
>
> ...otherwise, we didn't rearm, so we go back to state 1
>
>>  qemu_mutex_unlock(&monitor_lock);
>>  }
>>  
>
> Matches the state logic called out in the commit message.
>
>> @@ -539,20 +542,17 @@ static void
>>  monitor_qapi_event_throttle(QAPIEvent event, int64_t rate)
>>  {
>>  MonitorQAPIEventState *evstate;
>> +
>>

Re: [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices

2015-09-28 Thread Andreas Färber

Am 28.09.2015 um 10:11 schrieb Markus Armbruster:
> Hmm, does creating and destroying a macio object leave the memory region
> behind?
> 
> Paolo, is calling memory_region_init() in an instance_init() method
> okay?
> 
> If yes, where should they be destroyed, and how?

The counterpart to .instance_init is .instance_finalize aka uninit.

Cheers,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)

Re: [Qemu-devel] [RFC 3/6] monitor: Split MonitorQAPIEventConf off MonitorQAPIEventState

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/25/2015 08:00 AM, Markus Armbruster wrote:
>> In preparation of turning monitor_qapi_event_state[] into a hash table
>> for finer grained throttling.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  monitor.c | 79 
>> ++-
>>  1 file changed, 38 insertions(+), 41 deletions(-)
>> 
>
>> -/*
>> - * @event: the event ID to be limited
>> - * @rate: the rate limit in milliseconds
>> - *
>> - * Sets a rate limit on a particular event, so no
>> - * more than 1 event will be emitted within @rate
>> - * milliseconds
>> - */
>> -static void
>> -monitor_qapi_event_throttle(QAPIEvent event, int64_t rate)
>> -{
>
> At one point, I think we were considering having the rate be
> user-tunable through a qom property, in which case this function is
> still nicer than a monitor_qapi_event_conf[] fixed-rate table.  But
> since I don't think we ever used it, I'm fine with dropping the
> complexity and living with a constant rate.

I'm no friend on complicating stuff now to hopefully facilitate things
we might want to do some day, but don't really understand now.

Say we provide a way for the user to configure the rate.  Say you
accidentally set the rate to a month, and now want to fix your blunder
by setting it to a minute.  If an event has already arrived, the timer
is currently set to expire next month, and your fixed rate will take
effect then.

My point is configuring the rate could be more complicated than storing
the new rate, and keeping monitor_qapi_event_throttle() around won't
make the job materially easier.

> Reviewed-by: Eric Blake 

Thanks!

Re: [Qemu-devel] [PATCH] trace/simple: Fix warning and wrong trace file name for MinGW

2015-09-28 Thread Stefan Weil

Am 28.09.2015 um 09:00 schrieb Markus Armbruster:
> Stefan Weil  writes:
>
>> On Windows, getpid() always returns an int value, but pid_t (which is
>> expected by the format string) is either a 32 bit or a 64 bit value.
>>
>> Without a type cast (or a modified format string), the compiler prints
>> a warning when building for 64 bit Windows and the resulting trace_file_name
>> will include a wrong pid:
>>
>> trace/simple.c:332:9: warning:
>>  format ‘%lld’ expects argument of type ‘long long int’,
>>  but argument 2 has type ‘int’ [-Wformat=]
>>
>> Signed-off-by: Stefan Weil 
>> ---
>>  trace/simple.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/trace/simple.c b/trace/simple.c
>> index 11ad030..56a624c 100644
>> --- a/trace/simple.c
>> +++ b/trace/simple.c
>> @@ -329,7 +329,8 @@ bool st_set_trace_file(const char *file)
>>  g_free(trace_file_name);
>>  
>>  if (!file) {
>> -trace_file_name = g_strdup_printf(CONFIG_TRACE_FILE, getpid());
>> +/* Type cast needed for Windows where getpid() returns an int. */
>> +trace_file_name = g_strdup_printf(CONFIG_TRACE_FILE, 
>> (pid_t)getpid());
>>  } else {
>>  trace_file_name = g_strdup_printf("%s", file);
>>  }
> First we go to the trouble of defining a platform-dependent FMT_pid, and
> then we get to cast anyway.  Meh.
>
> Can you explain why osdep.h's
>
> #define FMT_pid "%" PRId64
>
> is appropriate for Windows?

Don't blame me for any strangeness which you might find in Windows. :-)

Mingw-w64 sys/types.h defines pid_t to be either an int or an __int64.
FMT_pid must match these definitions.

But getpid returns an int, not a pid_t...

Re: [Qemu-devel] [RFC 4/6] monitor: Turn monitor_qapi_event_state[] into a hash table

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/25/2015 08:00 AM, Markus Armbruster wrote:
>> In preparation of finer grained throttling.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  monitor.c | 55 ++-
>>  1 file changed, 38 insertions(+), 17 deletions(-)
>> 
>
>> @@ -512,6 +518,14 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, 
>> Error **errp)
>>  int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>  
>>  monitor_qapi_event_emit(event, qdict);
>> +
>> +evstate = g_new(MonitorQAPIEventState, 1);
>> +evstate->event = event;
>> +evstate->qdict = NULL;
>> +evstate->timer = timer_new_ns(QEMU_CLOCK_REALTIME,
>> +  monitor_qapi_event_handler,
>> +  evstate);
>
> Now timers are created and destroyed dynamically upon use rather than
> reused and just waiting to be rearmed; I hope there aren't any obvious
> inefficiencies from doing that.

I hope there aren't any unobvious inefficienies!  :)

If I read qemu-timer.c correctly, creating and destroying timers is
cheap.

Hmm, looks like I'm missing a timer_del() before timer_free().

> The conversion looks sane:
> Reviewed-by: Eric Blake 

Thanks!

Re: [Qemu-devel] [RFC 5/6] monitor: Throttle event VSERPORT_CHANGE separately by "id"

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/25/2015 08:00 AM, Markus Armbruster wrote:
>> VSERPORT_CHANGE is emitted when the guest opens or closes a
>> virtio-serial port.  The event's member "id" identifies the port.
>> 
>> When several events arrive quickly, throttling drops all but the last
>> of them.  Because of that, a QMP client must assume that *any* port
>> may have changed state when it receives a VSERPORT_CHANGE event and
>> throttling may have happened.
>> 
>> Make the event more useful by throttling it for each port separately.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  monitor.c | 18 --
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>> 
>
> All future differentiation would be added as additional special cases
> within the hash functions, but I like this approach for keeping the rest
> of the algorithm independent from what the hashing considers as equivalent.

Should that ever become unwieldy, we can easily add indirections right
there.

> Reviewed-by: Eric Blake 

Thanks!

[Qemu-devel] [PATCH v12 00/10] Add a netfilter object and netbuffer filter

2015-09-28 Thread Yang Hongyang

This patch add an netfilter abstract object, captures all network packets
on associated netdev. Also implement a concrete filter buffer based on
this abstract object. the "buffer" netfilter could be used by VM FT solutions
like MicroCheckpointing, to buffer/release packets. Or to simulate
packet delay.

You can also get the series from:
https://github.com/macrosheep/qemu/tree/netfilter-v12

Usage:
 -netdev tap,id=bn0
 -device e1000,netdev=bn0
 -object filter-buffer,id=f0,netdev=bn0,queue=rx,interval=1000

dynamically add/remove netfilters:
 object_add filter-buffer,id=f0,netdev=bn0,queue=rx,interval=1000
 object_del f0

NOTE:
 interval is in microseconds and can't be omiited.
 queue is optional, and is one of rx|tx|all, default is "all". See
 enum NetFilterDirection for detail.

v12:
 - Address Markus's comments.
 - Rebased to the latest master.

v11:
 - address Jason&Daniel's comments
 - add multiqueue support, the last 2 patches
 - rebased to the latest master

v10:
 - Reimplemented using QOM (suggested by stefan)
 - Do not export NetQueue internals (suggested by stefan)
 - see individual patch for detail

v9:
 - squash command description and help to patch 1&3
 - qapi changes according to Markus&Eric's comments
 - see individual patch for detail

v8:
 - some minor fixes according to Thomas's comments
 - rebased to the latest master branch

v7:
 - print filter info when execute 'info network'
 - addressed Jason's comments

v6:
 - add multiqueue support, please see individual patch for detail

v5:
 - add a sent_cb param to filter receive_iov api
 - squash the 4th patch into patch 3
 - remove dummy sent_cb (buffer filter)
 - addressed Jason's other comments, see individual patches for detail

v4:
 - get rid of struct Filter
 - squash the 4th patch into patch 2
 - fix qemu_netfilter_pass_to_next_iov
 - get rid of bh (buffer filter)
 - release the packet to next filter instead of to receiver (buffer filter)

v3:
 - add an api to pass the packet to next filter
 - remove netfilters when delete netdev
 - add qtest testcases for netfilter
 - addressed comments from Jason

v2:
 - add a chain option to netfilter object
 - move the hook place earlier, before net_queue_send
 - drop the unused api in buffer filter
 - squash buffer filter patches into one
 - remove receive() api from netfilter, only receive_iov() is enough
 - addressed comments from Jason&Thomas

v1:
 initial patch.

Yang Hongyang (10):
  vl.c: init delayed object after net_init_clients
  init/cleanup of netfilter object
  netfilter: hook packets before net queue send
  net: merge qemu_deliver_packet and qemu_deliver_packet_iov
  net/queue: introduce NetQueueDeliverFunc
  netfilter: add an API to pass the packet to next filter
  netfilter: print filter info associate with the netdev
  net/queue: export qemu_net_queue_append_iov
  netfilter: add a netbuffer filter
  tests: add test cases for netfilter object

 include/net/filter.h|  78 
 include/net/net.h   |   6 +-
 include/net/queue.h |  20 -
 include/qemu/typedefs.h |   1 +
 net/Makefile.objs   |   2 +
 net/filter-buffer.c | 187 ++
 net/filter.c| 233 
 net/net.c   | 121 +++--
 net/queue.c |  24 +++--
 qapi-schema.json|  19 
 qemu-options.hx |  14 +++
 tests/.gitignore|   1 +
 tests/Makefile  |   2 +
 tests/test-netfilter.c  | 200 +
 vl.c|  17 ++--
 15 files changed, 876 insertions(+), 49 deletions(-)
 create mode 100644 include/net/filter.h
 create mode 100644 net/filter-buffer.c
 create mode 100644 net/filter.c
 create mode 100644 tests/test-netfilter.c

-- 
1.9.1

Re: [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices

2015-09-28 Thread Paolo Bonzini

On 28/09/2015 10:11, Markus Armbruster wrote:
> 
> For most of the devices my patch marks, we have a pretty good idea on
> what's wrong with them.  spapr-rng is among the exceptions.  You believe
> it's actually "the macio object".  Which one?  "macio" is abstract...
> 
> You report introspecting "spapr-rng" crashes "while trying to go through
> the macio object".  I wonder how omitting introspection of macio objects
> (that's what marking them does to this test) could affect the object
> we're going through when we crash.
> 
>> > Or maybe we could get this also fixed? The problem could be the
>> > memory_region_init(&s->bar, NULL, "macio", 0x8) in
>> > macio_instance_init() ... is this ok here? Or does this rather have to
>> > go to the realize() function instead?
> Hmm, does creating and destroying a macio object leave the memory region
> behind?
> 
> Paolo, is calling memory_region_init() in an instance_init() method
> okay?

Yes, but it has to have a non-NULL owner.

The reason why this particular call has a NULL owner is that the
(non-qdevified) DBDMA_init object inside it is also passing a NULL
owner.  DBDMA_init object is also doing a few more non-idempotent things
such as a malloc, a vmstate_register and a qemu_register_reset.

The full solution would be to qdev-ify DBDMA.  A simpler but also valid
solution would be to move DBDMA_init to macio_common_realize, in
addition to setting the owner of s->bar.

Paolo

[Qemu-devel] [PATCH v12 03/10] netfilter: hook packets before net queue send

2015-09-28 Thread Yang Hongyang

Capture packets that will be sent.

Signed-off-by: Yang Hongyang 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 include/net/filter.h |  8 +++
 net/filter.c | 17 ++
 net/net.c| 66 
 3 files changed, 91 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index e3e14ea..a4c56fd 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -59,4 +59,12 @@ struct NetFilterState {
 QTAILQ_ENTRY(NetFilterState) next;
 };
 
+ssize_t qemu_netfilter_receive(NetFilterState *nf,
+   NetFilterDirection direction,
+   NetClientState *sender,
+   unsigned flags,
+   const struct iovec *iov,
+   int iovcnt,
+   NetPacketSent *sent_cb);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index 34e32cd..4c63556 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -15,6 +15,23 @@
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
 
+ssize_t qemu_netfilter_receive(NetFilterState *nf,
+   NetFilterDirection direction,
+   NetClientState *sender,
+   unsigned flags,
+   const struct iovec *iov,
+   int iovcnt,
+   NetPacketSent *sent_cb)
+{
+if (nf->direction == direction ||
+nf->direction == NET_FILTER_DIRECTION_ALL) {
+return NETFILTER_GET_CLASS(OBJECT(nf))->receive_iov(
+   nf, sender, flags, iov, iovcnt, sent_cb);
+}
+
+return 0;
+}
+
 static char *netfilter_get_netdev_id(Object *obj, Error **errp)
 {
 NetFilterState *nf = NETFILTER(obj);
diff --git a/net/net.c b/net/net.c
index 033f4f3..e27643d 100644
--- a/net/net.c
+++ b/net/net.c
@@ -561,6 +561,44 @@ int qemu_can_send_packet(NetClientState *sender)
 return 1;
 }
 
+static ssize_t filter_receive_iov(NetClientState *nc,
+  NetFilterDirection direction,
+  NetClientState *sender,
+  unsigned flags,
+  const struct iovec *iov,
+  int iovcnt,
+  NetPacketSent *sent_cb)
+{
+ssize_t ret = 0;
+NetFilterState *nf = NULL;
+
+QTAILQ_FOREACH(nf, &nc->filters, next) {
+ret = qemu_netfilter_receive(nf, direction, sender, flags, iov,
+ iovcnt, sent_cb);
+if (ret) {
+return ret;
+}
+}
+
+return ret;
+}
+
+static ssize_t filter_receive(NetClientState *nc,
+  NetFilterDirection direction,
+  NetClientState *sender,
+  unsigned flags,
+  const uint8_t *data,
+  size_t size,
+  NetPacketSent *sent_cb)
+{
+struct iovec iov = {
+.iov_base = (void *)data,
+.iov_len = size
+};
+
+return filter_receive_iov(nc, direction, sender, flags, &iov, 1, sent_cb);
+}
+
 ssize_t qemu_deliver_packet(NetClientState *sender,
 unsigned flags,
 const uint8_t *data,
@@ -632,6 +670,7 @@ static ssize_t 
qemu_send_packet_async_with_flags(NetClientState *sender,
  NetPacketSent *sent_cb)
 {
 NetQueue *queue;
+int ret;
 
 #ifdef DEBUG_NET
 printf("qemu_send_packet_async:\n");
@@ -642,6 +681,19 @@ static ssize_t 
qemu_send_packet_async_with_flags(NetClientState *sender,
 return size;
 }
 
+/* Let filters handle the packet first */
+ret = filter_receive(sender, NET_FILTER_DIRECTION_TX,
+ sender, flags, buf, size, sent_cb);
+if (ret) {
+return ret;
+}
+
+ret = filter_receive(sender->peer, NET_FILTER_DIRECTION_RX,
+ sender, flags, buf, size, sent_cb);
+if (ret) {
+return ret;
+}
+
 queue = sender->peer->incoming_queue;
 
 return qemu_net_queue_send(queue, sender, flags, buf, size, sent_cb);
@@ -712,11 +764,25 @@ ssize_t qemu_sendv_packet_async(NetClientState *sender,
 NetPacketSent *sent_cb)
 {
 NetQueue *queue;
+int ret;
 
 if (sender->link_down || !sender->peer) {
 return iov_size(iov, iovcnt);
 }
 
+/* Let filters handle the packet first */
+ret = filter_receive_iov(sender, NET_FILTER_DIRECTION_TX, sender,
+ QEMU_NET_PACKET_FLAG_NONE, iov, iovcnt, sent_cb);
+if (ret) {
+return ret;
+}
+
+ret = filter_receive_iov(sender->peer, NET_FILTER_DIRECTION_RX, sender,
+

[Qemu-devel] [PATCH v12 01/10] vl.c: init delayed object after net_init_clients

2015-09-28 Thread Yang Hongyang

Init delayed object after net_init_clients, because netfilters need
to be initialized after net clients initialized.

Signed-off-by: Yang Hongyang 
---
 vl.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/vl.c b/vl.c
index e211f6a..6f27b64 100644
--- a/vl.c
+++ b/vl.c
@@ -2760,6 +2760,7 @@ static bool object_create_initial(const char *type)
 if (g_str_equal(type, "rng-egd")) {
 return false;
 }
+/* TODO: implement netfilters */
 return true;
 }
 
@@ -4279,12 +4280,6 @@ int main(int argc, char **argv, char **envp)
 exit(0);
 }
 
-if (qemu_opts_foreach(qemu_find_opts("object"),
-  object_create,
-  object_create_delayed, NULL)) {
-exit(1);
-}
-
 machine_opts = qemu_get_machine_opts();
 if (qemu_opt_foreach(machine_opts, machine_set_property, current_machine,
  NULL)) {
@@ -4390,6 +4385,12 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 
+if (qemu_opts_foreach(qemu_find_opts("object"),
+  object_create,
+  object_create_delayed, NULL)) {
+exit(1);
+}
+
 #ifdef CONFIG_TPM
 if (tpm_init() < 0) {
 exit(1);
-- 
1.9.1

[Qemu-devel] [PATCH v12 05/10] net/queue: introduce NetQueueDeliverFunc

2015-09-28 Thread Yang Hongyang

net/queue.c has logic to send/queue/flush packets but a
qemu_deliver_packet_iov() call is hardcoded. Abstract this
func so that we can use our own deliver function in netfilter.

Signed-off-by: Yang Hongyang 
Cc: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
---
 include/net/queue.h | 13 -
 net/net.c   |  2 +-
 net/queue.c |  8 +---
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/net/queue.h b/include/net/queue.h
index fc02b33..b4a7183 100644
--- a/include/net/queue.h
+++ b/include/net/queue.h
@@ -34,7 +34,18 @@ typedef void (NetPacketSent) (NetClientState *sender, 
ssize_t ret);
 #define QEMU_NET_PACKET_FLAG_NONE  0
 #define QEMU_NET_PACKET_FLAG_RAW  (1<<0)
 
-NetQueue *qemu_new_net_queue(void *opaque);
+/* Returns:
+ *   >0 - success
+ *0 - queue packet for future redelivery
+ *   <0 - failure (discard packet)
+ */
+typedef ssize_t (NetQueueDeliverFunc)(NetClientState *sender,
+  unsigned flags,
+  const struct iovec *iov,
+  int iovcnt,
+  void *opaque);
+
+NetQueue *qemu_new_net_queue(NetQueueDeliverFunc *deliver, void *opaque);
 
 void qemu_del_net_queue(NetQueue *queue);
 
diff --git a/net/net.c b/net/net.c
index 2f939f9..c0ebb13 100644
--- a/net/net.c
+++ b/net/net.c
@@ -286,7 +286,7 @@ static void qemu_net_client_setup(NetClientState *nc,
 }
 QTAILQ_INSERT_TAIL(&net_clients, nc, next);
 
-nc->incoming_queue = qemu_new_net_queue(nc);
+nc->incoming_queue = qemu_new_net_queue(qemu_deliver_packet_iov, nc);
 nc->destructor = destructor;
 QTAILQ_INIT(&nc->filters);
 }
diff --git a/net/queue.c b/net/queue.c
index cf8db3a..16dddf0 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -52,13 +52,14 @@ struct NetQueue {
 void *opaque;
 uint32_t nq_maxlen;
 uint32_t nq_count;
+NetQueueDeliverFunc *deliver;
 
 QTAILQ_HEAD(packets, NetPacket) packets;
 
 unsigned delivering : 1;
 };
 
-NetQueue *qemu_new_net_queue(void *opaque)
+NetQueue *qemu_new_net_queue(NetQueueDeliverFunc *deliver, void *opaque)
 {
 NetQueue *queue;
 
@@ -67,6 +68,7 @@ NetQueue *qemu_new_net_queue(void *opaque)
 queue->opaque = opaque;
 queue->nq_maxlen = 1;
 queue->nq_count = 0;
+queue->deliver = deliver;
 
 QTAILQ_INIT(&queue->packets);
 
@@ -158,7 +160,7 @@ static ssize_t qemu_net_queue_deliver(NetQueue *queue,
 };
 
 queue->delivering = 1;
-ret = qemu_deliver_packet_iov(sender, flags, &iov, 1, queue->opaque);
+ret = queue->deliver(sender, flags, &iov, 1, queue->opaque);
 queue->delivering = 0;
 
 return ret;
@@ -173,7 +175,7 @@ static ssize_t qemu_net_queue_deliver_iov(NetQueue *queue,
 ssize_t ret = -1;
 
 queue->delivering = 1;
-ret = qemu_deliver_packet_iov(sender, flags, iov, iovcnt, queue->opaque);
+ret = queue->deliver(sender, flags, iov, iovcnt, queue->opaque);
 queue->delivering = 0;
 
 return ret;
-- 
1.9.1

[Qemu-devel] [PATCH v12 02/10] init/cleanup of netfilter object

2015-09-28 Thread Yang Hongyang

Add a netfilter object based on QOM.

A netfilter is attached to a netdev, captures all network packets
that pass through the netdev. When we delete the netdev, we also
delete the netfilter object attached to it, because if the netdev is
removed, the filter which attached to it is useless.

QTAILQ_ENTRY next used by netdev, filter belongs to the specific netdev is
in this queue.

Signed-off-by: Yang Hongyang 
---
 include/net/filter.h|  62 ++
 include/net/net.h   |   1 +
 include/qemu/typedefs.h |   1 +
 net/Makefile.objs   |   1 +
 net/filter.c| 138 
 net/net.c   |   7 +++
 qapi-schema.json|  19 +++
 7 files changed, 229 insertions(+)
 create mode 100644 include/net/filter.h
 create mode 100644 net/filter.c

diff --git a/include/net/filter.h b/include/net/filter.h
new file mode 100644
index 000..e3e14ea
--- /dev/null
+++ b/include/net/filter.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Author: Yang Hongyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NET_FILTER_H
+#define QEMU_NET_FILTER_H
+
+#include "qom/object.h"
+#include "qemu-common.h"
+#include "qemu/typedefs.h"
+#include "net/queue.h"
+
+#define TYPE_NETFILTER "netfilter"
+#define NETFILTER(obj) \
+OBJECT_CHECK(NetFilterState, (obj), TYPE_NETFILTER)
+#define NETFILTER_GET_CLASS(obj) \
+OBJECT_GET_CLASS(NetFilterClass, (obj), TYPE_NETFILTER)
+#define NETFILTER_CLASS(klass) \
+OBJECT_CLASS_CHECK(NetFilterClass, (klass), TYPE_NETFILTER)
+
+typedef void (FilterSetup) (NetFilterState *nf, Error **errp);
+typedef void (FilterCleanup) (NetFilterState *nf);
+/*
+ * Return:
+ *   0: finished handling the packet, we should continue
+ *   size: filter stolen this packet, we stop pass this packet further
+ */
+typedef ssize_t (FilterReceiveIOV)(NetFilterState *nc,
+   NetClientState *sender,
+   unsigned flags,
+   const struct iovec *iov,
+   int iovcnt,
+   NetPacketSent *sent_cb);
+
+struct NetFilterClass {
+ObjectClass parent_class;
+
+/* optional */
+FilterSetup *setup;
+FilterCleanup *cleanup;
+/* mandatory */
+FilterReceiveIOV *receive_iov;
+};
+typedef struct NetFilterClass NetFilterClass;
+
+
+struct NetFilterState {
+/* private */
+Object parent;
+
+/* protected */
+char *netdev_id;
+NetClientState *netdev;
+NetFilterDirection direction;
+QTAILQ_ENTRY(NetFilterState) next;
+};
+
+#endif /* QEMU_NET_FILTER_H */
diff --git a/include/net/net.h b/include/net/net.h
index 6a6cbef..36e5fab 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -92,6 +92,7 @@ struct NetClientState {
 NetClientDestructor *destructor;
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
+QTAILQ_HEAD(, NetFilterState) filters;
 };
 
 typedef struct NICState {
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 3a835ff..ee1ce1d 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -45,6 +45,7 @@ typedef struct Monitor Monitor;
 typedef struct MouseTransformInfo MouseTransformInfo;
 typedef struct MSIMessage MSIMessage;
 typedef struct NetClientState NetClientState;
+typedef struct NetFilterState NetFilterState;
 typedef struct NICInfo NICInfo;
 typedef struct PcGuestInfo PcGuestInfo;
 typedef struct PCIBridge PCIBridge;
diff --git a/net/Makefile.objs b/net/Makefile.objs
index ec19cb3..914aec0 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -13,3 +13,4 @@ common-obj-$(CONFIG_HAIKU) += tap-haiku.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
 common-obj-$(CONFIG_NETMAP) += netmap.o
+common-obj-y += filter.o
diff --git a/net/filter.c b/net/filter.c
new file mode 100644
index 000..34e32cd
--- /dev/null
+++ b/net/filter.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Author: Yang Hongyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+
+#include "net/filter.h"
+#include "net/net.h"
+#include "net/vhost_net.h"
+#include "qom/object_interfaces.h"
+
+static char *netfilter_get_netdev_id(Object *obj, Error **errp)
+{
+NetFilterState *nf = NETFILTER(obj);
+
+return g_strdup(nf->netdev_id);
+}
+
+static void netfilter_set_netdev_id(Object *obj, const char *str, Error **errp)
+{
+NetFilterState *nf = NETFILTER(obj);
+
+nf->netdev_id = g_strdup(str);
+}
+
+static int netfilter_get_direction(Object *obj, Error **errp G_GNUC_UNUSED)
+{
+NetFilterState *nf = NETFILTER(obj);
+return nf->direction

[Qemu-devel] [PATCH v12 06/10] netfilter: add an API to pass the packet to next filter

2015-09-28 Thread Yang Hongyang

add an API qemu_netfilter_pass_to_next() to pass the packet
to next filter.

Signed-off-by: Yang Hongyang 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 include/net/filter.h |  7 +++
 net/filter.c | 58 
 2 files changed, 65 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index a4c56fd..28a9ce2 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -67,4 +67,11 @@ ssize_t qemu_netfilter_receive(NetFilterState *nf,
int iovcnt,
NetPacketSent *sent_cb);
 
+/* pass the packet to the next filter */
+ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
+unsigned flags,
+const struct iovec *iov,
+int iovcnt,
+void *opaque);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index 4c63556..2d7939b 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -14,6 +14,7 @@
 #include "net/net.h"
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
+#include "qemu/iov.h"
 
 ssize_t qemu_netfilter_receive(NetFilterState *nf,
NetFilterDirection direction,
@@ -32,6 +33,63 @@ ssize_t qemu_netfilter_receive(NetFilterState *nf,
 return 0;
 }
 
+ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
+unsigned flags,
+const struct iovec *iov,
+int iovcnt,
+void *opaque)
+{
+int ret = 0;
+int direction;
+NetFilterState *nf = opaque;
+NetFilterState *next = QTAILQ_NEXT(nf, next);
+
+if (!sender || !sender->peer) {
+/* no receiver, or sender been deleted, no need to pass it further */
+goto out;
+}
+
+if (nf->direction == NET_FILTER_DIRECTION_ALL) {
+if (sender == nf->netdev) {
+/* This packet is sent by netdev itself */
+direction = NET_FILTER_DIRECTION_TX;
+} else {
+direction = NET_FILTER_DIRECTION_RX;
+}
+} else {
+direction = nf->direction;
+}
+
+while (next) {
+/*
+ * if qemu_netfilter_pass_to_next been called, means that
+ * the packet has been hold by filter and has already retured size
+ * to the sender, so sent_cb shouldn't be called later, just
+ * pass NULL to next.
+ */
+ret = qemu_netfilter_receive(next, direction, sender, flags, iov,
+ iovcnt, NULL);
+if (ret) {
+return ret;
+}
+next = QTAILQ_NEXT(next, next);
+}
+
+/*
+ * We have gone through all filters, pass it to receiver.
+ * Do the valid check again incase sender or receiver been
+ * deleted while we go through filters.
+ */
+if (sender && sender->peer) {
+qemu_net_queue_send_iov(sender->peer->incoming_queue,
+sender, flags, iov, iovcnt, NULL);
+}
+
+out:
+/* no receiver, or sender been deleted */
+return iov_size(iov, iovcnt);
+}
+
 static char *netfilter_get_netdev_id(Object *obj, Error **errp)
 {
 NetFilterState *nf = NETFILTER(obj);
-- 
1.9.1

[Qemu-devel] [PATCH v12 08/10] net/queue: export qemu_net_queue_append_iov

2015-09-28 Thread Yang Hongyang

This will be used by buffer filter implementation later to
queue packets.

Signed-off-by: Yang Hongyang 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 include/net/queue.h |  7 +++
 net/queue.c | 12 ++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/net/queue.h b/include/net/queue.h
index b4a7183..5469fdb 100644
--- a/include/net/queue.h
+++ b/include/net/queue.h
@@ -47,6 +47,13 @@ typedef ssize_t (NetQueueDeliverFunc)(NetClientState *sender,
 
 NetQueue *qemu_new_net_queue(NetQueueDeliverFunc *deliver, void *opaque);
 
+void qemu_net_queue_append_iov(NetQueue *queue,
+   NetClientState *sender,
+   unsigned flags,
+   const struct iovec *iov,
+   int iovcnt,
+   NetPacketSent *sent_cb);
+
 void qemu_del_net_queue(NetQueue *queue);
 
 ssize_t qemu_net_queue_send(NetQueue *queue,
diff --git a/net/queue.c b/net/queue.c
index 16dddf0..de8b9d3 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -112,12 +112,12 @@ static void qemu_net_queue_append(NetQueue *queue,
 QTAILQ_INSERT_TAIL(&queue->packets, packet, entry);
 }
 
-static void qemu_net_queue_append_iov(NetQueue *queue,
-  NetClientState *sender,
-  unsigned flags,
-  const struct iovec *iov,
-  int iovcnt,
-  NetPacketSent *sent_cb)
+void qemu_net_queue_append_iov(NetQueue *queue,
+   NetClientState *sender,
+   unsigned flags,
+   const struct iovec *iov,
+   int iovcnt,
+   NetPacketSent *sent_cb)
 {
 NetPacket *packet;
 size_t max_len = 0;
-- 
1.9.1

[Qemu-devel] [PATCH v12 04/10] net: merge qemu_deliver_packet and qemu_deliver_packet_iov

2015-09-28 Thread Yang Hongyang

qemu_deliver_packet_iov already have the compat delivery, we
can drop qemu_deliver_packet.

Signed-off-by: Yang Hongyang 
Signed-off-by: Jason Wang 
---
 include/net/net.h |  5 -
 net/net.c | 51 ---
 net/queue.c   |  6 +-
 3 files changed, 21 insertions(+), 41 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 36e5fab..7af3e15 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -152,11 +152,6 @@ void qemu_check_nic_model(NICInfo *nd, const char *model);
 int qemu_find_nic_model(NICInfo *nd, const char * const *models,
 const char *default_model);
 
-ssize_t qemu_deliver_packet(NetClientState *sender,
-unsigned flags,
-const uint8_t *data,
-size_t size,
-void *opaque);
 ssize_t qemu_deliver_packet_iov(NetClientState *sender,
 unsigned flags,
 const struct iovec *iov,
diff --git a/net/net.c b/net/net.c
index e27643d..2f939f9 100644
--- a/net/net.c
+++ b/net/net.c
@@ -599,36 +599,6 @@ static ssize_t filter_receive(NetClientState *nc,
 return filter_receive_iov(nc, direction, sender, flags, &iov, 1, sent_cb);
 }
 
-ssize_t qemu_deliver_packet(NetClientState *sender,
-unsigned flags,
-const uint8_t *data,
-size_t size,
-void *opaque)
-{
-NetClientState *nc = opaque;
-ssize_t ret;
-
-if (nc->link_down) {
-return size;
-}
-
-if (nc->receive_disabled) {
-return 0;
-}
-
-if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
-ret = nc->info->receive_raw(nc, data, size);
-} else {
-ret = nc->info->receive(nc, data, size);
-}
-
-if (ret == 0) {
-nc->receive_disabled = 1;
-}
-
-return ret;
-}
-
 void qemu_purge_queued_packets(NetClientState *nc)
 {
 if (!nc->peer) {
@@ -719,14 +689,25 @@ ssize_t qemu_send_packet_raw(NetClientState *nc, const 
uint8_t *buf, int size)
 }
 
 static ssize_t nc_sendv_compat(NetClientState *nc, const struct iovec *iov,
-   int iovcnt)
+   int iovcnt, unsigned flags)
 {
-uint8_t buffer[NET_BUFSIZE];
+uint8_t buf[NET_BUFSIZE];
+uint8_t *buffer;
 size_t offset;
 
-offset = iov_to_buf(iov, iovcnt, 0, buffer, sizeof(buffer));
+if (iovcnt == 1) {
+buffer = iov[0].iov_base;
+offset = iov[0].iov_len;
+} else {
+buffer = buf;
+offset = iov_to_buf(iov, iovcnt, 0, buffer, sizeof(buffer));
+}
 
-return nc->info->receive(nc, buffer, offset);
+if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
+return nc->info->receive_raw(nc, buffer, offset);
+} else {
+return nc->info->receive(nc, buffer, offset);
+}
 }
 
 ssize_t qemu_deliver_packet_iov(NetClientState *sender,
@@ -749,7 +730,7 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
 if (nc->info->receive_iov) {
 ret = nc->info->receive_iov(nc, iov, iovcnt);
 } else {
-ret = nc_sendv_compat(nc, iov, iovcnt);
+ret = nc_sendv_compat(nc, iov, iovcnt, flags);
 }
 
 if (ret == 0) {
diff --git a/net/queue.c b/net/queue.c
index ebbe2bb..cf8db3a 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -152,9 +152,13 @@ static ssize_t qemu_net_queue_deliver(NetQueue *queue,
   size_t size)
 {
 ssize_t ret = -1;
+struct iovec iov = {
+.iov_base = (void *)data,
+.iov_len = size
+};
 
 queue->delivering = 1;
-ret = qemu_deliver_packet(sender, flags, data, size, queue->opaque);
+ret = qemu_deliver_packet_iov(sender, flags, &iov, 1, queue->opaque);
 queue->delivering = 0;
 
 return ret;
-- 
1.9.1

[Qemu-devel] [PATCH v12 09/10] netfilter: add a netbuffer filter

2015-09-28 Thread Yang Hongyang

This filter is to buffer/release packets, this feature can be used
when using MicroCheckpointing, or other Remus like VM FT solutions, you
can also use it to simulate the network delay.

Usage:
 -netdev tap,id=bn0
 -object filter-buffer,id=f0,netdev=bn0,queue=rx,interval=1000

NOTE:
 Interval is in microseconds, it can't be omitted currently, and can't be 0.

Signed-off-by: Yang Hongyang 
Signed-off-by: Jason Wang 
---
 net/Makefile.objs   |   1 +
 net/filter-buffer.c | 187 
 qemu-options.hx |  14 
 vl.c|   6 +-
 4 files changed, 207 insertions(+), 1 deletion(-)
 create mode 100644 net/filter-buffer.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 914aec0..5fa2f97 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -14,3 +14,4 @@ common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
 common-obj-$(CONFIG_NETMAP) += netmap.o
 common-obj-y += filter.o
+common-obj-y += filter-buffer.o
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
new file mode 100644
index 000..776827e
--- /dev/null
+++ b/net/filter-buffer.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Author: Yang Hongyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "net/filter.h"
+#include "net/queue.h"
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/iov.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi-visit.h"
+#include "qom/object.h"
+
+#define TYPE_FILTER_BUFFER "filter-buffer"
+
+#define FILTER_BUFFER(obj) \
+OBJECT_CHECK(FilterBufferState, (obj), TYPE_FILTER_BUFFER)
+
+struct FilterBufferState {
+NetFilterState parent_obj;
+
+NetQueue *incoming_queue;
+uint32_t interval;
+QEMUTimer release_timer;
+};
+typedef struct FilterBufferState FilterBufferState;
+
+static void filter_buffer_flush(NetFilterState *nf)
+{
+FilterBufferState *s = FILTER_BUFFER(nf);
+
+if (!qemu_net_queue_flush(s->incoming_queue)) {
+/* Unable to empty the queue, purge remaining packets */
+qemu_net_queue_purge(s->incoming_queue, nf->netdev);
+}
+}
+
+static void filter_buffer_release_timer(void *opaque)
+{
+NetFilterState *nf = opaque;
+
+FilterBufferState *s = FILTER_BUFFER(nf);
+/*
+ * TODO: We should queue the packet if the receiver or next filter
+ * could not receive packets. But currently there's no way for the
+ * next filter or recivier to notify us that it can receive more packet.
+ * This could be done in the future.
+ */
+filter_buffer_flush(nf);
+/* Timer rearmed to fire again in s->interval microseconds. */
+timer_mod(&s->release_timer,
+  qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + s->interval);
+}
+
+/* filter APIs */
+static ssize_t filter_buffer_receive_iov(NetFilterState *nf,
+ NetClientState *sender,
+ unsigned flags,
+ const struct iovec *iov,
+ int iovcnt,
+ NetPacketSent *sent_cb)
+{
+FilterBufferState *s = FILTER_BUFFER(nf);
+
+/*
+ * We return size when buffer a packet, the sender will take it as
+ * a already sent packet, so sent_cb should not be called later.
+ *
+ * FIXME: Even if guest can't receive packet for some reasons. Filter
+ * can still accept packet until its internal queue is full.
+ * For example:
+ *   For some reason, receiver could not receive more packets
+ * (.can_receive() returns zero). Without a filter, at most one packet
+ * will be queued in incoming queue and sender's poll will be disabled
+ * unit its sent_cb() was called. With a filter, it will keep receive
+ * the packets without caring about the receiver. This is suboptimal.
+ * May need more thoughts (e.g keeping sent_cb).
+ */
+qemu_net_queue_append_iov(s->incoming_queue, sender, flags,
+  iov, iovcnt, NULL);
+return iov_size(iov, iovcnt);
+}
+
+static void filter_buffer_cleanup(NetFilterState *nf)
+{
+FilterBufferState *s = FILTER_BUFFER(nf);
+
+if (s->interval) {
+timer_del(&s->release_timer);
+}
+
+/* flush packets */
+if (s->incoming_queue) {
+filter_buffer_flush(nf);
+g_free(s->incoming_queue);
+}
+}
+
+static void filter_buffer_setup(NetFilterState *nf, Error **errp)
+{
+FilterBufferState *s = FILTER_BUFFER(nf);
+
+/*
+ * We may want to accept zero interval when VM FT solutions like MC
+ * or COLO use this filter to release packets on demand.
+ */
+if (!s->interval) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "interval",
+   "a non-zero interval");
+return;
+}
+
+s->incoming_queue = qemu_new_net_queu

[Qemu-devel] [PATCH v12 10/10] tests: add test cases for netfilter object

2015-09-28 Thread Yang Hongyang

Using qtest qmp interface to implement following cases:
1) add/remove netfilter
2) add a netfilter then delete the netdev
3) add/remove more than one netfilters
4) add more than one netfilters and then delete the netdev

Signed-off-by: Yang Hongyang 
Signed-off-by: Jason Wang 
---
 tests/.gitignore   |   1 +
 tests/Makefile |   2 +
 tests/test-netfilter.c | 200 +
 3 files changed, 203 insertions(+)
 create mode 100644 tests/test-netfilter.c

diff --git a/tests/.gitignore b/tests/.gitignore
index a607bdd..65496aa 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -49,5 +49,6 @@ test-vmstate
 test-write-threshold
 test-x86-cpuid
 test-xbzrle
+test-netfilter
 *-test
 qapi-schema/*.test.*
diff --git a/tests/Makefile b/tests/Makefile
index 4063639..bacfd8a 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -189,6 +189,7 @@ check-qtest-i386-y += tests/pc-cpu-test$(EXESUF)
 check-qtest-i386-y += tests/q35-test$(EXESUF)
 gcov-files-i386-y += hw/pci-host/q35.c
 check-qtest-i386-$(CONFIG_LINUX) += tests/vhost-user-test$(EXESUF)
+check-qtest-i386-y += tests/test-netfilter$(EXESUF)
 check-qtest-x86_64-y = $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst 
i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
@@ -435,6 +436,7 @@ tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o 
qemu-char.o qemu-timer.o
 tests/qemu-iotests/socket_scm_helper$(EXESUF): 
tests/qemu-iotests/socket_scm_helper.o
 tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o $(test-util-obj-y)
 tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o 
$(test-block-obj-y)
+tests/test-netfilter$(EXESUF): tests/test-netfilter.o $(qtest-obj-y)
 
 ifeq ($(CONFIG_POSIX),y)
 LIBS += -lutil
diff --git a/tests/test-netfilter.c b/tests/test-netfilter.c
new file mode 100644
index 000..303deb7
--- /dev/null
+++ b/tests/test-netfilter.c
@@ -0,0 +1,200 @@
+/*
+ * QTest testcase for netfilter
+ *
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Author: Yang Hongyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include "libqtest.h"
+
+/* add a netfilter to a netdev and then remove it */
+static void add_one_netfilter(void)
+{
+QDict *response;
+
+response = qmp("{'execute': 'object-add',"
+   " 'arguments': {"
+   "   'qom-type': 'filter-buffer',"
+   "   'id': 'qtest-f0',"
+   "   'props': {"
+   " 'netdev': 'qtest-bn0',"
+   " 'queue': 'rx',"
+   " 'interval': 1000"
+   "}}}");
+
+g_assert(response);
+g_assert(!qdict_haskey(response, "error"));
+QDECREF(response);
+
+response = qmp("{'execute': 'object-del',"
+   " 'arguments': {"
+   "   'id': 'qtest-f0'"
+   "}}");
+g_assert(response);
+g_assert(!qdict_haskey(response, "error"));
+QDECREF(response);
+}
+
+/* add a netfilter to a netdev and then remove the netdev */
+static void remove_netdev_with_one_netfilter(void)
+{
+QDict *response;
+
+response = qmp("{'execute': 'object-add',"
+   " 'arguments': {"
+   "   'qom-type': 'filter-buffer',"
+   "   'id': 'qtest-f0',"
+   "   'props': {"
+   " 'netdev': 'qtest-bn0',"
+   " 'queue': 'rx',"
+   " 'interval': 1000"
+   "}}}");
+
+g_assert(response);
+g_assert(!qdict_haskey(response, "error"));
+QDECREF(response);
+
+response = qmp("{'execute': 'netdev_del',"
+   " 'arguments': {"
+   "   'id': 'qtest-bn0'"
+   "}}");
+g_assert(response);
+g_assert(!qdict_haskey(response, "error"));
+QDECREF(response);
+
+/* add back the netdev */
+response = qmp("{'execute': 'netdev_add',"
+   " 'arguments': {"
+   "   'type': 'user',"
+   "   'id': 'qtest-bn0'"
+   "}}");
+g_assert(response);
+g_assert(!qdict_haskey(response, "error"));
+QDECREF(response);
+}
+
+/* add multi(2) netfilters to a netdev and then remove them */
+static void add_multi_netfilter(void)
+{
+QDict *response;
+
+response = qmp("{'execute': 'object-add',"
+   " 'arguments': {"
+   "   'qom-type': 'filter-buffer',"
+   "   'id': 'qtest-f0',"
+   "   'props': {"
+   " 'netdev': 'qtest-bn0',"
+   " 'queue': 'rx',"
+   " 'interval': 1000"
+   "}}}");
+
+g_assert(response);
+g_assert(!qdict_haskey(response, "error"));
+QDECREF(response);
+
+response = qmp("{'execute': 'object-add',"
+

[Qemu-devel] [PATCH v12 07/10] netfilter: print filter info associate with the netdev

2015-09-28 Thread Yang Hongyang

From: Yang Hongyang 

When execute "info network", print filter info also.
add a info_str member to NetFilterState, store specific filters
info.

Signed-off-by: Yang Hongyang 
Signed-off-by: Jason Wang 
---
 include/net/filter.h |  1 +
 net/filter.c | 20 
 net/net.c| 11 +++
 3 files changed, 32 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index 28a9ce2..5264c06 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -56,6 +56,7 @@ struct NetFilterState {
 char *netdev_id;
 NetClientState *netdev;
 NetFilterDirection direction;
+char info_str[256];
 QTAILQ_ENTRY(NetFilterState) next;
 };
 
diff --git a/net/filter.c b/net/filter.c
index 2d7939b..60b7bb1 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -15,6 +15,7 @@
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
+#include "qapi/string-output-visitor.h"
 
 ssize_t qemu_netfilter_receive(NetFilterState *nf,
NetFilterDirection direction,
@@ -148,6 +149,9 @@ static void netfilter_complete(UserCreatable *uc, Error 
**errp)
 NetFilterClass *nfc = NETFILTER_GET_CLASS(uc);
 int queues;
 Error *local_err = NULL;
+char *str, *info;
+ObjectProperty *prop;
+StringOutputVisitor *ov;
 
 if (!nf->netdev_id) {
 error_setg(errp, "Parameter 'netdev' is required");
@@ -181,6 +185,22 @@ static void netfilter_complete(UserCreatable *uc, Error 
**errp)
 }
 }
 QTAILQ_INSERT_TAIL(&nf->netdev->filters, nf, next);
+
+/* generate info str */
+QTAILQ_FOREACH(prop, &OBJECT(nf)->properties, node) {
+if (!strcmp(prop->name, "type")) {
+continue;
+}
+ov = string_output_visitor_new(false);
+object_property_get(OBJECT(nf), string_output_get_visitor(ov),
+prop->name, errp);
+str = string_output_get_string(ov);
+string_output_visitor_cleanup(ov);
+info = g_strdup_printf(",%s=%s", prop->name, str);
+g_strlcat(nf->info_str, info, sizeof(nf->info_str));
+g_free(str);
+g_free(info);
+}
 }
 
 static void netfilter_class_init(ObjectClass *oc, void *data)
diff --git a/net/net.c b/net/net.c
index c0ebb13..39af893 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1179,10 +1179,21 @@ void qmp_netdev_del(const char *id, Error **errp)
 
 void print_net_client(Monitor *mon, NetClientState *nc)
 {
+NetFilterState *nf;
+
 monitor_printf(mon, "%s: index=%d,type=%s,%s\n", nc->name,
nc->queue_index,
NetClientOptionsKind_lookup[nc->info->type],
nc->info_str);
+if (!QTAILQ_EMPTY(&nc->filters)) {
+monitor_printf(mon, "filters:\n");
+}
+QTAILQ_FOREACH(nf, &nc->filters, next) {
+monitor_printf(mon, "  - %s: type=%s%s\n",
+   object_get_canonical_path_component(OBJECT(nf)),
+   object_get_typename(OBJECT(nf)),
+   nf->info_str);
+}
 }
 
 RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
-- 
1.9.1

Re: [Qemu-devel] [RFC 6/6] docs: Document QMP event rate limiting

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/25/2015 08:00 AM, Markus Armbruster wrote:
>> Signed-off-by: Markus Armbruster 
>> ---
>>  docs/qmp/qmp-events.txt | 12 
>>  docs/qmp/qmp-spec.txt   |  5 +
>>  2 files changed, 17 insertions(+)
>
> Obvious rebase implied if your other patch to s,docs/qmp/,docs/, goes
> through.
>
>> 
>> diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
>> index d92cc48..d2f1ce4 100644
>> --- a/docs/qmp/qmp-events.txt
>> +++ b/docs/qmp/qmp-events.txt
>> @@ -28,6 +28,8 @@ Example:
>>  "data": { "actual": 944766976 },
>>  "timestamp": { "seconds": 1267020223, "microseconds": 435656 } }
>>  
>> +Note: this event is rate-limited.
>
> Marc-Andre's series moves all the documentation into the .json files;
> maybe we could make it easier by just listing rate-limiting in the .json
> files for now, rather than having to churn through which file(s)
> document things.

As long as qmp-events.txt exists, shouldn't we document rate-limiting
there?

> Do we want to document that VSERPORT_CHANGE rate limiting is per-"id"?

I did:

@@ -632,6 +640,8 @@ Example:
 "data": { "id": "channel0", "open": true },
 "timestamp": { "seconds": 1401385907, "microseconds": 422329 } }

+Note: this event is rate-limited separately for each "id".
+
 WAKEUP
 --

>> +++ b/docs/qmp/qmp-spec.txt
>> @@ -175,6 +175,11 @@ The format of asynchronous events is:
>>  For a listing of supported asynchronous events, please, refer to the
>>  qmp-events.txt file.
>>  
>> +Some events are rate-limited to at most one per second.  If more
>> +events arrive within one second, all but the last one are dropped, and
>> +the last one is delayed.  Rate-limiting applies to each kind of event
>> +separately.
>
> Do we also want to document that limits might be further tuned according
> to other elements of the event, with VSERPORT_CHANGE "id" as the example?

You need to interpret "each kind of event" in a sufficiently fuzzy
manner :)

Seriously, I'm open to suggestions for better language here.

Re: [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices

2015-09-28 Thread Paolo Bonzini

On 28/09/2015 10:15, Andreas Färber wrote:
> Am 28.09.2015 um 10:11 schrieb Markus Armbruster:
>> Hmm, does creating and destroying a macio object leave the memory region
>> behind?
>>
>> Paolo, is calling memory_region_init() in an instance_init() method
>> okay?
>>
>> If yes, where should they be destroyed, and how?
> 
> The counterpart to .instance_init is .instance_finalize aka uninit.

memory_region_init adds the region as a child of the object, so it need
not be destroyed anywhere.

Lo and behold, it's even documented in docs/memory.txt: "Destruction of
a memory region happens automatically when the owner object dies".

Paolo

Re: [Qemu-devel] [PATCH] Makefile: fix build when VPATH is outside GIT tree

2015-09-28 Thread Paolo Bonzini

On 25/09/2015 23:24, Steve Ellcey wrote:
> On Fri, 2015-09-18 at 13:40 +0200, Paolo Bonzini wrote:
> 
> > > The problem is due to the fact that some sub directory deps
> > > were listed against SOFTMMU_SUBDIR_RULES instead of SUBDIR_RULES,
> > > so were only processed for system emulators, not user emalutors.
> > > 
> > > Signed-off-by: Daniel P. Berrange 
> > 
> > Queued, thanks.
> 
> Is being queued a status for qemu patches or does that just mean it is
> in your personal queue to handle when you have time?

It's both.  I handle patches that fall through the cracks between
maintained areas (which includes the build system, as it has no formal
maintainer).  So if it's "queued" that means that the submitter need not
care about it anymore, as the patch is in my hands now.

I tend to flush the queue pretty often, around once a week.  The patch
is now in.

Paolo

> I would like to remove my local patch for this and go back to using the
> top-of-tree sources, but it doesn't look like this change has been
> checked in yet.  It does fix my build problem.

Re: [Qemu-devel] [RFC 0/6] Throttle event VSERPORT_CHANGE separately by "id"

2015-09-28 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Fri, Sep 25, 2015 at 4:00 PM, Markus Armbruster  wrote:
>> VSERPORT_CHANGE is emitted when the guest opens or closes a
>> virtio-serial port.  The event's member "id" identifies the port.
>>
>> When several events arrive quickly, throttling drops all but the last
>> of them.  Because of that, a QMP client must assume that *any* port
>> may have changed state when it receives a VSERPORT_CHANGE event and
>> throttling may have happened.
>>
>> Make the event more useful by throttling it for each port separately.
>>
>> Marc-André recently posted 'monitor: throttle VSERPORT_CHANGED by
>> "id"' to do the same thing.  Why am I redoing his work?  Let me
>> explain.
>>
>> In master, event throttling works per QAPIEvent.  Throttling state is
>> kept in MonitorQAPIEventState monitor_qapi_event_state[].  Going from
>> event to throttling state is a simple array subscript.
>>
>> Marc-André's series makes the code governing throttling pluggable per
>> event.  MonitorQAPIEventState gets two new members, the governor
>> function and an opaque.  It loses the actual throttling state.  That
>> one's now in new type MonitorQAPIEventPending.
>>
>> The standard governor has the opaque point to a single
>> MonitorQAPIEventPending.
>>
>> The governor for VSERPORT_CHANGED has the opaque point to a hash table
>> mapping "id" to MonitorQAPIEventPending.
>>
>> In my review, I challenged the complexity of this solution, but
>> Marc-André thinks it's justified.  That leaves me two choices: concede
>> the point, or come up with something that's actually simpler.
>>
>> My solution doesn't make anything pluggable.  Instead, I simply
>> generalize the map from event to throttling state.  Instead of using
>> just the QAPIEvent as key for looking up state, I permit additional
>> use of event-specific data.  For VSERPORT_CHANGED, I additionally use
>> "id".  monitor_qapi_event_state[] becomes a hash table.
>>
>> No callbacks, no type-punning, less code, and fewer code paths to
>> test.
>
> I dislike the approach to put all event filters in the same hashtable.
> The simple case doesn't need it, and I kept it that way. But
> certainly, it shorten a bit the code. Since it is probably not
> performance critical, it doesn't matter, so you may prefer less code.

Yes, I do prefer simpler code.  Fewer bugs, cheaper to maintain.

The performance differences aren't entirely obvious.  Sure, going
through a hash table is slower than a simple array subscript.  But your
indirections aren't free, either.  Mispredicted indirect calls, in
particular.

> I would still argue that it's cleaner and faster the way I proposed.
> ymmv

[...]

Re: [Qemu-devel] [PATCH] e1000: use alias for default model

2015-09-28 Thread Markus Armbruster

Jason Wang  writes:

> Instead of using a new type for default model (82540em), using an
> alias for this to avoid bit duplication.

Suggest to rephrase as

Instead of duplicating the "e1000-82540em" device model as "e1000",
make the latter an alias for the former.

> Cc: Markus Armbruster 
> Signed-off-by: Jason Wang 

Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications

2015-09-28 Thread Paolo Bonzini



On 26/09/2015 08:54, David Gibson wrote:
> On Fri, Sep 25, 2015 at 02:04:14PM +0200, Paolo Bonzini wrote:
>> -BEGIN PGP SIGNED MESSAGE- Hash: SHA256
>> 
>> 
>> 
>> On 25/09/2015 13:33, David Gibson wrote:
>>> 1) Is there a case where using the no-replay functions makes 
>>> sense?
>>> 
>>> I'm not sure.  I think vfio is the only user so far, so I
>>> guess that's technically a no.  I was reluctant to change the
>>> interface and semantics just off the bat, though.
>> 
>> Considering memory_region_listener does the reply, I think it's 
>> okay.
> 
> Uh.. just to be clear, are you saying I should change this so
> there's only the replaying interface?

Maybe...  The only issue is the "granularity" argument, which is
not in memory_region_register_iommu_notifier.  That makes me wonder if
the replay and registration make sense as separate operations.

What about adding a new function memory_region_iommu_replay and
separate the two phases?

>> For solving the problem that Laurent mentioned, using int128
>> seems like the easiest solution...
> 
> Maybe.  It means I have to do all the address calculation in the
> loop with an int128, then truncate it to do the actual call.  That
> seems harder to me than the overflow check I added, but I suppose
> it's conceptually similar in some ways.

Your overflow check is also okay, I wrote this before seeing the
updated version.

Paolo

Re: [Qemu-devel] [PATCH] block: disable I/O limits at the beginning of bdrv_close()

2015-09-28 Thread Alberto Garcia

On Mon 28 Sep 2015 02:18:33 AM CEST, Fam Zheng  wrote:

>> > Can this be abused? If I have a guest running in a cloud where the
>> > cloud provider has put severe throttling limits on me, but lets me
>> > hotplug to my heart's content, couldn't I just repeatedly
>> > plug/unplug the disk to get around the throttling (every time I
>> > unplug, all writes flush at full speed, then I immediately replug
>> > to start batching up a new set of writes).  In other words,
>> > shouldn't the draining still be throttled, to prevent my abuse?
>> 
>> I didn't think about this case, and I don't know how practical this
>> is, but note that bdrv_drain() (which is already at the beginning of
>> bdrv_close()) flushes the I/O queue explicitly bypassing the limits,
>> so other cases where a user can trigger a bdrv_drain() would also be
>> vulnerable to this.
>
> Yes, the issue is pre-existing. This patch only reordered things
> inside bdrv_close() so it's no worse.
>
> But indeed there is this vulnerability, maybe we should throttle the
> queue in all cases?

I would like to see a test case with numbers that show how much you can
actually bypass the I/O limits.

Berto

Re: [Qemu-devel] ARM CPU affinities

2015-09-28 Thread Peter Maydell

On 27 September 2015 at 22:28, Peter Crosthwaite
 wrote:
> Hi Peter,
>
> I am looking at this:
>
> static void arm_cpu_initfn(Object *obj)
> {
> ...
> Aff1 = cs->cpu_index / ARM_CPUS_PER_CLUSTER;
> Aff0 = cs->cpu_index % ARM_CPUS_PER_CLUSTER;
> cpu->mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
>
>
> Should we push this up to the machine model? I am trying to fix a
> machine where Aff1 of the one and only cluster is non-zero. The kernel
> SMP support barfs when Aff1 is mismatched to whats in the DTB (CPU reg
> property).
>
> I think this modulo 8 starting from 0:0 policy might be specific to mach-virt?

Yeah, it should be a CPU property if you need it to be something
different. I think we just left it as a hardcoded thing until
somebody needed it to be different.

NB that as the comment says KVM currently imposes its own numbering
anyway -- if you care about that you need to get the kernel to
support having userspace tell it about affinity numbering.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/2] pc: memhp: force gaps between DIMM's GPA

2015-09-28 Thread Igor Mammedov

On Mon, 28 Sep 2015 10:09:26 +0530
Bharata B Rao  wrote:

> On Sun, Sep 27, 2015 at 04:04:06PM +0200, Igor Mammedov wrote:
> > On Sun, 27 Sep 2015 16:11:02 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Sun, Sep 27, 2015 at 03:06:24PM +0200, Igor Mammedov wrote:
> > > > On Sun, 27 Sep 2015 13:48:21 +0300
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Fri, Sep 25, 2015 at 03:53:12PM +0200, Igor Mammedov wrote:
> > > > > > mapping DIMMs non contiguously allows to workaround
> > > > > > virtio bug reported earlier:
> > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
> > > > > > in this case guest kernel doesn't allocate buffers
> > > > > > that can cross DIMM boundary keeping each buffer
> > > > > > local to a DIMM.
> > > > > > 
> > > > > > Suggested-by: Michael S. Tsirkin 
> > > > > > Signed-off-by: Igor Mammedov 
> > > > > > ---
> > > > > > benefit of this workaround is that no guest side
> > > > > > changes are required.
> > > > > 
> > > > > That's a hard requirement, I agree.
> > > > > 
> > > > > 
> > > > > > ---
> > > > > >  hw/i386/pc.c | 4 +++-
> > > > > >  hw/i386/pc_piix.c| 3 +++
> > > > > >  hw/i386/pc_q35.c | 3 +++
> > > > > >  include/hw/i386/pc.h | 2 ++
> > > > > >  4 files changed, 11 insertions(+), 1 deletion(-)
> > > > > 
> > > > > Aren't other architectures besides PC ever affected?
> > > > > Do they all allocate all of memory contigious in HVA space?
> > > > I'm not sure about other targets I've CCed interested parties.
> > > > 
> > > > > 
> > > > > Also - does the issue only affect hotplugged memory?
> > > > Potentially it affects -numa memdev=foo, but however I've
> > > > tried I wasn't able to reproduce.
> > > > We could do it as
> > > > separate workaround later if it would affect someone
> > > > and virtio is not fixed to handle split buffers by that time.
> > > > 
> > > 
> > > You can't reproduce a crash or you can't reproduce getting
> > > contigious GPA with fragmented HVA?
> > > If you can see fragmentation that's enough to assume guest crash
> > > can be triggered, even if it doesn't with Linux.
> > I'll check it.
> > 
> > > 
> > > >  
> > > > > Can't the patch be local to pc-dimm (except maybe the
> > > > > backwards compatibility thing)?
> > > > I think decision about using gaps and its size
> > > > should be done by board and not generic pc-dimm.
> > > > 
> > > 
> > > Well virtio is generic and can be used by all boards.
> > Still pc-dimm.addr is not allocation is not part of pc-dimm
> > device. it's just helper functions that happen to live in
> > the same file source file.
> > 
> > But more importantly every target might have it's own
> > notion how it partitions hotplug address space so making
> > the same gap global might break them.
> > 
> > It's safer to enable gaps per target, I think ppc guys
> > will make their own patch on top of this to taking
> > in account their target specific and compat stuff.
> 
> I have never seen this issue that you mention at
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
> 
> in PowerPC. I have not been able to reproduce the QEMU crash with the
> commandline suggested there.
> 
> (# ./ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic
> -machine pseries -m 8G,slots=32,maxmem=32G -device
> virtio-blk-pci,drive=rootdisk -drive
> file=/home/bharata/F20-snap1,if=none,cache=none,id=rootdisk,format=qcow2
> -monitor telnet:localhost:1235,server,nowait -vga none
> -bios /home/bharata/slof/slof.bin -smp 16,maxcpus=32 -netdev
> tap,id=foo,ifname=tap0,script=/home/bharata/qemu-ifup -device
> virtio-net-pci,id=n1,netdev=foo `for i in $(seq 0 15); do echo -n
> "-object memory-backend-ram,id=m$i,size=256M -device
> pc-dimm,id=dimm$i,memdev=m$i "; done` -snapshot)
> 
> PowerPC sPAPR memory hotplug enforces memory alignment of 256MB
> for both boottime as well as hotplugged memory.
> 
> So not sure if anything other than the default gap=0 which you have
> done in this patchset for PowerPC is necessary.
The bigger initial memory and dimm sizes the less likelihood of
triggering the bug. You don't see it mostly due to luck, but it doesn't
rule out possibility of it happening in production.
So please consider turning on gaps for ppc machine.

Looking at how hotplug_mem_size is sized in hw/ppc/spapr.c it doesn't
look that just turning on gaps would work since it doesn't have a space
for alignment adjustment.

try to plug dimm device in following order:
  -m 8G,slots=2,maxmem=1256M \

  -object memory-backend-ram,id=m1,size=256M -device pc-dimm,memdev=m1 \

  -object memory-backend-file,id=hugepage1g,size=1G,file=/path/to1Gb/hugepagefs 
\
  -device pc-dimm,memdev=hugepage1g

it should fail when adding second dimm since alignment for 1Gb huge page
would be 1Gb but hotplug_mem container size is only 1256M total.

in PC target we apply following adjustment to container size:
/* size hotplug region assuming 1G page max alignment per slot */
hotplug

Re: [Qemu-devel] [PATCH v5 07/46] qapi: Don't pass pre-existing error to later call

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/24/2015 10:14 AM, Eric Blake wrote:
>

 visit_start_implicit_struct(m, (void **)obj, sizeof(FOO), &err);
 if (!err) {
 visit_type_FOO_fields(m, obj, &err);
 visit_end_implicit_struct(m, err ? NULL : &err);
 }
 error_propagate(errp, err);
>>>
>>> Hm... not sure we do this anywhere else, yet.  The ternary isn't
>>> exactly pretty, but the intent to ignore additional error is clear
>>> enough, I think.
>>>
>>> If we elect to adopt this new error handling pattern, we should perhaps
>>> document it in error.h.
>>>
>>> Third option: drop visit_end_implicit_struct()'s errp parameter.  If we
>>> find a compelling use for it, we'll put it back and solve the problem.
>>>
>> 
>> Ooh, interesting idea.  It changes the contract - but since the contract
>> isn't (yet) documented, and happens to work with existing uses without a
>> contract, it could indeed be nicer.  It would have knock-on effects to
>> 24/46 where I first try documenting the contract.
>
> Oh well.  We do have a compelling use: qmp-input-visitor.c can set errp
> during visit_end_struct() when in strict mode (basically, the mode which
> warns if the input QDict has leftover key:value pairs that were not
> consumed by visit_type_FOO() between the start/end call).  I don't know
> if visit_end_list() or visit_end_implicit_struct() care; but then we
> have the argument that it is worth keeping them consistent with
> visit_end_struct() which can indeed raise an error.
>
> One other potential alternative:  What if we split visit_end_struct()
> into two visitor functions, one that checks for success, and the other
> that is called unconditionally to clean up resources.  That is, go from:
>
> visit_start_struct(m, (void **)obj, "foo", name, sizeof(FOO), &err);
> if (!err) {
> if (*obj) {
> visit_type_FOO_fields(m, obj, errp);
> }
> visit_end_struct(m, &err);
> }
> error_propagate(errp, err);
>
> to a form where the check for leftover key/value pairs is only done on
> success with a new visit_check_struct():
>
> visit_start_struct(m, (void **)obj, "foo", name, sizeof(FOO), &err);
> if (!err) {
> if (*obj) {
> visit_type_FOO_fields(m, obj, &err);
> }
> if (!err) {
> visit_check_struct(m, &err);
> }
> visit_end_struct(m);
> }
> error_propagate(errp, err);

I think this split could help with writing safe code: in
visit_check_struct() you can rely on "no error so far", as usual.  In
visit_end_struct(), you can't, but it should be a pure cleanup function,
where that's quite normal.

Looks like we're getting drawn into visitor contract territory again.

Re: [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices

2015-09-28 Thread Thomas Huth

On 28/09/15 10:11, Markus Armbruster wrote:
> Thomas Huth  writes:
> 
>> On 25/09/15 16:17, Markus Armbruster wrote:
>>> Thomas Huth  writes:
>>>
 On 24/09/15 20:57, Markus Armbruster wrote:
> Several devices don't survive object_unref(object_new(T)): they crash
> or hang during cleanup, or they leave dangling pointers behind.
>
> This breaks at least device-list-properties, because
> qmp_device_list_properties() needs to create a device to find its
> properties.  Broken in commit f4eb32b "qmp: show QOM properties in
> device-list-properties", v2.1.  Example reproducer:
>
> $ qemu-system-aarch64 -nodefaults -display none -machine none
> -S -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4,
> "major": 2}, "package": ""}, "capabilities": []}}
> { "execute": "qmp_capabilities" }
> {"return": {}}
> { "execute": "device-list-properties", "arguments": {
> "typename": "pxa2xx-pcmcia" } }
> qemu-system-aarch64: /home/armbru/work/qemu/memory.c:1307:
> memory_region_finalize: Assertion `((&mr->subregions)->tqh_first
> == ((void *)0))' failed.
> Aborted (core dumped)
> [Exit 134 (SIGABRT)]
>
> Unfortunately, I can't fix the problems in these devices right now.
> Instead, add DeviceClass member cannot_even_create_with_object_new_yet
> to mark them:
>> ...
>  static void pxa2xx_pcmcia_register_types(void)
> diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
> index ed43d5e..e1b115d 100644
> --- a/hw/ppc/spapr_rng.c
> +++ b/hw/ppc/spapr_rng.c
> @@ -169,6 +169,11 @@ static void spapr_rng_class_init(ObjectClass *oc, 
> void *data)
>  dc->realize = spapr_rng_realize;
>  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  dc->props = spapr_rng_properties;
> +
> +/*
> + * Reason: crashes device-introspect-test for unknown reason.
> + */
> +dc->cannot_even_create_with_object_new_yet = true;
>  }

 Please don't do that! That breaks the help output from
 "-device spapr-rng,?" which should help the user to see how to use this
 device!
>>>
>>> Well, device-introspection-test makes qemu crash, with the backtrace
>>> pointing squarely to this device.  Stands to reason that device
>>> introspection could crash in normal usage, too.  Until the crash is
>>> debugged, we better disable introspection of this device.
>>>
>>> I quite agree that disabling introspection hurts users.  Just not as
>>> much as crashes :)
>>>
 I tried to debug why this device breaks the test, but the test
 environment is giving me a hard time ... how do you best hook a gdb into
 that framework, so you can trace such problems?
 Anyway, with some trial and error, I found out that it seems like the

   object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)

 in spapr_rng_instance_init() is causing the problems. Could it be that
 object_resolve_path_type is not working with the test environment?
>>>
>>> I tried to figure out why this device breaks under this test, but
>>> couldn't, so I posted with the "for unknown reason" comment.
>>
>> I've debugged this now for a while (thanks for the tip with
>> MALLOC_PERTURB, by the way!) and it seems to me that the problem is in
>> the macio object than in spapr-rng - the latter is just the victim of
>> some memory corruption caused by the first one: The
>> object_resolve_path_type() crashes while trying to go through the macio
>> object.
>>
>> So could you please add the "dc->cannot_even_create_with_object_new_yet
>> = true;" to macio_class_init() instead? ... that seems to fix the crash
>> for me, too, and is likely the better place.
> 
> Hmm.
> 
> For most of the devices my patch marks, we have a pretty good idea on
> what's wrong with them.  spapr-rng is among the exceptions.  You believe
> it's actually "the macio object".  Which one?  "macio" is abstract...
> 
> You report introspecting "spapr-rng" crashes "while trying to go through
> the macio object".  I wonder how omitting introspection of macio objects
> (that's what marking them does to this test) could affect the object
> we're going through when we crash.

I have to correct myself: It's not going through the macio object, the
problem is actually the "macio[0]" property that is created during
memory_region_init() with object_property_add_child() ... the property
points to a free()d object when the crash happens.

>> Or maybe we could get this also fixed? The problem could be the
>> memory_region_init(&s->bar, NULL, "macio", 0x8) in
>> macio_instance_init() ... is this ok here? Or does this rather have to
>> go to the realize() function instead?
> 
> Hmm, does creating and destroying a macio object leave the memory region
> behind?
> 
> Paolo, is calling memory_region_init() in an instance_init() method
> okay?

As Paolo mentioned, we likely need to pass an "owner" to
mem

Re: [Qemu-devel] [PATCH 2/2] pc: memhp: force gaps between DIMM's GPA

2015-09-28 Thread Igor Mammedov

On Sun, 27 Sep 2015 17:18:48 +0300
"Michael S. Tsirkin"  wrote:

> On Sun, Sep 27, 2015 at 04:04:06PM +0200, Igor Mammedov wrote:
> > On Sun, 27 Sep 2015 16:11:02 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Sun, Sep 27, 2015 at 03:06:24PM +0200, Igor Mammedov wrote:
> > > > On Sun, 27 Sep 2015 13:48:21 +0300
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Fri, Sep 25, 2015 at 03:53:12PM +0200, Igor Mammedov wrote:
> > > > > > mapping DIMMs non contiguously allows to workaround
> > > > > > virtio bug reported earlier:
> > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
> > > > > > in this case guest kernel doesn't allocate buffers
> > > > > > that can cross DIMM boundary keeping each buffer
> > > > > > local to a DIMM.
> > > > > > 
> > > > > > Suggested-by: Michael S. Tsirkin 
> > > > > > Signed-off-by: Igor Mammedov 
> > > > > > ---
> > > > > > benefit of this workaround is that no guest side
> > > > > > changes are required.
> > > > > 
> > > > > That's a hard requirement, I agree.
> > > > > 
> > > > > 
> > > > > > ---
> > > > > >  hw/i386/pc.c | 4 +++-
> > > > > >  hw/i386/pc_piix.c| 3 +++
> > > > > >  hw/i386/pc_q35.c | 3 +++
> > > > > >  include/hw/i386/pc.h | 2 ++
> > > > > >  4 files changed, 11 insertions(+), 1 deletion(-)
> > > > > 
> > > > > Aren't other architectures besides PC ever affected?
> > > > > Do they all allocate all of memory contigious in HVA space?
> > > > I'm not sure about other targets I've CCed interested parties.
> > > > 
> > > > > 
> > > > > Also - does the issue only affect hotplugged memory?
> > > > Potentially it affects -numa memdev=foo, but however I've
> > > > tried I wasn't able to reproduce.
> > > > We could do it as
> > > > separate workaround later if it would affect someone
> > > > and virtio is not fixed to handle split buffers by that time.
> > > > 
> > > 
> > > You can't reproduce a crash or you can't reproduce getting
> > > contigious GPA with fragmented HVA?
> > > If you can see fragmentation that's enough to assume guest crash
> > > can be triggered, even if it doesn't with Linux.
> > I'll check it.
> > 
> > > 
> > > >  
> > > > > Can't the patch be local to pc-dimm (except maybe the
> > > > > backwards compatibility thing)?
> > > > I think decision about using gaps and its size
> > > > should be done by board and not generic pc-dimm.
> > > > 
> > > 
> > > Well virtio is generic and can be used by all boards.
> > Still pc-dimm.addr is not allocation is not part of pc-dimm
> > device. it's just helper functions that happen to live in
> > the same file source file.
> > 
> > But more importantly every target might have it's own
> > notion how it partitions hotplug address space so making
> > the same gap global might break them.
> 
> That's why each target has it's own alignment, no?
yep, it's target specific how and where it would map hotplugged
memory.

> 
> > It's safer to enable gaps per target, I think ppc guys
> > will make their own patch on top of this to taking
> > in account their target specific and compat stuff.
> 
> Sure, it can be split up but we need to address at least
> the kvm platforms.
I've put Bharata in the loop, so that if issue is applicable to PPC it
would be fixed.

> > > 
> > > 
> > > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > > > index 91d134c..c462c4e 100644
> > > > > > --- a/hw/i386/pc.c
> > > > > > +++ b/hw/i386/pc.c
> > > > > > @@ -1629,6 +1629,7 @@ static void
> > > > > > pc_dimm_plug(HotplugHandler *hotplug_dev,
> > > > > > HotplugHandlerClass *hhc; Error *local_err = NULL;
> > > > > >  PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> > > > > > +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> > > > > >  PCDIMMDevice *dimm = PC_DIMM(dev);
> > > > > >  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> > > > > >  MemoryRegion *mr = ddc->get_memory_region(dimm);
> > > > > > @@ -1644,7 +1645,8 @@ static void
> > > > > > pc_dimm_plug(HotplugHandler *hotplug_dev, goto out;
> > > > > >  }
> > > > > >  
> > > > > > -pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr,
> > > > > > align, 0, &local_err);
> > > > > > +pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr,
> > > > > > align,
> > > > > > +pcmc->inter_dimm_gap, &local_err);
> > > > > >  if (local_err) {
> > > > > >  goto out;
> > > > > >  }
> > > > > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> > > > > > index 3ffb05f..3165667 100644
> > > > > > --- a/hw/i386/pc_piix.c
> > > > > > +++ b/hw/i386/pc_piix.c
> > > > > > @@ -457,11 +457,13 @@ static void
> > > > > > pc_xen_hvm_init(MachineState *machine) 
> > > > > >  static void pc_i440fx_machine_options(MachineClass *m)
> > > > > >  {
> > > > > > +PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> > > > > >  m->family = "pc_piix";
> > > > > >  m->desc = "Standard PC (i440FX + PIIX, 1996)";
> > > > > >  m->hot_add_cpu = pc_hot_add_cpu;
> > > > > >  m->defaul

[Qemu-devel] Agenda items for today's MTTCG call

2015-09-28 Thread Alex Bennée

User-agent: mu4e 0.9.12; emacs 24.5.50.4
Hi,

It's been a while since we last synced up via phone call so could I add
the following items for discussion:

1. Status of Alvise and Fred's trees
2. Any cross-pollination ideas between Alvise's LL/SC work and Emilio's
series

Emilio's series has a number of interesting features to it, not least
that it works with softmmu and linux-user. However I do prefer the TCG
op based approach of Alvise's tree. Is there any scope for getting the
best of both worlds?

3. Memory barriers

I now have working barrier tests which fail on real hardware when
barriers don't exist. I'll be sending a new series soon. Again Emilio's
tree introduces some barrier TCG ops.

4. Other outstanding work

I've had a couple of people prod me with offers of help so it would be
nice to see what other tasks are currently pending to avoid too much
duplication of work.

Does anyone else have anything then want to bring up?

-- 
Alex Bennée

Re: [Qemu-devel] [PATCH v5 07/46] qapi: Don't pass pre-existing error to later call

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/24/2015 08:58 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> Due to the existing semantics of the error_set() family,
>>> generated sequences in the qapi visitors such as:
>>>
>>> visit_start_implicit_struct(m, (void **)obj, sizeof(FOO), &err);
>>> if (!err) {
>>> visit_type_FOO_fields(m, obj, errp);
>>> visit_end_implicit_struct(m, &err);
>>> }
>>> error_propagate(errp, err);
>> 
>> Indentation seems off.  Intentional?
>> 
>>>
>>> are risky: if visit_type_FOO_fields() sets errp, and then
>>> visit_end_implicit_struct() also encounters an error, the
>>> attempt to overwrite the first error will cause an abort().
>
> I didn't even read error_propagate()'s contract correctly. It
> specifically specifies that if errp is already set, then err is ignored.

Yes.  Differs from error_set() & friends, where the destination must not
contain an error.  The inconsistency is a bit ugly.  Perhaps it adds
enough convenience to make it worthwhile.  Anyway, changing it now would
be a huge bother.

Note that GLib's g_propagate_error() requires the destination not to
contain an error.

> So the above sequence is actually just fine, because only the following
> paths exist:
>
> visit_start_implicit_struct() fails into &err,
> error_propagate() passes err into caller's errp
>
> visit_start_implicit_struct() succeeds,
> visit_type_FOO_fields() fails into caller's errp,
> visit_end_implicit_struct() succeeds,
> error_propagate() does nothing
>
> visit_start_implicit_struct() succeeds,
> visit_type_FOO_fields() fails into caller's errp,
> visit_end_implicit_struct() fails int &err,
> error_propagate() does nothing (errp trumps err)

Yes, but visit_end_implicit_struct() gets called with an errp argument
that may already contain an error, and that's unusual.  Prominent notice
in the contract required.

> visit_start_implicit_struct() succeeds,
> visit_type_FOO_fields() succeeds,
> visit_end_implicit_struct() fails int &err,
> error_propagate() passes err into caller's errp
>
> visit_start_implicit_struct() succeeds,
> visit_type_FOO_fields() succeeds,
> visit_end_implicit_struct() succeeds,
> error_propagate() does nothing
>
>
> As such, I'm revisiting if anything is needed at all, other than making
> the various visit_start/visit_end patterns consistent with one another
> using existing idioms, and it may turn out we don't need the ternary
> after all.

Re: [Qemu-devel] [PATCH v5 05/46] qapi: Test use of 'number' within alternates

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> On 09/24/2015 10:29 AM, Markus Armbruster wrote:
>
> +
> +/* FIXME: Order of alternate should not affect semantics */

 Inhowfar does it affect semantics?  Or asked differently: what exactly
 is wrong with this test now?

> +v = visitor_input_test_init(data, "42");
> +visit_type_AltThree(v, &three, NULL, &error_abort);
> +g_assert_cmpint(three->kind, ==, ALT_THREE_KIND_N);
> +g_assert_cmpfloat(three->n, ==, 42);
> +qapi_free_AltThree(three);
> +one = NULL;
>>>
>>>
>>> AltTwo and AltThree are ostensibly the same struct (two branches, one
>>> for 'str' and one for 'number', just in a different order), but they
>>> parsed differently (AltTwo failed, AltThree succeeded).  The bug is
>>> fixed later when the order of the branch declaration no longer impacts
>>> the result of the parse.
>> 
>> Then nothing is wrong with this test case, and the FIXME doesn't belong
>> here.
>
> Actually, the test for AltThree succeeds only by accident. There are two
> bugs at play; when I fix the first bug (order shouldn't matter: AltTwo
> and AltThree should parse identically), then the second bug is finally
> exposed (integers aren't being parsed as numbers, in either AltTwo or
> AltThree). But I can certainly rework the FIXMEs both here and on the
> first fix (19/46) to make it more obvious what the second fix (20/46) is
> good for.

Yes, please.  I find the FIXME quoted above confusing, because it makes
me look for problems exposed by this test when there are none.

Re: [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices

2015-09-28 Thread Peter Maydell

On 28 September 2015 at 10:17, Thomas Huth  wrote:
> By the way, there are some more spots like this in the code, e.g. in
> pxa2xx_fir_instance_init() in hw/arm/pxa2xx.c ...

That's an oversight from when it was converted to qdev, I think,
and could be fixed.

-- PMM

Re: [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane

2015-09-28 Thread Stefan Hajnoczi

On Fri, Sep 11, 2015 at 02:22:14PM +0200, Kevin Wolf wrote:
> Am 11.09.2015 um 13:46 hat Fam Zheng geschrieben:
> > On Fri, 09/11 12:39, Kevin Wolf wrote:
> > > Am 29.07.2015 um 06:42 hat Fam Zheng geschrieben:
> > > > v2: Switch to disable/enable model. [Paolo]
> > > > 
> > > > Most existing nested aio_poll()'s in block layer are inconsiderate of
> > > > dispatching potential new r/w requests from ioeventfds and nbd exports, 
> > > > which
> > > > might result in responsiveness issues (e.g. bdrv_drain_all will not 
> > > > return when
> > > > new requests keep coming), or even wrong semantics (e.g. 
> > > > qmp_transaction cannot
> > > > enforce atomicity due to aio_poll in bdrv_drain_all).
> > > > 
> > > > Previous attampts to address this issue include new op blocker[1], 
> > > > bdrv_lock[2]
> > > > and nested AioContext (patches not posted to qemu-devel).
> > > > 
> > > > This approach is based on the idea proposed by Paolo Bonzini. The 
> > > > original idea
> > > > is introducing "aio_context_disable_client / aio_context_enable_client 
> > > > to
> > > > filter AioContext handlers according to the "client", e.g.
> > > > AIO_CLIENT_DATAPLANE (ioeventfd), AIO_CLIENT_PROTOCOL, 
> > > > AIO_CLIENT_NBD_SERVER,
> > > > AIO_CLIENT_CONTEXT, ... Extend aio_set_{event_notifier,fd}_handler to 
> > > > pass a
> > > > client (type)." 
> > > > 
> > > > After this series, block layer aio_poll() will only process those 
> > > > "protocol"
> > > > fds that are used in block I/O, plus the ctx->notifier for 
> > > > aio_notify();  other
> > > > aio_poll()'s keep unchanged.
> > > > 
> > > > The biggest advantage over approaches [1] and [2] is, no change is 
> > > > needed in
> > > > virtio-{blk,scsi}-dataplane code, also this doesn't depend on 
> > > > converting QMP to
> > > > coroutines.
> > > 
> > > It seems that I haven't replied on the mailing list yet, even though I
> > > think I already said this in person at KVM Forum: This series fixes only
> > > a special case of the real problem, which is that bdrv_drain/all at a
> > > single point doesn't make a lot of sense, but needs to be replaced by a
> > > whole section with exclusive access, like a bdrv_drained_begin/end pair.
> > > 
> > > To be clear: Anything that works with types of users instead of
> > > individual users is bound to fall short of being a complete solution. I
> > > don't prefer partial solutions when we know there is a bigger problem.
> > > 
> > > This series addresses your immediate need of protecting against new data
> > > plane requests, which it arguably achieves. The second case I always
> > > have in mind is Berto's case where he has multiple streaming block jobs
> > > in the same backing file chain [1].
> > > 
> > > This involves a bdrv_reopen() of the target BDS to make it writable, and
> > > bdrv_reopen() uses bdrv_drain_all() so drivers don't have to cope with
> > > running requests while reopening themselves. It can however involve
> > > nested event loops for synchronous operations like bdrv_flush(), and if
> > > those can process completions of block jobs, which can respond by doing
> > > anything to the respective node, things can go wrong.
> > 
> > Just to get a better idea of bdrv_drained_begin/end, could you explain how 
> > to
> > use the pair to fix the above problem?
> 
> How to use it is easy part: In bdrv_reopen_multiple(), you would replace
> the existing bdrv_drain_all() with begin and you would add the
> corresponding end right before the return statement.
> 
> > > You don't solve this by adding client types (then problematic request
> > > would be PROTOCOL in your proposal and you can never exclude that), but
> > > you really need to have bdrv_drained_being/end pairs, where only
> > > requests issued in between are processed and everything else waits.
> > 
> > What do you mean by "only requests issued in between are processed"? Where 
> > are
> > the requests from?
> 
> Generally speaking, you would have code that looks like this:
> 
> bdrv_drain_begin()
> ...
> bdrv_something_synchronous()
> ...
> bdrv_drain_end()
> 
> You want to process everything that is necessary for completing
> bdrv_something_synchronous(), but nothing else.
> 
> The trickier question is how to implement this. I know that it's much
> easier to say that your series doesn't work than actually proposing
> something else that works...
> 
> One relatively obvious answer we found when we discussed this a while
> back was some kind of a recursive CoRwLock (reader = in-flight request;
> writer = drained section), but that requires obviously that you're
> running in a coroutine if you want to do something with a drained
> request queue.

At one point I thought about converting vl.c:main() to run everything in
a coroutine.  It would allow us to eliminate synchronous operations
completely.

The problem is that it's quite a big change that requires protecting
formerly synchronous operations against concurrency.

Stefan

Re: [Qemu-devel] [PATCH v5 45/46] net: Complete qapi-fication of netdev_add

2015-09-28 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 23/09/2015 18:37, Eric Blake wrote:
 
 Is this still type-unsafe like the old netdev_add (e.g. accepts
 a string for an integer)?
>> I did not address that yet; it still needs further patches to
>> accept an integer as a port number. I can investigate what
>> additional patches are needed along those lines, while still
>> preserving back-compat to previously-accepted QMP command lines (it
>> may require the use of an 'alternate' type that accepts both int
>> and string).  The main goal here was that the command line is
>> unchanged, and that now the QMP command is introspectable, even if
>> what introspection shows is ugly types.
>
> This in fact is a laudable goal.  However, while changing the command to
>
> { 'command': 'netdev_add', 'data': 'Netdev', 'box': true, 'gen': false }
>
> for better introspection, you should keep 'gen':'false' and the manual
> implementation based on qemu_opts_from_qdict, otherwise you break
> backwards-compatibility.

Non sequitur :)

We need to stay sufficiently backwards compatible somehow.  'gen' false
is one possible solution.  Liberal use of alternate types could be
another.  A special input visitor mode could be a third.

Since my review cursor is >30 patches behind this one, I don't have an
opinion, yet.

Re: [Qemu-devel] [PATCH 01/18] iothread: release iothread around aio_poll

2015-09-28 Thread Stefan Hajnoczi

On Thu, Aug 06, 2015 at 03:35:59PM +0200, Paolo Bonzini wrote:
> This is the first step towards having fine-grained critical sections in
> dataplane threads, which resolves lock ordering problems between
> address_space_* functions (which need the BQL when doing MMIO, even
> after we complete RCU-based dispatch) and the AioContext.
> 
> Because AioContext does not use contention callbacks anymore, the
> unit test has to be changed.
> 
> Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and
> then reverted.

commit da5e1de95bb235330d7724316e7a29239d1359d5
Author: Stefan Hajnoczi 
Date:   Wed Jun 3 10:15:33 2015 +0100

Revert "iothread: release iothread around aio_poll"

This reverts commit a0710f7995f914e3044e5899bd8ff6c43c62f916.

In qemu-devel email message <556dbf87.2020...@de.ibm.com>, Christian
Borntraeger writes:

  Having many guests all with a kernel/ramdisk (via -kernel) and
  several null block devices will result in hangs. All hanging
  guests are in partition detection code waiting for an I/O to return
  so very early maybe even the first I/O.

  Reverting that commit "fixes" the hangs.

Reverting this commit for the 2.4 release.  More time is needed to
investigate and correct this patch.

Did we ever find the root cause for hangs caused by this patch?

[Qemu-devel] [PATCH v2 1/2] memhp: extend address auto assignment to support gaps

2015-09-28 Thread Igor Mammedov

setting gap to non 0 value will make sparse DIMM
address auto allocation, leaving gaps between
a new DIMM address and preceeding existing DIMM.

Signed-off-by: Igor Mammedov 
---
 hw/i386/pc.c |  2 +-
 hw/mem/pc-dimm.c | 13 +++--
 hw/ppc/spapr.c   |  2 +-
 include/hw/mem/pc-dimm.h |  7 ---
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 461c128..91d134c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1644,7 +1644,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 goto out;
 }
 
-pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err);
+pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, 0, &local_err);
 if (local_err) {
 goto out;
 }
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index bb04862..1e5a8ea 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -32,7 +32,8 @@ typedef struct pc_dimms_capacity {
 } pc_dimms_capacity;
 
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
- MemoryRegion *mr, uint64_t align, Error **errp)
+ MemoryRegion *mr, uint64_t align, uint64_t gap,
+ Error **errp)
 {
 int slot;
 MachineState *machine = MACHINE(qdev_get_machine());
@@ -48,7 +49,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState 
*hpms,
 
 addr = pc_dimm_get_free_addr(hpms->base,
  memory_region_size(&hpms->mr),
- !addr ? NULL : &addr, align,
+ !addr ? NULL : &addr, align, gap,
  memory_region_size(mr), &local_err);
 if (local_err) {
 goto out;
@@ -287,8 +288,8 @@ static int pc_dimm_built_list(Object *obj, void *opaque)
 
 uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
uint64_t address_space_size,
-   uint64_t *hint, uint64_t align, uint64_t size,
-   Error **errp)
+   uint64_t *hint, uint64_t align, uint64_t gap,
+   uint64_t size, Error **errp)
 {
 GSList *list = NULL, *item;
 uint64_t new_addr, ret = 0;
@@ -333,13 +334,13 @@ uint64_t pc_dimm_get_free_addr(uint64_t 
address_space_start,
 goto out;
 }
 
-if (ranges_overlap(dimm->addr, dimm_size, new_addr, size)) {
+if (ranges_overlap(dimm->addr, dimm_size, new_addr, size + gap)) {
 if (hint) {
 DeviceState *d = DEVICE(dimm);
 error_setg(errp, "address range conflicts with '%s'", d->id);
 goto out;
 }
-new_addr = QEMU_ALIGN_UP(dimm->addr + dimm_size, align);
+new_addr = QEMU_ALIGN_UP(dimm->addr + dimm_size + gap, align);
 }
 }
 ret = new_addr;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a9b5f2a..a40ada5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2096,7 +2096,7 @@ static void spapr_memory_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 goto out;
 }
 
-pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
+pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, 0, &local_err);
 if (local_err) {
 goto out;
 }
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index d83bf30..aa784f8 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -83,15 +83,16 @@ typedef struct MemoryHotplugState {
 
 uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
uint64_t address_space_size,
-   uint64_t *hint, uint64_t align, uint64_t size,
-   Error **errp);
+   uint64_t *hint, uint64_t align, uint64_t gap,
+   uint64_t size, Error **errp);
 
 int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
 int qmp_pc_dimm_device_list(Object *obj, void *opaque);
 uint64_t pc_existing_dimms_capacity(Error **errp);
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
- MemoryRegion *mr, uint64_t align, Error **errp);
+ MemoryRegion *mr, uint64_t align, uint64_t gap,
+ Error **errp);
 void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
MemoryRegion *mr);
 #endif
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 0/2] pc: memhp: enforce gaps between DIMMs

2015-09-28 Thread Igor Mammedov

v2:
   make inter_dimm_gap a boolean and inster gap in 1 byte
   instead of 2Mb, due to alignment that gap would be extended
   up to natural backend alignment value.

it's a simplier way suggested by Michael S. Tsirkin
to workaround virtio bug reported earlier:
http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
where virtio can't handle buffer that crosses border
between 2 DIMM's (i.e. 2 MemoryRegions).

idea is to leave gaps between DIMMs, making their GPAs
non contiguous, which effectively forces kmalloc
to not use DIMM if buffer doesn't fit inside of it.

Simpler reproducer:
 qemu-system-x86_64 -enable-kvm  -m 256,slots=250,maxmem=32G  -drive 
if=virtio,file=/dev/slow/rhel72  \
  `for i in $(seq 0 58); do echo -n "-object memory-backend-ram,id=m$i,size=10M 
-device pc-dimm,id=dimm$i,memdev=m$i "; done` \
   -nodefaults -snapshot -serial stdio -nographic -M pc-i440fx-2.4

if guest manages to boot just login and do:
  dd if=/dev/vda of=/dev/null bs=128M

it will crash QEMU in 99% cases with following message:
  qemu-system-x86_64: virtio: error trying to map MMIO memory

Igor Mammedov (2):
  memhp: extend address auto assignment to support gaps
  pc: memhp: force gaps between DIMM's GPA

 hw/i386/pc.c |  4 +++-
 hw/i386/pc_piix.c|  3 +++
 hw/i386/pc_q35.c |  3 +++
 hw/mem/pc-dimm.c | 13 +++--
 hw/ppc/spapr.c   |  2 +-
 include/hw/i386/pc.h |  1 +
 include/hw/mem/pc-dimm.h |  7 ---
 7 files changed, 22 insertions(+), 11 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v2 2/2] pc: memhp: force gaps between DIMM's GPA

2015-09-28 Thread Igor Mammedov

mapping DIMMs non contiguously allows to workaround
virtio bug reported earlier:
http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
in this case guest kernel doesn't allocate buffers
that can cross DIMM boundary keeping each buffer
local to a DIMM.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Igor Mammedov 
---
v2:
  make inter_dimm_gap boolean

benefit of this workaround is that no guest side
changes are required.
---
 hw/i386/pc.c | 4 +++-
 hw/i386/pc_piix.c| 3 +++
 hw/i386/pc_q35.c | 3 +++
 include/hw/i386/pc.h | 1 +
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 91d134c..fc97463 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1629,6 +1629,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 HotplugHandlerClass *hhc;
 Error *local_err = NULL;
 PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 PCDIMMDevice *dimm = PC_DIMM(dev);
 PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
 MemoryRegion *mr = ddc->get_memory_region(dimm);
@@ -1644,7 +1645,8 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 goto out;
 }
 
-pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, 0, &local_err);
+pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align,
+pcmc->inter_dimm_gap ? 1 : 0, &local_err);
 if (local_err) {
 goto out;
 }
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 3ffb05f..ece735d 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -457,11 +457,13 @@ static void pc_xen_hvm_init(MachineState *machine)
 
 static void pc_i440fx_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 m->family = "pc_piix";
 m->desc = "Standard PC (i440FX + PIIX, 1996)";
 m->hot_add_cpu = pc_hot_add_cpu;
 m->default_machine_opts = "firmware=bios-256k.bin";
 m->default_display = "std";
+pcmc->inter_dimm_gap = true;
 }
 
 static void pc_i440fx_2_5_machine_options(MachineClass *m)
@@ -482,6 +484,7 @@ static void pc_i440fx_2_4_machine_options(MachineClass *m)
 m->alias = NULL;
 m->is_default = 0;
 pcmc->broken_reserved_end = true;
+pcmc->inter_dimm_gap = false;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_4);
 }
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 1b7d3b6..ca73f25 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -360,6 +360,7 @@ static void pc_compat_1_4(MachineState *machine)
 
 static void pc_q35_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 m->family = "pc_q35";
 m->desc = "Standard PC (Q35 + ICH9, 2009)";
 m->hot_add_cpu = pc_hot_add_cpu;
@@ -368,6 +369,7 @@ static void pc_q35_machine_options(MachineClass *m)
 m->default_display = "std";
 m->no_floppy = 1;
 m->no_tco = 0;
+pcmc->inter_dimm_gap = true;
 }
 
 static void pc_q35_2_5_machine_options(MachineClass *m)
@@ -385,6 +387,7 @@ static void pc_q35_2_4_machine_options(MachineClass *m)
 pc_q35_2_5_machine_options(m);
 m->alias = NULL;
 pcmc->broken_reserved_end = true;
+pcmc->inter_dimm_gap = false;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_4);
 }
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ab5413f..c13e91d 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -60,6 +60,7 @@ struct PCMachineClass {
 
 /*< public >*/
 bool broken_reserved_end;
+bool inter_dimm_gap;
 HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
DeviceState *dev);
 };
-- 
1.8.3.1

Re: [Qemu-devel] [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-28 Thread Stefano Stabellini

No, unfortunately it is not possible at this stage of the release cycle.
But users can still use QEMU 2.5 (as soon as it is released, which
should be in a couple of months) with Xen 4.6 as there is not a strong
tie between QEMU releases and Xen releases.

On Mon, 28 Sep 2015, Chen, Tiejun wrote:
> On 9/22/2015 12:03 AM, Stefano Stabellini wrote:
> > It is going to be in QEMU 2.5 and qemu-xen 4.7.
> 
> Thanks for your reply.
> 
> Do we have any possibility of just merging this series into qemu-xen 4.6? We
> really want to support IGD passthrough on xen 4.6 if possible :)
> 
> Thanks
> Tiejun
> 
> > 
> > On Mon, 21 Sep 2015, Chen, Tiejun wrote:
> > > Stefano,
> > > 
> > > I have two questions,
> > > 
> > > #1. Which qemu version is this igd stuff going into? 2.6?
> > > #2. Is this igd stuff going into qemu-xen inside xen? Any plan to go into
> > > xen
> > > 4.6?
> > > 
> > > Thanks
> > > Tiejun
> > > 
> > > On 9/9/2015 1:21 AM, Stefano Stabellini wrote:
> > > > The following changes since commit
> > > 8611280505119e296757a60711a881341603fa5a:
> > > >
> > > >target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200)
> > > >
> > > > are available in the git repository at:
> > > >
> > > >git://xenbits.xen.org/people/sstabellini/qemu-dm.git
> > > > tags/xen-2015-09-08-tag
> > > >
> > > > for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4:
> > > >
> > > >xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings.
> > > > (2015-09-08 15:21:56 +)
> > > >
> > > > 
> > > > Xen branch xen-2015-09-08
> > > >
> > > > 
> > > > Don Slutz (1):
> > > >xen-hvm: Add trace to ioreq
> > > >
> > > > Jan Beulich (1):
> > > >xen/HVM: atomically access pointers in bufioreq handling
> > > >
> > > > Konrad Rzeszutek Wilk (7):
> > > >xen-hvm: When using xc_domain_add_to_physmap also include errno
> > > when
> > > > reporting
> > > >xen/pt: Update comments with proper function name.
> > > >xen/pt: Make xen_pt_msi_set_enable static
> > > >xen/pt: xen_host_pci_config_read returns -errno, not -1 on
> > > failure
> > > >xen: use errno instead of rc for xc_domain_add_to_physmap
> > > >xen/pt/msi: Add the register value when printing logging and
> > > error
> > > > messages
> > > >xen/pt: Use XEN_PT_LOG properly to guard against compiler
> > > warnings.
> > > >
> > > > Michael S. Tsirkin (1):
> > > >i440fx: make types configurable at run-time
> > > >
> > > > Tiejun Chen (9):
> > > >pc_init1: pass parameters just with types
> > > >piix: create host bridge to passthrough
> > > >hw/pci-assign: split pci-assign.c
> > > >xen, gfx passthrough: basic graphics passthrough support
> > > >xen, gfx passthrough: retrieve VGA BIOS to work
> > > >igd gfx passthrough: create a isa bridge
> > > >xen, gfx passthrough: register a isa bridge
> > > >xen, gfx passthrough: register host bridge specific to
> > > passthrough
> > > >xen, gfx passthrough: add opregion mapping
> > > >
> > > >   configure |   28 +
> > > >   hw/core/machine.c |   20 +++
> > > >   hw/i386/Makefile.objs |1 +
> > > >   hw/i386/kvm/pci-assign.c  |   82 ++---
> > > >   hw/i386/pc_piix.c |  139 -
> > > >   hw/i386/pci-assign-load-rom.c |   93 ++
> > > >   hw/pci-host/piix.c|   91 +-
> > > >   hw/xen/Makefile.objs  |1 +
> > > >   hw/xen/xen-host-pci-device.c  |5 +
> > > >   hw/xen/xen-host-pci-device.h  |1 +
> > > >   hw/xen/xen_pt.c   |   42 ++-
> > > >   hw/xen/xen_pt.h   |   22 +++-
> > > >   hw/xen/xen_pt_config_init.c   |   59 -
> > > >   hw/xen/xen_pt_graphics.c  |  272
> > > > +
> > > >   hw/xen/xen_pt_msi.c   |2 +-
> > > >   include/hw/boards.h   |1 +
> > > >   include/hw/i386/pc.h  |9 +-
> > > >   include/hw/pci/pci-assign.h   |   27 
> > > >   include/hw/xen/xen_common.h   |   34 +-
> > > >   qemu-options.hx   |3 +
> > > >   trace-events  |7 ++
> > > >   vl.c  |   10 ++
> > > >   xen-hvm.c |   55 +++--
> > > >   23 files changed, 891 insertions(+), 113 deletions(-)
> > > >   create mode 100644 hw/i386/pci-assign-load-rom.c
> > > >   create mode 100644 hw/xen/xen_pt_graphics.c
> > > >   create mode 100644 include/hw/pci/pci-assign.h
> > > >
> > > > ___
> > > > Xen-devel mailing list
> > > > xen-de...@lists.xen.org
> > > > http://lists.xen.org/xen-devel
> > > >
> > > 
> > 
> > 
>

[Qemu-devel] [PATCH v2 1/2] target-arm: Fix GDB breakpoint handling

2015-09-28 Thread Sergey Fedorov

GDB breakpoints have higher priority so they have to be checked first.
Should GDB breakpoint match, just return from the debug exception
handler.

Signed-off-by: Sergey Fedorov 
---
 target-arm/op_helper.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 1425a1d..1d4d8cb 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -897,6 +897,15 @@ void arm_debug_excp_handler(CPUState *cs)
 }
 }
 } else {
+CPUBreakpoint *bp;
+uint64_t pc = is_a64(env) ? env->pc : env->regs[15];
+
+QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
+if (bp->pc == pc && !(bp->flags & BP_CPU)) {
+return;
+}
+}
+
 if (check_breakpoints(cpu)) {
 bool same_el = (arm_debug_target_el(env) == arm_current_el(env));
 if (extended_addresses_enabled(env)) {
-- 
1.9.1

[Qemu-devel] [PATCH v2 0/2] target-arm: Fix breakpoint handling

2015-09-28 Thread Sergey Fedorov

This series is intended to fix ARM breakpoint emulation misbehavior.
QEMU hangs when QEMU breakpoint fires but it does not pass additional
architectural checks in ARM CPU debug exception handler. For details,
please see individual patches. The most relevant parts of the original
discussion about ARM breakpoint emulation misbehavior can be found at:
https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02715.html

Changes in v2:
 * The initial commit split into two parts
 * Extra block eliminated in arm_debug_excp_handler()
 * Just one instruction always translated in case of breakpoint PC match

Sergey Fedorov (2):
  target-arm: Fix GDB breakpoint handling
  target-arm: Fix CPU breakpoint handling

 target-arm/helper.h|  2 ++
 target-arm/op_helper.c | 36 ++--
 target-arm/translate-a64.c | 14 --
 target-arm/translate.c | 13 -
 4 files changed, 44 insertions(+), 21 deletions(-)

-- 
1.9.1

[Qemu-devel] [PATCH] spapr: add a default rng device

2015-09-28 Thread Greg Kurz

A recent patch by Thomas Huth brought a new spapr-rng pseudo-device to
provide high-quality random numbers to guests. The device may either be
backed by a "RngBackend" or the in-kernel implementation of the H_RANDOM
hypercall.

Since modern POWER8 based servers always provide a hardware rng, it makes
sense to create a spapr-rng device with use-kvm=true by default when it
is available.

Of course we want the user to have full control on how the rng is handled.
The default device WILL NOT be created in the following cases:
- the -nodefaults option was passed
- a spapr-rng device was already passed on the command line

The default device is created at reset time to ensure devices specified on
the command line have been created.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c   |   17 +
 hw/ppc/spapr_rng.c   |2 +-
 target-ppc/kvm.c |9 +
 target-ppc/kvm_ppc.h |6 ++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7f4f196e53e5..ee048ecffd0c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1059,6 +1059,14 @@ static int spapr_check_htab_fd(sPAPRMachineState *spapr)
 return rc;
 }
 
+static void spapr_rng_create(void)
+{
+Object *rng = object_new(TYPE_SPAPR_RNG);
+
+object_property_set_bool(rng, true, "use-kvm", &error_abort);
+object_property_set_bool(rng, true, "realized", &error_abort);
+}
+
 static void ppc_spapr_reset(void)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
@@ -1082,6 +1090,15 @@ static void ppc_spapr_reset(void)
 spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE;
 spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
 
+/* Create a rng device if the user did not provide it already and
+ * KVM has hwrng support.
+ */
+if (defaults_enabled() &&
+kvmppc_hwrng_present() &&
+!object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
+spapr_rng_create();
+}
+
 /* Load the fdt */
 spapr_finalize_fdt(spapr, spapr->fdt_addr, spapr->rtas_addr,
spapr->rtas_size);
diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
index ed43d5e04221..ee5af302bd4d 100644
--- a/hw/ppc/spapr_rng.c
+++ b/hw/ppc/spapr_rng.c
@@ -114,7 +114,7 @@ static void spapr_rng_realize(DeviceState *dev, Error 
**errp)
 sPAPRRngState *rngstate = SPAPR_RNG(dev);
 
 if (rngstate->use_kvm) {
-if (kvmppc_enable_hwrng() == 0) {
+if (kvmppc_hwrng_present() && kvmppc_enable_hwrng() == 0) {
 return;
 }
 /*
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index e641680fb146..084bb034f1fd 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2490,11 +2490,12 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 return data & 0x;
 }
 
-int kvmppc_enable_hwrng(void)
+bool kvmppc_hwrng_present(void)
 {
-if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
-return -1;
-}
+return kvm_enabled() && kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG);
+}
 
+int kvmppc_enable_hwrng(void)
+{
 return kvmppc_enable_hcall(kvm_state, H_RANDOM);
 }
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 470f6d62f7bb..a76338c9aa16 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -54,6 +54,7 @@ void kvmppc_hash64_free_pteg(uint64_t token);
 void kvmppc_hash64_write_pte(CPUPPCState *env, target_ulong pte_index,
  target_ulong pte0, target_ulong pte1);
 bool kvmppc_has_cap_fixup_hcalls(void);
+bool kvmppc_hwrng_present(void);
 int kvmppc_enable_hwrng(void);
 
 #else
@@ -252,6 +253,11 @@ static inline bool kvmppc_has_cap_fixup_hcalls(void)
 abort();
 }
 
+static inline bool kvmppc_hwrng_present(void)
+{
+return false;
+}
+
 static inline int kvmppc_enable_hwrng(void)
 {
 return -1;

Re: [Qemu-devel] [PATCH 03/18] qemu-thread: introduce QemuLockCnt

2015-09-28 Thread Stefan Hajnoczi

On Thu, Aug 06, 2015 at 03:36:01PM +0200, Paolo Bonzini wrote:
> +int qemu_lockcnt_count(QemuLockCnt *lockcnt);

Why use int here when the counter field is unsigned?

Re: [Qemu-devel] [PATCH 02/18] aio: rename bh_lock to list_lock

2015-09-28 Thread Stefan Hajnoczi

On Thu, Aug 06, 2015 at 03:36:00PM +0200, Paolo Bonzini wrote:
> This will be used for AioHandlers too.  There is going to be little
> or no contention, so it is better to reuse the same lock.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  async.c | 16 
>  include/block/aio.h |  2 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi

[Qemu-devel] [PATCH v2 2/2] target-arm: Fix CPU breakpoint handling

2015-09-28 Thread Sergey Fedorov

A QEMU breakpoint match is not definitely an architectural breakpoint
match. If an exception is generated unconditionally during translation,
it is hardly possible to ignore it in the debug exception handler.

Generate a call to a helper to check CPU breakpoints and raise an
exception only if any breakpoint matches architecturally.

Signed-off-by: Sergey Fedorov 
---
 target-arm/helper.h|  2 ++
 target-arm/op_helper.c | 29 ++---
 target-arm/translate-a64.c | 14 --
 target-arm/translate.c | 13 -
 4 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/target-arm/helper.h b/target-arm/helper.h
index 827b33d..c2a85c7 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -54,6 +54,8 @@ DEF_HELPER_1(yield, void, env)
 DEF_HELPER_1(pre_hvc, void, env)
 DEF_HELPER_2(pre_smc, void, env, i32)
 
+DEF_HELPER_1(check_breakpoints, void, env)
+
 DEF_HELPER_3(cpsr_write, void, env, i32, i32)
 DEF_HELPER_1(cpsr_read, i32, env)
 
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 1d4d8cb..8ec8590 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -867,6 +867,15 @@ static bool check_breakpoints(ARMCPU *cpu)
 return false;
 }
 
+void HELPER(check_breakpoints)(CPUARMState *env)
+{
+ARMCPU *cpu = arm_env_get_cpu(env);
+
+if (check_breakpoints(cpu)) {
+HELPER(exception_internal(env, EXCP_DEBUG));
+}
+}
+
 void arm_debug_excp_handler(CPUState *cs)
 {
 /* Called by core code when a watchpoint or breakpoint fires;
@@ -899,6 +908,7 @@ void arm_debug_excp_handler(CPUState *cs)
 } else {
 CPUBreakpoint *bp;
 uint64_t pc = is_a64(env) ? env->pc : env->regs[15];
+bool same_el = (arm_debug_target_el(env) == arm_current_el(env));
 
 QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
 if (bp->pc == pc && !(bp->flags & BP_CPU)) {
@@ -906,18 +916,15 @@ void arm_debug_excp_handler(CPUState *cs)
 }
 }
 
-if (check_breakpoints(cpu)) {
-bool same_el = (arm_debug_target_el(env) == arm_current_el(env));
-if (extended_addresses_enabled(env)) {
-env->exception.fsr = (1 << 9) | 0x22;
-} else {
-env->exception.fsr = 0x2;
-}
-/* FAR is UNKNOWN, so doesn't need setting */
-raise_exception(env, EXCP_PREFETCH_ABORT,
-syn_breakpoint(same_el),
-arm_debug_target_el(env));
+if (extended_addresses_enabled(env)) {
+env->exception.fsr = (1 << 9) | 0x22;
+} else {
+env->exception.fsr = 0x2;
 }
+/* FAR is UNKNOWN, so doesn't need setting */
+raise_exception(env, EXCP_PREFETCH_ABORT,
+syn_breakpoint(same_el),
+arm_debug_target_el(env));
 }
 }
 
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index ec0936c..426229f 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -11082,11 +11082,14 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
 QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
 if (bp->pc == dc->pc) {
-gen_exception_internal_insn(dc, 0, EXCP_DEBUG);
-/* Advance PC so that clearing the breakpoint will
-   invalidate this TB.  */
-dc->pc += 2;
-goto done_generating;
+if (bp->flags & BP_CPU) {
+gen_helper_check_breakpoints(cpu_env);
+} else {
+gen_exception_internal_insn(dc, 0, EXCP_DEBUG);
+}
+/* End the TB early; it's likely not going to be executed 
*/
+dc->is_jmp = DISAS_UPDATE;
+break;
 }
 }
 }
@@ -11209,7 +11212,6 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 }
 }
 
-done_generating:
 gen_tb_end(tb, num_insns);
 
 #ifdef DEBUG_DISAS
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 84a21ac..405d6d0 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11328,11 +11328,14 @@ static inline void 
gen_intermediate_code_internal(ARMCPU *cpu,
 if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
 QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
 if (bp->pc == dc->pc) {
-gen_exception_internal_insn(dc, 0, EXCP_DEBUG);
-/* Advance PC so that clearing the breakpoint will
-   invalidate this TB.  */
-dc->pc += 2;
-goto done_generating;
+if (bp->flags & BP_CPU) {
+gen_helper_check_breakpoints(cpu_env);
+} else

Re: [Qemu-devel] [PATCH 01/18] iothread: release iothread around aio_poll

2015-09-28 Thread Paolo Bonzini



On 28/09/2015 11:50, Stefan Hajnoczi wrote:
> On Thu, Aug 06, 2015 at 03:35:59PM +0200, Paolo Bonzini wrote:
>> This is the first step towards having fine-grained critical sections in
>> dataplane threads, which resolves lock ordering problems between
>> address_space_* functions (which need the BQL when doing MMIO, even
>> after we complete RCU-based dispatch) and the AioContext.
>>
>> Because AioContext does not use contention callbacks anymore, the
>> unit test has to be changed.
>>
>> Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and
>> then reverted.
> 
> commit da5e1de95bb235330d7724316e7a29239d1359d5
> Author: Stefan Hajnoczi 
> Date:   Wed Jun 3 10:15:33 2015 +0100
> 
> Revert "iothread: release iothread around aio_poll"
> 
> This reverts commit a0710f7995f914e3044e5899bd8ff6c43c62f916.
> 
> In qemu-devel email message <556dbf87.2020...@de.ibm.com>, Christian
> Borntraeger writes:
> 
>   Having many guests all with a kernel/ramdisk (via -kernel) and
>   several null block devices will result in hangs. All hanging
>   guests are in partition detection code waiting for an I/O to return
>   so very early maybe even the first I/O.
> 
>   Reverting that commit "fixes" the hangs.
> 
> Reverting this commit for the 2.4 release.  More time is needed to
> investigate and correct this patch.
> 
> Did we ever find the root cause for hangs caused by this patch?

It was fixed by commit 53ec73e ("block: Use bdrv_drain to replace
uncessary bdrv_drain_all", 2015-05-29)'s change to bdrv_set_aio_context.
 We never investigated the root cause, but I'd guess it's gone after the
2.4-rc bugfixes to AioContext.

Paolo

Re: [Qemu-devel] [PATCH 03/18] qemu-thread: introduce QemuLockCnt

2015-09-28 Thread Paolo Bonzini



On 28/09/2015 12:15, Stefan Hajnoczi wrote:
> On Thu, Aug 06, 2015 at 03:36:01PM +0200, Paolo Bonzini wrote:
>> > +int qemu_lockcnt_count(QemuLockCnt *lockcnt);
> Why use int here when the counter field is unsigned?

Nice catch, will fix.

Paolo

[Qemu-devel] [PATCH] target-arm: Add MDCR_EL2

2015-09-28 Thread Sergey Fedorov

Signed-off-by: Sergey Fedorov 
---

This patch is a prerequisite for a debug exception routing patch:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg03542.html

 target-arm/cpu-qom.h |  1 +
 target-arm/cpu.c |  1 +
 target-arm/cpu.h |  1 +
 target-arm/cpu64.c   |  1 +
 target-arm/helper.c  | 13 +
 5 files changed, 17 insertions(+)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 25fb1ce..d2b0769 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -167,6 +167,7 @@ typedef struct ARMCPU {
 uint64_t id_aa64mmfr0;
 uint64_t id_aa64mmfr1;
 uint32_t dbgdidr;
+uint32_t mdcr;
 uint32_t clidr;
 uint64_t mp_affinity; /* MP ID without feature bits */
 /* The elements of this array are the CCSIDR values for each cache,
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index d7b4445..6474c0d 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -1125,6 +1125,7 @@ static void cortex_a15_initfn(Object *obj)
 cpu->id_isar3 = 0x2131;
 cpu->id_isar4 = 0x10011142;
 cpu->dbgdidr = 0x3515f021;
+cpu->mdcr = 0x0006;
 cpu->clidr = 0x0a200023;
 cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
 cpu->ccsidr[1] = 0x201fe00a; /* 32K L1 icache */
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 1b80516..d57ed20 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -380,6 +380,7 @@ typedef struct CPUARMState {
 uint64_t dbgwvr[16]; /* watchpoint value registers */
 uint64_t dbgwcr[16]; /* watchpoint control registers */
 uint64_t mdscr_el1;
+uint64_t mdcr_el2;
 /* If the counter is enabled, this stores the last time the counter
  * was reset. Otherwise it stores the counter value
  */
diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
index 63c8b1c..a41cd28 100644
--- a/target-arm/cpu64.c
+++ b/target-arm/cpu64.c
@@ -136,6 +136,7 @@ static void aarch64_a57_initfn(Object *obj)
 cpu->id_aa64isar0 = 0x00011120;
 cpu->id_aa64mmfr0 = 0x1124;
 cpu->dbgdidr = 0x3516d000;
+cpu->mdcr = 0x0006;
 cpu->clidr = 0x0a200023;
 cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
 cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 12ea88f..5fe1291 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3221,6 +3221,9 @@ static const ARMCPRegInfo el3_no_el2_cp_reginfo[] = {
 { .name = "CNTHP_CTL_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 14, .crm = 2, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+{ .name = "MDCR_EL2", .state = ARM_CP_STATE_BOTH,
+  .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 1,
+  .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 REGINFO_SENTINEL
 };
 
@@ -3870,6 +3873,13 @@ static void define_debug_regs(ARMCPU *cpu)
 .name = "DBGDIDR", .cp = 14, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 0,
 .access = PL0_R, .type = ARM_CP_CONST, .resetvalue = cpu->dbgdidr,
 };
+ARMCPRegInfo mdcr_el2 = {
+.name = "MDCR_EL2", .state = ARM_CP_STATE_BOTH,
+.opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 1,
+.access = PL2_RW,
+.fieldoffset = offsetof(CPUARMState, cp15.mdcr_el2),
+.resetvalue = cpu->mdcr,
+};
 
 /* Note that all these register fields hold "number of Xs minus 1". */
 brps = extract32(cpu->dbgdidr, 24, 4);
@@ -3894,6 +3904,9 @@ static void define_debug_regs(ARMCPU *cpu)
 if (arm_feature(&cpu->env, ARM_FEATURE_LPAE)) {
 define_arm_cp_regs(cpu, debug_lpae_cp_reginfo);
 }
+if (arm_feature(&cpu->env, ARM_FEATURE_EL2)) {
+define_one_arm_cp_reg(cpu, &mdcr_el2);
+}
 
 for (i = 0; i < brps + 1; i++) {
 ARMCPRegInfo dbgregs[] = {
-- 
1.9.1

[Qemu-devel] [PATCH 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-09-28 Thread Haozhong Zhang

This patchset enables QEMU to save/restore vcpu's TSC rate during the
migration. When cooperating with KVM which supports TSC scaling, guest
programs can observe a consistent guest TSC rate even though they are
migrated among machines with different host TSC rates.

Haozhong Zhang (3):
  target-i386: add a subsection of vcpu's TSC rate in vmstate_x86_cpu
  target-i386: initialize vcpu's TSC rate to the value from KVM
  kvm-all: notice KVM of vcpu's TSC rate after migration

 kvm-all.c | 13 +
 target-i386/kvm.c |  7 +++
 target-i386/machine.c | 20 
 3 files changed, 40 insertions(+)

--
2.4.8

[Qemu-devel] [PATCH 1/3] target-i386: add a subsection of vcpu's TSC rate in vmstate_x86_cpu

2015-09-28 Thread Haozhong Zhang

The newly added subsection 'vmstate_tsc_khz' in this patch results in
vcpu's TSC rate being saved on the source machine and loaded on the
target machine during the migration.

Signed-off-by: Haozhong Zhang 
---
 target-i386/machine.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/target-i386/machine.c b/target-i386/machine.c
index 9fa0563..80108a3 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -752,6 +752,25 @@ static const VMStateDescription vmstate_xss = {
 }
 };
 
+static bool tsc_khz_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return env->tsc_khz != 0;
+}
+
+static const VMStateDescription vmstate_tsc_khz = {
+.name = "cpu/tsc_khz",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = tsc_khz_needed,
+.fields = (VMStateField[]) {
+VMSTATE_INT64(env.tsc_khz, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
 VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -871,6 +890,7 @@ VMStateDescription vmstate_x86_cpu = {
 &vmstate_msr_hyperv_crash,
 &vmstate_avx512,
 &vmstate_xss,
+&vmstate_tsc_khz,
 NULL
 }
 };
-- 
2.4.8

[Qemu-devel] [PATCH 2/3] target-i386: initialize vcpu's TSC rate to the value from KVM

2015-09-28 Thread Haozhong Zhang

When creating a vcpu, we initialize its TSC rate to the value from
KVM (through ioctl KVM_GET_TSC_KHZ).

Signed-off-by: Haozhong Zhang 
---
 target-i386/kvm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7b0ba17..c2b161a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -751,6 +751,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 }
 
+r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
+if (r < 0) {
+fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
+return r;
+}
+env->tsc_khz = r;
+
 if (kvm_has_xsave()) {
 env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
 }
-- 
2.4.8

[Qemu-devel] [PATCH 3/3] kvm-all: notice KVM of vcpu's TSC rate after migration

2015-09-28 Thread Haozhong Zhang

When a vcpu is created in KVM, its TSC rate is initially identical to
the host TSC rate. If its state is migrated to a vcpu on another
machine (target machine) which may uses a different host TSC rate, QEMU
on the target machine should notice KVM of the migrated vcpu's TSC
rate. In case that KVM on the target machine supports TSC scaling, guest
programs running on the migrated vcpu will observe the same TSC rate
before and after the migration.

Signed-off-by: Haozhong Zhang 
---
 kvm-all.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index 0be4615..e8de038 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1769,6 +1769,19 @@ void kvm_cpu_synchronize_post_reset(CPUState *cpu)
 static void do_kvm_cpu_synchronize_post_init(void *arg)
 {
 CPUState *cpu = arg;
+CPUX86State *env = &X86_CPU(cpu)->env;
+int r;
+
+/*
+ * XXX: KVM_SET_TSC_KHZ must be done before kvm_arch_put_registers().
+ */
+r = kvm_check_extension(cpu->kvm_state, KVM_CAP_TSC_CONTROL);
+if (r && env->tsc_khz) {
+r = kvm_vcpu_ioctl(cpu, KVM_SET_TSC_KHZ, env->tsc_khz);
+if (r < 0) {
+fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");
+}
+}
 
 kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
 cpu->kvm_vcpu_dirty = false;
-- 
2.4.8

Re: [Qemu-devel] [PATCH 01/10][TRIVIAL] adb: add to input category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> The Apple Desktop Bus is used to connect a keyboard and a mouse,
> so add it to the input category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/input/adb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/input/adb.c b/hw/input/adb.c
> index a18eea2..09eead9 100644
> --- a/hw/input/adb.c
> +++ b/hw/input/adb.c
> @@ -362,6 +362,7 @@ static void adb_kbd_class_init(ObjectClass *oc, void 
> *data)
>  
>  akc->parent_realize = dc->realize;
>  dc->realize = adb_kbd_realizefn;
> +set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
>  
>  adc->devreq = adb_kbd_request;
>  dc->reset = adb_kbd_reset;
> @@ -566,6 +567,7 @@ static void adb_mouse_class_init(ObjectClass *oc, void 
> *data)
>  
>  amc->parent_realize = dc->realize;
>  dc->realize = adb_mouse_realizefn;
> +set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
>  
>  adc->devreq = adb_mouse_request;
>  dc->reset = adb_mouse_reset;

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 02/10][TRIVIAL] cmd646: add to storage category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> cmd646 is an IDE controller, so add it to the
> storage category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/ide/cmd646.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
> index 66fb9d9..27f3da2 100644
> --- a/hw/ide/cmd646.c
> +++ b/hw/ide/cmd646.c
> @@ -417,6 +417,7 @@ static void cmd646_ide_class_init(ObjectClass *klass, 
> void *data)
>  k->config_read = cmd646_pci_config_read;
>  k->config_write = cmd646_pci_config_write;
>  dc->props = cmd646_ide_properties;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 03/10][TRIVIAL] escc: add to input category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> ESCC is a serial port controller, so add it
> to the input category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/char/escc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/char/escc.c b/hw/char/escc.c
> index ba653ef..9816154 100644
> --- a/hw/char/escc.c
> +++ b/hw/char/escc.c
> @@ -1035,6 +1035,7 @@ static void escc_class_init(ObjectClass *klass, void 
> *data)
>  dc->reset = escc_reset;
>  dc->vmsd = &vmstate_escc;
>  dc->props = escc_properties;
> +set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 06/10][TRIVIAL] macio-ide: add to storage category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> macio-ide is an IDE controller, so add it
> to the storage category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/ide/macio.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index 66ac2ba..893c9b9 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -590,6 +590,7 @@ static void macio_ide_class_init(ObjectClass *oc, void 
> *data)
>  dc->realize = macio_ide_realizefn;
>  dc->reset = macio_ide_reset;
>  dc->vmsd = &vmstate_pmac;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 3/3] Target-ppc: Remove unnecessary variable

2015-09-28 Thread Paolo Bonzini



On 26/09/2015 18:15, Eric Blake wrote:
> On 09/25/2015 02:37 AM, Shraddha Barke wrote:
>> Compress lines and remove the variable.
>>
> 
>> +++ b/target-ppc/kvm.c
>> @@ -1782,8 +1782,7 @@ uint32_t kvmppc_get_tbfreq(void)
>>  
>>  ns++;
>>  
>> -retval = atoi(ns);
>> -return retval;
>> +return atoi(ns);
> 
> atoi() is lousy; it cannot properly detect user input errors.  This
> should probably be converted to use the appropriate qemu_strtol variant
> instead.

But it's more or less okay here, it's parsing /proc/cpuinfo.

Paolo

Re: [Qemu-devel] [PATCH repost 2/4] oslib: allocate PROT_NONE pages on top of RAM

2015-09-28 Thread Paolo Bonzini

On 27/09/2015 12:14, Michael S. Tsirkin wrote:
> -if (total > size) {
> -munmap(ptr + size, total - size);
> +if (total > size + getpagesize()) {
> +munmap(ptr + size + getpagesize(), total - size - getpagesize());
>  }
>  

Please add a comment here, also noting that "total" always allocates at
least an extra page, even if size is already aligned.

>  if (ptr) {
> -munmap(ptr, size);
> +munmap(ptr, size + getpagesize());
>  }

And another comment here.

Paolo

Re: [Qemu-devel] [PATCH repost 0/4] add mitigation against buffer overflows

2015-09-28 Thread Paolo Bonzini



On 27/09/2015 12:14, Michael S. Tsirkin wrote:
> Multiple places in QEMU map guest memory, then access it
> directly. Unfortunately since we are using C, there's always
> a chance that we'll miss a bounds check when we do this.
> This has a potential to corrupt QEMU memory.
> 
> As a mitigation strategy against such exploits,
> allocate a page in HVA space on top of each RAM chunk
> with PROT_NONE protection.
> 
> Buffer overflows will now cause QEMU to crash.
> 
> This is a repost, combining separate patches into a single
> series. No changes to patches themselves.
> 
> Michael S. Tsirkin (4):
>   oslib: rework anonimous RAM allocation
>   oslib: allocate PROT_NONE pages on top of RAM
>   exec: allocate PROT_NONE pages on top of RAM
>   exec: factor out duplicate mmap code
> 
>  include/qemu/mmap-alloc.h | 10 +
>  exec.c| 19 -
>  util/mmap-alloc.c | 52 
> +++
>  util/oslib-posix.c| 20 --
>  util/Makefile.objs|  2 +-
>  5 files changed, 81 insertions(+), 22 deletions(-)
>  create mode 100644 include/qemu/mmap-alloc.h
>  create mode 100644 util/mmap-alloc.c
> 

Reviewed-by: Paolo Bonzini 
Acked-by: Paolo Bonzini 

Regarding my request to add comments in patch 2, feel free to add them
directly in patch 4 instead.

Paolo

Re: [Qemu-devel] [PATCH 04/10][TRIVIAL] grackle: add to bridge category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> Grackle is the PCI host controller of oldworld powermac,
> so add it to the bridge category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/pci-host/grackle.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/pci-host/grackle.c b/hw/pci-host/grackle.c
> index bfe707a..ea31b72 100644
> --- a/hw/pci-host/grackle.c
> +++ b/hw/pci-host/grackle.c
> @@ -146,8 +146,10 @@ static const TypeInfo grackle_pci_info = {
>  static void pci_grackle_class_init(ObjectClass *klass, void *data)
>  {
>  SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
> +DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  k->init = pci_grackle_init_device;
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 07/10][TRIVIAL] uninorth: add to bridge category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> Uninorth is the mac99 PCI host controller, so add
> it to the bridge category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/pci-host/uninorth.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/pci-host/uninorth.c b/hw/pci-host/uninorth.c
> index f0144eb..215b64f 100644
> --- a/hw/pci-host/uninorth.c
> +++ b/hw/pci-host/uninorth.c
> @@ -446,8 +446,10 @@ static const TypeInfo unin_internal_pci_host_info = {
>  static void pci_unin_main_class_init(ObjectClass *klass, void *data)
>  {
>  SysBusDeviceClass *sbc = SYS_BUS_DEVICE_CLASS(klass);
> +DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  sbc->init = pci_unin_main_init_device;
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  }
>  
>  static const TypeInfo pci_unin_main_info = {
> @@ -460,8 +462,10 @@ static const TypeInfo pci_unin_main_info = {
>  static void pci_u3_agp_class_init(ObjectClass *klass, void *data)
>  {
>  SysBusDeviceClass *sbc = SYS_BUS_DEVICE_CLASS(klass);
> +DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  sbc->init = pci_u3_agp_init_device;
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  }
>  
>  static const TypeInfo pci_u3_agp_info = {
> @@ -474,8 +478,10 @@ static const TypeInfo pci_u3_agp_info = {
>  static void pci_unin_agp_class_init(ObjectClass *klass, void *data)
>  {
>  SysBusDeviceClass *sbc = SYS_BUS_DEVICE_CLASS(klass);
> +DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  sbc->init = pci_unin_agp_init_device;
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  }
>  
>  static const TypeInfo pci_unin_agp_info = {
> @@ -488,8 +494,10 @@ static const TypeInfo pci_unin_agp_info = {
>  static void pci_unin_internal_class_init(ObjectClass *klass, void *data)
>  {
>  SysBusDeviceClass *sbc = SYS_BUS_DEVICE_CLASS(klass);
> +DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  sbc->init = pci_unin_internal_init_device;
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 08/10][TRIVIAL] macio: add to bridge category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> macio is a bridge between the PCI bus and the Mac nvram,
> IDE controller and PIC, so add it to the bridge category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/misc/macio/macio.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/misc/macio/macio.c b/hw/misc/macio/macio.c
> index e3c0242..c0989a0 100644
> --- a/hw/misc/macio/macio.c
> +++ b/hw/misc/macio/macio.c
> @@ -393,6 +393,7 @@ static void macio_class_init(ObjectClass *klass, void 
> *data)
>  k->vendor_id = PCI_VENDOR_ID_APPLE;
>  k->class_id = PCI_CLASS_OTHERS << 8;
>  dc->props = macio_properties;
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 09/10][TRIVIAL] macio-nvram: add to misc category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> The macio nvram is a non volatile RAM, so add it
> the misc category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/nvram/mac_nvram.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/nvram/mac_nvram.c b/hw/nvram/mac_nvram.c
> index d35f8a3..9f16566 100644
> --- a/hw/nvram/mac_nvram.c
> +++ b/hw/nvram/mac_nvram.c
> @@ -123,6 +123,7 @@ static void macio_nvram_class_init(ObjectClass *oc, void 
> *data)
>  dc->reset = macio_nvram_reset;
>  dc->vmsd = &vmstate_macio_nvram;
>  dc->props = macio_nvram_properties;
> +set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH 10/10][TRIVIAL] openpic: add to misc category

2015-09-28 Thread Thomas Huth

On 26/09/15 18:22, Laurent Vivier wrote:
> openpic is a programmable interrupt controller, so
> add it to the misc category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/intc/openpic.c | 1 +
>  hw/intc/openpic_kvm.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/hw/intc/openpic.c b/hw/intc/openpic.c
> index 14ab0e3..bfcf155 100644
> --- a/hw/intc/openpic.c
> +++ b/hw/intc/openpic.c
> @@ -1643,6 +1643,7 @@ static void openpic_class_init(ObjectClass *oc, void 
> *data)
>  dc->props = openpic_properties;
>  dc->reset = openpic_reset;
>  dc->vmsd = &vmstate_openpic;
> +set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  }
>  
>  static const TypeInfo openpic_info = {
> diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
> index f7cac58..649f476 100644
> --- a/hw/intc/openpic_kvm.c
> +++ b/hw/intc/openpic_kvm.c
> @@ -275,6 +275,7 @@ static void kvm_openpic_class_init(ObjectClass *oc, void 
> *data)
>  dc->realize = kvm_openpic_realize;
>  dc->props = kvm_openpic_properties;
>  dc->reset = kvm_openpic_reset;
> +set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH v5 45/46] net: Complete qapi-fication of netdev_add

2015-09-28 Thread Paolo Bonzini



On 28/09/2015 11:31, Markus Armbruster wrote:
>> However, while changing the command to
>> >
>> > { 'command': 'netdev_add', 'data': 'Netdev', 'box': true, 'gen': false }
>> >
>> > for better introspection, you should keep 'gen':'false' and the manual
>> > implementation based on qemu_opts_from_qdict, otherwise you break
>> > backwards-compatibility.
> Non sequitur :)
> 
> We need to stay sufficiently backwards compatible somehow.  'gen' false
> is one possible solution.  Liberal use of alternate types could be
> another.  A special input visitor mode could be a third.

Agreed; my suggestion is the minimal change on top of this patch,
actually undoing part of it.  Any other approach should be done separately.

Paolo

> Since my review cursor is >30 patches behind this one, I don't have an
> opinion, yet.

[Qemu-devel] [PATCH 1/1] migration: fix deadlock

2015-09-28 Thread Denis V. Lunev

Release qemu global mutex before call synchronize_rcu().
synchronize_rcu() waiting for all readers to finish their critical
sections. There is at least one critical section in which we try
to get QGM (critical section is in address_space_rw() and
prepare_mmio_access() is trying to aquire QGM).

Both functions (migration_end() and migration_bitmap_extend())
are called from main thread which is holding QGM.

Thus there is a race condition that ends up with deadlock:
main thread working thread
Lock QGA|
| Call KVM_EXIT_IO handler
|   |
|Open rcu reader's critical section
Migration cleanup bh|
|   |
synchronize_rcu() is|
waiting for readers |
|prepare_mmio_access() is waiting for QGM
  \   /
 deadlock

The patch changes bitmap freeing from direct g_free after synchronize_rcu
to free inside call_rcu.

Signed-off-by: Denis V. Lunev 
Reported-by: Igor Redko 
Tested-by: Igor Redko 
CC: Anna Melekhova 
CC: Juan Quintela 
CC: Amit Shah 
CC: Paolo Bonzini 
CC: Wen Congyang 
---
 migration/ram.c | 44 +++-
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7f007e6..e7c5bcf 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -221,12 +221,16 @@ static RAMBlock *last_seen_block;
 /* This is the last block from where we have sent data */
 static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
-static unsigned long *migration_bitmap;
 static QemuMutex migration_bitmap_mutex;
 static uint64_t migration_dirty_pages;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
+static struct BitmapRcu {
+struct rcu_head rcu;
+unsigned long *bmap;
+} *migration_bitmap_rcu;
+
 struct CompressParam {
 bool start;
 bool done;
@@ -508,7 +512,7 @@ ram_addr_t 
migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
 
 unsigned long next;
 
-bitmap = atomic_rcu_read(&migration_bitmap);
+bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
 if (ram_bulk_stage && nr > base) {
 next = nr + 1;
 } else {
@@ -526,7 +530,7 @@ ram_addr_t 
migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
 unsigned long *bitmap;
-bitmap = atomic_rcu_read(&migration_bitmap);
+bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
 migration_dirty_pages +=
 cpu_physical_memory_sync_dirty_bitmap(bitmap, start, length);
 }
@@ -1024,17 +1028,22 @@ void free_xbzrle_decoded_buf(void)
 xbzrle_decoded_buf = NULL;
 }
 
+static void migration_bitmap_free(struct BitmapRcu *bmap)
+{
+g_free(bmap->bmap);
+g_free(bmap);
+}
+
 static void migration_end(void)
 {
 /* caller have hold iothread lock or is in a bh, so there is
  * no writing race against this migration_bitmap
  */
-unsigned long *bitmap = migration_bitmap;
-atomic_rcu_set(&migration_bitmap, NULL);
+struct BitmapRcu *bitmap = migration_bitmap_rcu;
+atomic_rcu_set(&migration_bitmap_rcu, NULL);
 if (bitmap) {
 memory_global_dirty_log_stop();
-synchronize_rcu();
-g_free(bitmap);
+call_rcu(bitmap, migration_bitmap_free, rcu);
 }
 
 XBZRLE_cache_lock();
@@ -1070,9 +1079,10 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t 
new)
 /* called in qemu main thread, so there is
  * no writing race against this migration_bitmap
  */
-if (migration_bitmap) {
-unsigned long *old_bitmap = migration_bitmap, *bitmap;
-bitmap = bitmap_new(new);
+if (migration_bitmap_rcu) {
+struct BitmapRcu *old_bitmap = migration_bitmap_rcu, *bitmap;
+bitmap = g_new(struct BitmapRcu, 1);
+bitmap->bmap = bitmap_new(new);
 
 /* prevent migration_bitmap content from being set bit
  * by migration_bitmap_sync_range() at the same time.
@@ -1080,13 +1090,12 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t 
new)
  * at the same time.
  */
 qemu_mutex_lock(&migration_bitmap_mutex);
-bitmap_copy(bitmap, old_bitmap, old);
-bitmap_set(bitmap, old, new - old);
-atomic_rcu_set(&migration_bitmap, bitmap);
+bitmap_copy(bitmap->bmap, old_bitmap->bmap, old);
+bitmap_set(bitmap->bmap, old, new - old);
+atomic_rcu_set(&migration_bitmap_rcu, bitmap);
 qemu_mutex_unlock(&migration_bitmap_mutex);
 migration_dirty_pages += new - old;
-synchronize_rcu();
-g_free(old_bitmap);
+call_rcu(old_bitmap, migration_bitmap_free, rcu);
 }
 }
 
@@ -1145,8 +1154,9 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 reset_ram_globals();
 
 ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
-migration_bitmap = bitmap_new(ram_bitmap_pages);
-bitmap_set(migration_bitmap, 0, ram_bitmap_pa

Re: [Qemu-devel] [RFC PATCH v3] os-android: Add support to android platform

2015-09-28 Thread Paolo Bonzini



On 24/09/2015 15:21, Houcheng Lin wrote:
> +if [ "$android" = "yes" ] ; then
> +  LIBS="-lglib-2.0 -lgthread-2.0 -lz -lpixman-1 -lintl -liconv -lc $LIBS"
> +  libs_qga="-lglib-2.0 -lgthread-2.0 -lz -lpixman-1 -lintl -liconv -lc"
> +fi

This change should not be necessary.

> +#define getdtablesize qemu_getdtablesize

Please instead replace all occurrences of getdtablesize with
qemu_getdtablesize.

> 
> +#ifdef CONFIG_ANDROID
> +#include "sysemu/os-android.h"
> +#endif
> +

Please replace this with

#include 

#ifndef IOV_MAX
#define IOV_MAX 1024
#endif

and get rid of os-android.h.

> 
> +#if defined(CONFIG_ANDROID)
> +char pty_buf[PATH_MAX];
> +#define ptsname(fd) pty_buf
> +#endif
>  const char *slave;
>  int mfd = -1, sfd = -1;
>  
> @@ -67,17 +72,21 @@ static int openpty(int *amaster, int *aslave, char *name,
>  
>  if (grantpt(mfd) == -1 || unlockpt(mfd) == -1)
>  goto err;
> -
> +#if defined(CONFIG_ANDROID)
> +if (ptsname_r(mfd, pty_buf, PATH_MAX) < 0)
> +goto err;
> +#endif
>  if ((slave = ptsname(mfd)) == NULL)
>  goto err;
>  


Better:

#if defined(CONFIG_ANDROID)
char slave[PATH_MAX];
#else
const char *slave;
#endif

...

#if defined(CONFIG_ANDROID)
if (ptsname_r(mfd, slave, PATH_MAX) < 0)
goto err;
#else
if ((slave = ptsname(mfd)) == NULL)
goto err;
#endif

Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock

2015-09-28 Thread Paolo Bonzini



On 28/09/2015 13:41, Denis V. Lunev wrote:
> Release qemu global mutex before call synchronize_rcu().
> synchronize_rcu() waiting for all readers to finish their critical
> sections. There is at least one critical section in which we try
> to get QGM (critical section is in address_space_rw() and
> prepare_mmio_access() is trying to aquire QGM).
> 
> Both functions (migration_end() and migration_bitmap_extend())
> are called from main thread which is holding QGM.
> 
> Thus there is a race condition that ends up with deadlock:
> main thread working thread
> Lock QGA|
> | Call KVM_EXIT_IO handler
> |   |
> |Open rcu reader's critical section
> Migration cleanup bh|
> |   |
> synchronize_rcu() is|
> waiting for readers |
> |prepare_mmio_access() is waiting for QGM
>   \   /
>  deadlock
> 
> The patch changes bitmap freeing from direct g_free after synchronize_rcu
> to free inside call_rcu.
> 
> Signed-off-by: Denis V. Lunev 
> Reported-by: Igor Redko 
> Tested-by: Igor Redko 
> CC: Anna Melekhova 
> CC: Juan Quintela 
> CC: Amit Shah 
> CC: Paolo Bonzini 
> CC: Wen Congyang 
> ---
>  migration/ram.c | 44 +++-
>  1 file changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 7f007e6..e7c5bcf 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -221,12 +221,16 @@ static RAMBlock *last_seen_block;
>  /* This is the last block from where we have sent data */
>  static RAMBlock *last_sent_block;
>  static ram_addr_t last_offset;
> -static unsigned long *migration_bitmap;
>  static QemuMutex migration_bitmap_mutex;
>  static uint64_t migration_dirty_pages;
>  static uint32_t last_version;
>  static bool ram_bulk_stage;
>  
> +static struct BitmapRcu {
> +struct rcu_head rcu;
> +unsigned long *bmap;
> +} *migration_bitmap_rcu;
> +
>  struct CompressParam {
>  bool start;
>  bool done;
> @@ -508,7 +512,7 @@ ram_addr_t 
> migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
>  
>  unsigned long next;
>  
> -bitmap = atomic_rcu_read(&migration_bitmap);
> +bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
>  if (ram_bulk_stage && nr > base) {
>  next = nr + 1;
>  } else {
> @@ -526,7 +530,7 @@ ram_addr_t 
> migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
>  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
>  {
>  unsigned long *bitmap;
> -bitmap = atomic_rcu_read(&migration_bitmap);
> +bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
>  migration_dirty_pages +=
>  cpu_physical_memory_sync_dirty_bitmap(bitmap, start, length);
>  }
> @@ -1024,17 +1028,22 @@ void free_xbzrle_decoded_buf(void)
>  xbzrle_decoded_buf = NULL;
>  }
>  
> +static void migration_bitmap_free(struct BitmapRcu *bmap)
> +{
> +g_free(bmap->bmap);
> +g_free(bmap);
> +}
> +
>  static void migration_end(void)
>  {
>  /* caller have hold iothread lock or is in a bh, so there is
>   * no writing race against this migration_bitmap
>   */
> -unsigned long *bitmap = migration_bitmap;
> -atomic_rcu_set(&migration_bitmap, NULL);
> +struct BitmapRcu *bitmap = migration_bitmap_rcu;
> +atomic_rcu_set(&migration_bitmap_rcu, NULL);
>  if (bitmap) {
>  memory_global_dirty_log_stop();
> -synchronize_rcu();
> -g_free(bitmap);
> +call_rcu(bitmap, migration_bitmap_free, rcu);
>  }
>  
>  XBZRLE_cache_lock();
> @@ -1070,9 +1079,10 @@ void migration_bitmap_extend(ram_addr_t old, 
> ram_addr_t new)
>  /* called in qemu main thread, so there is
>   * no writing race against this migration_bitmap
>   */
> -if (migration_bitmap) {
> -unsigned long *old_bitmap = migration_bitmap, *bitmap;
> -bitmap = bitmap_new(new);
> +if (migration_bitmap_rcu) {
> +struct BitmapRcu *old_bitmap = migration_bitmap_rcu, *bitmap;
> +bitmap = g_new(struct BitmapRcu, 1);
> +bitmap->bmap = bitmap_new(new);
>  
>  /* prevent migration_bitmap content from being set bit
>   * by migration_bitmap_sync_range() at the same time.
> @@ -1080,13 +1090,12 @@ void migration_bitmap_extend(ram_addr_t old, 
> ram_addr_t new)
>   * at the same time.
>   */
>  qemu_mutex_lock(&migration_bitmap_mutex);
> -bitmap_copy(bitmap, old_bitmap, old);
> -bitmap_set(bitmap, old, new - old);
> -atomic_rcu_set(&migration_bitmap, bitmap);
> +bitmap_copy(bitmap->bmap, old_bitmap->bmap, old);
> +bitmap_set(bitmap->bmap, old, new - old);
> +atomic_rcu_set(&migration_bitmap_rcu, bitmap);
>  qemu_mutex_unlock(&migration_bitmap_mutex);
>  migration_dirty_pages += new - old;
> -synchronize_rcu();
> -g_free(old_bitmap);
> +call_rcu(ol

Re: [Qemu-devel] ARM CPU affinities

2015-09-28 Thread Pavel Fedin

 Hello!

> NB that as the comment says KVM currently imposes its own numbering
> anyway -- if you care about that you need to get the kernel to
> support having userspace tell it about affinity numbering.

 Yes. You cannot set MPIDR values for the KVM, because upon reset they are 
reverted to something which KVM thinks must be. IIRC it is 16 CPUs per cluster, 
starting from 0:0. So, currently you can do whatever you want only for TCG.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH] target-i386: add ABM to Haswell* and Broadwell* CPU models

2015-09-28 Thread Paolo Bonzini

ABM is only implemented as a single instruction set by AMD; all AMD
processors support both instructions or neither. Intel considers POPCNT
as part of SSE4.2, and LZCNT as part of BMI1, but Intel also uses AMD's
ABM flag to indicate support for both POPCNT and LZCNT.  It has to be
added to Haswell and Broadwell because Haswell, by adding LZCNT, has
completed the ABM.

Tested with "qemu-kvm -cpu Haswell-noTSX,enforce" (and also with older
machine types) on an Haswell-EP machine.

Signed-off-by: Paolo Bonzini 
---
 include/hw/i386/pc.h | 22 +-
 target-i386/cpu.c|  8 
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ab5413f..7e89c6a 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -297,7 +297,27 @@ int e820_get_num_entries(void);
 bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 
 #define PC_COMPAT_2_4 \
-HW_COMPAT_2_4
+HW_COMPAT_2_4 \
+{\
+.driver   = "Haswell-" TYPE_X86_CPU,\
+.property = "abm",\
+.value= "off",\
+},\
+{\
+.driver   = "Haswell-noTSX-" TYPE_X86_CPU,\
+.property = "abm",\
+.value= "off",\
+},\
+{\
+.driver   = "Broadwell-" TYPE_X86_CPU,\
+.property = "abm",\
+.value= "off",\
+},\
+{\
+.driver   = "Broadwell-noTSX-" TYPE_X86_CPU,\
+.property = "abm",\
+.value= "off",\
+},
 
 #define PC_COMPAT_2_3 \
 PC_COMPAT_2_4 \
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index bd411b9..d6994c3 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1113,7 +1113,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
@@ -1148,7 +1148,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
@@ -1185,7 +1185,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
@@ -1223,7 +1223,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
 CPUID_EXT2_SYSCALL,
 .features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
+CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
 CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
-- 
2.5.0

Re: [Qemu-devel] Agenda items for today's MTTCG call

2015-09-28 Thread Claudio Fontana

On 28.09.2015 11:19, Alex Bennée wrote:
> User-agent: mu4e 0.9.12; emacs 24.5.50.4
> Hi,
> 
> It's been a while since we last synced up via phone call so could I add
> the following items for discussion:
> 
> 1. Status of Alvise and Fred's trees
> 2. Any cross-pollination ideas between Alvise's LL/SC work and Emilio's
> series
> 
> Emilio's series has a number of interesting features to it, not least
> that it works with softmmu and linux-user. However I do prefer the TCG
> op based approach of Alvise's tree. Is there any scope for getting the
> best of both worlds?
> 
> 3. Memory barriers
> 
> I now have working barrier tests which fail on real hardware when
> barriers don't exist. I'll be sending a new series soon. Again Emilio's
> tree introduces some barrier TCG ops.
> 
> 4. Other outstanding work
> 
> I've had a couple of people prod me with offers of help so it would be
> nice to see what other tasks are currently pending to avoid too much
> duplication of work.
> 
> Does anyone else have anything then want to bring up?
> 

Hi Alex, thanks for this and we'll hear in a few :-)

In general the way I see this is that we should start upstreaming one small 
thing at a time, rather than wait for a perfect solution which starts to 
include more and more patches and diverge more and more from upstream qemu.
I don't think we want to maintain a separate "MT" branch right?

Ciao,

Claudio

Re: [Qemu-devel] [PATCH 08/16] block: Manage backing file references in bdrv_set_backing_hd()

2015-09-28 Thread Alberto Garcia

On Thu 17 Sep 2015 03:48:12 PM CEST, Kevin Wolf  wrote:
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -55,28 +55,7 @@ static int coroutine_fn stream_populate(BlockDriverState 
> *bs,
>  static void close_unused_images(BlockDriverState *top, BlockDriverState 
> *base,
>  const char *base_id)
>  {
> -BlockDriverState *intermediate;
> -intermediate = top->backing ? top->backing->bs : NULL;
> -
> -/* Must assign before bdrv_delete() to prevent traversing dangling 
> pointer
> - * while we delete backing image instances.
> - */
>  bdrv_set_backing_hd(top, base);
> -
> -while (intermediate) {
> -BlockDriverState *unused;
> -
> -/* reached base */
> -if (intermediate == base) {
> -break;
> -}
> -
> -unused = intermediate;
> -intermediate = intermediate->backing ? intermediate->backing->bs : 
> NULL;
> -bdrv_set_backing_hd(unused, NULL);
> -bdrv_unref(unused);
> -}
> -
>  bdrv_refresh_limits(top, NULL);
>  }


bdrv_refresh_limits(bs, NULL) is already called at the end of
bdrv_set_backing_hd(), so there's no need to call it again here.

And then close_unused_images() is used only once in this file and it
would only contain the bdrv_set_backing_hd() call, so you don't need a
separate function anymore.

Berto

Re: [Qemu-devel] [PATCH 10/16] block/io: Make bdrv_requests_pending() public

2015-09-28 Thread Alberto Garcia

On Thu 17 Sep 2015 03:48:14 PM CEST, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  block/io.c| 2 +-
>  include/block/block_int.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Alberto Garcia 

Berto

[Qemu-devel] [PATCH v5 1/1] block/gluster: add support for multiple gluster backup volfile servers

2015-09-28 Thread Prasanna Kumar Kalever

This patch adds a way to specify multiple volfile servers to the gluster
block backend of QEMU with tcp|rdma transport types and their port numbers.

Problem:

Currenly VM Image on gluster volume is specified like this:

file=gluster[+tcp]://server1[:port]/testvol/a.img

Assuming we have have three servers in trustred pool with replica 3 volume
in action and unfortunately server1 (mentioned in the command above) went down
for some reason, since the volume is replica 3 we now have other 2 servers
active from which we can boot the VM.

But currently there is no mechanism to pass the other 2 gluster server
addresses to qemu.

Solution:

New way of specifying VM Image on gluster volume with volfile servers:
(We still support old syntax to maintain backward compatibility)

Basic command line syntax looks like:

Pattern I:
 -drive driver=gluster,
volname=testvol,image-path=/path/a.raw,
volfile-servers.0.server=1.2.3.4,
   [volfile-servers.0.port=24007,]
   [volfile-servers.0.transport=tcp,]
volfile-servers.1.server=5.6.7.8,
   [volfile-servers.1.port=24008,]
   [volfile-servers.1.transport=rdma,] ...

Pattern II:
 'json:{"driver":"qcow2","file":{"driver":"gluster",
   "volname":"testvol","image-path":"/path/a.qcow2",
   "volfile-servers":[{tuple0},{tuple1}, ...{tupleN}]}}'

   driver   => 'gluster' (protocol name)
   volname  => name of gluster volume where our VM image resides
   image-path   => is the absolute path of image in gluster volume

  {tuple}   => {"server":"1.2.3.4"[,"port":"24007","transport":"tcp"]}

   server   => server address (hostname/ipv4/ipv6 addresses)
   port => port number on which glusterd is listening. (default 24007)
   tranport => transport type used to connect to gluster management daemon,
it can be tcp|rdma (default 'tcp')

Examples:
1.
 -drive driver=qcow2,file.driver=gluster,
file.volname=testvol,file.image-path=/path/a.qcow2,
file.volfile-servers.0.server=1.2.3.4,
file.volfile-servers.0.port=24007,
file.volfile-servers.0.transport=tcp,
file.volfile-servers.1.server=5.6.7.8,
file.volfile-servers.1.port=24008,
file.volfile-servers.1.transport=rdma
2.
 'json:{"driver":"qcow2","file":{"driver":"gluster","volname":"testvol",
 "image-path":"/path/a.qcow2","volfile-servers":
 [{"server":"1.2.3.4","port":"24007","transport":"tcp"},
  {"server":"4.5.6.7","port":"24008","transport":"rdma"}] } }'

This patch gives a mechanism to provide all the server addresses which are in
replica set, so in case server1 is down VM can still boot from any of the
active servers.

This is equivalent to the volfile-servers option supported by
mount.glusterfs (FUSE way of mounting gluster volume)

This patch depends on a recent fix in libgfapi raised as part of this work:
http://review.gluster.org/#/c/12114/

Credits: Sincere thanks to Kevin Wolf  and
"Deepak C Shetty"  for inputs and all their support

Signed-off-by: Prasanna Kumar Kalever 
---
v1:
multiple server addresses but common port number and transport type
pattern: URI syntax with query (?) delimitor
syntax:
file=gluster[+transport-type]://server1:24007/testvol/a.img\
 ?backup-volfile-servers=server2&backup-volfile-servers=server3

v2:
multiple server addresses each have their own port number, but all use
common transport type
pattern: URI syntax  with query (?) delimiter
syntax:
file=gluster[+transport-type]://[server[:port]]/testvol/a.img\
 [?backup-volfile-servers=server1[:port]\
  &backup-volfile-servers=server2[:port]]

v3:
multiple server addresses each have their own port number and transport type
pattern: changed to json
syntax:
'json:{"driver":"qcow2","file":{"driver":"gluster","volname":"testvol",
   "image-path":"/path/a.qcow2","backup-volfile-servers":
 [{"server":"1.2.3.4","port":"24007","transport":"tcp"},
  {"server":"4.5.6.7","port":"24008","transport":"rdma"}] } }'

v4, v5:
address comments from "Eric Blake" 

v5:
renamed option "backup-volfile-servers" as "volfile-servers"
pattern: json
syntax:
'json:{"driver":"qcow2","file":{"driver":"gluster","volname":"testvol",
   "image-path":"/path/a.qcow2","volfile-servers":
 [{"server":"1.2.3.4","port":"24007","transport":"tcp"},
  {"server":"4.5.6.7","port":"24008","transport":"rdma"}] } }'
---
 block/gluster.c  | 428 +--
 qapi/block-core.json |  60 +++-
 2 files changed, 439 insertions(+), 49 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 1eb3a8c..63c3dcb 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -11,6 +11,15 @@
 #include "block/block_int.h"
 #include "qemu/uri.h"
 
+#define GLUSTER_OPT_FILENAME  "filename"
+#define GLUSTER_OPT_VOLNAME   "volname"
+#define GLUSTER_OPT_IMAGE_PATH"image-path"
+#define GLUSTER_OPT_SERVER"serve

Re: [Qemu-devel] [PATCH 4/5] disk_deadlines: add control of requests time expiration

2015-09-28 Thread Stefan Hajnoczi

On Fri, Sep 25, 2015 at 01:34:22PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefa...@gmail.com) wrote:
> > On Tue, Sep 08, 2015 at 04:48:24PM +0200, Kevin Wolf wrote:
> > > Am 08.09.2015 um 16:23 hat Denis V. Lunev geschrieben:
> > > > On 09/08/2015 04:05 PM, Kevin Wolf wrote:
> > > > >Am 08.09.2015 um 13:27 hat Denis V. Lunev geschrieben:
> > > > >>interesting point. Yes, it flushes all requests and most likely
> > > > >>hangs inside waiting requests to complete. But fortunately
> > > > >>this happens after the switch to paused state thus
> > > > >>the guest becomes paused. That's why I have missed this
> > > > >>fact.
> > > > >>
> > > > >>This (could) be considered as a problem but I have no (good)
> > > > >>solution at the moment. Should think a bit on.
> > > > >Let me suggest a radically different design. Note that I don't say this
> > > > >is necessarily how things should be done, I'm just trying to introduce
> > > > >some new ideas and broaden the discussion, so that we have a larger set
> > > > >of ideas from which we can pick the right solution(s).
> > > > >
> > > > >The core of my idea would be a new filter block driver 'timeout' that
> > > > >can be added on top of each BDS that could potentially fail, like a
> > > > >raw-posix BDS pointing to a file on NFS. This way most pieces of the
> > > > >solution are nicely modularised and don't touch the block layer core.
> > > > >
> > > > >During normal operation the driver would just be passing through
> > > > >requests to the lower layer. When it detects a timeout, however, it
> > > > >completes the request it received with -ETIMEDOUT. It also completes 
> > > > >any
> > > > >new request it receives with -ETIMEDOUT without passing the request on
> > > > >until the request that originally timed out returns. This is our safety
> > > > >measure against anyone seeing whether or how the timed out request
> > > > >modified data.
> > > > >
> > > > >We need to make sure that bdrv_drain() doesn't wait for this request.
> > > > >Possibly we need to introduce a .bdrv_drain callback that replaces the
> > > > >default handling, because bdrv_requests_pending() in the default
> > > > >handling considers bs->file, which would still have the timed out
> > > > >request. We don't want to see this; bdrv_drain_all() should complete
> > > > >even though that request is still pending internally (externally, we
> > > > >returned -ETIMEDOUT, so we can consider it completed). This way the
> > > > >monitor stays responsive and background jobs can go on if they don't 
> > > > >use
> > > > >the failing block device.
> > > > >
> > > > >And then we essentially reuse the rerror/werror mechanism that we
> > > > >already have to stop the VM. The device models would be extended to
> > > > >always stop the VM on -ETIMEDOUT, regardless of the error policy. In
> > > > >this state, the VM would even be migratable if you make sure that the
> > > > >pending request can't modify the image on the destination host any 
> > > > >more.
> > > > >
> > > > >Do you think this could work, or did I miss something important?
> > > > >
> > > > >Kevin
> > > > could I propose even more radical solution then?
> > > > 
> > > > My original approach was based on the fact that
> > > > this could should be maintainable out-of-stream.
> > > > If the patch will be merged - this boundary condition
> > > > could be dropped.
> > > > 
> > > > Why not to invent 'terror' field on BdrvOptions
> > > > and process things in core block layer without
> > > > a filter? RB Tree entry will just not created if
> > > > the policy will be set to 'ignore'.
> > > 
> > > 'terror' might not be the most fortunate name... ;-)
> > > 
> > > The reason why I would prefer a filter driver is so the code and the
> > > associated data structures are cleanly modularised and we can keep the
> > > actual block layer core small and clean. The same is true for some other
> > > functions that I would rather move out of the core into filter drivers
> > > than add new cases (e.g. I/O throttling, backup notifiers, etc.), but
> > > which are a bit harder to actually move because we already have old
> > > interfaces that we can't break (we'll probably do it anyway eventually,
> > > even if it needs a bit more compatibility code).
> > > 
> > > However, it seems that you are mostly touching code that is maintained
> > > by Stefan, and Stefan used to be a bit more open to adding functionality
> > > to the core, so my opinion might not be the last word.
> > 
> > I've been thinking more about the correctness of this feature:
> > 
> > QEMU cannot cancel I/O because there is no Linux userspace API for doing
> > so.  Linux AIO's io_cancel(2) syscall is a nop since file systems don't
> > implement a kiocb_cancel_fn.  Sending a signal to a task blocked in
> > O_DIRECT preadv(2)/pwritev(2) doesn't work either because the task is in
> > uninterruptible sleep.
> 
> There are things that work on some devices, but nothing generic.
> For NBD/iSCSI/(ceph?

Re: [Qemu-devel] [PATCH v5 11/46] qapi: Don't use info as witness of implicit object type

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> A future patch will enable error reporting from the various
> check() methods.  But to report an error on an implicit type,
> we'll need to associate a location with the type (the same
> location as the top-level entity that is causing the creation
> of the implicit type), and once we do that, keying off of
> whether foo.info exists is no longer a viable way to determine
> if foo is an implicit type.

Ensuring error messages are good even for implicit types could be hard.
But pretty much anything's better than error messages without location
information.

> Rename the info member to _info (so that sub-classes can still
> use it, but external code should not), add an is_implicit()
> method to QAPISchemaObjectType, and adjust the visitor to pass
> another parameter about whether the type is implicit.

I have doubts on the rename.

When you create an stable interface for use in other programs,
religiously hiding instance variables behind accessor methods can pay.
But in a purely internal interface like this one, I don't see the point.

If we run into a case where we want to use a QAPISchemaEntity's info, I
want to write .info and be done with it.  If we rename it to _info now,
we'll rename it back then.

So far, we've used the '_' prefix only for instance variables that are
clearly internal.  Mostly for stuff flowing from __init__() to check().

> Signed-off-by: Eric Blake 
> ---
>  scripts/qapi-types.py  |  4 ++--
>  scripts/qapi-visit.py  |  4 ++--
>  scripts/qapi.py| 33 +++--
>  tests/qapi-schema/test-qapi.py |  2 +-
>  4 files changed, 24 insertions(+), 19 deletions(-)
>
> diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
> index b292682..aa25e03 100644
> --- a/scripts/qapi-types.py
> +++ b/scripts/qapi-types.py
> @@ -253,8 +253,8 @@ class QAPISchemaGenTypeVisitor(QAPISchemaVisitor):
>  self.decl += gen_array(name, element_type)
>  self._gen_type_cleanup(name)
>
> -def visit_object_type(self, name, info, base, members, variants):
> -if info:
> +def visit_object_type(self, name, info, base, members, variants, 
> implicit):

This is now right at the PEP8 line length limit, and the number of
parameters is getting unwieldy, too.  Hmm.

> +if not implicit:
>  self._fwdecl += gen_fwd_object_or_array(name)
>  if variants:
>  assert not members  # not implemented
> diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
> index 1f287ba..62a47fa 100644
> --- a/scripts/qapi-visit.py
> +++ b/scripts/qapi-visit.py
> @@ -348,8 +348,8 @@ class QAPISchemaGenVisitVisitor(QAPISchemaVisitor):
>  self.decl += decl
>  self.defn += defn
>
> -def visit_object_type(self, name, info, base, members, variants):
> -if info:
> +def visit_object_type(self, name, info, base, members, variants, 
> implicit):
> +if not implicit:
>  self.decl += gen_visit_decl(name)
>  if variants:
>  assert not members  # not implemented
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index 7275daa..1dc7641 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -786,7 +786,7 @@ class QAPISchemaEntity(object):
>  def __init__(self, name, info):
>  assert isinstance(name, str)
>  self.name = name
> -self.info = info
> +self._info = info
>
>  def c_name(self):
>  return c_name(self.name)
> @@ -814,7 +814,7 @@ class QAPISchemaVisitor(object):
>  def visit_array_type(self, name, info, element_type):
>  pass
>
> -def visit_object_type(self, name, info, base, members, variants):
> +def visit_object_type(self, name, info, base, members, variants, 
> implicit):
>  pass
>
>  def visit_object_type_flat(self, name, info, members, variants):
> @@ -877,7 +877,7 @@ class QAPISchemaBuiltinType(QAPISchemaType):
>  return self._json_type_name
>
>  def visit(self, visitor):
> -visitor.visit_builtin_type(self.name, self.info, self.json_type())
> +visitor.visit_builtin_type(self.name, self._info, self.json_type())
>
>
>  class QAPISchemaEnumType(QAPISchemaType):
> @@ -903,7 +903,7 @@ class QAPISchemaEnumType(QAPISchemaType):
>  return 'string'
>
>  def visit(self, visitor):
> -visitor.visit_enum_type(self.name, self.info,
> +visitor.visit_enum_type(self.name, self._info,
>  self.values, self.prefix)
>
>
> @@ -922,7 +922,7 @@ class QAPISchemaArrayType(QAPISchemaType):
>  return 'array'
>
>  def visit(self, visitor):
> -visitor.visit_array_type(self.name, self.info, self.element_type)
> +visitor.visit_array_type(self.name, self._info, self.element_type)
>
>
>  class QAPISchemaObjectType(QAPISchemaType):
> @@ -961,21 +961,25 @@ class QAPISchemaObjectType(QAPISchemaType):
>  self.variants.che

Re: [Qemu-devel] [PATCH v5 12/46] qapi: Track location that created an implicit type

2015-09-28 Thread Markus Armbruster

Eric Blake  writes:

> A future patch will enable error detection in the various
> QapiSchema check() methods.  But since all errors have to
> have an associated 'info' location, we need a location to
> be associated with all implicit types.  Easiest is to reuse
> the location of the enclosing entity that includes the
> dictionary defining the implicit type.
>
> While at it, we were always passing None as the location of
> array types, making that parameter useless; sharing the
> location (if any) of the underlying element type makes sense.

The parameter is useless only because all array types are implicit.
Once we change that, it won't be anymore.

>
> Signed-off-by: Eric Blake 
> ---
>  scripts/qapi.py | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index 1dc7641..e982970 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -908,8 +908,8 @@ class QAPISchemaEnumType(QAPISchemaType):
>
>
>  class QAPISchemaArrayType(QAPISchemaType):
> -def __init__(self, name, info, element_type):
> -QAPISchemaType.__init__(self, name, info)
> +def __init__(self, name, element_type):
> +QAPISchemaType.__init__(self, name, None)
>  assert isinstance(element_type, str)
>  self._element_type_name = element_type
>  self.element_type = None
> @@ -917,6 +917,7 @@ class QAPISchemaArrayType(QAPISchemaType):
>  def check(self, schema):
>  self.element_type = schema.lookup_type(self._element_type_name)
>  assert self.element_type
> +self._info = self.element_type._info
>
>  def json_type(self):
>  return 'array'

Implicit array type's info is the element type's info.  Okay.

> @@ -928,6 +929,7 @@ class QAPISchemaArrayType(QAPISchemaType):
>  class QAPISchemaObjectType(QAPISchemaType):
>  def __init__(self, name, info, base, local_members, variants):
>  QAPISchemaType.__init__(self, name, info)
> +assert info or name == ':empty'

I think what we really want to assert is "we got info unless this is a
built-in entity", in QAPISchemaEntity.__init__().

Built-in entities are exactly the types defined by
QAPISchema._def_predefineds(), currently the built-in types and
':empty'.

>  assert base is None or isinstance(base, str)
>  for m in local_members:
>  assert isinstance(m, QAPISchemaObjectTypeMember)
> @@ -1165,15 +1167,15 @@ class QAPISchema(object):
>  def _make_array_type(self, element_type):
>  name = element_type + 'List'
>  if not self.lookup_type(name):
> -self._def_entity(QAPISchemaArrayType(name, None, element_type))
> +self._def_entity(QAPISchemaArrayType(name, element_type))
>  return name
>
> -def _make_implicit_object_type(self, name, role, members):
> +def _make_implicit_object_type(self, name, info, role, members):
>  if not members:
>  return None
>  name = ':obj-%s-%s' % (name, role)
>  if not self.lookup_entity(name, QAPISchemaObjectType):
> -self._def_entity(QAPISchemaObjectType(name, None, None,
> +self._def_entity(QAPISchemaObjectType(name, info, None,
>members, None))
>  return name
>
> @@ -1210,11 +1212,11 @@ class QAPISchema(object):
>  def _make_variant(self, case, typ):
>  return QAPISchemaObjectTypeVariant(case, typ)
>
> -def _make_simple_variant(self, case, typ):
> +def _make_simple_variant(self, info, case, typ):
>  if isinstance(typ, list):
>  assert len(typ) == 1
>  typ = self._make_array_type(typ[0])
> -typ = self._make_implicit_object_type(typ, 'wrapper',
> +typ = self._make_implicit_object_type(typ, info, 'wrapper',
>[self._make_member('data', 
> typ)])
>  return QAPISchemaObjectTypeVariant(case, typ)
>
> @@ -1232,7 +1234,7 @@ class QAPISchema(object):
>  variants = [self._make_variant(key, value)
>  for (key, value) in data.iteritems()]
>  else:
> -variants = [self._make_simple_variant(key, value)
> +variants = [self._make_simple_variant(info, key, value)
>  for (key, value) in data.iteritems()]
>  tag_enum = self._make_tag_enum(name, variants)
>  self._def_entity(

A simple union type's implicit wrapper types' info is the simple union
type's info.  Okay.

> @@ -1263,7 +1265,7 @@ class QAPISchema(object):
>  gen = expr.get('gen', True)
>  success_response = expr.get('success-response', True)
>  if isinstance(data, OrderedDict):
> -data = self._make_implicit_object_type(name, 'arg',
> +data = self._make_implicit_object_type(name, info, 'arg',
> self._make_members(data))
>

Re: [Qemu-devel] [PATCH v5 00/46] post-introspection cleanups, and qapi-ify netdev_add

2015-09-28 Thread Markus Armbruster

I think the first ten patches are a about as large a bite as we can chew
and commit quickly.  If you split them off, I can try to get them in
quickly, and in parallel continue reviewing the rest of your series.

Re: [Qemu-devel] [PATCH 12/16] block: Introduce parents list

2015-09-28 Thread Alberto Garcia

On Thu 17 Sep 2015 03:48:16 PM CEST, Kevin Wolf wrote:

> @@ -1090,6 +1090,7 @@ static BdrvChild *bdrv_attach_child(BlockDriverState 
> *parent_bs,
>  };
>  
>  QLIST_INSERT_HEAD(&parent_bs->children, child, next);
> +QLIST_INSERT_HEAD(&child_bs->parents, child, next_parent);
>  
>  return child;
>  }

Ok, I'm probably slow today, but what is this used for? :-? And why is
it called 'parents'? The list doesn't contain pointers to the parents of
child_bs, but to child_bs itself...

I would expect a BdrvChild *parent, with parent->bs = parent_bs.

Berto

Re: [Qemu-devel] [PATCH 0/4] linux-user: CLI cleanup

2015-09-28 Thread Riku Voipio

On Mon, Jul 06, 2015 at 11:03:37AM -0700, mead...@codesourcery.com wrote:
> From: Meador Inge 
> 
> This patch series fixes a few nits in the Linux
> usermode driver to make the general behavior less
> surprising (proper error codes, --foo options, and
> -help) and to make it easier to discover bad command
> line input.

Applied to linux-user, thanks.

> Meador Inge (4):
>   linux-user: Exit 0 when -h is used
>   linux-user: Add -help
>   linux-user: Add proper error messages for bad options
>   linux-user: Treat --foo options the same as -foo
> 
>  linux-user/main.c | 30 --
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> -- 
> 1.8.1.1
> 
>

Re: [Qemu-devel] [PATCH 1/4] linux-user: Exit 0 when -h is used

2015-09-28 Thread Riku Voipio

On Mon, Jul 13, 2015 at 01:08:02PM -0700, Meador Inge wrote:
> On Mon, Jul 06, 2015 at 09:43:20PM +0200, Laurent Vivier wrote:
> 
> > Global comment: you should use EXIT_SUCCESS and EXIT_FAILURE from stdlib.h
 
> On second thought, I was following an existing pattern in 'main.c'. Really
> fixing this would require changing around 30 other locations too.  I think
> if we are going to switch to the EXIT_* macros, then the whole file should
> be changed in one patch and making that fix shouldn't hold this patch up.

I've just done a commit to for wholesale transition. This code is
only ever compiled for Linux, so portability doesn't matter. I think
it makes the code a bit more self-documenting.

Riku

Re: [Qemu-devel] [PATCH 16/16] block: Remove bdrv_swap()

2015-09-28 Thread Alberto Garcia

On Thu 17 Sep 2015 03:48:20 PM CEST, Kevin Wolf wrote:
> bdrv_swap() is unused now. Remove it and all functions that have
> no other users than bdrv_swap(). In particular, this removes the
> .bdrv_rebind callbacks from block drivers.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto

Re: [Qemu-devel] [PATCH 2/2] linux-user: remove MAX_ARG_PAGES limit

2015-09-28 Thread Riku Voipio

On Mon, Sep 14, 2015 at 09:37:10PM +0200, Stefan Bruens wrote:
> On Thursday 03 September 2015 18:27:20 Peter Maydell wrote:
> > On 2 September 2015 at 02:38, Stefan Brüns  
> wrote:
> > > Instead of creating a temporary copy for the whole environment and
> > > the arguments, directly copy everything to the target stack.
> > > 
> > > For this to work, we have to change the order of stack creation and
> > > copying the arguments.
> > > 
> > > Signed-off-by: Stefan Brüns 
> > 
> > Reviewed-by: Peter Maydell 
> > 
> > Thanks for working through the code review process.
> > 
> > thanks
> > -- PMM
> 
> Hi Peter,
> 
> is there anything that needs to be done before picking up these two patches?

Hi,

Both patches applied to linux-user tree, thanks

Riku

Re: [Qemu-devel] [PATCH] fw_cfg: insert string blobs via qemu cmdline

2015-09-28 Thread Laszlo Ersek

On 09/27/15 00:55, Gabriel L. Somlo wrote:
> Allow users to provide custom fw_cfg blobs with ascii string
> payloads specified directly on the qemu command line.
> 
> Suggested-by: Jordan Justen 
> Suggested-by: Laszlo Ersek 
> Signed-off-by: Gabriel Somlo 
> ---
>  docs/specs/fw_cfg.txt |  5 +
>  qemu-options.hx   |  7 ++-
>  vl.c  | 30 --
>  3 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
> index b5f4b5d..4d85701 100644
> --- a/docs/specs/fw_cfg.txt
> +++ b/docs/specs/fw_cfg.txt
> @@ -236,6 +236,11 @@ the following syntax:
>  where  is the fw_cfg item name, and  is the location
>  on the host file system of a file containing the data to be inserted.
>  
> +Small enough items may be provided directly as strings on the command
> +line, using the syntax:
> +
> +-fw_cfg [name=],content=
> +

Please consider spelling out that these blobs will NOT be NUL-terminated
when viewed on the guest. (It kinda follows from all the other fw_cfg
things, but once we leave host-side files for qemu command line strings,
it might become non-obvious to users.)

So something like, "... the content presented to the guest will not be
NUL-terminated, same as if the string was written to a host-side file
first".

Please also stress that such content (and actually name too, but that's
more generic) should be plain ASCII; let's sidestep any tricks a shell's
locale settings could pull on us.

>  NOTE: Users *SHOULD* choose item names beginning with the prefix "opt/"
>  when using the "-fw_cfg" command line option, to avoid conflicting with
>  item names used internally by QEMU. For instance:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 328404c..0b6f5bd 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2724,13 +2724,18 @@ ETEXI
>  
>  DEF("fw_cfg", HAS_ARG, QEMU_OPTION_fwcfg,
>  "-fw_cfg [name=],file=\n"
> -"add named fw_cfg entry from file\n",
> +"add named fw_cfg entry from file\n"
> +"-fw_cfg [name=],content=\n"
> +"add named fw_cfg entry from string\n",

Looks good. I got worried for a second that spelling out -fw_cfg twice
is not consistent with the rest of the documentation, but there are many
other such examples (-smbios, -netdev, -net, -chardev etc).

>  QEMU_ARCH_ALL)
>  STEXI
>  @item -fw_cfg [name=]@var{name},file=@var{file}
>  @findex -fw_cfg
>  Add named fw_cfg entry from file. @var{name} determines the name of
>  the entry in the fw_cfg file directory exposed to the guest.
> +
> +@item -fw_cfg [name=]@var{name},content=@var{str}
> +Add named fw_cfg entry from string.
>  ETEXI
>  
>  DEF("serial", HAS_ARG, QEMU_OPTION_serial, \

Looks good. I checked with -chardev, and indeed @findex stands only
after the first occurrence of the option.

> diff --git a/vl.c b/vl.c
> index e211f6a..7f35f40 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -512,6 +512,10 @@ static QemuOptsList qemu_fw_cfg_opts = {
>  .type = QEMU_OPT_STRING,
>  .help = "Sets the name of the file from which\n"
>  "the fw_cfg blob will be loaded",
> +}, {
> +.name = "content",
> +.type = QEMU_OPT_STRING,
> +.help = "Sets the content of the fw_cfg blob to be inserted",
>  },
>  { /* end of list */ }
>  },
> @@ -2236,7 +2240,7 @@ static int parse_fw_cfg(void *opaque, QemuOpts *opts, 
> Error **errp)
>  {
>  gchar *buf;
>  size_t size;
> -const char *name, *file;
> +const char *name, *file, *content;
>  
>  if (opaque == NULL) {
>  error_report("fw_cfg device not available");
> @@ -2244,8 +2248,17 @@ static int parse_fw_cfg(void *opaque, QemuOpts *opts, 
> Error **errp)
>  }
>  name = qemu_opt_get(opts, "name");
>  file = qemu_opt_get(opts, "file");
> -if (name == NULL || *name == '\0' || file == NULL || *file == '\0') {
> -error_report("invalid argument value");
> +content = qemu_opt_get(opts, "content");
> +
> +#define HAVE_OPT(arg) ((arg) && (*arg))

Not sure if this is usual in QEMU, but if it is, please also #undef the
macro after you are done using it.

Also, I recommend renaming HAVE_OPT() to NONEMPTY_STR().

> +
> +/* we need name and either a file or the content string */
> +if (!(HAVE_OPT(name) && (HAVE_OPT(file) || HAVE_OPT(content {
> +error_report("invalid argument(s)!");

Please drop the exclamation mark.

> +return -1;
> +}
> +if (HAVE_OPT(file) && HAVE_OPT(content)) {
> +error_report("file and content are mutually exclusive!");

Ditto.

>  return -1;
>  }
>  if (strlen(name) > FW_CFG_MAX_FILE_PATH - 1) {
> @@ -2256,9 +2269,14 @@ static int parse_fw_cfg(void *opaque, QemuOpts *opts, 
> Error **errp)
>  error_report("WARNING: externally provided fw_cfg item names "
>   "should be prefixed with \"o

Re: [Qemu-devel] [PATCH] linux-user: fix cmsg conversion in case of multiple headers

2015-09-28 Thread Riku Voipio

Hi,

On Thu, Sep 03, 2015 at 07:27:26AM +0200, Jonathan Neuschäfer wrote:
> Currently, __target_cmsg_nxthdr compares a pointer derived from
> target_cmsg against the msg_control field of target_msgh (through
> subtraction).  This failed for me when emulating i386 code under x86_64,
> because pointers in the host address space and pointers in the guest
> address space were not the same.  This patch passes the initial value of
> target_cmsg into __target_cmsg_nxthdr.
> 
> I found and fixed two more related bugs:
> - __target_cmsg_nxthdr now returns the new cmsg pointer instead of the
>   old one.
> - tgt_space (in host_to_target_cmsg) doesn't count "sizeof (struct
>   target_cmsghdr)" twice anymore.

Applied to linux-user tree, thanks.

> Signed-off-by: Jonathan Neuschäfer 
> --

For next time, use three dashes "---" to separate version-to-version
change descriptions from commit messages. This way git will
automatically leave them out when doing git am.

> Changes since v2:
> - The patch is now clean WRT checkpatch.pl
> Changes since v1:
> - Follow Peter Maydell's advice on how to fix the first bug
> - The "two more related bugs"
> ---
>  linux-user/syscall.c  | 14 +-
>  linux-user/syscall_defs.h | 14 +-
>  2 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index f62c698..12a6cd2 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -1181,7 +1181,7 @@ static inline abi_long target_to_host_cmsg(struct 
> msghdr *msgh,
>  struct cmsghdr *cmsg = CMSG_FIRSTHDR(msgh);
>  abi_long msg_controllen;
>  abi_ulong target_cmsg_addr;
> -struct target_cmsghdr *target_cmsg;
> +struct target_cmsghdr *target_cmsg, *target_cmsg_start;
>  socklen_t space = 0;
>  
>  msg_controllen = tswapal(target_msgh->msg_controllen);
> @@ -1189,6 +1189,7 @@ static inline abi_long target_to_host_cmsg(struct 
> msghdr *msgh,
>  goto the_end;
>  target_cmsg_addr = tswapal(target_msgh->msg_control);
>  target_cmsg = lock_user(VERIFY_READ, target_cmsg_addr, msg_controllen, 
> 1);
> +target_cmsg_start = target_cmsg;
>  if (!target_cmsg)
>  return -TARGET_EFAULT;
>  
> @@ -1247,7 +1248,8 @@ static inline abi_long target_to_host_cmsg(struct 
> msghdr *msgh,
>  }
>  
>  cmsg = CMSG_NXTHDR(msgh, cmsg);
> -target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg);
> +target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg,
> + target_cmsg_start);
>  }
>  unlock_user(target_cmsg, target_cmsg_addr, 0);
>   the_end:
> @@ -1261,7 +1263,7 @@ static inline abi_long host_to_target_cmsg(struct 
> target_msghdr *target_msgh,
>  struct cmsghdr *cmsg = CMSG_FIRSTHDR(msgh);
>  abi_long msg_controllen;
>  abi_ulong target_cmsg_addr;
> -struct target_cmsghdr *target_cmsg;
> +struct target_cmsghdr *target_cmsg, *target_cmsg_start;
>  socklen_t space = 0;
>  
>  msg_controllen = tswapal(target_msgh->msg_controllen);
> @@ -1269,6 +1271,7 @@ static inline abi_long host_to_target_cmsg(struct 
> target_msghdr *target_msgh,
>  goto the_end;
>  target_cmsg_addr = tswapal(target_msgh->msg_control);
>  target_cmsg = lock_user(VERIFY_WRITE, target_cmsg_addr, msg_controllen, 
> 0);
> +target_cmsg_start = target_cmsg;
>  if (!target_cmsg)
>  return -TARGET_EFAULT;
>  
> @@ -1382,14 +1385,15 @@ static inline abi_long host_to_target_cmsg(struct 
> target_msghdr *target_msgh,
>  }
>  
>  target_cmsg->cmsg_len = tswapal(tgt_len);
> -tgt_space = TARGET_CMSG_SPACE(tgt_len);
> +tgt_space = TARGET_CMSG_SPACE(len);
>  if (msg_controllen < tgt_space) {
>  tgt_space = msg_controllen;
>  }
>  msg_controllen -= tgt_space;
>  space += tgt_space;
>  cmsg = CMSG_NXTHDR(msgh, cmsg);
> -target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg);
> +target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg,
> + target_cmsg_start);
>  }
>  unlock_user(target_cmsg, target_cmsg_addr, space);
>   the_end:
> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
> index edd5f3c..9d3c537 100644
> --- a/linux-user/syscall_defs.h
> +++ b/linux-user/syscall_defs.h
> @@ -234,7 +234,8 @@ struct target_cmsghdr {
>  };
>  
>  #define TARGET_CMSG_DATA(cmsg) ((unsigned char *) ((struct target_cmsghdr *) 
> (cmsg) + 1))
> -#define TARGET_CMSG_NXTHDR(mhdr, cmsg) __target_cmsg_nxthdr (mhdr, cmsg)
> +#define TARGET_CMSG_NXTHDR(mhdr, cmsg, cmsg_start) \
> +   __target_cmsg_nxthdr(mhdr, cmsg, cmsg_start)
>  #define TARGET_CMSG_ALIGN(len) (((len) + sizeof (abi_long) - 1) \
> & (size_t) ~(sizeof (abi_long) - 1))
>  #define TARGET_CMSG_SPACE(len) (TARGET_CMSG_ALIGN (len) \
> @@ -242,17 +243,20 @

[Qemu-devel] [RFC PATCH 1/6] kvm: Make KVM_CAP_SIGNAL_MSI globally available

2015-09-28 Thread Pavel Fedin

This capability is useful to determine whether we can use KVM ITS
emulation on ARM

Signed-off-by: Pavel Fedin 
---
 include/sysemu/kvm.h |  9 +
 kvm-all.c| 10 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 2a58b4d..d42c464 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -52,6 +52,7 @@ extern bool kvm_msi_via_irqfd_allowed;
 extern bool kvm_gsi_routing_allowed;
 extern bool kvm_gsi_direct_mapping;
 extern bool kvm_readonly_mem_allowed;
+extern bool kvm_direct_msi_allowed;
 
 #if defined CONFIG_KVM || !defined NEED_CPU_H
 #define kvm_enabled()   (kvm_allowed)
@@ -145,6 +146,13 @@ extern bool kvm_readonly_mem_allowed;
  */
 #define kvm_readonly_mem_enabled() (kvm_readonly_mem_allowed)
 
+/**
+ * kvm_direct_msi_enabled:
+ *
+ * Returns: true if KVM allows direct MSI injection.
+ */
+#define kvm_direct_msi_enabled() (kvm_direct_msi_allowed)
+
 #else
 #define kvm_enabled()   (0)
 #define kvm_irqchip_in_kernel() (false)
@@ -157,6 +165,7 @@ extern bool kvm_readonly_mem_allowed;
 #define kvm_gsi_routing_allowed() (false)
 #define kvm_gsi_direct_mapping() (false)
 #define kvm_readonly_mem_enabled() (false)
+#define kvm_direct_msi_enabled() (false)
 #endif
 
 struct kvm_run;
diff --git a/kvm-all.c b/kvm-all.c
index 0be4615..4931b27 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -93,7 +93,6 @@ struct KVMState
 uint32_t *used_gsi_bitmap;
 unsigned int gsi_count;
 QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
-bool direct_msi;
 #endif
 KVMMemoryListener memory_listener;
 };
@@ -111,6 +110,7 @@ bool kvm_gsi_direct_mapping;
 bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 bool kvm_vm_attributes_allowed;
+bool kvm_direct_msi_allowed;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
 KVM_CAP_INFO(USER_MEMORY),
@@ -979,7 +979,7 @@ void kvm_init_irq_routing(KVMState *s)
 s->irq_routes = g_malloc0(sizeof(*s->irq_routes));
 s->nr_allocated_irq_routes = 0;
 
-if (!s->direct_msi) {
+if (!kvm_direct_msi_allowed) {
 for (i = 0; i < KVM_MSI_HASHTAB_SIZE; i++) {
 QTAILQ_INIT(&s->msi_hashtab[i]);
 }
@@ -1113,7 +1113,7 @@ static int kvm_irqchip_get_virq(KVMState *s)
  * number can succeed even though a new route entry cannot be added.
  * When this happens, flush dynamic MSI entries to free IRQ route entries.
  */
-if (!s->direct_msi && s->irq_routes->nr == s->gsi_count) {
+if (!kvm_direct_msi_allowed && s->irq_routes->nr == s->gsi_count) {
 kvm_flush_dynamic_msi_routes(s);
 }
 
@@ -1150,7 +1150,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 struct kvm_msi msi;
 KVMMSIRoute *route;
 
-if (s->direct_msi) {
+if (kvm_direct_msi_allowed) {
 msi.address_lo = (uint32_t)msg.address;
 msi.address_hi = msg.address >> 32;
 msi.data = le32_to_cpu(msg.data);
@@ -1598,7 +1598,7 @@ static int kvm_init(MachineState *ms)
 #endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
-s->direct_msi = (kvm_check_extension(s, KVM_CAP_SIGNAL_MSI) > 0);
+kvm_direct_msi_allowed = (kvm_check_extension(s, KVM_CAP_SIGNAL_MSI) > 0);
 #endif
 
 s->intx_set_mask = kvm_check_extension(s, KVM_CAP_PCI_2_3);
-- 
2.4.4

[Qemu-devel] [RFC PATCH 3/6] Add vGICv3 ITS definitions

2015-09-28 Thread Pavel Fedin

This temporary patch adds kernel API definitions. Use proper header update
procedure after these features are released.

Signed-off-by: Pavel Fedin 
---
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h | 9 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index c8abf25..07a02da 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -81,6 +81,7 @@ struct kvm_regs {
 /* Supported VGICv3 address types  */
 #define KVM_VGIC_V3_ADDR_TYPE_DIST 2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST   3
+#define KVM_VGIC_V3_ADDR_TYPE_ITS  4
 
 #define KVM_VGIC_V3_DIST_SIZE  SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE(2 * SZ_64K)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 683f713..1294a8f 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -831,7 +831,10 @@ struct kvm_irq_routing_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
-   __u32 pad;
+   union {
+   __u32 pad;
+   __u32 devid;
+   };
 };
 
 struct kvm_irq_routing_s390_adapter {
@@ -970,12 +973,14 @@ struct kvm_one_reg {
__u64 addr;
 };
 
+#define KVM_MSI_VALID_DEVID(1U << 0)
 struct kvm_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
 struct kvm_arm_device_addr {
-- 
2.4.4

[Qemu-devel] [RFC PATCH 0/6] vITS support

2015-09-28 Thread Pavel Fedin

This series introduces support for in-kernel GICv3 ITS emulation.
It is based on kernel API which is not released yet, therefore i post it
as an RFC. However, if respective maintainers think this is appropriate,
infrastructure-related parts of this set (patch N1 and parts of patch N4)
can be applied right now, in order to simplify future addition.

Kernel patches which implement this functionality are:
- [PATCH v2 00/15] KVM: arm64: GICv3 ITS emulation
  http://www.spinics.net/lists/arm-kernel/msg430724.html
- [PATCH v3 0/7] KVM: arm/arm64: gsi routing support
  http://www.spinics.net/lists/kvm/msg119567.html

Pavel Fedin (6):
  kvm: Make KVM_CAP_SIGNAL_MSI globally available
  hw/intc: Implement ITS base class
  Add vGICv3 ITS definitions
  kvm: Implement passing device ID to MSI routing functions
  kvm_arm: Implement support for ITS emulation by KVM
  arm/virt: Add ITS to the virt board

 hw/arm/virt.c  | 47 ++---
 hw/i386/kvm/pci-assign.c   |  9 ++--
 hw/intc/Makefile.objs  |  2 +
 hw/intc/arm_gicv3_its_common.c | 96 ++
 hw/intc/arm_gicv3_its_kvm.c| 88 +++
 hw/vfio/pci.c  | 11 ++--
 hw/virtio/virtio-pci.c |  5 +-
 include/hw/intc/arm_gicv3_its_common.h | 63 ++
 include/sysemu/kvm.h   | 15 +-
 kvm-all.c  | 27 ++
 kvm-stub.c |  5 +-
 linux-headers/asm-arm64/kvm.h  |  1 +
 linux-headers/linux/kvm.h  |  9 +++-
 target-arm/kvm_arm.h   | 10 
 target-arm/machine.c   | 16 ++
 15 files changed, 372 insertions(+), 32 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_its_common.c
 create mode 100644 hw/intc/arm_gicv3_its_kvm.c
 create mode 100644 include/hw/intc/arm_gicv3_its_common.h

-- 
2.4.4

[Qemu-devel] [RFC PATCH 4/6] kvm: Implement passing device ID to MSI routing functions

2015-09-28 Thread Pavel Fedin

Routing add/update functions now take additional PCIDevice pointer. Also,
in order to provide backwards compatibility with older kernels, a new
kvm_msi_flags global variable is provided, and machines, wishing to use
new MSI features, must set appropriates flags in it.

Signed-off-by: Pavel Fedin 
---
 hw/i386/kvm/pci-assign.c |  9 +
 hw/vfio/pci.c| 11 ++-
 hw/virtio/virtio-pci.c   |  5 +++--
 include/sysemu/kvm.h |  6 --
 kvm-all.c| 17 +
 kvm-stub.c   |  5 +++--
 6 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index b1beaa6..132f249 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -980,7 +980,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 MSIMessage msg = msi_get_message(pci_dev, 0);
 int virq;
 
-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
+virq = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
 if (virq < 0) {
 perror("assigned_dev_update_msi: kvm_irqchip_add_msi_route");
 return;
@@ -1018,7 +1018,7 @@ static void assigned_dev_update_msi_msg(PCIDevice 
*pci_dev)
 }
 
 kvm_irqchip_update_msi_route(kvm_state, assigned_dev->msi_virq[0],
- msi_get_message(pci_dev, 0));
+ msi_get_message(pci_dev, 0), pci_dev);
 }
 
 static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
@@ -1084,7 +1084,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 
 msg.address = entry->addr_lo | ((uint64_t)entry->addr_hi << 32);
 msg.data = entry->data;
-r = kvm_irqchip_add_msi_route(kvm_state, msg);
+r = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
 if (r < 0) {
 return r;
 }
@@ -1603,7 +1603,8 @@ static void assigned_dev_msix_mmio_write(void *opaque, 
hwaddr addr,
 msg.data = entry->data;
 
 ret = kvm_irqchip_update_msi_route(kvm_state,
-   adev->msi_virq[i], msg);
+   adev->msi_virq[i], msg,
+   pci_dev);
 if (ret) {
 error_report("Error updating irq routing entry (%d)", ret);
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dcabb6d..8fadbcf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -424,7 +424,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
VFIOMSIVector *vector,
 return;
 }
 
-virq = kvm_irqchip_add_msi_route(kvm_state, *msg);
+virq = kvm_irqchip_add_msi_route(kvm_state, *msg, &vdev->pdev);
 if (virq < 0) {
 event_notifier_cleanup(&vector->kvm_interrupt);
 return;
@@ -449,9 +449,10 @@ static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector)
 event_notifier_cleanup(&vector->kvm_interrupt);
 }
 
-static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg)
+static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
+ PCIDevice *pdev)
 {
-kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg);
+kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev);
 }
 
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
@@ -486,7 +487,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 if (!msg) {
 vfio_remove_kvm_msi_virq(vector);
 } else {
-vfio_update_kvm_msi_virq(vector, *msg);
+vfio_update_kvm_msi_virq(vector, *msg, pdev);
 }
 } else {
 vfio_add_kvm_msi_virq(vdev, vector, msg, true);
@@ -760,7 +761,7 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 }
 
 msg = msi_get_message(&vdev->pdev, i);
-vfio_update_kvm_msi_virq(vector, msg);
+vfio_update_kvm_msi_virq(vector, msg, &vdev->pdev);
 }
 }
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eda8205..8f0a063 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -590,7 +590,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy 
*proxy,
 int ret;
 
 if (irqfd->users == 0) {
-ret = kvm_irqchip_add_msi_route(kvm_state, msg);
+ret = kvm_irqchip_add_msi_route(kvm_state, msg, &proxy->pci_dev);
 if (ret < 0) {
 return ret;
 }
@@ -726,7 +726,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy 
*proxy,
 if (proxy->vector_irqfd) {
 irqfd = &proxy->vector_irqfd[vector];
 if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
-ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg);
+ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg,
+   &proxy

[Qemu-devel] [RFC PATCH 5/6] kvm_arm: Implement support for ITS emulation by KVM

2015-09-28 Thread Pavel Fedin

This patch relies on new kernel API which is not released yet.

Signed-off-by: Pavel Fedin 
---
 hw/intc/Makefile.objs   |  1 +
 hw/intc/arm_gicv3_its_kvm.c | 88 +
 2 files changed, 89 insertions(+)
 create mode 100644 hw/intc/arm_gicv3_its_kvm.c

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 2d6543b..8d5 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -19,6 +19,7 @@ common-obj-$(CONFIG_OPENPIC) += openpic.o
 obj-$(CONFIG_APIC) += apic.o apic_common.o
 obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o
 obj-$(call land,$(CONFIG_ARM_GIC_KVM),$(TARGET_AARCH64)) += arm_gicv3_kvm.o
+obj-$(call land,$(CONFIG_ARM_GIC_KVM),$(TARGET_AARCH64)) += arm_gicv3_its_kvm.o
 obj-$(CONFIG_STELLARIS) += armv7m_nvic.o
 obj-$(CONFIG_EXYNOS4) += exynos4210_gic.o exynos4210_combiner.o
 obj-$(CONFIG_GRLIB) += grlib_irqmp.o
diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
new file mode 100644
index 000..2bbf607
--- /dev/null
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -0,0 +1,88 @@
+/*
+ * KVM-based ITS implementation for a GICv3-based system
+ *
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
+ * Written by Pavel Fedin 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "hw/intc/arm_gicv3_its_common.h"
+#include "sysemu/kvm.h"
+#include "kvm_arm.h"
+
+#define TYPE_KVM_ARM_ITS "arm-its-kvm"
+#define KVM_ARM_ITS(obj) OBJECT_CHECK(GICv3ITSState, (obj), TYPE_KVM_ARM_ITS)
+
+static int kvm_its_send_msi(GICv3ITSState *s, uint32_t value, uint16_t devid)
+{
+struct kvm_msi msi;
+
+msi.address_lo = 0x0040;
+msi.address_hi = 0;
+msi.data = value;
+msi.flags = KVM_MSI_VALID_DEVID;
+msi.devid = devid;
+memset(msi.pad, 0, sizeof(msi.pad));
+
+return kvm_vm_ioctl(kvm_state, KVM_SIGNAL_MSI, &msi);
+}
+
+static void kvm_arm_its_realize(DeviceState *dev, Error **errp)
+{
+GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
+
+gicv3_its_init_mmio(s, NULL);
+kvm_arm_register_device(&s->iomem_its_cntrl, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
+KVM_VGIC_V3_ADDR_TYPE_ITS, s->gicv3->dev_fd);
+
+kvm_msi_flags = KVM_MSI_VALID_DEVID;
+kvm_gsi_routing_allowed = kvm_has_gsi_routing();
+kvm_msi_via_irqfd_allowed = kvm_gsi_routing_allowed;
+}
+
+static void kvm_arm_its_init(Object *obj)
+{
+GICv3ITSState *s = KVM_ARM_ITS(obj);
+
+object_property_add_link(obj, "parent-gicv3",
+ "kvm-arm-gicv3", (Object **)&s->gicv3,
+ object_property_allow_set_link,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE,
+ &error_abort);
+}
+
+static void kvm_arm_its_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+GICv3ITSCommonClass *icc = ARM_GICV3_ITS_COMMON_CLASS(klass);
+
+dc->realize = kvm_arm_its_realize;
+icc->send_msi = kvm_its_send_msi;
+}
+
+static const TypeInfo kvm_arm_its_info = {
+.name = TYPE_KVM_ARM_ITS,
+.parent = TYPE_ARM_GICV3_ITS_COMMON,
+.instance_size = sizeof(GICv3ITSState),
+.instance_init = kvm_arm_its_init,
+.class_init = kvm_arm_its_class_init,
+};
+
+static void kvm_arm_its_register_types(void)
+{
+type_register_static(&kvm_arm_its_info);
+}
+
+type_init(kvm_arm_its_register_types)
-- 
2.4.4

[Qemu-devel] [RFC PATCH 6/6] arm/virt: Add ITS to the virt board

2015-09-28 Thread Pavel Fedin

If supported by the configuration, ITS will be added automatically.

This patch also renames v2m_phandle to msi_phandle because it's now used
by both MSI implementations.

Signed-off-by: Pavel Fedin 
---
 hw/arm/virt.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d25d6cf..be5879e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -70,7 +70,7 @@ typedef struct VirtBoardInfo {
 int fdt_size;
 uint32_t clock_phandle;
 uint32_t gic_phandle;
-uint32_t v2m_phandle;
+uint32_t msi_phandle;
 } VirtBoardInfo;
 
 typedef struct {
@@ -351,9 +351,22 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 }
 }
 
+static void fdt_add_its_gic_node(VirtBoardInfo *vbi)
+{
+vbi->msi_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
+qemu_fdt_add_subnode(vbi->fdt, "/intc/its");
+qemu_fdt_setprop_string(vbi->fdt, "/intc/its", "compatible",
+"arm,gic-v3-its");
+qemu_fdt_setprop(vbi->fdt, "/intc/its", "msi-controller", NULL, 0);
+qemu_fdt_setprop_sized_cells(vbi->fdt, "/intc/its", "reg",
+ 2, vbi->memmap[VIRT_GIC_ITS].base,
+ 2, vbi->memmap[VIRT_GIC_ITS].size);
+qemu_fdt_setprop_cell(vbi->fdt, "/intc/its", "phandle", vbi->msi_phandle);
+}
+
 static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi)
 {
-vbi->v2m_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
+vbi->msi_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
 qemu_fdt_add_subnode(vbi->fdt, "/intc/v2m");
 qemu_fdt_setprop_string(vbi->fdt, "/intc/v2m", "compatible",
 "arm,gic-v2m-frame");
@@ -361,7 +374,7 @@ static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi)
 qemu_fdt_setprop_sized_cells(vbi->fdt, "/intc/v2m", "reg",
  2, vbi->memmap[VIRT_GIC_V2M].base,
  2, vbi->memmap[VIRT_GIC_V2M].size);
-qemu_fdt_setprop_cell(vbi->fdt, "/intc/v2m", "phandle", vbi->v2m_phandle);
+qemu_fdt_setprop_cell(vbi->fdt, "/intc/v2m", "phandle", vbi->msi_phandle);
 }
 
 static void fdt_add_gic_node(VirtBoardInfo *vbi, int type)
@@ -397,6 +410,26 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi, int type)
 qemu_fdt_setprop_cell(vbi->fdt, "/intc", "phandle", vbi->gic_phandle);
 }
 
+static void create_its(VirtBoardInfo *vbi, DeviceState *gicdev)
+{
+const char *itsclass = its_class_name();
+DeviceState *dev;
+
+if (!itsclass) {
+/* Do nothing if not supported */
+return;
+}
+
+dev = qdev_create(NULL, itsclass);
+
+object_property_set_link(OBJECT(dev), OBJECT(gicdev), "parent-gicv3",
+ &error_abort);
+qdev_init_nofail(dev);
+sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vbi->memmap[VIRT_GIC_ITS].base);
+
+fdt_add_its_gic_node(vbi);
+}
+
 static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic)
 {
 int i;
@@ -480,7 +513,9 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic, 
int type, bool secure)
 
 fdt_add_gic_node(vbi, type);
 
-if (type == 2) {
+if (type == 3) {
+create_its(vbi, gicdev);
+} else {
 create_v2m(vbi, pic);
 }
 }
@@ -799,9 +834,9 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq 
*pic,
 qemu_fdt_setprop_cells(vbi->fdt, nodename, "bus-range", 0,
nr_pcie_buses - 1);
 
-if (vbi->v2m_phandle) {
+if (vbi->msi_phandle) {
 qemu_fdt_setprop_cells(vbi->fdt, nodename, "msi-parent",
-   vbi->v2m_phandle);
+   vbi->msi_phandle);
 }
 
 qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
-- 
2.4.4

[Qemu-devel] [RFC PATCH 2/6] hw/intc: Implement ITS base class

2015-09-28 Thread Pavel Fedin

This is the basic skeleton for both KVM and software-emulated ITS.
Currently it is mostly a placeholder, however in future it is going to
contain device state necessary for live migration

Signed-off-by: Pavel Fedin 
---
 hw/intc/Makefile.objs  |  1 +
 hw/intc/arm_gicv3_its_common.c | 96 ++
 include/hw/intc/arm_gicv3_its_common.h | 63 ++
 target-arm/kvm_arm.h   | 10 
 target-arm/machine.c   | 16 ++
 5 files changed, 186 insertions(+)
 create mode 100644 hw/intc/arm_gicv3_its_common.c
 create mode 100644 include/hw/intc/arm_gicv3_its_common.h

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 004b0c2..2d6543b 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -13,6 +13,7 @@ common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_common.o
+common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_its_common.o
 common-obj-$(CONFIG_OPENPIC) += openpic.o
 
 obj-$(CONFIG_APIC) += apic.o apic_common.o
diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
new file mode 100644
index 000..f62728e
--- /dev/null
+++ b/hw/intc/arm_gicv3_its_common.c
@@ -0,0 +1,96 @@
+/*
+ * ITS base class for a GICv3-based system
+ *
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
+ * Written by Pavel Fedin
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "hw/pci/msi.h"
+#include "hw/intc/arm_gicv3_its_common.h"
+
+static uint64_t gicv3_its_trans_read(void *opaque, hwaddr offset, unsigned 
size)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "ITS read at offset 0x%jX\n", offset);
+return ~0ULL;
+}
+
+static MemTxResult gicv3_its_trans_write(void *opaque, hwaddr offset,
+ uint64_t value, unsigned size,
+ MemTxAttrs attrs)
+{
+if (offset == 0x0040) {
+GICv3ITSState *s = ARM_GICV3_ITS_COMMON(opaque);
+GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);
+int ret = c->send_msi(s, le32_to_cpu(value), attrs.stream_id);
+
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ITS: Error sending MSI: %s\n", strerror(-ret));
+return MEMTX_DECODE_ERROR;
+}
+
+return MEMTX_OK;
+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ITS write at bad offset 0x%jX\n", offset);
+return MEMTX_DECODE_ERROR;
+}
+}
+
+static const MemoryRegionOps gicv3_its_trans_ops = {
+.read = gicv3_its_trans_read,
+.write_with_attrs = gicv3_its_trans_write,
+.impl = {
+ .min_access_size = 4,
+ .max_access_size = 4,
+ },
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+void gicv3_its_init_mmio(GICv3ITSState *s, const MemoryRegionOps *ops)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(s);
+
+memory_region_init_io(&s->iomem_its_cntrl, OBJECT(s), ops, s,
+  "control", ITS_CONTROL_SIZE);
+memory_region_init_io(&s->iomem_its, OBJECT(s), &gicv3_its_trans_ops, s,
+  "translation", ITS_TRANS_SIZE);
+
+/* Our two regions are always adjacent, therefore we now combine them
+ * into a single one in order to make our users' life easier.
+ */
+memory_region_init(&s->iomem_main, OBJECT(s), "gicv3_its", ITS_SIZE);
+memory_region_add_subregion(&s->iomem_main, 0, &s->iomem_its_cntrl);
+memory_region_add_subregion(&s->iomem_main, ITS_CONTROL_SIZE,
+&s->iomem_its);
+sysbus_init_mmio(sbd, &s->iomem_main);
+
+msi_supported = true;
+}
+
+static const TypeInfo gicv3_its_common_info = {
+.name = TYPE_ARM_GICV3_ITS_COMMON,
+.parent = TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(GICv3ITSState),
+.class_size = sizeof(GICv3ITSCommonClass),
+};
+
+static void gicv3_its_common_register_types(void)
+{
+type_register_static(&gicv3_its_common_info);
+}
+
+type_init(gicv3_its_common_register_types)
diff --git a/include/hw/intc/arm_gicv3_its_common.h 
b/include/hw/intc/arm_gicv3_its_common.h
new file mode 100644
index 000..5538bc6
--- /dev/null
+++ b/include/hw/intc/arm_gicv3_its_common.h
@@ -0,0 +1,63 @@
+/*
+ * ITS support for A

Re: [Qemu-devel] [PATCH v2] linux-user: add name_to_handle_at/open_by_handle_at

2015-09-28 Thread Riku Voipio

On Tue, Sep 01, 2015 at 10:27:33PM +0200, Laurent Vivier wrote:
> This patch allows to run example given by open_by_handle_at(2):
> 
>   The following shell session demonstrates the use of these two programs:
> 
>$ echo 'Can you please think about it?' > cecilia.txt
>$ ./t_name_to_handle_at cecilia.txt > fh
>$ ./t_open_by_handle_at < fh
>open_by_handle_at: Operation not permitted
>$ sudo ./t_open_by_handle_at < fh  # Need CAP_SYS_ADMIN
>Read 31 bytes
>$ rm cecilia.txt
> 
>Now  we delete and (quickly) re-create the file so that it has the same
>content and (by chance) the  same  inode.[...]
> 
>$ stat --printf="%i\n" cecilia.txt # Display inode number
>4072121
>$ rm cecilia.txt
>$ echo 'Can you please think about it?' > cecilia.txt
>$ stat --printf="%i\n" cecilia.txt # Check inode number
>4072121
>$ sudo ./t_open_by_handle_at < fh
>open_by_handle_at: Stale NFS file handle
> 
> See the man page for source code.

Applied to linux-user,
Thanks

> Signed-off-by: Laurent Vivier 
> ---
> v2: Rename function parameters, swap fh->handle_type in both functions,
> When I have re-tested the patch I have found a bug in the size of
> the data, so I fix this too (handle_bytes is the size of f_handle[]
> only, not of the whole structure)
> 
>  linux-user/syscall.c | 98 
> 
>  1 file changed, 98 insertions(+)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index f62c698..8a17351 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -5246,6 +5246,94 @@ static int do_futex(target_ulong uaddr, int op, int 
> val, target_ulong timeout,
>  return -TARGET_ENOSYS;
>  }
>  }
> +#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
> +static abi_long do_name_to_handle_at(abi_long dirfd, abi_long pathname,
> + abi_long handle, abi_long mount_id,
> + abi_long flags)
> +{
> +struct file_handle *target_fh;
> +struct file_handle *fh;
> +int mid = 0;
> +abi_long ret;
> +char *name;
> +unsigned int size, total_size;
> +
> +if (get_user_s32(size, handle)) {
> +return -TARGET_EFAULT;
> +}
> +
> +name = lock_user_string(pathname);
> +if (!name) {
> +return -TARGET_EFAULT;
> +}
> +
> +total_size = sizeof(struct file_handle) + size;
> +target_fh = lock_user(VERIFY_WRITE, handle, total_size, 0);
> +if (!target_fh) {
> +unlock_user(name, pathname, 0);
> +return -TARGET_EFAULT;
> +}
> +
> +fh = g_malloc0(total_size);
> +fh->handle_bytes = size;
> +
> +ret = get_errno(name_to_handle_at(dirfd, path(name), fh, &mid, flags));
> +unlock_user(name, pathname, 0);
> +
> +/* man name_to_handle_at(2):
> + * Other than the use of the handle_bytes field, the caller should treat
> + * the file_handle structure as an opaque data type
> + */
> +
> +memcpy(target_fh, fh, total_size);
> +target_fh->handle_bytes = tswap32(fh->handle_bytes);
> +target_fh->handle_type = tswap32(fh->handle_type);
> +g_free(fh);
> +unlock_user(target_fh, handle, total_size);
> +
> +if (put_user_s32(mid, mount_id)) {
> +return -TARGET_EFAULT;
> +}
> +
> +return ret;
> +
> +}
> +#endif
> +
> +#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
> +static abi_long do_open_by_handle_at(abi_long mount_fd, abi_long handle,
> + abi_long flags)
> +{
> +struct file_handle *target_fh;
> +struct file_handle *fh;
> +unsigned int size, total_size;
> +abi_long ret;
> +
> +if (get_user_s32(size, handle)) {
> +return -TARGET_EFAULT;
> +}
> +
> +total_size = sizeof(struct file_handle) + size;
> +target_fh = lock_user(VERIFY_READ, handle, total_size, 1);
> +if (!target_fh) {
> +return -TARGET_EFAULT;
> +}
> +
> +fh = g_malloc0(total_size);
> +memcpy(fh, target_fh, total_size);
> +fh->handle_bytes = size;
> +fh->handle_type = tswap32(target_fh->handle_type);
> +
> +ret = get_errno(open_by_handle_at(mount_fd, fh,
> +target_to_host_bitmask(flags, fcntl_flags_tbl)));
> +
> +g_free(fh);
> +
> +unlock_user(target_fh, handle, total_size);
> +
> +return ret;
> +}
> +#endif
>  
>  /* Map host to target signal numbers for the wait family of syscalls.
> Assume all other status bits are the same.  */
> @@ -5655,6 +5743,16 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
> arg1,
>arg4));
>  unlock_user(p, arg2, 0);
>  break;
> +#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
> +case TARGET_NR_

1 2 3 >

1 - 100 of 248 matches

Mail list logo