date:20170629

Re: [Qemu-devel] [PATCH v1] s390x/cpumodel: allow to enable "idtes" feature for TCG

2017-06-29 Thread Thomas Huth

On 28.06.2017 19:02, David Hildenbrand wrote:
> On 28.06.2017 16:21, Thomas Huth wrote:
>> On 27.06.2017 18:10, David Hildenbrand wrote:
>>> STFL bit 4 and 5 are just indications to the guest, which TLB entries an
>>> IDTE call will clear. These are performance indicators for the guest.
>>>
>>> STFL bit 4:
>>> INVALIDATE DAT TABLE ENTRY (IDTE) performs
>>> the invalidation-and-clearing operation by
>>> selectively clearing TLB segment-table entries
>>> when a segment-table entry or entries are
>>> invalidated. IDTE also performs the clearing-by-
>>> ASCE operation. Unless bit 4 is one, IDTE simply
>>> purges all TLBs. Bit 3 is one if bit 4 is one.
>>>
>>> We can simply set STFL bit 4 ("idtes") and still purge the complete TLB.
>>> Purging more than advertised is never bad. E.g. Linux doesn't even care
>>> about this bit. We can optimized this later.
>>
>> Not sure, but why do we need this bit in add_qemu_cpu_model_features()
>> if Linux does not care about it? We will get it automatically once we
>> support the z9 in TCG...
> 
> The idea is to use this as a list we support in addition to the z900
> features. This is later helpful when actually switching to a new model
> (z9 might still take some time). Nobody has to do go over all features
> again and see if they are implemented.

OK, I agree, that makes sense.

However, I'm not sure whether you can simply ignore the clearing-by-ASCE
stuff in this case. For example, according to the PoP:

"When the clearing-by-ASCE-option bit (bit 52 of gen-
 eral register R2 is one), the M4 field is ignored."

And the idte helper function currently always takes the M4 field into
account...

 Thomas

[Qemu-devel] [PATCH 0/2] Pending MTTCG patches

2017-06-29 Thread Pranith Kumar

Hello,

Please find these two pending MTTCG fixes I have in my repo.

I've reworked the async_safe_* patch according to pbonzini's suggestion.

Thanks,

Pranith Kumar (2):
  Revert "exec.c: Fix breakpoint invalidation race"
  mttcg/i386: Patch instruction using async_safe_* framework

 exec.c | 25 ++-
 hw/i386/kvmvapic.c | 73 +++---
 2 files changed, 61 insertions(+), 37 deletions(-)

-- 
2.13.0

[Qemu-devel] [PATCH v3 1/2] Revert "exec.c: Fix breakpoint invalidation race"

2017-06-29 Thread Pranith Kumar

Now that we have proper locking after MTTCG patches have landed, we
can revert the commit.  This reverts commit

a9353fe897ca2687e5b3385ed39e3db3927a90e0.

CC: Peter Maydell 
CC: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 exec.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 42ad1eaedd..c8403baa46 100644
--- a/exec.c
+++ b/exec.c
@@ -770,15 +770,28 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
 #endif
 }
 
+#if defined(CONFIG_USER_ONLY)
 static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
 {
-/* Flush the whole TB as this will not have race conditions
- * even if we don't have proper locking yet.
- * Ideally we would just invalidate the TBs for the
- * specified PC.
- */
-tb_flush(cpu);
+mmap_lock();
+tb_lock();
+tb_invalidate_phys_page_range(pc, pc + 1, 0);
+tb_unlock();
+mmap_unlock();
 }
+#else
+static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
+{
+MemTxAttrs attrs;
+hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
+int asidx = cpu_asidx_from_attrs(cpu, attrs);
+if (phys != -1) {
+/* Locks grabbed by tb_invalidate_phys_addr */
+tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
+phys | (pc & ~TARGET_PAGE_MASK));
+}
+}
+#endif
 
 #if defined(CONFIG_USER_ONLY)
 void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
-- 
2.13.0

[Qemu-devel] [PATCH v3 2/2] mttcg/i386: Patch instruction using async_safe_* framework

2017-06-29 Thread Pranith Kumar

In mttcg, calling pause_all_vcpus() during execution from the
generated TBs causes a deadlock if some vCPU is waiting for exclusive
execution in start_exclusive(). Fix this by using the aync_safe_*
framework instead of pausing vcpus for patching instructions.

CC: Paolo Bonzini 
CC: Peter Maydell 
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 hw/i386/kvmvapic.c | 73 +++---
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 82a49556af..5e0c8219b0 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -383,8 +383,7 @@ static void patch_byte(X86CPU *cpu, target_ulong addr, 
uint8_t byte)
 cpu_memory_rw_debug(CPU(cpu), addr, &byte, 1, 1);
 }
 
-static void patch_call(VAPICROMState *s, X86CPU *cpu, target_ulong ip,
-   uint32_t target)
+static void patch_call(X86CPU *cpu, target_ulong ip, uint32_t target)
 {
 uint32_t offset;
 
@@ -393,23 +392,24 @@ static void patch_call(VAPICROMState *s, X86CPU *cpu, 
target_ulong ip,
 cpu_memory_rw_debug(CPU(cpu), ip + 1, (void *)&offset, sizeof(offset), 1);
 }
 
-static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
+struct PatchInfo {
+VAPICHandlers *handler;
+target_ulong ip;
+};
+
+static void do_patch_instruction(CPUState *cs, run_on_cpu_data data)
 {
-CPUState *cs = CPU(cpu);
-CPUX86State *env = &cpu->env;
-VAPICHandlers *handlers;
+X86CPU *x86_cpu = X86_CPU(cs);
+CPUX86State *env = &x86_cpu->env;
+struct PatchInfo *info = (struct PatchInfo *) data.host_ptr;
+VAPICHandlers *handlers = info->handler;
+target_ulong ip = info->ip;
 uint8_t opcode[2];
 uint32_t imm32 = 0;
 target_ulong current_pc = 0;
 target_ulong current_cs_base = 0;
 uint32_t current_flags = 0;
 
-if (smp_cpus == 1) {
-handlers = &s->rom_state.up;
-} else {
-handlers = &s->rom_state.mp;
-}
-
 if (!kvm_enabled()) {
 cpu_get_tb_cpu_state(env, ¤t_pc, ¤t_cs_base,
  ¤t_flags);
@@ -421,48 +421,59 @@ static void patch_instruction(VAPICROMState *s, X86CPU 
*cpu, target_ulong ip)
 }
 }
 
-pause_all_vcpus();
-
 cpu_memory_rw_debug(cs, ip, opcode, sizeof(opcode), 0);
 
 switch (opcode[0]) {
 case 0x89: /* mov r32 to r/m32 */
-patch_byte(cpu, ip, 0x50 + modrm_reg(opcode[1]));  /* push reg */
-patch_call(s, cpu, ip + 1, handlers->set_tpr);
+patch_byte(x86_cpu, ip, 0x50 + modrm_reg(opcode[1]));  /* push reg */
+patch_call(x86_cpu, ip + 1, handlers->set_tpr);
 break;
 case 0x8b: /* mov r/m32 to r32 */
-patch_byte(cpu, ip, 0x90);
-patch_call(s, cpu, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]);
+patch_byte(x86_cpu, ip, 0x90);
+patch_call(x86_cpu, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]);
 break;
 case 0xa1: /* mov abs to eax */
-patch_call(s, cpu, ip, handlers->get_tpr[0]);
+patch_call(x86_cpu, ip, handlers->get_tpr[0]);
 break;
 case 0xa3: /* mov eax to abs */
-patch_call(s, cpu, ip, handlers->set_tpr_eax);
+patch_call(x86_cpu, ip, handlers->set_tpr_eax);
 break;
 case 0xc7: /* mov imm32, r/m32 (c7/0) */
-patch_byte(cpu, ip, 0x68);  /* push imm32 */
+patch_byte(x86_cpu, ip, 0x68);  /* push imm32 */
 cpu_memory_rw_debug(cs, ip + 6, (void *)&imm32, sizeof(imm32), 0);
 cpu_memory_rw_debug(cs, ip + 1, (void *)&imm32, sizeof(imm32), 1);
-patch_call(s, cpu, ip + 5, handlers->set_tpr);
+patch_call(x86_cpu, ip + 5, handlers->set_tpr);
 break;
 case 0xff: /* push r/m32 */
-patch_byte(cpu, ip, 0x50); /* push eax */
-patch_call(s, cpu, ip + 1, handlers->get_tpr_stack);
+patch_byte(x86_cpu, ip, 0x50); /* push eax */
+patch_call(x86_cpu, ip + 1, handlers->get_tpr_stack);
 break;
 default:
 abort();
 }
 
-resume_all_vcpus();
+g_free(info);
+}
 
-if (!kvm_enabled()) {
-/* Both tb_lock and iothread_mutex will be reset when
- *  longjmps back into the cpu_exec loop. */
-tb_lock();
-tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
-cpu_loop_exit_noexc(cs);
+static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
+{
+CPUState *cs = CPU(cpu);
+VAPICHandlers *handlers;
+struct PatchInfo *info;
+const run_on_cpu_func fn = do_patch_instruction;
+
+if (smp_cpus == 1) {
+handlers = &s->rom_state.up;
+} else {
+handlers = &s->rom_state.mp;
 }
+
+info  = g_new(struct PatchInfo, 1);
+info->handler = handlers;
+info->ip = ip;
+
+async_safe_run_on_cpu(cs, fn, RUN_ON_CPU_HOST_PTR(info));
+cpu_exit(cs);
 }
 
 void vapic_report_tpr_access(DeviceState *dev, CPUState *cs

Re: [Qemu-devel] [Qemu-block] [RFC] QMP design: Fixing query-block and friends

2017-06-29 Thread Markus Armbruster

John Snow  writes:

> On 06/28/2017 03:15 AM, Markus Armbruster wrote:
>> John Snow  writes:
>> 
>>> On 06/27/2017 12:31 PM, Kevin Wolf wrote:
 Hi,

 I haven't really liked query-block for a long time, but now that
 blockdev-add and -blockdev have settled, it might finally be the time to
 actually do something about it. In fact, if used together with these
 modern interfaces, our query commands are simply broken, so we have to
 fix something.

>>>
>>> [...words...]
>>>

 So how do we go forward from here?

 I guess we could add a few hacks o fix the really urgent things, and
 just adding more information is always possible (at the cost of even
 more duplication).

>>>
>>> I think you've included this suggestion so that you can summarily
>>> dismiss it as foolish.
>>>
 However, it appears to me that I dislike so many thing about our current
 query commands that I'm tempted to say: Throw it all away and start
 over.

>>>
>>> Inclined to agree. The structure of the block layer has changed so much
>>> in the past few years and this is easily seen by the gap you've outlined
>>> here.
>>>
>>> We have to keep the old query commands around for a while as Eric says,
>>> but I worry that they are so wrong and misleading as to be actively harmful.
>>>
>>> Maybe there's some hair trigger somewhere that if $NEW_FEATURE_X is used
>>> to configure QEMU in some way, that the old commands can be deprecated
>>> at runtime, such that we can more aggressively force their retirement.
>> 
>> We warn on use of deprecated command line and HMP features.  I think we
>> want the same for QMP, within QMP.
>> 
>> [...]
>> 
>
> I was thinking of something even stronger than a warning in this case.
> Warn if you use it anyway, but if you use $SOME_2.10_FEATURE, it
> actually disables it.
>
> "Hi, I know that you have seen the 2.10 API. I'm removing access to this
> feature, because you REALLY ought not use it."
>
> Could be as simple as actually disabling the old query command if the
> new query command is utilized.

Such spontaneous API change is bad magic, I'm afraid.  It also sabotages
the value of query-qmp-schema.

What we could do is enrich query-qmp-schema with deprecation
information, and let clients request a compatibility level.  Requesting
a level we no longer provide fails.  Default is the oldest level we
provide.  Clients relying on a stable interface would be well adviced to
request the one they need.

Re: [Qemu-devel] [PATCH v6 3/4] net/net: Convert parse_host_port() to Error

2017-06-29 Thread Markus Armbruster

"Daniel P. Berrange"  writes:

> On Wed, Jun 28, 2017 at 09:24:58AM -0500, Eric Blake wrote:
>> On 06/28/2017 08:23 AM, Daniel P. Berrange wrote:
>> > On Wed, Jun 28, 2017 at 09:08:49PM +0800, Mao Zhongyi wrote:
>> >> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
>> >> index 5c326db..78e2b30 100644
>> >> --- a/include/qemu/sockets.h
>> >> +++ b/include/qemu/sockets.h
>> > 
>> >>  if (qemu_isdigit(buf[0])) {
>> >> -if (!inet_aton(buf, &saddr->sin_addr))
>> >> +if (!inet_aton(buf, &saddr->sin_addr)) {
>> >> +error_setg(errp, "host address '%s' is not a valid "
>> >> +   "IPv4 address", buf);
>> >>  return -1;
>> >> +}
>> >>  } else {
>> >> -if ((he = gethostbyname(buf)) == NULL)
>> >> +he = gethostbyname(buf);
>> >> +if (he == NULL) {
>> >> +error_setg(errp, "can't resolve host address '%s': "
>> >> +   "unknown host", buf);
>> >>  return - 1;
>> >> +}
>> > 
>> > gethostbyname sets  'h_errno' on failure, so you should pass that
>> > into error_setg_errno, instead of hardcoding 'unknown host' as a
>> > message

'unknown host' is misleading when h_errno != HOST_NOT_FOUND.

>> 'man gethostbyname' says it is deprecated, and that applications should
>> use getaddrinfo/getnameinfo instead.  What's our story here?
>
> The real story is to get net/socket.c converted to QIOChannelSocket
> and kill this parse_host_port() method in sockets.c It is already
> broken by design since it takes a 'struct sockdddr_in' and thus
> can't do IPv6.
>
> This patch doesn't make the existing situation worse, so I think
> its fine to add this error reporting cleanup now, and not force
> immediate conversion to QIOChannelSocket today.  The net/sockets.c
> code needs a further refactor before that conversion can be done
> in the right way - we've already reverted the wrong way twice ;-)

Until then, let's go with a generic error message, as I requested in my
review of v5.  Just drop the misleading ": unknown host" part.

Re: [Qemu-devel] [PATCH v6 4/4] net/socket: Improve -net socket error reporting

2017-06-29 Thread Markus Armbruster

Mao Zhongyi  writes:

> Hi, Daniel
>
> On 06/28/2017 09:27 PM, Daniel P. Berrange wrote:
>> On Wed, Jun 28, 2017 at 09:08:50PM +0800, Mao Zhongyi wrote:
>>> When -net socket fails, it first reports a specific error, then
>>> a generic one, like this:
>>>
>>> $ qemu-system-x86_64 -net socket,
>>> qemu-system-x86_64: -net socket: exactly one of fd=, listen=, connect=, 
>>> mcast= or udp= is required
>>> qemu-system-x86_64: -net socket: Device 'socket' could not be 
>>> initialized
>>
>> This second line of error message comes from net/net.c in the
>> net_client_init1 method:
>>
>> /* FIXME drop when all init functions store an Error */
>> if (errp && !*errp) {
>> error_setg(errp, QERR_DEVICE_INIT_FAILED,
>>NetClientDriver_lookup[netdev->type]);
>> }
>>
>>
>> hopefully your patch could drop this code too ?
>>
>> In fact this is the only use of QERR_DEVICE_INIT_FAILED in the
>> whole tree, so the QERR constant could possibly be killed too.
>>
>
> OK, I will. :)

You can do that only when *all* init functions stor an Error!  We're not
there, yet:

$ grep 'FIXME error_setg' net/*
net/l2tpv3.c:/* FIXME error_setg(errp, ...) on failure */
net/slirp.c:/* FIXME error_setg(errp, ...) on failure */
net/socket.c:/* FIXME error_setg(errp, ...) on failure */
net/tap-win32.c:/* FIXME error_setg(errp, ...) on failure */
net/vde.c:/* FIXME error_setg(errp, ...) on failure */

Re: [Qemu-devel] [RFC v1 2/3] util/qemu-error: Add a warning_report() function

2017-06-29 Thread Markus Armbruster

"Daniel P. Berrange"  writes:

> On Wed, Jun 28, 2017 at 09:16:45AM -0700, Alistair Francis wrote:
>> On Wed, Jun 28, 2017 at 2:04 AM, Daniel P. Berrange  
>> wrote:
>> > On Tue, Jun 27, 2017 at 01:45:45PM -0700, Alistair Francis wrote:
>> >> Add a functino which can be used similarly to error_report() execpt to
>> >> inform the users about warnings instead of errors.
>> >>
>> >> The warning print does not include the timestamp and instead will
>> >> preface the messages with a 'warning: '.
>> >
>> > Not including the timestamp is a bug IMHO. If I've turned on timestamps,
>> > I expect all messages to have the timestamp.
>> 
>> That's fine, I'm happy to add it back in. I just wasn't sure.
>> 
>> >
>> > I'm not particularly convinced by adding the 'warning: ' prefix either,
>> > particularly given the scenario you are using this in, is not actually
>> > a warning - its just a informative message.
>> 
>> Maybe it makes more sense to add an extra argument to error_report()
>> that can be used to specify error, warning or information. The same
>> way qemu_log_mask() works. That was Edgar's idea in reply to one of
>> the other patches.
>> 
>> Does that sound more useful?
>
> I'd suggest renaming the current 'error_report' to 'message_report' and
> making it take an extra arg that accepts a enum flag INFO | WARNING | ERROR.
> Then add macros for  error_report, warning_report, info_report that call
> message_report with the right enum.  That way you don't have to update any
> of the existing code that calls error_report.

*Functions*, please, not macros.  Macros would bloat the code for no
benefit at all.

Re: [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters

2017-06-29 Thread Ashijeet Acharya

On Tue, Jun 27, 2017 at 1:32 PM, Fam Zheng  wrote:
> On Mon, 06/05 13:22, Ashijeet Acharya wrote:
>> +/*
>> + * vmdk_handle_alloc
>> + *
>> + * Allocate new clusters for an area that either is yet unallocated or 
>> needs a
>> + * copy on write. If *cluster_offset is non_zero, clusters are only 
>> allocated if
>> + * the new allocation can match the specified host offset.
>
> I don't think this matches the function body, the passed in *cluster_offset
> value is ignored.
>
>> + *
>> + * Returns:
>> + *   VMDK_OK:   if new clusters were allocated, *bytes may be decreased 
>> if
>> + *  the new allocation doesn't cover all of the requested 
>> area.
>> + *  *cluster_offset is updated to contain the offset of the
>> + *  first newly allocated cluster.
>> + *
>> + *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is 
>> left
>> + *  unchanged.
>> + *
>> + *   VMDK_ERROR:in error cases
>> + */
>> +static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>> + uint64_t offset, uint64_t *cluster_offset,
>> + int64_t *bytes, VmdkMetaData *m_data,
>> + bool allocate, uint32_t 
>> *alloc_clusters_counter)
>> +{
>> +int l1_index, l2_offset, l2_index;
>> +uint32_t *l2_table;
>> +uint32_t cluster_sector;
>> +uint32_t nb_clusters;
>> +bool zeroed = false;
>> +uint64_t skip_start_bytes, skip_end_bytes;
>> +int ret;
>> +
>> +ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
>> +&l2_index, &l2_table);
>> +if (ret < 0) {
>> +return ret;
>> +}
>> +
>> +cluster_sector = le32_to_cpu(l2_table[l2_index]);
>> +
>> +skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
>> +/* Calculate the number of clusters to look for. Here we truncate the 
>> last
>> + * cluster, i.e. 1 less than the actual value calculated as we may need 
>> to
>> + * perform COW for the last one. */
>> +nb_clusters = DIV_ROUND_UP(skip_start_bytes + *bytes,
>> +   extent->cluster_sectors << BDRV_SECTOR_BITS) 
>> - 1;
>> +
>> +nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
>> +assert(nb_clusters <= INT_MAX);
>> +
>> +/* update bytes according to final nb_clusters value */
>> +if (nb_clusters != 0) {
>> +*bytes = ((nb_clusters * extent->cluster_sectors) << 
>> BDRV_SECTOR_BITS)
>> + - skip_start_bytes;
>> +} else {
>> +nb_clusters = 1;
>> +}
>> +*alloc_clusters_counter += nb_clusters;
>> +skip_end_bytes = skip_start_bytes + MIN(*bytes,
>> + extent->cluster_sectors * BDRV_SECTOR_SIZE
>> +- skip_start_bytes);
>
> I don't understand the MIN part, shouldn't skip_end_bytes simply be
> skip_start_bytes + *bytes?
>
>> +
>> +if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
>> +zeroed = true;
>> +}
>> +
>> +if (!cluster_sector || zeroed) {
>> +if (!allocate) {
>> +return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
>> +}
>> +
>> +cluster_sector = extent->next_cluster_sector;
>> +extent->next_cluster_sector += extent->cluster_sectors
>> +* nb_clusters;
>> +
>> +ret = vmdk_perform_cow(bs, extent, cluster_sector * 
>> BDRV_SECTOR_SIZE,
>> +   offset, skip_start_bytes,
>> +   skip_end_bytes);
>> +if (ret < 0) {
>> +return ret;
>> +}
>> +if (m_data) {
>> +m_data->valid = 1;
>> +m_data->l1_index = l1_index;
>> +m_data->l2_index = l2_index;
>> +m_data->l2_offset = l2_offset;
>> +m_data->l2_cache_entry = &l2_table[l2_index];
>> +m_data->nb_clusters = nb_clusters;
>> +}
>> +}
>> +*cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
>> +return VMDK_OK;
>> +}
>> +
>> +/*
>> + * vmdk_alloc_clusters
>> + *
>> + * For a given offset on the virtual disk, find the cluster offset in vmdk
>> + * file. If the offset is not found, allocate a new cluster.
>> + *
>> + * If the cluster is newly allocated, m_data->nb_clusters is set to the 
>> number
>> + * of contiguous clusters that have been allocated. In this case, the other
>> + * fields of m_data are valid and contain information about the first 
>> allocated
>> + * cluster.
>> + *
>> + * Returns:
>> + *
>> + *   VMDK_OK:   on success and @cluster_offset was set
>> + *
>> + *   VMDK_UNALLOC:  if no clusters were allocated and @cluster_offset is
>> + *  set to zero
>> + *
>> + *   VMDK_ERROR:in error cases
>> + */
>> +static int vmdk_alloc_clusters(BlockDriverState *bs,
>> +   VmdkExtent *extent,
>>

[Qemu-devel] [PATCH v2 0/3] Relax code buffer size limitation on aarch64 hosts

2017-06-29 Thread Pranith Kumar

Hello,

The following patches enable us to relax the 128MB code buffer size
limitation on ARM64 hosts.

Patch 2 increases this limitation to 3GB, even though ADRP+ADD can
address 4GB of pc-relative addresses to give us some slack.

Patch 3 uses LDR (literal) to load the address, allowing us to remove
the code buffer size limitation altogether. However, I feel that 3GB
should be sufficient for now and hence did not change it ;). It
however enables the !USE_DIRECT_JUMP path on aarch64 hosts.

Thanks,

Pranith Kumar (3):
  tcg/aarch64: Introduce and use long branch to register
  tcg/aarch64: Use ADRP+ADD to compute target address
  tcg/aarch64: Enable indirect jump path using LDR (literal)

 accel/tcg/translate-all.c|  2 +-
 tcg/aarch64/tcg-target.inc.c | 69 +++-
 2 files changed, 56 insertions(+), 15 deletions(-)

-- 
2.13.0

[Qemu-devel] [PATCH v2 2/3] tcg/aarch64: Use ADRP+ADD to compute target address

2017-06-29 Thread Pranith Kumar

We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.

CC: Richard Henderson 
CC: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 accel/tcg/translate-all.c|  2 +-
 tcg/aarch64/tcg-target.inc.c | 26 +-
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f6ad46b613..b6d122e087 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -522,7 +522,7 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 #elif defined(__powerpc__)
 # define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
 #elif defined(__aarch64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
 #elif defined(__s390x__)
   /* We have a +- 4GB range on the branches; leave some slop.  */
 # define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 8fce11ace7..b7670ecc90 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -372,6 +372,7 @@ typedef enum {
 I3510_EON   = 0x4a20,
 I3510_ANDS  = 0x6a00,
 
+NOP = 0xd503201f,
 /* System instructions.  */
 DMB_ISH = 0xd50338bf,
 DMB_LD  = 0x0100,
@@ -866,10 +867,18 @@ static inline void tcg_out_call(TCGContext *s, 
tcg_insn_unit *target)
 void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
 tcg_insn_unit *code_ptr = (tcg_insn_unit *)jmp_addr;
-tcg_insn_unit *target = (tcg_insn_unit *)addr;
+tcg_insn_unit adrp_insn = *code_ptr++;
+tcg_insn_unit addi_insn = *code_ptr;
 
-reloc_pc26_atomic(code_ptr, target);
-flush_icache_range(jmp_addr, jmp_addr + 4);
+ptrdiff_t offset = (addr >> 12) - (jmp_addr >> 12);
+
+/* patch ADRP */
+adrp_insn = deposit32(adrp_insn, 29, 2, offset & 0x3);
+adrp_insn = deposit32(adrp_insn, 5, 19, offset >> 2);
+/* patch ADDI */
+addi_insn = deposit32(addi_insn, 10, 12, addr & 0xfff);
+atomic_set((uint64_t *)jmp_addr, adrp_insn | ((uint64_t)addi_insn << 32));
+flush_icache_range(jmp_addr, jmp_addr + 8);
 }
 
 static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
@@ -1388,10 +1397,17 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 #endif
 /* consistency for USE_DIRECT_JUMP */
 tcg_debug_assert(s->tb_jmp_insn_offset != NULL);
+/* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
+   write can be used to patch the target address. */
+if ((uintptr_t)s->code_ptr & 7) {
+tcg_out32(s, NOP);
+}
 s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
 /* actual branch destination will be patched by
-   aarch64_tb_set_jmp_target later, beware retranslation. */
-tcg_out_goto_noaddr(s);
+   aarch64_tb_set_jmp_target later, beware of retranslation */
+tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
+tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, TCG_REG_TMP, 0);
+tcg_out_callr(s, TCG_REG_TMP);
 s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
 break;
 
-- 
2.13.0

[Qemu-devel] [PATCH v3 3/3] tcg/aarch64: Enable indirect jump path using LDR (literal)

2017-06-29 Thread Pranith Kumar

This patch enables the indirect jump path using an LDR (literal)
instruction. It will be interesting to test and see which performs
better among the two paths.

CC: Richard Henderson 
CC: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 tcg/aarch64/tcg-target.inc.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index b7670ecc90..5381c31b45 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -269,6 +269,8 @@ typedef enum {
 I3207_BLR   = 0xd63f,
 I3207_RET   = 0xd65f,
 
+/* Load literal for loading the address at pc-relative offset */
+I3305_LDR   = 0x5800,
 /* Load/store register.  Described here as 3.3.12, but the helper
that emits them can transform to 3.3.10 or 3.3.13.  */
 I3312_STRB  = 0x3800 | LDST_ST << 22 | MO_8 << 30,
@@ -389,6 +391,11 @@ static inline uint32_t tcg_in32(TCGContext *s)
 #define tcg_out_insn(S, FMT, OP, ...) \
 glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
 
+static void tcg_out_insn_3305(TCGContext *s, AArch64Insn insn, int imm19, 
TCGReg rt)
+{
+tcg_out32(s, insn | (imm19 & 0x7) << 5 | rt);
+}
+
 static void tcg_out_insn_3201(TCGContext *s, AArch64Insn insn, TCGType ext,
   TCGReg rt, int imm19)
 {
@@ -864,6 +871,8 @@ static inline void tcg_out_call(TCGContext *s, 
tcg_insn_unit *target)
 }
 }
 
+#ifdef USE_DIRECT_JUMP
+
 void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
 tcg_insn_unit *code_ptr = (tcg_insn_unit *)jmp_addr;
@@ -881,6 +890,8 @@ void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, 
uintptr_t addr)
 flush_icache_range(jmp_addr, jmp_addr + 8);
 }
 
+#endif
+
 static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
 {
 if (!l->has_value) {
@@ -1392,21 +1403,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_goto_tb:
-#ifndef USE_DIRECT_JUMP
-#error "USE_DIRECT_JUMP required for aarch64"
-#endif
-/* consistency for USE_DIRECT_JUMP */
-tcg_debug_assert(s->tb_jmp_insn_offset != NULL);
-/* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
-   write can be used to patch the target address. */
-if ((uintptr_t)s->code_ptr & 7) {
-tcg_out32(s, NOP);
+if (s->tb_jmp_insn_offset != NULL) {
+/* USE_DIRECT_JUMP */
+/* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
+   write can be used to patch the target address. */
+if ((uintptr_t)s->code_ptr & 7) {
+tcg_out32(s, NOP);
+}
+s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
+/* actual branch destination will be patched by
+   aarch64_tb_set_jmp_target later, beware of retranslation */
+tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
+tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, 
TCG_REG_TMP, 0);
+} else {
+/* !USE_DIRECT_JUMP */
+tcg_debug_assert(s->tb_jmp_target_addr != NULL);
+intptr_t offset = tcg_pcrel_diff(s, (s->tb_jmp_target_addr + a0)) 
>> 2;
+tcg_out_insn(s, 3305, LDR, offset, TCG_REG_TMP);
 }
-s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
-/* actual branch destination will be patched by
-   aarch64_tb_set_jmp_target later, beware of retranslation */
-tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
-tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, TCG_REG_TMP, 0);
 tcg_out_callr(s, TCG_REG_TMP);
 s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
 break;
-- 
2.13.0

[Qemu-devel] [PATCH v2 1/3] tcg/aarch64: Introduce and use long branch to register

2017-06-29 Thread Pranith Kumar

We can use a branch to register instruction for exit_tb for offsets
greater than 128MB.

CC: Richard Henderson 
CC: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 tcg/aarch64/tcg-target.inc.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1fa3bccc89..8fce11ace7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -819,6 +819,17 @@ static inline void tcg_out_goto(TCGContext *s, 
tcg_insn_unit *target)
 tcg_out_insn(s, 3206, B, offset);
 }
 
+static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
+{
+ptrdiff_t offset = target - s->code_ptr;
+if (offset == sextract64(offset, 0, 26)) {
+tcg_out_insn(s, 3206, BL, offset);
+} else {
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)target);
+tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
+}
+}
+
 static inline void tcg_out_goto_noaddr(TCGContext *s)
 {
 /* We pay attention here to not modify the branch target by reading from
@@ -1364,10 +1375,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_exit_tb:
 /* Reuse the zeroing that exists for goto_ptr.  */
 if (a0 == 0) {
-tcg_out_goto(s, s->code_gen_epilogue);
+tcg_out_goto_long(s, s->code_gen_epilogue);
 } else {
 tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
-tcg_out_goto(s, tb_ret_addr);
+tcg_out_goto_long(s, tb_ret_addr);
 }
 break;
 
-- 
2.13.0

Re: [Qemu-devel] [RFC PATCH 00/14] Implement network booting directly into the s390-ccw BIOS

2017-06-29 Thread Thomas Huth

On 28.06.2017 17:02, Viktor Mihajlovski wrote:
> On 28.06.2017 12:56, Thomas Huth wrote:
>> On 28.06.2017 10:02, Thomas Huth wrote:
>>> On 28.06.2017 09:28, Viktor Mihajlovski wrote:
 On 27.06.2017 23:40, Thomas Huth wrote:
 [...]
>>> - Is it OK to require loading an .INS file first? Or does anybody
>>>   have a better idea how to load multiple files (kernel, initrd,
>>>   etc. ...)?
>> It would be nice to support PXE-style boot, because the majority of boot
>> servers is set up that way. A straightforward way would be to do a PXE
>> emulation by attempting to download a pxelinux.cfg from the well-known
>> locations, parsing the content (menu) and finally load the kernel,
>> initrd and set the kernel command line as specified there. (I know, but
>> you're already parsing the INS-File).
>
> Please, don't mix up PXE and pxelinux (since you've used both terms in
> above paragraph). Assuming that you're only talking about pxlinux config
> files... are they that common on s390x already? Using the pxelinux
> config file syntax sounds like we would be completely bound to only
> loading Linux guests to me, since the boot loader has to know where to
> load the initrd and how to patch the kernel so that it can find the 
> initrd.
> Using .INS files sounds more flexible to me instead, since you can also
> specify the addresses here - so you can theoretically also load other
> guest kernels, and that's IMHO the better approach since a firmware
> should stay as generic as possible.
>
 In order to be consumable, the network boot should support the most
 common configurations. I would think that most network boot servers are
 setup as PXE boot servers using pxelinux configs.
>>>
>>> Are you really sure about the popularity of pxelinux? It's just one
>>> flavor of secondary stage network boot loaders - which also only exist
>>> on x86 so far, as far as I know.
>>
>> And it seems like it also only works with legacy BIOSes, i.e. you can
>> not use it on EFI-only systems, if I've got that right:
>>
>> https://github.com/openSUSE/kiwi/wiki/Setup-PXE-boot-with-EFI-Using-GRUB2
>>
>> So I guess the significance of pxelinux will very likely decrease in
>> the next years...
>>
> Maybe, but supposed goners tend to linger more often than not. I.e., the
> syslinux project offers a EFI bootloader called syslinux.efi equivalent
> to the pexelinux.0 BIOS loader.
> Further, the OPAL firmware of newer POWER systems embeds petitboot and
> thus offers PXELINUX-compatible network boot as well.

OK, true, ... you're slowly get me convinced that this pxelinux.cfg
stuff is maybe not such a bad idea after all ... So I guess supporting
at least the basic commands from the pxelinux config file would be
appropriate... (the full set of commands is huge, see
http://www.syslinux.org/wiki/index.php?title=Config )

> I appreciate the idea of a proper BOOTP implementation though, so maybe
> a compromise could be to start off with your proposal with the slight
> modification that the final boot action is controlled by the bootfile
> content (file magic), similar to what you suggested in order to support
> both insfile and binary kernel. PXELINUX emulation could be triggered by
> a specially crafted bootfile then. What do you think?

Yes, something like this could work:

1) Do DHCP to get TFTP server address and boot file name
2) Load the boot file from the TFTP server to address 0
3) If the boot file name ended with ".ins" or ".INS" (and the content
   starts with the "* " magic), treat it as an .INS file and load the
   files that are listed in there accordingly
4) If the boot file looks like a kernel, start it directly
5) If not successful in 3 or 4, start looking for a pxelinux config
   file by trying to download the config files as specified in
   http://www.syslinux.org/wiki/index.php?title=PXELINUX#Configuration
   and then parse the file and load the kernel + initrd accordingly.

Quite a bit of work, so I'll continue to ignore 5 for the first
versions, but I agree now that it can certainly be added later.

 Thomas

Re: [Qemu-devel] [PATCH] replace struct ucontext with ucontext_t type

2017-06-29 Thread Laurent Vivier

Le 28/06/2017 à 22:44, Khem Raj a écrit :
> The ucontext_t type had a tag struct ucontext until now
> but newer glibc will drop it so we need to adjust and use
> the exposed type instead

I didn't find in glib git tree the commit dropping the struct tags for
ucontext. Could you point it out?

Thanks,
Laurent

> Signed-off-by: Khem Raj 
> Cc: Kamil Rytarowski 
> Cc: Riku Voipio 
> Cc: Laurent Vivier 
> Cc: Paolo Bonzini 
> ---
>  linux-user/host/aarch64/hostdep.h |  2 +-
>  linux-user/host/arm/hostdep.h |  2 +-
>  linux-user/host/i386/hostdep.h|  2 +-
>  linux-user/host/ppc64/hostdep.h   |  2 +-
>  linux-user/host/s390x/hostdep.h   |  2 +-
>  linux-user/host/x86_64/hostdep.h  |  2 +-
>  linux-user/signal.c   | 10 +-
>  tests/tcg/test-i386.c |  4 ++--
>  user-exec.c   | 18 +-
>  9 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/linux-user/host/aarch64/hostdep.h 
> b/linux-user/host/aarch64/hostdep.h
> index 64f75cef49..a8d41a21ad 100644
> --- a/linux-user/host/aarch64/hostdep.h
> +++ b/linux-user/host/aarch64/hostdep.h
> @@ -24,7 +24,7 @@ extern char safe_syscall_end[];
>  /* Adjust the signal context to rewind out of safe-syscall if we're in it */
>  static inline void rewind_if_in_safe_syscall(void *puc)
>  {
> -struct ucontext *uc = puc;
> +ucontext_t *uc = puc;
>  __u64 *pcreg = &uc->uc_mcontext.pc;
>  
>  if (*pcreg > (uintptr_t)safe_syscall_start
> diff --git a/linux-user/host/arm/hostdep.h b/linux-user/host/arm/hostdep.h
> index 5c1ae60120..9276fe6ceb 100644
> --- a/linux-user/host/arm/hostdep.h
> +++ b/linux-user/host/arm/hostdep.h
> @@ -24,7 +24,7 @@ extern char safe_syscall_end[];
>  /* Adjust the signal context to rewind out of safe-syscall if we're in it */
>  static inline void rewind_if_in_safe_syscall(void *puc)
>  {
> -struct ucontext *uc = puc;
> +ucontext_t *uc = puc;
>  unsigned long *pcreg = &uc->uc_mcontext.arm_pc;
>  
>  if (*pcreg > (uintptr_t)safe_syscall_start
> diff --git a/linux-user/host/i386/hostdep.h b/linux-user/host/i386/hostdep.h
> index d834bd80ea..073be74d87 100644
> --- a/linux-user/host/i386/hostdep.h
> +++ b/linux-user/host/i386/hostdep.h
> @@ -24,7 +24,7 @@ extern char safe_syscall_end[];
>  /* Adjust the signal context to rewind out of safe-syscall if we're in it */
>  static inline void rewind_if_in_safe_syscall(void *puc)
>  {
> -struct ucontext *uc = puc;
> +ucontext_t *uc = puc;
>  greg_t *pcreg = &uc->uc_mcontext.gregs[REG_EIP];
>  
>  if (*pcreg > (uintptr_t)safe_syscall_start
> diff --git a/linux-user/host/ppc64/hostdep.h b/linux-user/host/ppc64/hostdep.h
> index 0b0f5f7821..98979ad917 100644
> --- a/linux-user/host/ppc64/hostdep.h
> +++ b/linux-user/host/ppc64/hostdep.h
> @@ -24,7 +24,7 @@ extern char safe_syscall_end[];
>  /* Adjust the signal context to rewind out of safe-syscall if we're in it */
>  static inline void rewind_if_in_safe_syscall(void *puc)
>  {
> -struct ucontext *uc = puc;
> +ucontext_t *uc = puc;
>  unsigned long *pcreg = &uc->uc_mcontext.gp_regs[PT_NIP];
>  
>  if (*pcreg > (uintptr_t)safe_syscall_start
> diff --git a/linux-user/host/s390x/hostdep.h b/linux-user/host/s390x/hostdep.h
> index 6f9da9c608..4f0171f36f 100644
> --- a/linux-user/host/s390x/hostdep.h
> +++ b/linux-user/host/s390x/hostdep.h
> @@ -24,7 +24,7 @@ extern char safe_syscall_end[];
>  /* Adjust the signal context to rewind out of safe-syscall if we're in it */
>  static inline void rewind_if_in_safe_syscall(void *puc)
>  {
> -struct ucontext *uc = puc;
> +ucontext_t *uc = puc;
>  unsigned long *pcreg = &uc->uc_mcontext.psw.addr;
>  
>  if (*pcreg > (uintptr_t)safe_syscall_start
> diff --git a/linux-user/host/x86_64/hostdep.h 
> b/linux-user/host/x86_64/hostdep.h
> index 3b4259633e..a4fefb5114 100644
> --- a/linux-user/host/x86_64/hostdep.h
> +++ b/linux-user/host/x86_64/hostdep.h
> @@ -24,7 +24,7 @@ extern char safe_syscall_end[];
>  /* Adjust the signal context to rewind out of safe-syscall if we're in it */
>  static inline void rewind_if_in_safe_syscall(void *puc)
>  {
> -struct ucontext *uc = puc;
> +ucontext_t *uc = puc;
>  greg_t *pcreg = &uc->uc_mcontext.gregs[REG_RIP];
>  
>  if (*pcreg > (uintptr_t)safe_syscall_start
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index 3d18d1b3ee..2c55a4f600 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -3346,7 +3346,7 @@ static void setup_rt_frame(int sig, struct 
> target_sigaction *ka,
>  *
>  *   a0 = signal number
>  *   a1 = pointer to siginfo_t
> -*   a2 = pointer to struct ucontext
> +*   a2 = pointer to ucontext_t
>  *
>  * $25 and PC point to the signal handler, $29 points to the
>  * struct sigframe.
> @@ -3733,7 +3733,7 @@ struct target_signal_frame {
>  
>  struct rt_signal_frame {
>  siginfo_t info;
> -struct ucontext uc;
> +ucontext_t uc;
>  uint32

[Qemu-devel] [PATCH v2 0/7] qdev: Introduce DEFINE_PROP_LINK

2017-06-29 Thread Fam Zheng

v2: Create a new header for link properties. [Paolo]
Don't wrap, use PropertyInfo.create() (much better diffstat, yay!).
[Paolo]

Link properties of devices created with object_property_add_link() are not
reflected in HMP "info qtree". For example, whether a virtio-blk device has an
iothread (i.e. has enabled data plane) can not be introspected easily.

Introduce a new type of qdev property macro to fix that.

Fam Zheng (7):
  qom: Make link property API public
  qom: Handle property lookup failure in object_resolve_link
  qdev: Introduce PropertyInfo.create
  qdev: Introduce DEFINE_PROP_LINK
  virtio-blk: Use DEFINE_PROP_LINK
  virtio-scsi: Use DEFINE_PROP_LINK
  virtio-rng: Use DEFINE_PROP_LINK

 hw/block/dataplane/virtio-blk.c |  2 +-
 hw/block/virtio-blk.c   |  7 +++
 hw/core/qdev-properties.c   | 16 
 hw/core/qdev.c  | 31 +++
 hw/scsi/virtio-scsi-dataplane.c |  2 +-
 hw/scsi/virtio-scsi.c   | 15 ---
 hw/virtio/virtio-pci.c  |  6 --
 hw/virtio/virtio-rng.c  | 16 
 include/hw/qdev-core.h  |  6 ++
 include/hw/qdev-properties.h| 11 +++
 include/hw/virtio/virtio-blk.h  |  2 +-
 include/hw/virtio/virtio-rng.h  |  2 +-
 include/hw/virtio/virtio-scsi.h |  2 +-
 include/qom/link-property.h | 31 +++
 qom/object.c| 24 +++-
 15 files changed, 110 insertions(+), 63 deletions(-)
 create mode 100644 include/qom/link-property.h

-- 
2.9.4

[Qemu-devel] [PATCH v2 2/7] qom: Handle property lookup failure in object_resolve_link

2017-06-29 Thread Fam Zheng

Since we have made object_set_link_property a public function, it is
possible that it will be called with a nonexistent property name.  Let's
survive this error case and report error to avoid segfault in the future.

Signed-off-by: Fam Zheng 
---
 qom/object.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/qom/object.c b/qom/object.c
index fdb8f0d..7ce97d9 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1471,7 +1471,10 @@ static Object *object_resolve_link(Object *obj, const 
char *name,
 Object *target;
 
 /* Go from link to FOO.  */
-type = object_property_get_type(obj, name, NULL);
+type = object_property_get_type(obj, name, errp);
+if (!type) {
+return NULL;
+}
 target_type = g_strndup(&type[5], strlen(type) - 6);
 target = object_resolve_path_type(path, target_type, &ambiguous);
 
-- 
2.9.4

[Qemu-devel] [PATCH v2 6/7] virtio-scsi: Use DEFINE_PROP_LINK

2017-06-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi-dataplane.c |  2 +-
 hw/scsi/virtio-scsi.c   | 15 ---
 hw/virtio/virtio-pci.c  |  2 --
 include/hw/virtio/virtio-scsi.h |  2 +-
 4 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 944ea4e..887c100 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -40,7 +40,7 @@ void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error **errp)
 error_setg(errp, "ioeventfd is required for iothread");
 return;
 }
-s->ctx = iothread_get_aio_context(vs->conf.iothread);
+s->ctx = iothread_get_aio_context(IOTHREAD(vs->conf.iothread));
 } else {
 if (!virtio_device_ioeventfd_enabled(vdev)) {
 return;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index f46f06d..afe4389 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -897,16 +897,6 @@ static void virtio_scsi_device_realize(DeviceState *dev, 
Error **errp)
 virtio_scsi_dataplane_setup(s, errp);
 }
 
-static void virtio_scsi_instance_init(Object *obj)
-{
-VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(obj);
-
-object_property_add_link(obj, "iothread", TYPE_IOTHREAD,
- (Object **)&vs->conf.iothread,
- qdev_prop_allow_set_link_before_realize,
- OBJ_PROP_LINK_UNREF_ON_RELEASE, &error_abort);
-}
-
 void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -934,6 +924,10 @@ static Property virtio_scsi_properties[] = {
VIRTIO_SCSI_F_HOTPLUG, true),
 DEFINE_PROP_BIT("param_change", VirtIOSCSI, host_features,
 VIRTIO_SCSI_F_CHANGE, true),
+DEFINE_PROP_LINK("iothread", VirtIOSCSI, parent_obj.conf.iothread,
+ TYPE_IOTHREAD,
+ qdev_prop_allow_set_link_before_realize,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -988,7 +982,6 @@ static const TypeInfo virtio_scsi_info = {
 .name = TYPE_VIRTIO_SCSI,
 .parent = TYPE_VIRTIO_SCSI_COMMON,
 .instance_size = sizeof(VirtIOSCSI),
-.instance_init = virtio_scsi_instance_init,
 .class_init = virtio_scsi_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_HOTPLUG_HANDLER },
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index e6960ae..eb03ba5 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2065,8 +2065,6 @@ static void virtio_scsi_pci_instance_init(Object *obj)
 
 virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
 TYPE_VIRTIO_SCSI);
-object_property_add_alias(obj, "iothread", OBJECT(&dev->vdev), "iothread",
-  &error_abort);
 }
 
 static const TypeInfo virtio_scsi_pci_info = {
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index de6ae5a..1decc40 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -56,7 +56,7 @@ struct VirtIOSCSIConf {
 #endif
 CharBackend chardev;
 uint32_t boot_tpgt;
-IOThread *iothread;
+Object *iothread;
 };
 
 struct VirtIOSCSI;
-- 
2.9.4

[Qemu-devel] [PATCH v2 3/7] qdev: Introduce PropertyInfo.create

2017-06-29 Thread Fam Zheng

This allows property implementation to provide a specialized property
creation method.

Update conditions guarding property types accordingly.

Signed-off-by: Fam Zheng 
---
 hw/core/qdev.c | 31 +++
 include/hw/qdev-core.h |  1 +
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 849952a..ec63fe0 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -744,6 +744,10 @@ static void qdev_property_add_legacy(DeviceState *dev, 
Property *prop,
 return;
 }
 
+if (prop->info->create) {
+return;
+}
+
 name = g_strdup_printf("legacy-%s", prop->name);
 object_property_add(OBJECT(dev), name, "str",
 prop->info->print ? qdev_get_legacy_property : 
prop->info->get,
@@ -770,20 +774,23 @@ void qdev_property_add_static(DeviceState *dev, Property 
*prop,
 Error *local_err = NULL;
 Object *obj = OBJECT(dev);
 
-/*
- * TODO qdev_prop_ptr does not have getters or setters.  It must
- * go now that it can be replaced with links.  The test should be
- * removed along with it: all static properties are read/write.
- */
-if (!prop->info->get && !prop->info->set) {
-return;
+if (prop->info->create) {
+prop->info->create(obj, prop, &local_err);
+} else {
+/*
+ * TODO qdev_prop_ptr does not have getters or setters.  It must
+ * go now that it can be replaced with links.  The test should be
+ * removed along with it: all static properties are read/write.
+ */
+if (!prop->info->get && !prop->info->set) {
+return;
+}
+object_property_add(obj, prop->name, prop->info->name,
+prop->info->get, prop->info->set,
+prop->info->release,
+prop, &local_err);
 }
 
-object_property_add(obj, prop->name, prop->info->name,
-prop->info->get, prop->info->set,
-prop->info->release,
-prop, &local_err);
-
 if (local_err) {
 error_propagate(errp, local_err);
 return;
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9d7c1c0..33518ee 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -241,6 +241,7 @@ struct PropertyInfo {
 const char * const *enum_table;
 int (*print)(DeviceState *dev, Property *prop, char *dest, size_t len);
 void (*set_default_value)(Object *obj, const Property *prop);
+void (*create)(Object *obj, Property *prop, Error **errp);
 ObjectPropertyAccessor *get;
 ObjectPropertyAccessor *set;
 ObjectPropertyRelease *release;
-- 
2.9.4

[Qemu-devel] [PATCH v2 1/7] qom: Make link property API public

2017-06-29 Thread Fam Zheng

The get/set pair and the struct will be reused by qdev link prop, make
them public.

Signed-off-by: Fam Zheng 
---
 include/qom/link-property.h | 31 +++
 qom/object.c| 19 +++
 2 files changed, 38 insertions(+), 12 deletions(-)
 create mode 100644 include/qom/link-property.h

diff --git a/include/qom/link-property.h b/include/qom/link-property.h
new file mode 100644
index 000..f3c3816
--- /dev/null
+++ b/include/qom/link-property.h
@@ -0,0 +1,31 @@
+/*
+ * QOM link property
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QOM_LINK_PROPERTY_H
+#define QOM_LINK_PROPERTY_H
+
+#include "qom/object.h"
+
+typedef struct {
+Object **child;
+void (*check)(Object *, const char *, Object *, Error **);
+ObjectPropertyLinkFlags flags;
+} LinkProperty;
+
+void object_get_link_property(Object *obj, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp);
+void object_set_link_property(Object *obj, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp);
+#endif
diff --git a/qom/object.c b/qom/object.c
index 5f6fdfa..fdb8f0d 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -28,6 +28,7 @@
 #include "qapi/qmp/qobject.h"
 #include "qapi/qmp/qbool.h"
 #include "qapi/qmp/qstring.h"
+#include "qom/link-property.h"
 
 #define MAX_INTERFACES 32
 
@@ -1434,15 +1435,9 @@ void object_property_allow_set_link(Object *obj, const 
char *name,
 /* Allow the link to be set, always */
 }
 
-typedef struct {
-Object **child;
-void (*check)(Object *, const char *, Object *, Error **);
-ObjectPropertyLinkFlags flags;
-} LinkProperty;
-
-static void object_get_link_property(Object *obj, Visitor *v,
- const char *name, void *opaque,
- Error **errp)
+void object_get_link_property(Object *obj, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
 {
 LinkProperty *lprop = opaque;
 Object **child = lprop->child;
@@ -1498,9 +1493,9 @@ static Object *object_resolve_link(Object *obj, const 
char *name,
 return target;
 }
 
-static void object_set_link_property(Object *obj, Visitor *v,
- const char *name, void *opaque,
- Error **errp)
+void object_set_link_property(Object *obj, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
 {
 Error *local_err = NULL;
 LinkProperty *prop = opaque;
-- 
2.9.4

[Qemu-devel] [PATCH v2 4/7] qdev: Introduce DEFINE_PROP_LINK

2017-06-29 Thread Fam Zheng

This property can be used to replace the object_property_add_link in
device code, to add a link to other objects, which is a common pattern.

Signed-off-by: Fam Zheng 
---
 hw/core/qdev-properties.c| 16 
 include/hw/qdev-core.h   |  5 +
 include/hw/qdev-properties.h | 11 +++
 3 files changed, 32 insertions(+)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 68cd653..7c11eb8 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -1192,3 +1192,19 @@ PropertyInfo qdev_prop_size = {
 .set = set_size,
 .set_default_value = set_default_value_uint,
 };
+
+/* --- object link property --- */
+
+static void create_link_property(Object *obj, Property *prop, Error **errp)
+{
+Object **child = qdev_get_prop_ptr(DEVICE(obj), prop);
+
+object_property_add_link(obj, prop->name, prop->link_type,
+ child, prop->link.check,
+ prop->link.flags, errp);
+}
+
+PropertyInfo qdev_prop_link = {
+.name = "link",
+.create = create_link_property,
+};
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 33518ee..40afb3d 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -5,6 +5,7 @@
 #include "qemu/option.h"
 #include "qemu/bitmap.h"
 #include "qom/object.h"
+#include "qom/link-property.h"
 #include "hw/irq.h"
 #include "hw/hotplug.h"
 
@@ -233,6 +234,10 @@ struct Property {
 int  arrayoffset;
 PropertyInfo *arrayinfo;
 int  arrayfieldsize;
+/* Only @check and @flags are used; @child is unuseful because we need a
+ * dynamic pointer in @obj as derived from @offset. */
+LinkProperty link;
+const char   *link_type;
 };
 
 struct PropertyInfo {
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 39bf4b2..767c10b 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -30,6 +30,7 @@ extern PropertyInfo qdev_prop_pci_devfn;
 extern PropertyInfo qdev_prop_blocksize;
 extern PropertyInfo qdev_prop_pci_host_devaddr;
 extern PropertyInfo qdev_prop_arraylen;
+extern PropertyInfo qdev_prop_link;
 
 #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
 .name  = (_name),\
@@ -117,6 +118,16 @@ extern PropertyInfo qdev_prop_arraylen;
 .arrayoffset = offsetof(_state, _arrayfield),   \
 }
 
+#define DEFINE_PROP_LINK(_name, _state, _field, _type, _check, _flags) {\
+.name = (_name),\
+.info = &(qdev_prop_link),  \
+.offset = offsetof(_state, _field)  \
++ type_check(Object *, typeof_field(_state, _field)),   \
+.link.check = _check,   \
+.link.flags = _flags,   \
+.link_type  = _type,\
+}
+
 #define DEFINE_PROP_UINT8(_n, _s, _f, _d)   \
 DEFINE_PROP_UNSIGNED(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
 #define DEFINE_PROP_UINT16(_n, _s, _f, _d)  \
-- 
2.9.4

[Qemu-devel] [PATCH v2 5/7] virtio-blk: Use DEFINE_PROP_LINK

2017-06-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 hw/block/dataplane/virtio-blk.c | 2 +-
 hw/block/virtio-blk.c   | 7 +++
 hw/virtio/virtio-pci.c  | 2 --
 include/hw/virtio/virtio-blk.h  | 2 +-
 4 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5556f0e..6fdc6f6 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -116,7 +116,7 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 s->conf = conf;
 
 if (conf->iothread) {
-s->iothread = conf->iothread;
+s->iothread = IOTHREAD(conf->iothread);
 object_ref(OBJECT(s->iothread));
 s->ctx = iothread_get_aio_context(s->iothread);
 } else {
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 604d37d..aa2c38c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -981,10 +981,6 @@ static void virtio_blk_instance_init(Object *obj)
 {
 VirtIOBlock *s = VIRTIO_BLK(obj);
 
-object_property_add_link(obj, "iothread", TYPE_IOTHREAD,
- (Object **)&s->conf.iothread,
- qdev_prop_allow_set_link_before_realize,
- OBJ_PROP_LINK_UNREF_ON_RELEASE, NULL);
 device_add_bootindex_property(obj, &s->conf.conf.bootindex,
   "bootindex", "/disk@0,0",
   DEVICE(obj), NULL);
@@ -1012,6 +1008,9 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
 true),
 DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
+DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
+ qdev_prop_allow_set_link_before_realize,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 20d6a08..e6960ae 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1996,8 +1996,6 @@ static void virtio_blk_pci_instance_init(Object *obj)
 
 virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
 TYPE_VIRTIO_BLK);
-object_property_add_alias(obj, "iothread", OBJECT(&dev->vdev),"iothread",
-  &error_abort);
 object_property_add_alias(obj, "bootindex", OBJECT(&dev->vdev),
   "bootindex", &error_abort);
 }
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index d3c8a6f..94a9f0c 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -33,7 +33,7 @@ struct virtio_blk_inhdr
 struct VirtIOBlkConf
 {
 BlockConf conf;
-IOThread *iothread;
+Object *iothread;
 char *serial;
 uint32_t scsi;
 uint32_t config_wce;
-- 
2.9.4

[Qemu-devel] [PATCH v2 7/7] virtio-rng: Use DEFINE_PROP_LINK

2017-06-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 hw/virtio/virtio-pci.c |  2 --
 hw/virtio/virtio-rng.c | 16 
 include/hw/virtio/virtio-rng.h |  2 +-
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eb03ba5..0938db4 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2459,8 +2459,6 @@ static void virtio_rng_initfn(Object *obj)
 
 virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
 TYPE_VIRTIO_RNG);
-object_property_add_alias(obj, "rng", OBJECT(&dev->vdev), "rng",
-  &error_abort);
 }
 
 static const TypeInfo virtio_rng_pci_info = {
diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index a6ee501..218778b 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -199,7 +199,7 @@ static void virtio_rng_device_realize(DeviceState *dev, 
Error **errp)
  "rng", NULL);
 }
 
-vrng->rng = vrng->conf.rng;
+vrng->rng = RNG_BACKEND(vrng->conf.rng);
 if (vrng->rng == NULL) {
 error_setg(errp, "'rng' parameter expects a valid object");
 return;
@@ -246,6 +246,9 @@ static Property virtio_rng_properties[] = {
  */
 DEFINE_PROP_UINT64("max-bytes", VirtIORNG, conf.max_bytes, INT64_MAX),
 DEFINE_PROP_UINT32("period", VirtIORNG, conf.period_ms, 1 << 16),
+DEFINE_PROP_LINK("rng", VirtIORNG, conf.rng, TYPE_RNG_BACKEND,
+ qdev_prop_allow_set_link_before_realize,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -262,21 +265,10 @@ static void virtio_rng_class_init(ObjectClass *klass, 
void *data)
 vdc->get_features = get_features;
 }
 
-static void virtio_rng_initfn(Object *obj)
-{
-VirtIORNG *vrng = VIRTIO_RNG(obj);
-
-object_property_add_link(obj, "rng", TYPE_RNG_BACKEND,
- (Object **)&vrng->conf.rng,
- qdev_prop_allow_set_link_before_realize,
- OBJ_PROP_LINK_UNREF_ON_RELEASE, NULL);
-}
-
 static const TypeInfo virtio_rng_info = {
 .name = TYPE_VIRTIO_RNG,
 .parent = TYPE_VIRTIO_DEVICE,
 .instance_size = sizeof(VirtIORNG),
-.instance_init = virtio_rng_initfn,
 .class_init = virtio_rng_class_init,
 };
 
diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h
index 922dce7..8d45597 100644
--- a/include/hw/virtio/virtio-rng.h
+++ b/include/hw/virtio/virtio-rng.h
@@ -23,7 +23,7 @@
 OBJECT_GET_PARENT_CLASS(obj, TYPE_VIRTIO_RNG)
 
 struct VirtIORNGConf {
-RngBackend *rng;
+Object *rng;
 uint64_t max_bytes;
 uint32_t period_ms;
 RngRandom *default_backend;
-- 
2.9.4

Re: [Qemu-devel] [RFC PATCH 00/14] Implement network booting directly into the s390-ccw BIOS

2017-06-29 Thread Viktor Mihajlovski

On 29.06.2017 09:58, Thomas Huth wrote:
> On 28.06.2017 17:02, Viktor Mihajlovski wrote:
>> On 28.06.2017 12:56, Thomas Huth wrote:
>>> On 28.06.2017 10:02, Thomas Huth wrote:
 On 28.06.2017 09:28, Viktor Mihajlovski wrote:
> On 27.06.2017 23:40, Thomas Huth wrote:
> [...]
 - Is it OK to require loading an .INS file first? Or does anybody
   have a better idea how to load multiple files (kernel, initrd,
   etc. ...)?
>>> It would be nice to support PXE-style boot, because the majority of boot
>>> servers is set up that way. A straightforward way would be to do a PXE
>>> emulation by attempting to download a pxelinux.cfg from the well-known
>>> locations, parsing the content (menu) and finally load the kernel,
>>> initrd and set the kernel command line as specified there. (I know, but
>>> you're already parsing the INS-File).
>>
>> Please, don't mix up PXE and pxelinux (since you've used both terms in
>> above paragraph). Assuming that you're only talking about pxlinux config
>> files... are they that common on s390x already? Using the pxelinux
>> config file syntax sounds like we would be completely bound to only
>> loading Linux guests to me, since the boot loader has to know where to
>> load the initrd and how to patch the kernel so that it can find the 
>> initrd.
>> Using .INS files sounds more flexible to me instead, since you can also
>> specify the addresses here - so you can theoretically also load other
>> guest kernels, and that's IMHO the better approach since a firmware
>> should stay as generic as possible.
>>
> In order to be consumable, the network boot should support the most
> common configurations. I would think that most network boot servers are
> setup as PXE boot servers using pxelinux configs.

 Are you really sure about the popularity of pxelinux? It's just one
 flavor of secondary stage network boot loaders - which also only exist
 on x86 so far, as far as I know.
>>>
>>> And it seems like it also only works with legacy BIOSes, i.e. you can
>>> not use it on EFI-only systems, if I've got that right:
>>>
>>> https://github.com/openSUSE/kiwi/wiki/Setup-PXE-boot-with-EFI-Using-GRUB2
>>>
>>> So I guess the significance of pxelinux will very likely decrease in
>>> the next years...
>>>
>> Maybe, but supposed goners tend to linger more often than not. I.e., the
>> syslinux project offers a EFI bootloader called syslinux.efi equivalent
>> to the pexelinux.0 BIOS loader.
>> Further, the OPAL firmware of newer POWER systems embeds petitboot and
>> thus offers PXELINUX-compatible network boot as well.
> 
> OK, true, ... you're slowly get me convinced that this pxelinux.cfg
> stuff is maybe not such a bad idea after all ... So I guess supporting
> at least the basic commands from the pxelinux config file would be
> appropriate... (the full set of commands is huge, see
> http://www.syslinux.org/wiki/index.php?title=Config )
> 
>> I appreciate the idea of a proper BOOTP implementation though, so maybe
>> a compromise could be to start off with your proposal with the slight
>> modification that the final boot action is controlled by the bootfile
>> content (file magic), similar to what you suggested in order to support
>> both insfile and binary kernel. PXELINUX emulation could be triggered by
>> a specially crafted bootfile then. What do you think?
> 
> Yes, something like this could work:
> 
> 1) Do DHCP to get TFTP server address and boot file name
> 2) Load the boot file from the TFTP server to address 0
> 3) If the boot file name ended with ".ins" or ".INS" (and the content
>starts with the "* " magic), treat it as an .INS file and load the
>files that are listed in there accordingly
> 4) If the boot file looks like a kernel, start it directly
> 5) If not successful in 3 or 4, start looking for a pxelinux config
>file by trying to download the config files as specified in
>http://www.syslinux.org/wiki/index.php?title=PXELINUX#Configuration
>and then parse the file and load the kernel + initrd accordingly.
> 
> Quite a bit of work, so I'll continue to ignore 5 for the first
> versions, but I agree now that it can certainly be added later.
That sounds more than fair. Thanks!
> 
>  Thomas
> 


-- 

Mit freundlichen Grüßen/Kind Regards
   Viktor Mihajlovski

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Köderitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Re: [Qemu-devel] [PATCH v6 3/3] migration: add bitmap for received page

2017-06-29 Thread Alexey

On Thu, Jun 29, 2017 at 11:28:02AM +0800, Peter Xu wrote:
> On Wed, Jun 28, 2017 at 08:49:32AM -0400, Alexey Perevalov wrote:
> 
> [...]
> 
> > @@ -2324,8 +2352,14 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
> 
> [1]
> 
> >  
> >  static int ram_load_cleanup(void *opaque)
> >  {
> > +RAMBlock *rb;
> >  xbzrle_load_cleanup();
> >  compress_threads_load_cleanup();
> > +
> > +RAMBLOCK_FOREACH(rb) {
> > +g_free(rb->receivedmap);
> > +rb->receivedmap = NULL;
> > +}
> >  return 0;
> >  }
> 
> Can we put init into ram_load_setup()? (I don't have the codes, but I
> see this function above at [1], I suppose it'll be called on dest side
> before migration really starts?)
> 
> Sorry for my harshness, but this just looks weird. I am not asking
> that we _must_ put it somewhere outside RAM codes, but at least I
> think they should be paired. If you cleanup the bitmap in RAM code,
> why not you init it in RAM code as well? Did I miss anything again?
> 
no looks like it missed by me, I have to admit I didn't fully review
Juan's patches, right now I checked forth version of his patch set,
I had got to participate in Juan's patch set discussion.
> -- 
> Peter Xu
> 

-- 

BR
Alexey

Re: [Qemu-devel] [RFC PATCH 00/14] Implement network booting directly into the s390-ccw BIOS

2017-06-29 Thread Thomas Huth

On 28.06.2017 10:59, Thomas Huth wrote:
> On 28.06.2017 09:43, Christian Borntraeger wrote:
>> Interesting work, thanks for giving it a ry.
>>
>> On 06/27/2017 01:48 PM, Thomas Huth wrote:
>>> It's already possible to do a network boot of an s390x guest with an
>>> external netboot image (based on a Linux installation), but it would
>>> be much more convenient if the s390-ccw firmware supported network
>>> booting right out of the box, without the need to assemble such an
>>> external image first.
>>>
>>> This patch series now introduces network booting via DHCP and TFTP
>>> directly into the s390-ccw firmware by re-using the networking stack
>>> from the SLOF firmware (see https://github.com/aik/SLOF/ for details),
>>> and adds a driver for virtio-net-ccw devices.
>>
>> What is the licensing situation with SLOF?
> 
> All its code is licensed under the 3-Clause BSD License:
> 
>  https://github.com/aik/SLOF/blob/master/LICENSE
> 
> AFAIK it should be fine to use such code in GPL-ed sources like the
> s390-ccw firmware, shouldn't it?
> 
>>> - Is it OK to require loading an .INS file first? Or does anybody
>>>   have a better idea how to load multiple files (kernel, initrd,
>>>   etc. ...)?
>>
>> I agree with Viktor that supporting a pxelinux config file is probably the
>> better way, since this is what all exisiting servers admins understand. For
>> the time being Linux will be the most relevant guest and requiring changes in
>> management infrastructure will certainly make things very hard.
> 
> I have to say, the more I read about pxelinux, the more I think we
> should *not* directly support this in the firmware. pxelinux is clearly
> a secondary stage bootloader, with a rather complex config file, and
> features like config file name guessing via MAC-address or IP-address
> ... if we really want to support that on s390x, too, I think it should
> stay in a external binary instead, and not directly incorporated into
> the s390-ccw firmware (so that users finally have the choice whether
> they want to use pxelinux-style config files or grub2 or something
> different one day).
> 
> I guess the .INS file parsing in firmware was already a bad idea... all
> other typical firmware implementations can also only load one file - and
> if you need to load multiple files, you've got to use a secondary stage
> bootloader like pxelinux, yaboot or grub2. So if we agree to add network
> booting directly into the s390-ccw firmware, I think we should do the
> same here and only support loading of one file (without an additional
> config file). But then the question is, of course, whether it makes
> sense to add that support to the firmware at all, or whether we should
> simply continue with the current "s390-netboot.img" secondary-loader
> approach...
> 
>>> - The code from SLOF uses a different coding style (TABs instead
>>>   of space) ... is it OK to keep that coding style here so we
>>>   can share patches between SLOF and s390-ccw more easily?
>>
>> Is it possible to enhance SLOF and then link to the existing SLOF code?
> 
> I already submitted some of the clean-up patches to the SLOF mailing
> list, and Alexey has merged them now, e.g.:
> 
>  https://github.com/aik/SLOF/commit/140c3f39db4ce4c0
> 
> And as I already mentioned in my reply to David, it should theoretically
> be possible to use the code from the roms/SLOF submodule in QEMU ...
> but then we've got to deal with sudden changes in the SLOF repository
> which might cause unwanted problems in the s390-ccw firmware. I guess we
> could give it a try (the libc code is very, very stable in SLOF, and the
> libnet code also changes only very seldomly) - of course only if we
> really decide that we want to have TFTP support directly in the
> firmware. If we rather want to continue with the s390-netboot.img
> approach instead, I have got to reconsider whether I continue with my
> efforts by putting that stuff into an external binary, or whether it
> makes more sense to look into porting pxelinux, grub2 or petitboot
> instead...

OK, thinking about all of this again, what do you think of the following
approach:

Let's do not include network booting code into the s390-ccw.img, but
keep the way that it uses an external s390-netboot.img for this job.

I'll then try to add an additional s390-netboot.img target to the
Makefile, which will only be built if the roms/SLOF submodule has been
checked out. That target then uses the libc and libnet from the SLOF
submodule and links it with my virtio-net driver and some other required
code from the s390-ccw bios into a s390-netboot.img binary that can be
used for network booting.

This way ...
- the main s390-ccw.img stays independent from the changes in the SLOF,
and we can tackle possible problems in the s390-netboot.img
independently. And we don't have to deal with coding style issues in the
libc and libnet.
- we can ship a s390-netboot.img with QEMU directly, so that network
booting is possible out of the box without forcing t

Re: [Qemu-devel] [PATCH v2] xenfb: remove xen_init_display "temporary" hack

2017-06-29 Thread Paul Durrant

> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 28 June 2017 19:37
> To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
> Cc: sstabell...@kernel.org; peter.mayd...@linaro.org; Anthony Perard
> ; kra...@redhat.com; Paul Durrant
> 
> Subject: [PATCH v2] xenfb: remove xen_init_display "temporary" hack
> 
> Initialize xenfb properly, as all other backends, from its own
> "initialise" function.
> 
> Signed-off-by: Stefano Stabellini 
> 
> ---
> Changes in v2:
> - remove xen_init_display from xen_backend.h
> - handle cases where vkbd is missing
> 
> diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c
> index e76c0d8..3b0168b 100644
> --- a/hw/display/xenfb.c
> +++ b/hw/display/xenfb.c
> @@ -71,7 +71,6 @@ struct XenFB {
>  int   fbpages;
>  int   feature_update;
>  int   bug_trigger;
> -int   have_console;
>  int   do_resize;
> 
>  struct {
> @@ -80,6 +79,7 @@ struct XenFB {
>  int   up_count;
>  int   up_fullscreen;
>  };
> +static const GraphicHwOps xenfb_ops;
> 
>  /*  */
> 
> @@ -855,6 +855,8 @@ static int fb_init(struct XenDevice *xendev)
>  static int fb_initialise(struct XenDevice *xendev)
>  {
>  struct XenFB *fb = container_of(xendev, struct XenFB, c.xendev);
> +struct XenDevice *xin;
> +struct XenInput *in;

I think the scope of 'in' can be limited to the 'else' clause where it is used 
below.

>  struct xenfb_page *fb_page;
>  int videoram;
>  int rc;
> @@ -877,16 +879,16 @@ static int fb_initialise(struct XenDevice *xendev)
>  if (rc != 0)
>   return rc;
> 
> -#if 0  /* handled in xen_init_display() for now */
> -if (!fb->have_console) {
> -fb->c.ds = graphic_console_init(xenfb_update,
> -xenfb_invalidate,
> -NULL,
> -NULL,
> -fb);
> -fb->have_console = 1;
> +fb->c.con = graphic_console_init(NULL, 0, &xenfb_ops, fb);
> +
> +xin = xen_pv_find_xendev("vkbd", xen_domid, 0);
> +if (xin == NULL) {
> +xen_pv_printf(xendev, 1, "xenfb is up, but vkbd is not present\n");
> +} else {
> +in = container_of(xin, struct XenInput, c.xendev);
> +in->c.con = fb->c.con;

I notice that in->c.con is tested in input_initialise() and it will fail if 
NULL, so there is an inherent race between the frontends and also processing of 
xenstore watches. That seems a little fragile at best, but the dependency of 
input_initialise() on in->c.con is also somewhat bogus as the vkbd backend is 
created for xenfv machine types whereas xenfb is only there for xenpv machine 
types. I think this needs to be fixed before xenfb can be relied on to 
initialize correctly in all circumstances.

  Paul

> +xen_be_check_state(xin);
>  }
> -#endif
> 
>  if (xenstore_read_fe_int(xendev, "feature-update", &fb-
> >feature_update) == -1)
>   fb->feature_update = 0;
> @@ -972,42 +974,3 @@ static const GraphicHwOps xenfb_ops = {
>  .gfx_update  = xenfb_update,
>  .update_interval = xenfb_update_interval,
>  };
> -
> -/*
> - * FIXME/TODO: Kill this.
> - * Temporary needed while DisplayState reorganization is in flight.
> - */
> -void xen_init_display(int domid)
> -{
> -struct XenDevice *xfb, *xin;
> -struct XenFB *fb;
> -struct XenInput *in;
> -int i = 0;
> -
> -wait_more:
> -i++;
> -main_loop_wait(true);
> -xfb = xen_pv_find_xendev("vfb", domid, 0);
> -xin = xen_pv_find_xendev("vkbd", domid, 0);
> -if (!xfb || !xin) {
> -if (i < 256) {
> -usleep(1);
> -goto wait_more;
> -}
> -xen_pv_printf(NULL, 1, "displaystate setup failed\n");
> -return;
> -}
> -
> -/* vfb */
> -fb = container_of(xfb, struct XenFB, c.xendev);
> -fb->c.con = graphic_console_init(NULL, 0, &xenfb_ops, fb);
> -fb->have_console = 1;
> -
> -/* vkbd */
> -in = container_of(xin, struct XenInput, c.xendev);
> -in->c.con = fb->c.con;
> -
> -/* retry ->init() */
> -xen_be_check_state(xin);
> -xen_be_check_state(xfb);
> -}
> diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
> index 79aef4e..31d2f25 100644
> --- a/hw/xenpv/xen_machine_pv.c
> +++ b/hw/xenpv/xen_machine_pv.c
> @@ -94,9 +94,6 @@ static void xen_init_pv(MachineState *machine)
> 
>  /* config cleanup hook */
>  atexit(xen_config_cleanup);
> -
> -/* setup framebuffer */
> -xen_init_display(xen_domid);
>  }
> 
>  static void xenpv_machine_init(MachineClass *mc)
> diff --git a/include/hw/xen/xen_backend.h
> b/include/hw/xen/xen_backend.h
> index 852c2ea..8a6fbcb 100644
> --- a/include/hw/xen/xen_backend.h
> +++ b/include/hw/xen/xen_backend.h
> @@ -55,8 +55

Re: [Qemu-devel] [PATCH v3 00/11] make dirty-bitmap byte-based

2017-06-29 Thread Vladimir Sementsov-Ogievskiy


Hi!

Can we apply my "[PATCH v22 00/30] qcow2: persistent dirty bitmaps" 
first? It was already near to the victory a week ago, but I had to 
rebase it on new Paolo's patches.


28.06.2017 20:55, Eric Blake wrote:

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

This is part two of that conversion: dirty-bitmap. Other parts
include bdrv_is_allocated (at v3 [1]) and replacing
bdrv_get_block_status with a byte based callback in all the
drivers (at v1, needs a rebase [3]).

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-dirty-v3

Depends on Kevin's block branch and my v3 bdrv_is_allocated [1]

Since v2 [2], I had to rebase on top of Paolo's locking fixes;
patch v2 2/12 is gone, and many of the others had a lot of
context conflicts. But I felt the resolution was simple enough
that I kept R-b on all but patch 8.

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06077.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03859.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02642.html

(git backport-diff doesn't like the rename in 7/11)

001/11:[] [--] 'dirty-bitmap: Report BlockDirtyInfo.count in bytes, as 
documented'
002/11:[0024] [FC] 'dirty-bitmap: Drop unused functions'
003/11:[] [-C] 'dirty-bitmap: Track size in bytes'
004/11:[] [-C] 'dirty-bitmap: Set iterator start by offset, not sector'
005/11:[] [-C] 'dirty-bitmap: Change bdrv_dirty_iter_next() to report byte 
offset'
006/11:[] [-C] 'dirty-bitmap: Change bdrv_get_dirty_count() to report bytes'
007/11:[down] 'dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes'
008/11:[0036] [FC] 'dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use 
bytes'
009/11:[] [--] 'mirror: Switch mirror_dirty_init() to byte-based iteration'
010/11:[0001] [FC] 'dirty-bitmap: Switch bdrv_set_dirty() to bytes'
011/11:[] [-C] 'dirty-bitmap: Convert internal hbitmap size/granularity'

Eric Blake (11):
   dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented
   dirty-bitmap: Drop unused functions
   dirty-bitmap: Track size in bytes
   dirty-bitmap: Set iterator start by offset, not sector
   dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
   dirty-bitmap: Change bdrv_get_dirty_count() to report bytes
   dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes
   dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
   mirror: Switch mirror_dirty_init() to byte-based iteration
   dirty-bitmap: Switch bdrv_set_dirty() to bytes
   dirty-bitmap: Convert internal hbitmap size/granularity

  include/block/block_int.h|   2 +-
  include/block/dirty-bitmap.h |  28 
  block/backup.c   |   7 ++-
  block/dirty-bitmap.c | 105 +++
  block/io.c   |   6 +--
  block/mirror.c   |  73 +-
  migration/block.c|  12 +++--
  7 files changed, 79 insertions(+), 154 deletions(-)



--
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH 1/6] libqos: fix typo in virtio.h QVirtQueue->used comment

2017-06-29 Thread Fam Zheng

On Wed, 06/28 19:47, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tests/libqos/virtio.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/libqos/virtio.h b/tests/libqos/virtio.h
> index 3397a08..829de5e 100644
> --- a/tests/libqos/virtio.h
> +++ b/tests/libqos/virtio.h
> @@ -26,7 +26,7 @@ typedef struct QVirtioDevice {
>  typedef struct QVirtQueue {
>  uint64_t desc; /* This points to an array of struct vring_desc */
>  uint64_t avail; /* This points to a struct vring_avail */
> -uint64_t used; /* This points to a struct vring_desc */
> +uint64_t used; /* This points to a struct vring_used */
>  uint16_t index;
>  uint32_t size;
>  uint32_t free_head;
> -- 
> 2.9.4
> 
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [RFC PATCH 0/5] hotplug: fix premature rebinding of VFIO devices to host

2017-06-29 Thread Daniel P. Berrange

On Wed, Jun 28, 2017 at 07:24:55PM -0500, Michael Roth wrote:
> Hi everyone. Hoping to get some feedback on this approach, or some
> alternatives proposed below, to the following issue:
> 
> Currently libvirt immediately attempts to rebind a managed device back to the
> host driver when it receives a DEVICE_DELETED event from QEMU. This is
> problematic for 2 reasons:
> 
> 1) If multiple devices from a group are attached to a guest, this can move
>the group into a "non-viable" state where some devices are assigned to
>the host and some to the guest.
> 
> 2) When QEMU emits the DEVICE_DELETED event, there's still a "finalize" phase
>where additional cleanup occurs. In most cases libvirt can ignore this
>cleanup, but in the case of VFIO devices this is where closing of a VFIO
>group FD occurs, and failing to wait before rebinding the device to the
>host driver can result in unexpected behavior. In the case of powernv
>hosts at least, this can lead to a host driver crashing due to the default
>DMA windows not having been fully-restored yet. The window between this is
>and the initial DEVICE_DELETED seems to be ~6 seconds in practice. We've
>seen host dumps with Mellanox CX4 VFs being rebound to host driver during
>this period (on powernv hosts).

Why on earth does QEMU's device finalization take 6 seconds to complete.
That feels very broken to me, unless QEMU is not being schedled due to
host being overcomitted. If that's not the case, then we have a bug to
investigate in QEMU to find out why cleanup is delayed so long.


>From libvirt's POV, we consider 'DEVICE_DELETED' as meaning both that the
frontend has gone *and* the corresponding backend has gone. Aside from
cleaning the VFIO group, we use this as a trigger for all other device
related cleanup like SELinux labelling, cgroup device ACLs, etc. If the
backend is not guaranteed to be closed in QEMU when this emit is emitted
then either QEMU needs to delay the event until it is really cleaned up,
or QEMU needs to add a further event to emit when the backend is clean.


> Patches 1-4 address 1) by deferring rebinding of a hostdev to the host driver
> until all the devices in the group have been detached, at which point all
> the hostdevs are rebound as a group. Until that point, the devices are traced
> by the drvManager's inactiveList in a similar manner to hostdevs that are
> assigned to VFIO via the nodedev-detach interface.
> 
> Patch 5 addresses 2) by adding an additional check that, when the last device
> from a group is detached, polls /proc for open FDs referencing the VFIO group
> path in /dev/vfio/ and waiting for the FD to be closed. If we
> time out, we abandon rebinding the hostdevs back to the host.

That is just gross - it is tieing libvirt to details of the QEMU internal
implementation. I really don't think we should be doing that. So NACK to
this from my POV.

> There are a couple alternatives to Patch 5 that might be worth considering:
> 
> a) Add a DEVICE_FINALIZED event to QEMU and wait for that instead of
>DEVICE_DELETED. Paired with patches 1-4 this would let us drop patch 5 in
>favor of minimal changes to libvirt's event handlers.
> 
>The downsides are:
> - that we'd incur some latency for all device-detach calls, but it's not
>   apparent to me whether this delay is significant for anything outside
>   of VFIO.
> - there may be cases where finalization after DEVICE_DELETE/unparent are
>   is not guaranteed, and I'm not sure QOM would encourage such
>   expectations even if that's currently the case.
> 
> b) Add a GROUP_DELETED event to VFIO's finalize callback. This is the most
>direct solution. With this we could completely separate out the handling
>of rebinding to host driver based on receival of this event.
> 
>The downsides are:
> - this would only work for newer versions of QEMU, though we could use
>   the poll-wait in patch 5 as a fallback.
> - synchronizing sync/async device-detach threads with sync/async
>   handlers for this would be a bit hairy, but I have a WIP in progress
>   that seems *fairly reasonable*
> 
> c) Take the approach in Patch 5, either as a precursor to implementing b) or
>something else, or just sticking with that for now.
> 
> d) ???

Fix DEVICE_DELETE so its only emitted when the backend associated with
the device is fully cleaned up.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC PATCH 7/8] tcg/tci: time to remove it :(

2017-06-29 Thread Daniel P. Berrange

On Wed, Jun 28, 2017 at 10:02:59PM -0300, Philippe Mathieu-Daudé wrote:
> "./configure --disable-tcg-interpreter" generates a warning:
>   ./configure: --disable-tcg-interpreter is obsolete, Experimental TCG 
> interpreter has been removed"
> 
> "./configure --enable-tcg-interpreter" generates an error:
> 
>   Experimental TCG interpreter has been removed

configure will already complain about any unknown command line arguments
it is given, so you could just delete --enable-tcg-interpreter support
and rely on it saying the option does not exist.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 2/6] libqos: add virtio used ring support

2017-06-29 Thread Fam Zheng

On Wed, 06/28 19:47, Stefan Hajnoczi wrote:
> Existing tests do not touch the virtqueue used ring.  Instead they poll
> the virtqueue ISR register and peek into their request's device-specific
> status field.
> 
> It turns out that the virtqueue ISR register can be set to 1 more than
> once for a single notification (see commit
> 83d768b5640946b7da55ce8335509df297e2c7cd "virtio: set ISR on dataplane
> notifications").  This causes problems for tests that assume a 1:1
> correspondence between the ISR being 1 and request completion.
> 
> Peeking at device-specific status fields is also problematic if the
> device has no field that can be abused for EINPROGRESS polling
> semantics.  This is the case if all the field's values may be set by the
> device; there's no magic constant left for polling.
> 
> It's time to process the used ring for completed requests, just like a
> real virtio guest driver.  This patch adds the necessary APIs.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [RFC PATCH 00/14] Implement network booting directly into the s390-ccw BIOS

2017-06-29 Thread Christian Borntraeger

On 06/29/2017 10:17 AM, Thomas Huth wrote:
> On 28.06.2017 10:59, Thomas Huth wrote:
>> On 28.06.2017 09:43, Christian Borntraeger wrote:
>>> Interesting work, thanks for giving it a ry.
>>>
>>> On 06/27/2017 01:48 PM, Thomas Huth wrote:
 It's already possible to do a network boot of an s390x guest with an
 external netboot image (based on a Linux installation), but it would
 be much more convenient if the s390-ccw firmware supported network
 booting right out of the box, without the need to assemble such an
 external image first.

 This patch series now introduces network booting via DHCP and TFTP
 directly into the s390-ccw firmware by re-using the networking stack
 from the SLOF firmware (see https://github.com/aik/SLOF/ for details),
 and adds a driver for virtio-net-ccw devices.
>>>
>>> What is the licensing situation with SLOF?
>>
>> All its code is licensed under the 3-Clause BSD License:
>>
>>  https://github.com/aik/SLOF/blob/master/LICENSE
>>
>> AFAIK it should be fine to use such code in GPL-ed sources like the
>> s390-ccw firmware, shouldn't it?
>>
 - Is it OK to require loading an .INS file first? Or does anybody
   have a better idea how to load multiple files (kernel, initrd,
   etc. ...)?
>>>
>>> I agree with Viktor that supporting a pxelinux config file is probably the
>>> better way, since this is what all exisiting servers admins understand. For
>>> the time being Linux will be the most relevant guest and requiring changes 
>>> in
>>> management infrastructure will certainly make things very hard.
>>
>> I have to say, the more I read about pxelinux, the more I think we
>> should *not* directly support this in the firmware. pxelinux is clearly
>> a secondary stage bootloader, with a rather complex config file, and
>> features like config file name guessing via MAC-address or IP-address
>> ... if we really want to support that on s390x, too, I think it should
>> stay in a external binary instead, and not directly incorporated into
>> the s390-ccw firmware (so that users finally have the choice whether
>> they want to use pxelinux-style config files or grub2 or something
>> different one day).
>>
>> I guess the .INS file parsing in firmware was already a bad idea... all
>> other typical firmware implementations can also only load one file - and
>> if you need to load multiple files, you've got to use a secondary stage
>> bootloader like pxelinux, yaboot or grub2. So if we agree to add network
>> booting directly into the s390-ccw firmware, I think we should do the
>> same here and only support loading of one file (without an additional
>> config file). But then the question is, of course, whether it makes
>> sense to add that support to the firmware at all, or whether we should
>> simply continue with the current "s390-netboot.img" secondary-loader
>> approach...
>>
 - The code from SLOF uses a different coding style (TABs instead
   of space) ... is it OK to keep that coding style here so we
   can share patches between SLOF and s390-ccw more easily?
>>>
>>> Is it possible to enhance SLOF and then link to the existing SLOF code?
>>
>> I already submitted some of the clean-up patches to the SLOF mailing
>> list, and Alexey has merged them now, e.g.:
>>
>>  https://github.com/aik/SLOF/commit/140c3f39db4ce4c0
>>
>> And as I already mentioned in my reply to David, it should theoretically
>> be possible to use the code from the roms/SLOF submodule in QEMU ...
>> but then we've got to deal with sudden changes in the SLOF repository
>> which might cause unwanted problems in the s390-ccw firmware. I guess we
>> could give it a try (the libc code is very, very stable in SLOF, and the
>> libnet code also changes only very seldomly) - of course only if we
>> really decide that we want to have TFTP support directly in the
>> firmware. If we rather want to continue with the s390-netboot.img
>> approach instead, I have got to reconsider whether I continue with my
>> efforts by putting that stuff into an external binary, or whether it
>> makes more sense to look into porting pxelinux, grub2 or petitboot
>> instead...
> 
> OK, thinking about all of this again, what do you think of the following
> approach:
> 
> Let's do not include network booting code into the s390-ccw.img, but
> keep the way that it uses an external s390-netboot.img for this job.
> 
> I'll then try to add an additional s390-netboot.img target to the
> Makefile, which will only be built if the roms/SLOF submodule has been
> checked out. That target then uses the libc and libnet from the SLOF
> submodule and links it with my virtio-net driver and some other required
> code from the s390-ccw bios into a s390-netboot.img binary that can be
> used for network booting.
> 
> This way ...
> - the main s390-ccw.img stays independent from the changes in the SLOF,
> and we can tackle possible problems in the s390-netboot.img
> independently. And we don't have to deal with coding style

Re: [Qemu-devel] [PATCH 3/6] tests: fix virtio-scsi-test ISR dependence

2017-06-29 Thread Fam Zheng

On Wed, 06/28 19:47, Stefan Hajnoczi wrote:
> Use the new used ring APIs instead of assuming ISR being set means the
> request has completed.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] Getting rid of xen_init_display() (and its dubious call into main_loop_wait())

2017-06-29 Thread Paul Durrant

> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 28 June 2017 19:20
> To: Peter Maydell 
> Cc: Stefano Stabellini ; QEMU Developers  de...@nongnu.org>; Anthony Perard ; Gerd
> Hoffmann ; Paul Durrant ;
> xen-de...@lists.xenproject.org
> Subject: Re: Getting rid of xen_init_display() (and its dubious call into
> main_loop_wait())
> 
> On Wed, 28 Jun 2017, Peter Maydell wrote:
> > On 28 June 2017 at 01:06, Stefano Stabellini 
> wrote:
> > > On Tue, 27 Jun 2017, Peter Maydell wrote:
> > >> So, there is exactly one caller of main_loop_wait() in the tree that
> > >> passes it 'true' as an argument. That caller is in xen_init_display()
> > >> in hw/dispaly/xenfb.c. The function was added in 2009 with the
> comment
> > >> "FIXME/TODO: Kill this. Temporary needed while DisplayState
> > >> reorganization is in flight."
> > >>
> > >> I'd like to think that we've now completed whatever reorg that was,
> > >> 8 years on, so can we now get rid of this function? It definitely
> > >> seems very dubious to have a display init function with a busy loop
> > >> and a call into main_loop_wait()...
> > >
> > > LOL, you gotta love "temporary fixes". I am happy to see it wasn't me
> > > that added it ;-)
> > >
> > > I think the following should do the trick.
> >
> > Thanks!
> >
> > > ---
> > >
> > > xenfb: remove xen_init_display "temporary" hack
> > >
> > > Initialize xenfb properly, as all other backends, from its own
> > > "initialise" function.
> > >
> > > Signed-off-by: Stefano Stabellini 
> > >
> > > diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c
> > > index e76c0d8..873e51f 100644
> > > --- a/hw/display/xenfb.c
> > > +++ b/hw/display/xenfb.c
> > > @@ -71,7 +71,6 @@ struct XenFB {
> > >  int   fbpages;
> > >  int   feature_update;
> > >  int   bug_trigger;
> > > -int   have_console;
> > >  int   do_resize;
> > >
> > >  struct {
> > > @@ -80,6 +79,7 @@ struct XenFB {
> > >  int   up_count;
> > >  int   up_fullscreen;
> > >  };
> > > +static const GraphicHwOps xenfb_ops;
> > >
> > >  /*  
> > > */
> > >
> > > @@ -855,6 +855,8 @@ static int fb_init(struct XenDevice *xendev)
> > >  static int fb_initialise(struct XenDevice *xendev)
> > >  {
> > >  struct XenFB *fb = container_of(xendev, struct XenFB, c.xendev);
> > > +struct XenDevice *xin;
> > > +struct XenInput *in;
> > >  struct xenfb_page *fb_page;
> > >  int videoram;
> > >  int rc;
> > > @@ -877,16 +879,12 @@ static int fb_initialise(struct XenDevice
> *xendev)
> > >  if (rc != 0)
> > > return rc;
> > >
> > > -#if 0  /* handled in xen_init_display() for now */
> > > -if (!fb->have_console) {
> > > -fb->c.ds = graphic_console_init(xenfb_update,
> > > -xenfb_invalidate,
> > > -NULL,
> > > -NULL,
> > > -fb);
> > > -fb->have_console = 1;
> > > -}
> > > -#endif
> > > +fb->c.con = graphic_console_init(NULL, 0, &xenfb_ops, fb);
> > > +
> > > +xin = xen_pv_find_xendev("vkbd", xen_domid, 0);
> > > +in = container_of(xin, struct XenInput, c.xendev);
> > > +in->c.con = fb->c.con;
> >
> > Won't this crash if xen_pv_find_xendev() returned NULL?
> > Or are we guaranteed that that can't happen here?
> 
> As long as there is a vkdb device, it will be already added to the
> xendevs list at this point. However, if there isn't a device at all,
> then it would crash. In that case, I think we should print a warning and
> continue without it.
> 
> I'll send an updated patch.

There is still the fact the vkbd can't initialise until vfb has done so. This 
interdependency is subtle and IMO bogus. It needs to be cleared up.

  Paul

Re: [Qemu-devel] [RFC PATCH 0/8] removal of tci (tcg interpreter)

2017-06-29 Thread Thomas Huth

 Hi Philippe,

On 29.06.2017 03:02, Philippe Mathieu-Daudé wrote:
> There have been some comments on the ML about the usefulness of tci.
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg04551.html
> 
>   Peter Maydell> I'd prefer we just got rid of it.
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg04296.html
> 
>   Richard Henderson> Is it time to remove it? I'm pretty sure the only hosts
>  for which it will work have proper backends...
> 
> Richard quotes are way clearer than me trying to paraphrase what he told me:
> - it doesn't use libffi, and as such the way it makes calls to helpers 
> doesn't work for many hosts.
> - we already cover almost everthing that debian does. if debian or gentoo 
> doesn't support it, one can confidently say there's little interest.
> - if someone *does* want to run qemu on something else, it isn't difficult to 
> port tcg.
> 
> I figured out MAINTAINERS was unsync, so added patches 1-4, they are not 
> really
> tci-related.

Since they are not related to TCI at all, please submit these as
separate series.

> Patches 5,6 are trivial fixes to let the codebase quite sane if there is 
> future
> need to revert/reimport tci.

I think this should go into 2.10...

> Patches 7,8 are the removal, marked RFC... let's debate!

... but NACK for a direct removal. Common sense is to make obsolete
features as deprecated first and then wait for 2 public releases before
the final removal, so that users still have a chance to speak up in case
they still need the feature and are willing to maintain it.

Please see the following URL for details (and please also add an entry
for TCI in the Miscellaneous section there):

  http://wiki.qemu.org/Features/LegacyRemoval

 Thomas

Re: [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for multiple clusters

2017-06-29 Thread Ashijeet Acharya

On Tue, Jun 27, 2017 at 1:34 PM, Fam Zheng  wrote:
> On Mon, 06/05 13:22, Ashijeet Acharya wrote:
>> @@ -1876,6 +1942,13 @@ static int vmdk_pwritev(BlockDriverState *bs, 
>> uint64_t offset,
>>  offset += n_bytes;
>>  bytes_done += n_bytes;
>>
>> +while (m_data->next != NULL) {
>
> If you do
>
>while (m_data) {
>
>> +VmdkMetaData *next;
>> +next = m_data->next;
>> +g_free(m_data);
>> +m_data = next;
>> +}
>> +
>>  /* update CID on the first write every time the virtual disk is
>>   * opened */
>>  if (!s->cid_updated) {
>> @@ -1886,6 +1959,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t 
>> offset,
>>  s->cid_updated = true;
>>  }
>>  }
>> +g_free(m_data);
>
> then you can remove this line.

As I explained last time, I can't do this because I am reusing the
first allocated m_data. If I am to do it the way you suggest, I will
have to move the allocation of first m_data (m_data =
g_new0(VmdkMetaData, 1)) inside the outer while loop, otherwise things
will segfault.

Ashijeet

Re: [Qemu-devel] [PATCH 4/6] tests: fix virtio-blk-test ISR dependence

2017-06-29 Thread Fam Zheng

On Wed, 06/28 19:47, Stefan Hajnoczi wrote:
> Use the new used ring APIs instead of assuming ISR being set means the
> request has completed.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH 5/6] tests: fix virtio-net-test ISR dependence

2017-06-29 Thread Fam Zheng

On Wed, 06/28 19:47, Stefan Hajnoczi wrote:
> Use the new used ring APIs instead of assuming ISR being set means the
> request has completed.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH 6/6] virtio-pci: use ioeventfd even when KVM is disabled

2017-06-29 Thread Fam Zheng

On Wed, 06/28 19:47, Stefan Hajnoczi wrote:
> Old kvm.ko versions only supported a tiny number of ioeventfds so
> virtio-pci avoids ioeventfds when kvm_has_many_ioeventfds() returns 0.
> 
> Do not check kvm_has_many_ioeventfds() when KVM is disabled since it
> always returns 0.  Since commit 8c56c1a592b5092d91da8d8943c1d6462a6f
> ("memory: emulate ioeventfd") it has been possible to use ioeventfds in
> qtest or TCG mode.
> 
> This patch makes -device virtio-blk-pci,iothread=iothread0 work even
> when KVM is disabled.
> 
> I have tested that virtio-blk-pci works under TCG both with and without
> iothread.
> 
> This patch fixes qemu-iotests 068, which was accidentally merged early
> despite the dependency on ioeventfd.
> 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Stefan Hajnoczi 
> Reviewed-by: Michael S. Tsirkin 
> Message-id: 20170615163813.7255-2-stefa...@redhat.com
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH] replace struct ucontext with ucontext_t type

2017-06-29 Thread Peter Maydell

On 28 June 2017 at 21:44, Khem Raj  wrote:
> The ucontext_t type had a tag struct ucontext until now
> but newer glibc will drop it so we need to adjust and use
> the exposed type instead

If true this seems like a bug in glibc to break
existing working programs, and it should be fixed there...

thanks
-- PMM

Re: [Qemu-devel] [PATCH v5 3/5] virtio-9p: break device if buffers are misconfigured

2017-06-29 Thread Greg Kurz

On Wed, 28 Jun 2017 22:44:30 +0200
Greg Kurz  wrote:

> The 9P protocol is transport agnostic: if the guest misconfigured the
> buffers, the best we can do is to set the broken flag on the device.
> 
> Since virtio_pdu_vmarshal() may be called by several active PDUs, we
> check if the transport isn't broken already to avoid printing extra
> error messages.
> 

Oops, forgot to drop this last sentence... Will do when pushing to my tree.

> Signed-off-by: Greg Kurz 
> ---
> v5: - use ssize_t variable in virtio_pdu_v[un]marshal()
> - drop remaining vdev->broken check (MST suggested to discuss calling
>   virtio_error() when the device is already broken to a separate thread)
> ---
>  hw/9pfs/9p.c   |2 +-
>  hw/9pfs/9p.h   |2 +-
>  hw/9pfs/virtio-9p-device.c |   40 
>  hw/9pfs/xen-9p-backend.c   |3 ++-
>  4 files changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 96d268334865..da0d6da65b45 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1664,7 +1664,7 @@ static void v9fs_init_qiov_from_pdu(QEMUIOVector *qiov, 
> V9fsPDU *pdu,
>  unsigned int niov;
>  
>  if (is_write) {
> -pdu->s->transport->init_out_iov_from_pdu(pdu, &iov, &niov);
> +pdu->s->transport->init_out_iov_from_pdu(pdu, &iov, &niov, size + 
> skip);
>  } else {
>  pdu->s->transport->init_in_iov_from_pdu(pdu, &iov, &niov, size + 
> skip);
>  }
> diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> index aac1b0b2ce3d..d1cfeaf10e4f 100644
> --- a/hw/9pfs/9p.h
> +++ b/hw/9pfs/9p.h
> @@ -363,7 +363,7 @@ struct V9fsTransport {
>  void(*init_in_iov_from_pdu)(V9fsPDU *pdu, struct iovec **piov,
>  unsigned int *pniov, size_t size);
>  void(*init_out_iov_from_pdu)(V9fsPDU *pdu, struct iovec **piov,
> - unsigned int *pniov);
> + unsigned int *pniov, size_t size);
>  void(*push_and_notify)(V9fsPDU *pdu);
>  };
>  
> diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> index 1a68c1622d3a..62650b0a6b99 100644
> --- a/hw/9pfs/virtio-9p-device.c
> +++ b/hw/9pfs/virtio-9p-device.c
> @@ -146,8 +146,16 @@ static ssize_t virtio_pdu_vmarshal(V9fsPDU *pdu, size_t 
> offset,
>  V9fsState *s = pdu->s;
>  V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
>  VirtQueueElement *elem = v->elems[pdu->idx];
> +ssize_t ret;
>  
> -return v9fs_iov_vmarshal(elem->in_sg, elem->in_num, offset, 1, fmt, ap);
> +ret = v9fs_iov_vmarshal(elem->in_sg, elem->in_num, offset, 1, fmt, ap);
> +if (ret < 0) {
> +VirtIODevice *vdev = VIRTIO_DEVICE(v);
> +
> +virtio_error(vdev, "Failed to encode VirtFS reply type %d",
> + pdu->id + 1);
> +}
> +return ret;
>  }
>  
>  static ssize_t virtio_pdu_vunmarshal(V9fsPDU *pdu, size_t offset,
> @@ -156,28 +164,52 @@ static ssize_t virtio_pdu_vunmarshal(V9fsPDU *pdu, 
> size_t offset,
>  V9fsState *s = pdu->s;
>  V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
>  VirtQueueElement *elem = v->elems[pdu->idx];
> +ssize_t ret;
> +
> +ret = v9fs_iov_vunmarshal(elem->out_sg, elem->out_num, offset, 1, fmt, 
> ap);
> +if (ret < 0) {
> +VirtIODevice *vdev = VIRTIO_DEVICE(v);
>  
> -return v9fs_iov_vunmarshal(elem->out_sg, elem->out_num, offset, 1, fmt, 
> ap);
> +virtio_error(vdev, "Failed to decode VirtFS request type %d", 
> pdu->id);
> +}
> +return ret;
>  }
>  
> -/* The size parameter is used by other transports. Do not drop it. */
>  static void virtio_init_in_iov_from_pdu(V9fsPDU *pdu, struct iovec **piov,
>  unsigned int *pniov, size_t size)
>  {
>  V9fsState *s = pdu->s;
>  V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
>  VirtQueueElement *elem = v->elems[pdu->idx];
> +size_t buf_size = iov_size(elem->in_sg, elem->in_num);
> +
> +if (buf_size < size) {
> +VirtIODevice *vdev = VIRTIO_DEVICE(v);
> +
> +virtio_error(vdev,
> + "VirtFS reply type %d needs %zu bytes, buffer has %zu",
> + pdu->id + 1, size, buf_size);
> +}
>  
>  *piov = elem->in_sg;
>  *pniov = elem->in_num;
>  }
>  
>  static void virtio_init_out_iov_from_pdu(V9fsPDU *pdu, struct iovec **piov,
> - unsigned int *pniov)
> + unsigned int *pniov, size_t size)
>  {
>  V9fsState *s = pdu->s;
>  V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
>  VirtQueueElement *elem = v->elems[pdu->idx];
> +size_t buf_size = iov_size(elem->out_sg, elem->out_num);
> +
> +if (buf_size < size) {
> +VirtIODevice *vdev = VIRTIO_DEVICE(v);
> +
> +virtio_error(vdev,
> +

[Qemu-devel] [Bug 1700380] Re: commit snapshot image got Permission denied error

2017-06-29 Thread Thomas Huth

Closing, according to comment #2

** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1700380

Title:
  commit snapshot image got Permission denied error

Status in QEMU:
  Invalid

Bug description:
  qemu 2.9.0, adm64, start image with -snapshot param, make some changes
  in the image, then:

  $telnet localhost 7000

  (qemu) commit virtio0
  'commit' error for 'virtio0': Permission denied

  Nerver met this problem before, commit is ok. I recently compiled
  v2.9.0, so is there some new param in qemu-qemu-system-x86_64 to avoid
  commit Permission denied?

  Regards.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1700380/+subscriptions

Re: [Qemu-devel] [RFC v1 1/4] util/aio-win32: Only select on what we are actually waiting for

2017-06-29 Thread Fam Zheng

On Tue, 06/27 16:57, Alistair Francis wrote:
> Signed-off-by: Alistair Francis 
> Acked-by: Edgar E. Iglesias 
> ---
> 
>  util/aio-win32.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/util/aio-win32.c b/util/aio-win32.c
> index bca496a47a..949979c2f5 100644
> --- a/util/aio-win32.c
> +++ b/util/aio-win32.c
> @@ -71,6 +71,7 @@ void aio_set_fd_handler(AioContext *ctx,
>  }
>  } else {
>  HANDLE event;
> +long bitmask = 0;
>  
>  if (node == NULL) {
>  /* Alloc and insert if it's not already there */
> @@ -95,10 +96,16 @@ void aio_set_fd_handler(AioContext *ctx,
>  node->io_write = io_write;
>  node->is_external = is_external;
>  
> +if (io_read) {
> +bitmask |= FD_READ;
> +}
> +
> +if (io_write) {
> +bitmask |= FD_WRITE;
> +}
> +
>  event = event_notifier_get_handle(&ctx->notifier);
> -WSAEventSelect(node->pfd.fd, event,
> -   FD_READ | FD_ACCEPT | FD_CLOSE |
> -   FD_CONNECT | FD_WRITE | FD_OOB);
> +WSAEventSelect(node->pfd.fd, event, bitmask);
>  }
>  
>  qemu_lockcnt_unlock(&ctx->list_lock);

Not sure if it's okay to drop accept/close/connect/oob altogether, Cc'ing Paolo
who knows Windows stuff.

Fam

[Qemu-devel] [PATCH v7 1/8] vmdk: Move vmdk_find_offset_in_cluster() to the top

2017-06-29 Thread Ashijeet Acharya

Move the existing vmdk_find_offset_in_cluster() function to the top of
the driver.

Signed-off-by: Ashijeet Acharya 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a9bd22b..22be887 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -242,6 +242,18 @@ static void vmdk_free_last_extent(BlockDriverState *bs)
 s->extents = g_renew(VmdkExtent, s->extents, s->num_extents);
 }
 
+static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
+   int64_t offset)
+{
+uint64_t extent_begin_offset, extent_relative_offset;
+uint64_t cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
+
+extent_begin_offset =
+(extent->end_sector - extent->sectors) * BDRV_SECTOR_SIZE;
+extent_relative_offset = offset - extent_begin_offset;
+return extent_relative_offset % cluster_size;
+}
+
 static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
 {
 char *desc;
@@ -1266,18 +1278,6 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
 return NULL;
 }
 
-static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
-   int64_t offset)
-{
-uint64_t extent_begin_offset, extent_relative_offset;
-uint64_t cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
-
-extent_begin_offset =
-(extent->end_sector - extent->sectors) * BDRV_SECTOR_SIZE;
-extent_relative_offset = offset - extent_begin_offset;
-return extent_relative_offset % cluster_size;
-}
-
 static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
   int64_t sector_num)
 {
-- 
2.6.2

[Qemu-devel] [PATCH v7 0/8] Optimize VMDK I/O by allocating multiple clusters

2017-06-29 Thread Ashijeet Acharya

Previously posted series patches:
v1 - http://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg02044.html
v2 - http://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg05080.html
v3 - http://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg00074.html
v4 - http://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg03851.html
v5 - http://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg00929.html
v6 - http://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg00947.html

This series helps to optimize the I/O performance of VMDK driver.

Patch 1 helps us to move vmdk_find_offset_in_cluster.

Patch 2 & 3 perform a simple function re-naming tasks.

Patch 4 is used to factor out metadata loading code and implement it in separate
functions. This will help us to avoid code duplication in future patches of this
series.

Patch 5 helps to set the upper limit of the bytes handled in one cycle.

Patch 6 adds new functions to help us allocate multiple clusters according to
the size requested, perform COW if required and return the offset of the first
newly allocated cluster.

Patch 7 changes the metadata update code to update the L2 tables for multiple
clusters at once.

Patch 8 helps us to finally change vmdk_get_cluster_offset() to find cluster 
offset
only as cluster allocation task is now handled by vmdk_alloc_clusters()

Optimization test results:

This patch series improves 128 KB sequential write performance to an
empty VMDK file by 54%

Benchmark command: ./qemu-img bench -w -c 1024 -s 128K -d 1 -t none -f
vmdk test.vmdk

Changes in v7:
- comment the use of MIN() in calculating skip_end_bytes
- use extent->cluster_sectors instead of 128
- place check for m_data != NULL
- use g_new0(VmdkMetaData, 1) instead of g_malloc0(sizeof(*m_data))

Changes in v6:
- rename total_alloc_clusters as alloc_clusters_counter (fam)

Changes in v5:
- fix commit message and comment in patch 4 (fam)
- add vmdk_ prefix to handle_alloc() (fam)
- fix alignment issue in patch 6 (fam)
- use BDRV_SECTOR_BITS (fam)
- fix endianness calculation in patch 7 (fam)

Changes in v4:
- fix commit message in patch 1 (fam)
- drop size_to_clusters() function (fam)
- fix grammatical errors in function documentations (fam)
- factor out metadata loading coding in a separate patch (patch 4) (fam)
- rename vmdk_alloc_cluster_offset() to vmdk_alloc_clusters() (fam)
- break patch 4(in v3) into separate patches (patch 3 and 8) (fam)
- rename extent_size to extent_end (fam)
- use QEMU_ALIGN_UP instead of vmdk_align_offset. (fam)
- drop next and simply do m_data = m_data->next (fam)

Changes in v3:
- move size_to_clusters() from patch 1 to 3 (fam)
- use DIV_ROUND_UP in size_to_clusters (fam)
- make patch 2 compilable (fam)
- rename vmdk_L2update as vmdk_l2update and use UINT32_MAX (fam)
- combine patch 3 and patch 4 (as in v2) to make them compilable (fam)
- call bdrv_pwrite_sync() for batches of atmost 512 clusters at once (fam)

Changes in v2:
- segregate the ugly Patch 1 in v1 into 6 readable and sensible patches
- include benchmark test results in v2

Ashijeet Acharya (8):
  vmdk: Move vmdk_find_offset_in_cluster() to the top
  vmdk: Rename get_whole_cluster() to vmdk_perform_cow()
  vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  vmdk: Factor out metadata loading code out of
vmdk_get_cluster_offset()
  vmdk: Set maximum bytes allocated in one cycle
  vmdk: New functions to assist allocating multiple clusters
  vmdk: Update metadata for multiple clusters
  vmdk: Make vmdk_get_cluster_offset() return cluster offset only

 block/vmdk.c | 537 +--
 1 file changed, 415 insertions(+), 122 deletions(-)

-- 
2.6.2

[Qemu-devel] [PATCH v7 7/8] vmdk: Update metadata for multiple clusters

2017-06-29 Thread Ashijeet Acharya

Include a next pointer in VmdkMetaData struct to point to the previous
allocated L2 table. Modify vmdk_L2update to start updating metadata for
allocation of multiple clusters at once.

Signed-off-by: Ashijeet Acharya 
---
 block/vmdk.c | 128 ++-
 1 file changed, 101 insertions(+), 27 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 277db16..60b8adc 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -137,6 +137,8 @@ typedef struct VmdkMetaData {
 int valid;
 uint32_t *l2_cache_entry;
 uint32_t nb_clusters;
+uint32_t offset;
+struct VmdkMetaData *next;
 } VmdkMetaData;
 
 typedef struct VmdkGrainMarker {
@@ -1116,34 +1118,87 @@ exit:
 return ret;
 }
 
-static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
- uint32_t offset)
+static int vmdk_alloc_cluster_link_l2(VmdkExtent *extent,
+  VmdkMetaData *m_data, bool zeroed)
 {
-offset = cpu_to_le32(offset);
+int i;
+uint32_t offset, temp_offset;
+int *l2_table_array;
+int l2_array_size;
+
+if (zeroed) {
+temp_offset = VMDK_GTE_ZEROED;
+} else {
+temp_offset = m_data->offset;
+}
+
+l2_array_size = sizeof(uint32_t) * m_data->nb_clusters;
+l2_table_array = qemu_try_blockalign(extent->file->bs,
+ QEMU_ALIGN_UP(l2_array_size,
+   BDRV_SECTOR_SIZE));
+if (l2_table_array == NULL) {
+return VMDK_ERROR;
+}
+memset(l2_table_array, 0, QEMU_ALIGN_UP(l2_array_size, BDRV_SECTOR_SIZE));
 /* update L2 table */
+offset = temp_offset;
+for (i = 0; i < m_data->nb_clusters; i++) {
+l2_table_array[i] = cpu_to_le32(offset);
+if (!zeroed) {
+offset += extent->cluster_sectors;
+}
+}
 if (bdrv_pwrite_sync(extent->file,
-((int64_t)m_data->l2_offset * 512)
-+ (m_data->l2_index * sizeof(offset)),
-&offset, sizeof(offset)) < 0) {
+ ((int64_t)m_data->l2_offset * 512)
+ + ((m_data->l2_index) * sizeof(offset)),
+ l2_table_array, l2_array_size) < 0) {
 return VMDK_ERROR;
 }
 /* update backup L2 table */
 if (extent->l1_backup_table_offset != 0) {
 m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
 if (bdrv_pwrite_sync(extent->file,
-((int64_t)m_data->l2_offset * 512)
-+ (m_data->l2_index * sizeof(offset)),
-&offset, sizeof(offset)) < 0) {
+ ((int64_t)m_data->l2_offset * 512)
+ + ((m_data->l2_index) * sizeof(offset)),
+ l2_table_array, l2_array_size) < 0) {
 return VMDK_ERROR;
 }
 }
+
+offset = temp_offset;
 if (m_data->l2_cache_entry) {
-*m_data->l2_cache_entry = offset;
+for (i = 0; i < m_data->nb_clusters; i++) {
+*m_data->l2_cache_entry = cpu_to_le32(offset);
+m_data->l2_cache_entry++;
+
+if (!zeroed) {
+offset += extent->cluster_sectors;
+}
+}
 }
 
+qemu_vfree(l2_table_array);
 return VMDK_OK;
 }
 
+static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
+ bool zeroed)
+{
+int ret;
+
+while (m_data->next != NULL) {
+
+ret = vmdk_alloc_cluster_link_l2(extent, m_data, zeroed);
+if (ret < 0) {
+return ret;
+}
+
+m_data = m_data->next;
+ }
+
+ return VMDK_OK;
+}
+
 /*
  * vmdk_l2load
  *
@@ -1260,9 +1315,10 @@ static int get_cluster_table(VmdkExtent *extent, 
uint64_t offset,
  *
  *   VMDK_ERROR:in error cases
  */
+
 static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
  uint64_t offset, uint64_t *cluster_offset,
- int64_t *bytes, VmdkMetaData *m_data,
+ int64_t *bytes, VmdkMetaData **m_data,
  bool allocate, uint32_t *total_alloc_clusters)
 {
 int l1_index, l2_offset, l2_index;
@@ -1271,6 +1327,7 @@ static int vmdk_handle_alloc(BlockDriverState *bs, 
VmdkExtent *extent,
 uint32_t nb_clusters;
 bool zeroed = false;
 uint64_t skip_start_bytes, skip_end_bytes;
+VmdkMetaData *old_m_data;
 int ret;
 
 ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
@@ -1331,13 +1388,21 @@ static int vmdk_handle_alloc(BlockDriverState *bs, 
VmdkExtent *extent,
 if (ret < 0) {
 return ret;
 }
-if (m_data) {
-m_data->valid = 1;
-m_data->l1_index = l1_index;
-m_data->l2_index = l2_index;
-m_data->l2_offset = l2_offset;
-m_data->l2_cac

[Qemu-devel] [PATCH v7 2/8] vmdk: Rename get_whole_cluster() to vmdk_perform_cow()

2017-06-29 Thread Ashijeet Acharya

Rename the existing function get_whole_cluster() to vmdk_perform_cow()
as its sole purpose is to perform COW for the first and the last
allocated clusters if needed.

Signed-off-by: Ashijeet Acharya 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 22be887..73ae786 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1028,8 +1028,8 @@ static void vmdk_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }
 }
 
-/**
- * get_whole_cluster
+/*
+ * vmdk_perform_cow
  *
  * Copy backing file's cluster that covers @sector_num, otherwise write zero,
  * to the cluster at @cluster_sector_num.
@@ -1037,13 +1037,18 @@ static void vmdk_refresh_limits(BlockDriverState *bs, 
Error **errp)
  * If @skip_start_sector < @skip_end_sector, the relative range
  * [@skip_start_sector, @skip_end_sector) is not copied or written, and leave
  * it for call to write user data in the request.
+ *
+ * Returns:
+ *   VMDK_OK:   on success
+ *
+ *   VMDK_ERROR:in error cases
  */
-static int get_whole_cluster(BlockDriverState *bs,
- VmdkExtent *extent,
- uint64_t cluster_offset,
- uint64_t offset,
- uint64_t skip_start_bytes,
- uint64_t skip_end_bytes)
+static int vmdk_perform_cow(BlockDriverState *bs,
+VmdkExtent *extent,
+uint64_t cluster_offset,
+uint64_t offset,
+uint64_t skip_start_bytes,
+uint64_t skip_end_bytes)
 {
 int ret = VMDK_OK;
 int64_t cluster_bytes;
@@ -1244,7 +1249,7 @@ static int get_cluster_offset(BlockDriverState *bs,
  * This problem may occur because of insufficient space on host disk
  * or inappropriate VM shutdown.
  */
-ret = get_whole_cluster(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
+ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
 offset, skip_start_bytes, skip_end_bytes);
 if (ret) {
 return ret;
-- 
2.6.2

[Qemu-devel] [PATCH v7 3/8] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()

2017-06-29 Thread Ashijeet Acharya

Rename the existing get_cluster_offset() to vmdk_get_cluster_offset()
and update name in all the callers accordingly.

Signed-off-by: Ashijeet Acharya 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 73ae786..f403981 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1144,7 +1144,7 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData 
*m_data,
 }
 
 /**
- * get_cluster_offset
+ * vmdk_get_cluster_offset
  *
  * Look up cluster offset in extent file by sector number, and store in
  * @cluster_offset.
@@ -1163,14 +1163,14 @@ static int vmdk_L2update(VmdkExtent *extent, 
VmdkMetaData *m_data,
  *  VMDK_UNALLOC if cluster is not mapped and @allocate is false.
  *  VMDK_ERROR if failed.
  */
-static int get_cluster_offset(BlockDriverState *bs,
-  VmdkExtent *extent,
-  VmdkMetaData *m_data,
-  uint64_t offset,
-  bool allocate,
-  uint64_t *cluster_offset,
-  uint64_t skip_start_bytes,
-  uint64_t skip_end_bytes)
+static int vmdk_get_cluster_offset(BlockDriverState *bs,
+   VmdkExtent *extent,
+   VmdkMetaData *m_data,
+   uint64_t offset,
+   bool allocate,
+   uint64_t *cluster_offset,
+   uint64_t skip_start_bytes,
+   uint64_t skip_end_bytes)
 {
 unsigned int l1_index, l2_offset, l2_index;
 int min_index, i, j;
@@ -1304,9 +1304,9 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 return 0;
 }
 qemu_co_mutex_lock(&s->lock);
-ret = get_cluster_offset(bs, extent, NULL,
- sector_num * 512, false, &offset,
- 0, 0);
+ret = vmdk_get_cluster_offset(bs, extent, NULL,
+  sector_num * 512, false, &offset,
+  0, 0);
 qemu_co_mutex_unlock(&s->lock);
 
 index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
@@ -1497,8 +1497,8 @@ vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 ret = -EIO;
 goto fail;
 }
-ret = get_cluster_offset(bs, extent, NULL,
- offset, false, &cluster_offset, 0, 0);
+ret = vmdk_get_cluster_offset(bs, extent, NULL,
+  offset, false, &cluster_offset, 0, 0);
 offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 
 n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
@@ -1584,10 +1584,10 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t 
offset,
 n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
  - offset_in_cluster);
 
-ret = get_cluster_offset(bs, extent, &m_data, offset,
- !(extent->compressed || zeroed),
- &cluster_offset, offset_in_cluster,
- offset_in_cluster + n_bytes);
+ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
+  !(extent->compressed || zeroed),
+  &cluster_offset, offset_in_cluster,
+  offset_in_cluster + n_bytes);
 if (extent->compressed) {
 if (ret == VMDK_OK) {
 /* Refuse write to allocated cluster for streamOptimized */
@@ -1596,8 +1596,8 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t 
offset,
 return -EIO;
 } else {
 /* allocate */
-ret = get_cluster_offset(bs, extent, &m_data, offset,
- true, &cluster_offset, 0, 0);
+ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
+  true, &cluster_offset, 0, 0);
 }
 }
 if (ret == VMDK_ERROR) {
@@ -2229,9 +2229,9 @@ static int vmdk_check(BlockDriverState *bs, 
BdrvCheckResult *result,
 sector_num);
 break;
 }
-ret = get_cluster_offset(bs, extent, NULL,
- sector_num << BDRV_SECTOR_BITS,
- false, &cluster_offset, 0, 0);
+ret = vmdk_get_cluster_offset(bs, extent, NULL,
+  sector_num << BDRV_SECTOR_BITS,
+  false, &cluster_offset, 0, 0);
 if (ret == VMDK_ERROR) {
 fprintf(

[Qemu-devel] [PATCH v7 6/8] vmdk: New functions to assist allocating multiple clusters

2017-06-29 Thread Ashijeet Acharya

Introduce two new helper functions handle_alloc() and
vmdk_alloc_cluster_offset(). handle_alloc() helps to allocate multiple
clusters at once starting from a given offset on disk and performs COW
if necessary for first and last allocated clusters.
vmdk_alloc_cluster_offset() helps to return the offset of the first of
the many newly allocated clusters. Also, provide proper documentation
for both.

Signed-off-by: Ashijeet Acharya 
---
 block/vmdk.c | 200 ---
 1 file changed, 190 insertions(+), 10 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index fe2046b..277db16 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -136,6 +136,7 @@ typedef struct VmdkMetaData {
 unsigned int l2_offset;
 int valid;
 uint32_t *l2_cache_entry;
+uint32_t nb_clusters;
 } VmdkMetaData;
 
 typedef struct VmdkGrainMarker {
@@ -1242,6 +1243,182 @@ static int get_cluster_table(VmdkExtent *extent, 
uint64_t offset,
 return VMDK_OK;
 }
 
+/*
+ * vmdk_handle_alloc
+ *
+ * Allocate new clusters for an area that either is yet unallocated or needs a
+ * copy on write.
+ *
+ * Returns:
+ *   VMDK_OK:   if new clusters were allocated, *bytes may be decreased if
+ *  the new allocation doesn't cover all of the requested area.
+ *  *cluster_offset is updated to contain the offset of the
+ *  first newly allocated cluster.
+ *
+ *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is left
+ *  unchanged.
+ *
+ *   VMDK_ERROR:in error cases
+ */
+static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
+ uint64_t offset, uint64_t *cluster_offset,
+ int64_t *bytes, VmdkMetaData *m_data,
+ bool allocate, uint32_t *total_alloc_clusters)
+{
+int l1_index, l2_offset, l2_index;
+uint32_t *l2_table;
+uint32_t cluster_sector;
+uint32_t nb_clusters;
+bool zeroed = false;
+uint64_t skip_start_bytes, skip_end_bytes;
+int ret;
+
+ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
+&l2_index, &l2_table);
+if (ret < 0) {
+return ret;
+}
+
+cluster_sector = le32_to_cpu(l2_table[l2_index]);
+
+skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
+/* Calculate the number of clusters to look for. Here we truncate the last
+ * cluster, i.e. 1 less than the actual value calculated as we may need to
+ * perform COW for the last one. */
+nb_clusters = DIV_ROUND_UP(skip_start_bytes + *bytes,
+   extent->cluster_sectors << BDRV_SECTOR_BITS) - 
1;
+
+nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
+assert(nb_clusters <= INT_MAX);
+
+/* update bytes according to final nb_clusters value */
+if (nb_clusters != 0) {
+*bytes = ((nb_clusters * extent->cluster_sectors) << BDRV_SECTOR_BITS)
+ - skip_start_bytes;
+} else {
+nb_clusters = 1;
+}
+*total_alloc_clusters += nb_clusters;
+
+/* we need to use MIN() for basically 3 cases that arise :
+ * 1. alloc very first cluster : here skip_start_bytes >= 0 and
+ **bytes <= cluster_size.
+ * 2. alloc middle clusters : here *bytes is a perfect multiple of
+ *cluster_size and skip_start_bytes is 0.
+ * 3. alloc very last cluster : here *bytes <= cluster_size and
+ *skip_start_bytes is 0
+ */
+skip_end_bytes = skip_start_bytes + MIN(*bytes,
+ extent->cluster_sectors * BDRV_SECTOR_SIZE
+- skip_start_bytes);
+
+if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
+zeroed = true;
+}
+
+if (!cluster_sector || zeroed) {
+if (!allocate) {
+return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
+}
+
+cluster_sector = extent->next_cluster_sector;
+extent->next_cluster_sector += extent->cluster_sectors
+* nb_clusters;
+
+ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
+   offset, skip_start_bytes,
+   skip_end_bytes);
+if (ret < 0) {
+return ret;
+}
+if (m_data) {
+m_data->valid = 1;
+m_data->l1_index = l1_index;
+m_data->l2_index = l2_index;
+m_data->l2_offset = l2_offset;
+m_data->l2_cache_entry = &l2_table[l2_index];
+m_data->nb_clusters = nb_clusters;
+}
+}
+*cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
+return VMDK_OK;
+}
+
+/*
+ * vmdk_alloc_clusters
+ *
+ * For a given offset on the virtual disk, find the cluster offset in vmdk
+ * file. If the offset is not found, allocate a new cluster.
+ *
+ * If the cluster is newly alloc

[Qemu-devel] [PATCH v7 4/8] vmdk: Factor out metadata loading code out of vmdk_get_cluster_offset()

2017-06-29 Thread Ashijeet Acharya

Move the cluster tables loading code out of the existing
vmdk_get_cluster_offset() function and implement it in separate
get_cluster_table() and vmdk_l2load() functions.

Signed-off-by: Ashijeet Acharya 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 153 ---
 1 file changed, 105 insertions(+), 48 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index f403981..5647f53 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1143,6 +1143,105 @@ static int vmdk_L2update(VmdkExtent *extent, 
VmdkMetaData *m_data,
 return VMDK_OK;
 }
 
+/*
+ * vmdk_l2load
+ *
+ * Load a new L2 table into memory. If the table is in the cache, the cache
+ * is used; otherwise the L2 table is loaded from the image file.
+ *
+ * Returns:
+ *   VMDK_OK:   on success
+ *   VMDK_ERROR:in error cases
+ */
+static int vmdk_l2load(VmdkExtent *extent, uint64_t offset, int l2_offset,
+   uint32_t **new_l2_table, int *new_l2_index)
+{
+int min_index, i, j;
+uint32_t *l2_table;
+uint32_t min_count;
+
+for (i = 0; i < L2_CACHE_SIZE; i++) {
+if (l2_offset == extent->l2_cache_offsets[i]) {
+/* increment the hit count */
+if (++extent->l2_cache_counts[i] == UINT32_MAX) {
+for (j = 0; j < L2_CACHE_SIZE; j++) {
+extent->l2_cache_counts[j] >>= 1;
+}
+}
+l2_table = extent->l2_cache + (i * extent->l2_size);
+goto found;
+}
+}
+/* not found: load a new entry in the least used one */
+min_index = 0;
+min_count = UINT32_MAX;
+for (i = 0; i < L2_CACHE_SIZE; i++) {
+if (extent->l2_cache_counts[i] < min_count) {
+min_count = extent->l2_cache_counts[i];
+min_index = i;
+}
+}
+l2_table = extent->l2_cache + (min_index * extent->l2_size);
+if (bdrv_pread(extent->file,
+(int64_t)l2_offset * 512,
+l2_table,
+extent->l2_size * sizeof(uint32_t)
+) != extent->l2_size * sizeof(uint32_t)) {
+return VMDK_ERROR;
+}
+
+extent->l2_cache_offsets[min_index] = l2_offset;
+extent->l2_cache_counts[min_index] = 1;
+found:
+*new_l2_index = ((offset >> 9) / extent->cluster_sectors) % 
extent->l2_size;
+*new_l2_table = l2_table;
+
+return VMDK_OK;
+}
+
+/*
+ * get_cluster_table
+ *
+ * For a given offset, load (and allocate if needed) the l2 table.
+ *
+ * Returns:
+ *   VMDK_OK:on success
+ *
+ *   VMDK_UNALLOC:   if cluster is not mapped
+ *
+ *   VMDK_ERROR: in error cases
+ */
+static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
+ int *new_l1_index, int *new_l2_offset,
+ int *new_l2_index, uint32_t **new_l2_table)
+{
+int l1_index, l2_offset, l2_index;
+uint32_t *l2_table;
+int ret;
+
+offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
+l1_index = (offset >> 9) / extent->l1_entry_sectors;
+if (l1_index >= extent->l1_size) {
+return VMDK_ERROR;
+}
+l2_offset = extent->l1_table[l1_index];
+if (!l2_offset) {
+return VMDK_UNALLOC;
+}
+
+ret = vmdk_l2load(extent, offset, l2_offset, &l2_table, &l2_index);
+if (ret < 0) {
+return ret;
+}
+
+*new_l1_index = l1_index;
+*new_l2_offset = l2_offset;
+*new_l2_index = l2_index;
+*new_l2_table = l2_table;
+
+return VMDK_OK;
+}
+
 /**
  * vmdk_get_cluster_offset
  *
@@ -1172,66 +1271,24 @@ static int vmdk_get_cluster_offset(BlockDriverState *bs,
uint64_t skip_start_bytes,
uint64_t skip_end_bytes)
 {
-unsigned int l1_index, l2_offset, l2_index;
-int min_index, i, j;
-uint32_t min_count, *l2_table;
+int l1_index, l2_offset, l2_index;
+uint32_t *l2_table;
 bool zeroed = false;
 int64_t ret;
 int64_t cluster_sector;
 
-if (m_data) {
-m_data->valid = 0;
-}
 if (extent->flat) {
 *cluster_offset = extent->flat_start_offset;
 return VMDK_OK;
 }
 
-offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
-l1_index = (offset >> 9) / extent->l1_entry_sectors;
-if (l1_index >= extent->l1_size) {
-return VMDK_ERROR;
-}
-l2_offset = extent->l1_table[l1_index];
-if (!l2_offset) {
-return VMDK_UNALLOC;
-}
-for (i = 0; i < L2_CACHE_SIZE; i++) {
-if (l2_offset == extent->l2_cache_offsets[i]) {
-/* increment the hit count */
-if (++extent->l2_cache_counts[i] == 0x) {
-for (j = 0; j < L2_CACHE_SIZE; j++) {
-extent->l2_cache_counts[j] >>= 1;
-}
-}
-l2_table = extent->l2_cache + (i * extent->l2_size);
-goto found;
-}
-}
-/* not found: load a new entry in t

[Qemu-devel] [PATCH v7 5/8] vmdk: Set maximum bytes allocated in one cycle

2017-06-29 Thread Ashijeet Acharya

Set the maximum bytes allowed to get allocated at once to be not more
than the extent size boundary to handle writes at two separate extents
appropriately.

Signed-off-by: Ashijeet Acharya 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 5647f53..fe2046b 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1624,6 +1624,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t 
offset,
 uint64_t cluster_offset;
 uint64_t bytes_done = 0;
 VmdkMetaData m_data;
+uint64_t extent_end;
 
 if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
 error_report("Wrong offset: offset=0x%" PRIx64
@@ -1637,9 +1638,17 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t 
offset,
 if (!extent) {
 return -EIO;
 }
+extent_end = extent->end_sector * BDRV_SECTOR_SIZE;
+
 offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
-n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
- - offset_in_cluster);
+
+/* truncate n_bytes to first cluster because we need to perform COW */
+if (offset_in_cluster > 0) {
+n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
+ - offset_in_cluster);
+} else {
+n_bytes = MIN(bytes, extent_end - offset);
+}
 
 ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
   !(extent->compressed || zeroed),
-- 
2.6.2

[Qemu-devel] [PATCH v1 1/3] add memory_region_get_offset_within_address_space

2017-06-29 Thread KONRAD Frederic

This is helpful in the next patch to know if a rom is pointed by an alias.

Signed-off-by: KONRAD Frederic 
---
 include/exec/memory.h | 10 ++
 memory.c  | 22 --
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8503685..e342412 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1270,6 +1270,16 @@ void memory_region_set_size(MemoryRegion *mr, uint64_t 
size);
 void memory_region_set_alias_offset(MemoryRegion *mr,
 hwaddr offset);
 
+/*
+ * memory_region_get_offset_within_address_space: get the offset of a region
+ *
+ * Returns the offset of a region within its address space. @mr must be mapped
+ * to an #AddressSpace.
+ *
+ * @mr: the #MemoryRegion to check.
+ */
+hwaddr memory_region_get_offset_within_address_space(MemoryRegion *mr);
+
 /**
  * memory_region_present: checks if an address relative to a @container
  * translates into #MemoryRegion within @container
diff --git a/memory.c b/memory.c
index 1044bba..2b7439b 100644
--- a/memory.c
+++ b/memory.c
@@ -598,11 +598,18 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 return r;
 }
 
-static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
+static AddressSpace *memory_region_to_address_space(MemoryRegion *mr,
+hwaddr *offset)
 {
 AddressSpace *as;
 
+if (offset) {
+*offset = 0;
+}
 while (mr->container) {
+if (offset) {
+*offset += mr->addr;
+}
 mr = mr->container;
 }
 QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
@@ -613,6 +620,17 @@ static AddressSpace 
*memory_region_to_address_space(MemoryRegion *mr)
 return NULL;
 }
 
+hwaddr memory_region_get_offset_within_address_space(MemoryRegion *mr)
+{
+hwaddr offset;
+AddressSpace *as;
+
+as = memory_region_to_address_space(mr, &offset);
+assert(as);
+
+return offset;
+}
+
 /* Render a memory region into the global view.  Ranges in @view obscure
  * ranges in @mr.
  */
@@ -2251,7 +2269,7 @@ static MemoryRegionSection 
memory_region_find_rcu(MemoryRegion *mr,
 addr += root->addr;
 }
 
-as = memory_region_to_address_space(root);
+as = memory_region_to_address_space(root, NULL);
 if (!as) {
 return ret;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH v1 3/3] armv7m_systick: abort instead of locking on a bad rate

2017-06-29 Thread KONRAD Frederic

This helps the board developer by asserting that system_clock_rate is not
null. Using systick with a zero rate will lead to a deadlock so better showing
the error.

Signed-off-by: KONRAD Frederic 
---
 hw/timer/armv7m_systick.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/timer/armv7m_systick.c b/hw/timer/armv7m_systick.c
index df8d280..745efb7 100644
--- a/hw/timer/armv7m_systick.c
+++ b/hw/timer/armv7m_systick.c
@@ -54,6 +54,9 @@ static void systick_reload(SysTickState *s, int reset)
 s->tick = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
 s->tick += (s->reload + 1) * systick_scale(s);
+
+/* system_clock_scale = 0 leads to a nasty deadlock, better aborting */
+assert(systick_scale(s));
 timer_mod(s->timer, s->tick);
 }
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v7 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only

2017-06-29 Thread Ashijeet Acharya

vmdk_alloc_clusters() introduced earlier now handles the task of
allocating clusters and performing COW when needed. Thus we can change
vmdk_get_cluster_offset() to stick to the sole purpose of returning
cluster offset using sector number. Update the changes at all call
sites.

Signed-off-by: Ashijeet Acharya 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 56 
 1 file changed, 12 insertions(+), 44 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 60b8adc..d41fde9 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1493,25 +1493,16 @@ static int vmdk_alloc_clusters(BlockDriverState *bs,
  * For flat extents, the start offset as parsed from the description file is
  * returned.
  *
- * For sparse extents, look up in L1, L2 table. If allocate is true, return an
- * offset for a new cluster and update L2 cache. If there is a backing file,
- * COW is done before returning; otherwise, zeroes are written to the allocated
- * cluster. Both COW and zero writing skips the sector range
- * [@skip_start_sector, @skip_end_sector) passed in by caller, because caller
- * has new data to write there.
+ * For sparse extents, look up the L1, L2 table.
  *
  * Returns: VMDK_OK if cluster exists and mapped in the image.
- *  VMDK_UNALLOC if cluster is not mapped and @allocate is false.
- *  VMDK_ERROR if failed.
+ *  VMDK_UNALLOC if cluster is not mapped.
+ *  VMDK_ERROR if failed
  */
 static int vmdk_get_cluster_offset(BlockDriverState *bs,
VmdkExtent *extent,
-   VmdkMetaData *m_data,
uint64_t offset,
-   bool allocate,
-   uint64_t *cluster_offset,
-   uint64_t skip_start_bytes,
-   uint64_t skip_end_bytes)
+   uint64_t *cluster_offset)
 {
 int l1_index, l2_offset, l2_index;
 uint32_t *l2_table;
@@ -1536,31 +1527,9 @@ static int vmdk_get_cluster_offset(BlockDriverState *bs,
 }
 
 if (!cluster_sector || zeroed) {
-if (!allocate) {
-return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
-}
-
-cluster_sector = extent->next_cluster_sector;
-extent->next_cluster_sector += extent->cluster_sectors;
-
-/* First of all we write grain itself, to avoid race condition
- * that may to corrupt the image.
- * This problem may occur because of insufficient space on host disk
- * or inappropriate VM shutdown.
- */
-ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
-offset, skip_start_bytes, skip_end_bytes);
-if (ret) {
-return ret;
-}
-if (m_data) {
-m_data->valid = 1;
-m_data->l1_index = l1_index;
-m_data->l2_index = l2_index;
-m_data->l2_offset = l2_offset;
-m_data->l2_cache_entry = &l2_table[l2_index];
-}
+return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
 }
+
 *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
 return VMDK_OK;
 }
@@ -1603,9 +1572,7 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 return 0;
 }
 qemu_co_mutex_lock(&s->lock);
-ret = vmdk_get_cluster_offset(bs, extent, NULL,
-  sector_num * 512, false, &offset,
-  0, 0);
+ret = vmdk_get_cluster_offset(bs, extent, sector_num * 512, &offset);
 qemu_co_mutex_unlock(&s->lock);
 
 index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
@@ -1796,13 +1763,14 @@ vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 ret = -EIO;
 goto fail;
 }
-ret = vmdk_get_cluster_offset(bs, extent, NULL,
-  offset, false, &cluster_offset, 0, 0);
+
 offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 
 n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
  - offset_in_cluster);
 
+ret = vmdk_get_cluster_offset(bs, extent, offset, &cluster_offset);
+
 if (ret != VMDK_OK) {
 /* if not allocated, try to read from parent image, if exist */
 if (bs->backing && ret != VMDK_ZEROED) {
@@ -2549,9 +2517,9 @@ static int vmdk_check(BlockDriverState *bs, 
BdrvCheckResult *result,
 sector_num);
 break;
 }
-ret = vmdk_get_cluster_offset(bs, extent, NULL,
+ret = vmdk_get_cluster_offset(bs, extent,
   sector_num << BDRV_SECTOR_BITS,
-  false, &cluster_offset, 0, 0);
+  &cluster_offset);
 if (re

Re: [Qemu-devel] [RFC v1 2/4] util/oslib-win32: Remove invalid check

2017-06-29 Thread Fam Zheng

On Tue, 06/27 16:57, Alistair Francis wrote:
> There is no way nhandles can be zero in this section so that part of the
> if statement will always be false. Let's just remove it to make the code
> easier to read.
> 
> Signed-off-by: Alistair Francis 
> Acked-by: Edgar E. Iglesias 
> ---
> 
>  util/oslib-win32.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index 80e4668935..7ec0f8e083 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -414,7 +414,7 @@ static int poll_rest(gboolean poll_msgs, HANDLE *handles, 
> gint nhandles,
>  /* If we have a timeout, or no handles to poll, be satisfied
>   * with just noticing we have messages waiting.
>   */
> -if (timeout != 0 || nhandles == 0) {
> +if (timeout != 0) {
>  return 1;
>  }
>  
> -- 
> 2.11.0
> 

Reviewed-by: Fam Zheng

[Qemu-devel] [PATCH v1 0/3] Some armv7m fixes

2017-06-29 Thread KONRAD Frederic

Hi,

While playing with armv7m, I found two little bugs:
  - When there is an alias @0x to a flash memory the cpu state isn't
reset correctly which leads later to an exception as ARM instruction-set is
used. Presumably this bug might be present with the netduino2 board.
  - If the developer omits to set system_clock_rate we later go in a livelock
when systick is triggered. Better aborting before to avoid the pain chasing
the livelock.

Thanks,
Fred

KONRAD Frederic (3):
  add memory_region_get_offset_within_address_space
  arm: fix the armv7m reset state
  armv7m_systick: abort instead of locking on a bad rate

 hw/timer/armv7m_systick.c |  3 +++
 include/exec/memory.h | 10 ++
 memory.c  | 22 --
 target/arm/cpu.c  | 14 ++
 4 files changed, 47 insertions(+), 2 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v1 2/3] arm: fix the armv7m reset state

2017-06-29 Thread KONRAD Frederic

This fixes an odd bug when a ROM is present somewhere and an alias @0x
is pointing to the ROM. The "if (rom)" test fails and we don't get a valid reset
state. QEMU later crashes with an exception because the ARMv7-M starts with the
ARM instruction set. (eg: PC & 0x01 is 0).

This patch uses memory_region_get_offset_within_address_space introduced before
to check if an alias doesn't point to a flash somewhere.

Signed-off-by: KONRAD Frederic 
---
 target/arm/cpu.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 28a9141..b8afd97 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -201,6 +201,20 @@ static void arm_cpu_reset(CPUState *s)
 
 /* Load the initial SP and PC from the vector table at address 0 */
 rom = rom_ptr(0);
+
+if (!rom) {
+/* Sometimes address 0x is an alias to a flash which
+ * actually have a ROM.
+ */
+MemoryRegionSection section;
+hwaddr offset = 0;
+
+section = memory_region_find(s->as->root, 0, 8);
+offset = memory_region_get_offset_within_address_space(section.mr);
+memory_region_unref(section.mr);
+rom = rom_ptr(offset);
+}
+
 if (rom) {
 /* Address zero is covered by ROM which hasn't yet been
  * copied into physical memory.
-- 
1.8.3.1

Re: [Qemu-devel] [RFC v1 3/4] util/oslib-win32: Fix up if conditional

2017-06-29 Thread Fam Zheng

On Tue, 06/27 16:57, Alistair Francis wrote:
> Signed-off-by: Alistair Francis 
> Acked-by: Edgar E. Iglesias 
> ---
> 
>  util/oslib-win32.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index 7ec0f8e083..a015e1ac96 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -438,7 +438,7 @@ static int poll_rest(gboolean poll_msgs, HANDLE *handles, 
> gint nhandles,
>  if (timeout == 0 && nhandles > 1) {
>  /* Remove the handle that fired */
>  int i;
> -if (ready < nhandles - 1) {
> +if ((ready - WAIT_OBJECT_0) < nhandles - 1) {
>  for (i = ready - WAIT_OBJECT_0 + 1; i < nhandles; i++) {
>  handles[i-1] = handles[i];
>  }
> -- 
> 2.11.0
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [RFC v1 4/4] util/oslib-win32: Recursivly pass the timeout

2017-06-29 Thread Fam Zheng

On Tue, 06/27 16:57, Alistair Francis wrote:
> Signed-off-by: Alistair Francis 
> Acked-by: Edgar E. Iglesias 
> ---
> 
>  util/oslib-win32.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index a015e1ac96..3630e46499 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -432,10 +432,10 @@ static int poll_rest(gboolean poll_msgs, HANDLE 
> *handles, gint nhandles,
>  }
>  }
>  
> -/* If no timeout and polling several handles, recurse to poll
> - * the rest of them.
> +/* We only found one and we are waiting on more then one. Let's try
> + * again.
>   */
> -if (timeout == 0 && nhandles > 1) {
> +if (nhandles > 1) {
>  /* Remove the handle that fired */
>  int i;
>  if ((ready - WAIT_OBJECT_0) < nhandles - 1) {
> @@ -444,7 +444,20 @@ static int poll_rest(gboolean poll_msgs, HANDLE 
> *handles, gint nhandles,
>  }
>  }
>  nhandles--;
> -recursed_result = poll_rest(FALSE, handles, nhandles, fds, nfds, 
> 0);
> +
> +/* If we just had a very small timeout let's increase it when we
> + * recurse to ensure we don't just busy wait. This ensures we let
> + * the Windows threads block at least a little. If we previously
> + * had some wait let's set it to zero to avoid blocking for too
> + * long.
> + */
> +if (timeout < 10) {
> +timeout = timeout + 1;
> +} else {
> +timeout = 0;
> +}
> +recursed_result = poll_rest(FALSE, handles, nhandles, fds,
> +nfds, timeout);
>  return (recursed_result == -1) ? -1 : 1 + recursed_result;
>  }
>  return 1;
> -- 
> 2.11.0
> 

This is a hack, can we fix what is the causing the busy wait instead?

Fam

[Qemu-devel] [PATCH v7 0/4] Improve error reporting

2017-06-29 Thread Mao Zhongyi

v7:
* PATCH 01
  -fix the error message.[Daniel P. Berrange]
  -adjust the Indentation problem.[Eric Blake]
* PATCH 03
  -print a generic message when gethostbyname() failed in parse_host_port(),
   drop the misleading ": unkonwn host" part.[Markus Armbruster]

v6:
* PATCH 02
  -rename the subject
  -drop the "qemu: error: " prefix.
  -correct inappropriate error information settings.
* PATCH 03,04
  -correct inappropriate error information settings.[Markus Armbruster]

v5:
* PATCH 01 make the commit message more exact about the actual function.
[Markus Armbruster]
* PATCH 02, 03, 04 still retains the original function, but specific
   content and order of each patch has been adjusted substantially, 
   so that ensure each patch is a completed fix.[Markus Armbruster]

v4: 
* PATCH 01 is redoing previous patch 1, replace the fprintf() with 
error_report()
 in the 'default' case of net_socket_fd_init() [Markus 
Armbruster]

v3:
* PATCH 01 is suggested by Markus and Daniel that removes the dubious 'default' 
case
   in the net_socket_fd_init(). Jason agreed.
* PATCH 02 is redoing previous patch 4.
* PATCH 04 is redoing previous patch 2, improves sort of error messages. 

v2:
* PATCH 02 reworking of patch 2 following Markus's suggestion that convert 
error_report()
   in the function called by net_socket_*_init() to Error. Also add 
many error 
   handling information.
* PATCH 03 net_socket_mcast_create(), net_socket_fd_init_dgram() and 
net_socket_fd_init() 
   use the function such as fprintf, perror to report an error message. 
Convert it 
   to Error.
* PATCH 04 parse_host_port() may fail without reporting an error. Now, fix it 
to set an
   error when it fails.

Cc: jasow...@redhat.com
Cc: arm...@redhat.com
Cc: berra...@redhat.com
Cc: kra...@redhat.com
Cc: pbonz...@redhat.com
Cc: ebl...@redhat.com

Mao Zhongyi (4):
  net/socket: Don't treat odd socket type as SOCK_STREAM
  net/socket: Convert several helper functions to Error
  net/net: Convert parse_host_port() to Error
  net/socket: Improve -net socket error reporting

 include/qemu/sockets.h |   3 +-
 net/net.c  |  22 +--
 net/socket.c   | 156 -
 3 files changed, 109 insertions(+), 72 deletions(-)

-- 
2.9.4

[Qemu-devel] [PATCH v7 3/4] net/net: Convert parse_host_port() to Error

2017-06-29 Thread Mao Zhongyi

Cc: berra...@redhat.com
Cc: kra...@redhat.com
Cc: pbonz...@redhat.com
Cc: jasow...@redhat.com
Cc: arm...@redhat.com
Cc: ebl...@redhat.com
Signed-off-by: Mao Zhongyi 
---
 include/qemu/sockets.h |  3 ++-
 net/net.c  | 22 +-
 net/socket.c   | 19 ++-
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 5c326db..78e2b30 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -53,7 +53,8 @@ void socket_listen_cleanup(int fd, Error **errp);
 int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
 
 /* Old, ipv4 only bits.  Don't use for new code. */
-int parse_host_port(struct sockaddr_in *saddr, const char *str);
+int parse_host_port(struct sockaddr_in *saddr, const char *str,
+Error **errp);
 int socket_init(void);
 
 /**
diff --git a/net/net.c b/net/net.c
index 6235aab..77b6deb 100644
--- a/net/net.c
+++ b/net/net.c
@@ -100,7 +100,8 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
 return 0;
 }
 
-int parse_host_port(struct sockaddr_in *saddr, const char *str)
+int parse_host_port(struct sockaddr_in *saddr, const char *str,
+Error **errp)
 {
 char buf[512];
 struct hostent *he;
@@ -108,24 +109,35 @@ int parse_host_port(struct sockaddr_in *saddr, const char 
*str)
 int port;
 
 p = str;
-if (get_str_sep(buf, sizeof(buf), &p, ':') < 0)
+if (get_str_sep(buf, sizeof(buf), &p, ':') < 0) {
+error_setg(errp, "host address '%s' doesn't contain ':' "
+   "separating host from port", str);
 return -1;
+}
 saddr->sin_family = AF_INET;
 if (buf[0] == '\0') {
 saddr->sin_addr.s_addr = 0;
 } else {
 if (qemu_isdigit(buf[0])) {
-if (!inet_aton(buf, &saddr->sin_addr))
+if (!inet_aton(buf, &saddr->sin_addr)) {
+error_setg(errp, "host address '%s' is not a valid "
+   "IPv4 address", buf);
 return -1;
+}
 } else {
-if ((he = gethostbyname(buf)) == NULL)
+he = gethostbyname(buf);
+if (he == NULL) {
+error_setg(errp, "can't resolve host address '%s'", buf);
 return - 1;
+}
 saddr->sin_addr = *(struct in_addr *)he->h_addr;
 }
 }
 port = strtol(p, (char **)&r, 0);
-if (r == p)
+if (r == p) {
+error_setg(errp, "port number '%s' is invalid", p);
 return -1;
+}
 saddr->sin_port = htons(port);
 return 0;
 }
diff --git a/net/socket.c b/net/socket.c
index 44fb504..bd80b3c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -500,9 +500,12 @@ static int net_socket_listen_init(NetClientState *peer,
 NetSocketState *s;
 struct sockaddr_in saddr;
 int fd, ret;
+Error *err = NULL;
 
-if (parse_host_port(&saddr, host_str) < 0)
+if (parse_host_port(&saddr, host_str, &err) < 0) {
+error_report_err(err);
 return -1;
+}
 
 fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 if (fd < 0) {
@@ -547,8 +550,10 @@ static int net_socket_connect_init(NetClientState *peer,
 struct sockaddr_in saddr;
 Error *err = NULL;
 
-if (parse_host_port(&saddr, host_str) < 0)
+if (parse_host_port(&saddr, host_str, &err) < 0) {
+error_report_err(err);
 return -1;
+}
 
 fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 if (fd < 0) {
@@ -600,8 +605,10 @@ static int net_socket_mcast_init(NetClientState *peer,
 struct in_addr localaddr, *param_localaddr;
 Error *err = NULL;
 
-if (parse_host_port(&saddr, host_str) < 0)
+if (parse_host_port(&saddr, host_str, &err) < 0) {
+error_report_err(err);
 return -1;
+}
 
 if (localaddr_str != NULL) {
 if (inet_aton(localaddr_str, &localaddr) == 0)
@@ -643,11 +650,13 @@ static int net_socket_udp_init(NetClientState *peer,
 struct sockaddr_in laddr, raddr;
 Error *err = NULL;
 
-if (parse_host_port(&laddr, lhost) < 0) {
+if (parse_host_port(&laddr, lhost, &err) < 0) {
+error_report_err(err);
 return -1;
 }
 
-if (parse_host_port(&raddr, rhost) < 0) {
+if (parse_host_port(&raddr, rhost, &err) < 0) {
+error_report_err(err);
 return -1;
 }
 
-- 
2.9.4

[Qemu-devel] [PATCH v7 2/4] net/socket: Convert several helper functions to Error

2017-06-29 Thread Mao Zhongyi

Currently, net_socket_mcast_create(), net_socket_fd_init_dgram() and
net_socket_fd_init() use the function such as fprintf(), perror() to
report an error message.

Now, convert these functions to Error.

Cc: jasow...@redhat.com
Cc: arm...@redhat.com
Cc: berra...@redhat.com
Signed-off-by: Mao Zhongyi 
Reviewed-by: Daniel P. Berrange 
---
 net/socket.c | 81 +---
 1 file changed, 50 insertions(+), 31 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 7d05e70..44fb504 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -209,7 +209,9 @@ static void net_socket_send_dgram(void *opaque)
 }
 }
 
-static int net_socket_mcast_create(struct sockaddr_in *mcastaddr, struct 
in_addr *localaddr)
+static int net_socket_mcast_create(struct sockaddr_in *mcastaddr,
+   struct in_addr *localaddr,
+   Error **errp)
 {
 struct ip_mreq imr;
 int fd;
@@ -221,16 +223,16 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 #endif
 
 if (!IN_MULTICAST(ntohl(mcastaddr->sin_addr.s_addr))) {
-fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) "
-"does not contain a multicast address\n",
-inet_ntoa(mcastaddr->sin_addr),
-(int)ntohl(mcastaddr->sin_addr.s_addr));
+error_setg(errp, "specified mcastaddr %s (0x%08x) "
+   "does not contain a multicast address",
+   inet_ntoa(mcastaddr->sin_addr),
+   (int)ntohl(mcastaddr->sin_addr.s_addr));
 return -1;
 
 }
 fd = qemu_socket(PF_INET, SOCK_DGRAM, 0);
 if (fd < 0) {
-perror("socket(PF_INET, SOCK_DGRAM)");
+error_setg_errno(errp, errno, "failed to create datagram socket");
 return -1;
 }
 
@@ -242,13 +244,15 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 val = 1;
 ret = qemu_setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
 if (ret < 0) {
-perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
+error_setg_errno(errp, errno, "set the socket 'SO_REUSEADDR'"
+ " attribute failed");
 goto fail;
 }
 
 ret = bind(fd, (struct sockaddr *)mcastaddr, sizeof(*mcastaddr));
 if (ret < 0) {
-perror("bind");
+error_setg_errno(errp, errno, "bind ip=%s to socket failed",
+ inet_ntoa(mcastaddr->sin_addr));
 goto fail;
 }
 
@@ -263,7 +267,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret = qemu_setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
   &imr, sizeof(struct ip_mreq));
 if (ret < 0) {
-perror("setsockopt(IP_ADD_MEMBERSHIP)");
+error_setg_errno(errp, errno, "add socket to multicast group %s"
+ " failed", inet_ntoa(imr.imr_multiaddr));
 goto fail;
 }
 
@@ -272,7 +277,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret = qemu_setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
   &loop, sizeof(loop));
 if (ret < 0) {
-perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
+error_setg_errno(errp, errno, "force multicast message to loopback"
+ " failed");
 goto fail;
 }
 
@@ -281,7 +287,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret = qemu_setsockopt(fd, IPPROTO_IP, IP_MULTICAST_IF,
   localaddr, sizeof(*localaddr));
 if (ret < 0) {
-perror("setsockopt(IP_MULTICAST_IF)");
+error_setg_errno(errp, errno, "set the default network send "
+ "interface failed");
 goto fail;
 }
 }
@@ -320,7 +327,8 @@ static NetClientInfo net_dgram_socket_info = {
 static NetSocketState *net_socket_fd_init_dgram(NetClientState *peer,
 const char *model,
 const char *name,
-int fd, int is_connected)
+int fd, int is_connected,
+Error **errp)
 {
 struct sockaddr_in saddr;
 int newfd;
@@ -337,14 +345,13 @@ static NetSocketState 
*net_socket_fd_init_dgram(NetClientState *peer,
 if (getsockname(fd, (struct sockaddr *) &saddr, &saddr_len) == 0) {
 /* must be bound */
 if (saddr.sin_addr.s_addr == 0) {
-fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, "
-"cannot setup multicast dst addr\n", fd);
+error_setg(errp, "init_dgram: fd=%d unbound, "
+   "cannot setup multicast dst addr", fd

[Qemu-devel] [PATCH v7 4/4] net/socket: Improve -net socket error reporting

2017-06-29 Thread Mao Zhongyi

When -net socket fails, it first reports a specific error, then
a generic one, like this:

$ qemu-system-x86_64 -net socket,
qemu-system-x86_64: -net socket: exactly one of fd=, listen=, connect=, 
mcast= or udp= is required
qemu-system-x86_64: -net socket: Device 'socket' could not be initialized

Convert net_socket_*_init() to Error to get rid of the superfluous second
error message. After the patch, the effect like this:

$ qemu-system-x86_64 -net socket,
qemu-system-x86_64: -net socket: exactly one of fd=, listen=, connect=, 
mcast= or udp= is required

At the same time, add many explicit error handling message when it fails.

Cc: jasow...@redhat.com
Cc: arm...@redhat.com
Cc: berra...@redhat.com
Signed-off-by: Mao Zhongyi 
---
 net/socket.c | 94 +---
 1 file changed, 45 insertions(+), 49 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index bd80b3c..a891c3a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -494,22 +494,21 @@ static void net_socket_accept(void *opaque)
 static int net_socket_listen_init(NetClientState *peer,
   const char *model,
   const char *name,
-  const char *host_str)
+  const char *host_str,
+  Error **errp)
 {
 NetClientState *nc;
 NetSocketState *s;
 struct sockaddr_in saddr;
 int fd, ret;
-Error *err = NULL;
 
-if (parse_host_port(&saddr, host_str, &err) < 0) {
-error_report_err(err);
+if (parse_host_port(&saddr, host_str, errp) < 0) {
 return -1;
 }
 
 fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 if (fd < 0) {
-perror("socket");
+error_setg_errno(errp, errno, "failed to create stream socket");
 return -1;
 }
 qemu_set_nonblock(fd);
@@ -518,13 +517,14 @@ static int net_socket_listen_init(NetClientState *peer,
 
 ret = bind(fd, (struct sockaddr *)&saddr, sizeof(saddr));
 if (ret < 0) {
-perror("bind");
+error_setg_errno(errp, errno, "bind ip=%s to socket failed",
+ inet_ntoa(saddr.sin_addr));
 closesocket(fd);
 return -1;
 }
 ret = listen(fd, 0);
 if (ret < 0) {
-perror("listen");
+error_setg_errno(errp, errno, "listen socket failed");
 closesocket(fd);
 return -1;
 }
@@ -543,21 +543,20 @@ static int net_socket_listen_init(NetClientState *peer,
 static int net_socket_connect_init(NetClientState *peer,
const char *model,
const char *name,
-   const char *host_str)
+   const char *host_str,
+   Error **errp)
 {
 NetSocketState *s;
 int fd, connected, ret;
 struct sockaddr_in saddr;
-Error *err = NULL;
 
-if (parse_host_port(&saddr, host_str, &err) < 0) {
-error_report_err(err);
+if (parse_host_port(&saddr, host_str, errp) < 0) {
 return -1;
 }
 
 fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 if (fd < 0) {
-perror("socket");
+error_setg_errno(errp, errno, "failed to create stream socket");
 return -1;
 }
 qemu_set_nonblock(fd);
@@ -573,7 +572,7 @@ static int net_socket_connect_init(NetClientState *peer,
errno == EINVAL) {
 break;
 } else {
-perror("connect");
+error_setg_errno(errp, errno, "connection failed");
 closesocket(fd);
 return -1;
 }
@@ -582,9 +581,8 @@ static int net_socket_connect_init(NetClientState *peer,
 break;
 }
 }
-s = net_socket_fd_init(peer, model, name, fd, connected, &err);
+s = net_socket_fd_init(peer, model, name, fd, connected, errp);
 if (!s) {
-error_report_err(err);
 return -1;
 }
 snprintf(s->nc.info_str, sizeof(s->nc.info_str),
@@ -597,36 +595,36 @@ static int net_socket_mcast_init(NetClientState *peer,
  const char *model,
  const char *name,
  const char *host_str,
- const char *localaddr_str)
+ const char *localaddr_str,
+ Error **errp)
 {
 NetSocketState *s;
 int fd;
 struct sockaddr_in saddr;
 struct in_addr localaddr, *param_localaddr;
-Error *err = NULL;
 
-if (parse_host_port(&saddr, host_str, &err) < 0) {
-error_report_err(err);
+if (parse_host_port(&saddr, host_str, errp) < 0) {
 return -1;
 }
 
 if (localaddr_str != NULL) {
-if (inet_aton(localaddr_str, &localaddr) == 0)
+if (inet_aton(localaddr_str, &localaddr) == 0) {
+

[Qemu-devel] [PATCH v7 1/4] net/socket: Don't treat odd socket type as SOCK_STREAM

2017-06-29 Thread Mao Zhongyi

In net_socket_fd_init(), the 'default' case is odd: it warns,
then continues as if the socket type was SOCK_STREAM. The
comment explains "this could be a eg. a pty", but that makes
no sense. If @fd really was a pty, getsockopt() would fail
with ENOTSOCK. If @fd was a socket, but neither SOCK_DGRAM nor
SOCK_STREAM. It should not be treated as if it was SOCK_STREAM.

Turn this case into an Error. If there is a genuine reason to
support something like SOCK_RAW, it should be explicitly
handled.

Cc: jasow...@redhat.com
Cc: arm...@redhat.com
Cc: berra...@redhat.com
Cc: arm...@redhat.com
Cc: ebl...@redhat.com
Suggested-by: Markus Armbruster 
Suggested-by: Daniel P. Berrange 
Signed-off-by: Mao Zhongyi 
Reviewed-by: Markus Armbruster 
---
 net/socket.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index dcae1ae..7d05e70 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -449,9 +449,9 @@ static NetSocketState *net_socket_fd_init(NetClientState 
*peer,
 case SOCK_STREAM:
 return net_socket_fd_init_stream(peer, model, name, fd, is_connected);
 default:
-/* who knows ... this could be a eg. a pty, do warn and continue as 
stream */
-fprintf(stderr, "qemu: warning: socket type=%d for fd=%d is not 
SOCK_DGRAM or SOCK_STREAM\n", so_type, fd);
-return net_socket_fd_init_stream(peer, model, name, fd, is_connected);
+error_report("socket type=%d for fd=%d must be either"
+ " SOCK_DGRAM or SOCK_STREAM", so_type, fd);
+closesocket(fd);
 }
 return NULL;
 }
-- 
2.9.4

Re: [Qemu-devel] SPARC64 supported processors

2017-06-29 Thread Artyom Tarasenko

Hi Pasha,

On Tue, Jun 27, 2017 at 7:59 PM, Pasha Tatashin
 wrote:
> Hi,
>
> I am trying to evaluate the current qemu support for sparc64 processors.
> First, it seems -smp is not supported for any processor, is this correct?
> When I set -smp greater than 1, I am getting:
>
> qemu-system-sparc64: Number of SMP CPUs requested (2) exceeds max CPUs
> supported by machine 'sun4u' (1)
>
> I've done some testing for all available sparc64 cpus + latest linux kernel:
>
> Fujitsu Sparc64 Working
> Fujitsu Sparc64 III Exception 0x30 (DAE_side_effect_page) in OpenBios
> Fujitsu Sparc64 IV  Working
> Fujitsu Sparc64 V   Working
> TI UltraSparc I Working
> TI UltraSparc IIWorking
> TI UltraSparc IIi   Working
> TI UltraSparc IIe   Exception 0x28 (division_by_zero) in init_tick_ops
>   Can make it to work if is_hummingbird() is changed
>   to return 0. The IO stick, and OpenBios stick properties
>   are absent, so we have to default to %tick for now.
>
> Sun UltraSparc III  Illegal instruction in cheetah_boot():
>   wr  %g0, %g1, %dcr
>   It appears dispatch control register is not implemented.
>
> Sun UltraSparc IIIi
> Sun UltraSparc IV
> Sun UltraSparc IV+
> Sun UltraSparc IIIi+
>  In these four CPUs, I am getting exception 0x32 in
>  cheetah_generic_boot: stxa  %g0, [ %g3 ] #ASI_DMMU

The UltraSPARC III {,i,i+} and IV(+) MMUs are not implemented. IIi is
the best for the sun4u target, I think.

> Sun UltraSparc T1
> Sun UltraSparc T2

Same here. T2 is pretty much a stub currently. The emulation uses the
same MMU as for T1.

> Both of the above boot pretty far but fail in this function when tmpfs is
> mounted:
> direct_pcr_write(unsigned long reg_num, u64 val)
> __asm__ __volatile__("wr %0, 0x0, %%pcr" : : "r" (val));
>
> Seems like performance counter registers are not supported.

Correct. As discussed off-list they are not.
Btw, you can try to get through even without modifying qemu.
Start qemu-system-sparc64 with -s -S options, connect to it with gdb,
set a breakpoint before the instruction and skip it by modifying the
%pc and %npc registers.


> needed to add these to kernel parameters:
> keep_bootcon -> to see where we are panicking
> lpj=1000 -> jiffers could not calculate for some reason.
>
> NEC UltraSparc IWorking
>
> Does this look right or may be I have missed something, and we can get some
> of the Sun UltraSparc to work for example?
>
> Thank you,
> Pasha
>


-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

Re: [Qemu-devel] [PATCH 0/6] virtio: use ioeventfd in TCG and qtest mode

2017-06-29 Thread Kevin Wolf

Am 28.06.2017 um 21:38 hat Eric Blake geschrieben:
> On 06/28/2017 01:47 PM, Stefan Hajnoczi wrote:
> > This patch series fixes qemu-iotests 068.  Since commit
> > ea4f3cebc4e0224605ab9dd9724aa4e7768fe372 ("qemu-iotests: 068: test iothread
> > mode") the test case has attempted to use dataplane without -M accel=kvm.
> > Although QEMU is capable of running TCG or qtest with emulated 
> > ioeventfd/irqfd
> > we haven't enabled it yet.
> > 
> > Unfortunately the virtio test cases fail when ioeventfd is enabled in qtest
> > mode.  This is because they make assumptions about virtqueue ISR signalling.
> > They assume that a request is completed when ISR becomes 1.  However, the 
> > ISR
> > can be set to 1 even though no new request has completed since commit
> > 83d768b5640946b7da55ce8335509df297e2c7cd "virtio: set ISR on dataplane
> > notifications".
> > 
> > This issue is solved by introducing a proper qvirtqueue_get_buf() API 
> > (similar
> > to the Linux guest drivers) instead of making assumptions about the ISR.  
> > Most
> > of the patches update the test cases to use the new API.
> > 
> > Stefan Hajnoczi (6):
> >   libqos: fix typo in virtio.h QVirtQueue->used comment
> >   libqos: add virtio used ring support
> >   tests: fix virtio-scsi-test ISR dependence
> >   tests: fix virtio-blk-test ISR dependence
> >   tests: fix virtio-net-test ISR dependence
> >   virtio-pci: use ioeventfd even when KVM is disabled
> 
> I'm less familiar with the code in question, so I'll let others review,
> but it did fix the failure of 068 for me.
> 
> Tested-by: Eric Blake 

Also passes the virtio-scsi qtest under valgrind for me now.

Tested-by: Kevin Wolf 


pgprIuu_3ZFVw.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v5 1/5] throttle: factor out duplicate code

2017-06-29 Thread Pradeep Jagadeesh


On 6/22/2017 4:38 PM, Markus Armbruster wrote:

Pradeep Jagadeesh  writes:


This patch factor out the duplicate throttle code that was present in
block and fsdev devices.

Signed-off-by: Pradeep Jagadeesh 
---
 blockdev.c  | 44 +---
 fsdev/qemu-fsdev-throttle.c | 43 +--
 fsdev/qemu-fsdev-throttle.h |  1 +
 include/qemu/throttle-options.h |  4 
 util/throttle.c | 50 +
 5 files changed, 57 insertions(+), 85 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6472548..5db9e5c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -386,49 +386,7 @@ static void extract_common_blockdev_options(QemuOpts 
*opts, int *bdrv_flags,
 }

 if (throttle_cfg) {
-throttle_config_init(throttle_cfg);
-throttle_cfg->buckets[THROTTLE_BPS_TOTAL].avg =
-qemu_opt_get_number(opts, "throttling.bps-total", 0);
-throttle_cfg->buckets[THROTTLE_BPS_READ].avg  =
-qemu_opt_get_number(opts, "throttling.bps-read", 0);
-throttle_cfg->buckets[THROTTLE_BPS_WRITE].avg =
-qemu_opt_get_number(opts, "throttling.bps-write", 0);
-throttle_cfg->buckets[THROTTLE_OPS_TOTAL].avg =
-qemu_opt_get_number(opts, "throttling.iops-total", 0);
-throttle_cfg->buckets[THROTTLE_OPS_READ].avg =
-qemu_opt_get_number(opts, "throttling.iops-read", 0);
-throttle_cfg->buckets[THROTTLE_OPS_WRITE].avg =
-qemu_opt_get_number(opts, "throttling.iops-write", 0);
-
-throttle_cfg->buckets[THROTTLE_BPS_TOTAL].max =
-qemu_opt_get_number(opts, "throttling.bps-total-max", 0);
-throttle_cfg->buckets[THROTTLE_BPS_READ].max  =
-qemu_opt_get_number(opts, "throttling.bps-read-max", 0);
-throttle_cfg->buckets[THROTTLE_BPS_WRITE].max =
-qemu_opt_get_number(opts, "throttling.bps-write-max", 0);
-throttle_cfg->buckets[THROTTLE_OPS_TOTAL].max =
-qemu_opt_get_number(opts, "throttling.iops-total-max", 0);
-throttle_cfg->buckets[THROTTLE_OPS_READ].max =
-qemu_opt_get_number(opts, "throttling.iops-read-max", 0);
-throttle_cfg->buckets[THROTTLE_OPS_WRITE].max =
-qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
-
-throttle_cfg->buckets[THROTTLE_BPS_TOTAL].burst_length =
-qemu_opt_get_number(opts, "throttling.bps-total-max-length", 1);
-throttle_cfg->buckets[THROTTLE_BPS_READ].burst_length  =
-qemu_opt_get_number(opts, "throttling.bps-read-max-length", 1);
-throttle_cfg->buckets[THROTTLE_BPS_WRITE].burst_length =
-qemu_opt_get_number(opts, "throttling.bps-write-max-length", 1);
-throttle_cfg->buckets[THROTTLE_OPS_TOTAL].burst_length =
-qemu_opt_get_number(opts, "throttling.iops-total-max-length", 1);
-throttle_cfg->buckets[THROTTLE_OPS_READ].burst_length =
-qemu_opt_get_number(opts, "throttling.iops-read-max-length", 1);
-throttle_cfg->buckets[THROTTLE_OPS_WRITE].burst_length =
-qemu_opt_get_number(opts, "throttling.iops-write-max-length", 1);
-
-throttle_cfg->op_size =
-qemu_opt_get_number(opts, "throttling.iops-size", 0);
-
+throttle_parse_options(throttle_cfg, opts);
 if (!throttle_is_valid(throttle_cfg, errp)) {
 return;
 }
diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
index 7ae4e86..da9c225 100644
--- a/fsdev/qemu-fsdev-throttle.c
+++ b/fsdev/qemu-fsdev-throttle.c
@@ -31,48 +31,7 @@ static void fsdev_throttle_write_timer_cb(void *opaque)

 void fsdev_throttle_parse_opts(QemuOpts *opts, FsThrottle *fst, Error **errp)
 {
-throttle_config_init(&fst->cfg);
-fst->cfg.buckets[THROTTLE_BPS_TOTAL].avg =
-qemu_opt_get_number(opts, "throttling.bps-total", 0);
-fst->cfg.buckets[THROTTLE_BPS_READ].avg  =
-qemu_opt_get_number(opts, "throttling.bps-read", 0);
-fst->cfg.buckets[THROTTLE_BPS_WRITE].avg =
-qemu_opt_get_number(opts, "throttling.bps-write", 0);
-fst->cfg.buckets[THROTTLE_OPS_TOTAL].avg =
-qemu_opt_get_number(opts, "throttling.iops-total", 0);
-fst->cfg.buckets[THROTTLE_OPS_READ].avg =
-qemu_opt_get_number(opts, "throttling.iops-read", 0);
-fst->cfg.buckets[THROTTLE_OPS_WRITE].avg =
-qemu_opt_get_number(opts, "throttling.iops-write", 0);
-
-fst->cfg.buckets[THROTTLE_BPS_TOTAL].max =
-qemu_opt_get_number(opts, "throttling.bps-total-max", 0);
-fst->cfg.buckets[THROTTLE_BPS_READ].max  =
-qemu_opt_get_number(opts, "throttling.bps-read-max", 0);
-fst->cfg.buckets[THROTTLE_BPS_WRITE].max =
-qemu_opt_get_number(opts, "throttling.bps-write-max", 0);
-fst->cfg.buckets[THROTTLE_OPS_TOTAL].max =
-qemu_opt_get_number(opts, "throttling.iops-total-max", 0);
-fst->cfg.buckets[THROT

Re: [Qemu-devel] [PATCH v2 07/23] hyperv: ensure VP index equal to QEMU cpu_index

2017-06-29 Thread Roman Kagan

On Wed, Jun 28, 2017 at 04:47:43PM +0200, Igor Mammedov wrote:
> On Wed, 21 Jun 2017 19:24:08 +0300
> Roman Kagan  wrote:
> 
> > Hyper-V identifies vCPUs by Virtual Processor (VP) index which can be
> > queried by the guest via HV_X64_MSR_VP_INDEX msr.  It is defined by the
> > spec as a sequential number which can't exceed the maximum number of
> > vCPUs per VM.
> > 
> > It has to be owned by QEMU in order to preserve it across migration.
> > 
> > However, the initial implementation in KVM didn't allow to set this
> > msr, and KVM used its own notion of VP index.  Fortunately, the way
> > vCPUs are created in QEMU/KVM makes it likely that the KVM value is
> > equal to QEMU cpu_index.
> > 
> > So choose cpu_index as the value for vp_index, and push that to KVM on
> > kernels that support setting the msr.  On older ones that don't, query
> > the kernel value and assert that it's in sync with QEMU.
> > 
> > Besides, since handling errors from vCPU init at hotplug time is
> > impossible, disable vCPU hotplug.
> proper place to check if cpu might be created is at 
> pc_cpu_pre_plug() where you can gracefully abort cpu creation process. 

Thanks for the suggestion, I'll rework it this way.

> Also it's possible to create cold-plugged CPUs in out of order
> sequence using
>  -device cpu-foo on CLI
> will be hyperv kvm/guest side ok with it?

On kernels that support setting HV_X64_MSR_VP_INDEX QEMU will
synchronize all sides.  On kernels that don't, if out-of-order creation
results in vp_index mismatch between the kernel and QEMU, vcpu creation
will fail.

> > This patch also introduces accessor functions to wrap the mapping
> > between a vCPU and its vp_index.  Besides, a few variables are renamed
> > to avoid confusion of vp_index with vcpu_id (== apic_id).
> > 
> > Signed-off-by: Roman Kagan 
> > ---
> > v1 -> v2:
> >  - were patches 5, 6 in v1
> >  - move vp_index initialization to hyperv_init_vcpu
> >  - check capability before trying to set the msr
> >  - set the msr on the usual kvm_put_msrs path
> >  - disable cpu hotplug if msr is not settable
> > 
> >  target/i386/hyperv.h |  5 -
> >  target/i386/hyperv.c | 16 +---
> >  target/i386/kvm.c| 51 
> > +++
> >  3 files changed, 68 insertions(+), 4 deletions(-)
> > 
> > diff --git a/target/i386/hyperv.h b/target/i386/hyperv.h
> > index 0c3b562..82f4757 100644
> > --- a/target/i386/hyperv.h
> > +++ b/target/i386/hyperv.h
> > @@ -32,11 +32,14 @@ struct HvSintRoute {
> >  
> >  int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit);
> >  
> > -HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, uint32_t sint,
> > +HvSintRoute *kvm_hv_sint_route_create(uint32_t vp_index, uint32_t sint,
> >HvSintAckClb sint_ack_clb);
> >  
> >  void kvm_hv_sint_route_destroy(HvSintRoute *sint_route);
> >  
> >  int kvm_hv_sint_route_set_sint(HvSintRoute *sint_route);
> >  
> > +uint32_t hyperv_vp_index(X86CPU *cpu);
> > +X86CPU *hyperv_find_vcpu(uint32_t vp_index);
> > +
> >  #endif
> > diff --git a/target/i386/hyperv.c b/target/i386/hyperv.c
> > index 227185c..4f57447 100644
> > --- a/target/i386/hyperv.c
> > +++ b/target/i386/hyperv.c
> > @@ -16,6 +16,16 @@
> >  #include "hyperv.h"
> >  #include "hyperv_proto.h"
> >  
> > +uint32_t hyperv_vp_index(X86CPU *cpu)
> > +{
> > +return CPU(cpu)->cpu_index;
> > +}
> 
> 
> > +X86CPU *hyperv_find_vcpu(uint32_t vp_index)
> > +{
> > +return X86_CPU(qemu_get_cpu(vp_index));
> > +}
> this helper isn't used in this patch, add it in the patch that would actually 
> use it

I thought I would put the only two functions that encapsulate the
knowledge of how vp_index is realted to cpu_index, in a single patch.

I'm now thinking of open-coding the iteration over cpus here and
directly look for cpu whose hyperv_vp_index() matches.  Then that
knowledge will become encapsulated in a single place, and indeed, this
helper can go into another patch where it's used.

> also if  qemu_get_cpu() were called from each CPU init,
> it would incur O(N^2) complexity, could you do without it?

It isn't called on hot paths (ATM it's called only when SINT routes are
created, which is at most one per cpu).  I don't see a problem here.

> > @@ -105,7 +115,7 @@ HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, 
> > uint32_t sint,
> >  }
> >  sint_route->gsi = gsi;
> >  sint_route->sint_ack_clb = sint_ack_clb;
> > -sint_route->vcpu_id = vcpu_id;
> > +sint_route->vcpu_id = vp_index;
>^^^ - shouldn't it also be re-named?

Right, but vcpu_id on sint_route is replaced with X86CPU pointer in a
followup patch, so I wasn't sure if it was worth while to add more
churn...

> 
> maybe split all renaming into separate patch ...

Part of the renaming will disappear eventually in the followup patches,
so I'm sure it's a good idea...  Opinions?

> > +static int hyperv_init_vcpu(X86CPU *cpu)
> > +{
> > +if (c

Re: [Qemu-devel] SPARC64 supported processors

2017-06-29 Thread Artyom Tarasenko

 Hi Mark,

On Thu, Jun 29, 2017 at 8:57 AM, Mark Cave-Ayland
 wrote:
> On 27/06/17 18:59, Pasha Tatashin wrote:
>
> Hi Pasha,
>
>> Hi,
>>
>> I am trying to evaluate the current qemu support for sparc64 processors.
>> First, it seems -smp is not supported for any processor, is this
>> correct? When I set -smp greater than 1, I am getting:
>>
>> qemu-system-sparc64: Number of SMP CPUs requested (2) exceeds max CPUs
>> supported by machine 'sun4u' (1)
>
> Yes, that is correct for the moment. Before the MTTCG patches were
> merged, TCG only ran one thread so -smp 2 would effectively split your
> single host CPU into 2 virtual CPUs which doesn't give much benefit.
>
> Now MTTCG has been included multi CPUs become much more useful and
> really for SPARC64 there are 2 main tasks (for sun4u): reviewing the
> atomic instructions and marking them with the appropriate barriers in
> TCG, and then updating OpenBIOS (and possibly fw_cfg) to handle multiple
> CPUs on startup.
>
> The other point to note here is that since the default sun4u CPU doesn't
> have the equivalent of a "sleep until interrupt" instruction then the
> guest processes will run at 100% CPU on the host since there is no way
> for the guest to indicate that it should yield.

Well, it should be possible to recognize the scheduler loops for the
popular OSes at the translation time and halt or throttle down if
nothing happens. But I'm not sure the hack is worth the effort.

> I would expect that this
> would be possible with other CPUs though.

The IIIi has ways to reduce the frequency, but not halt completely.
Not sure about IV(+).

>> I've done some testing for all available sparc64 cpus + latest linux
>> kernel:
>>
>> Fujitsu Sparc64Working
>> Fujitsu Sparc64 III Exception 0x30 (DAE_side_effect_page) in OpenBios
>> Fujitsu Sparc64 IV  Working
>> Fujitsu Sparc64 V   Working
>> TI UltraSparc I Working
>> TI UltraSparc IIWorking
>> TI UltraSparc IIi   Working
>> TI UltraSparc IIe   Exception 0x28 (division_by_zero) in init_tick_ops
>>   Can make it to work if is_hummingbird() is changed
>>   to return 0. The IO stick, and OpenBios stick properties
>>   are absent, so we have to default to %tick for now.
>>
>> Sun UltraSparc III  Illegal instruction in cheetah_boot():
>>   wr  %g0, %g1, %dcr
>>   It appears dispatch control register is not implemented.
>>
>> Sun UltraSparc IIIi
>> Sun UltraSparc IV
>> Sun UltraSparc IV+
>> Sun UltraSparc IIIi+
>>  In these four CPUs, I am getting exception 0x32 in
>>  cheetah_generic_boot: stxa  %g0, [ %g3 ] #ASI_DMMU
>>
>> Sun UltraSparc T1
>> Sun UltraSparc T2
>>
>> Both of the above boot pretty far but fail in this function when tmpfs
>> is mounted:
>> direct_pcr_write(unsigned long reg_num, u64 val)
>> __asm__ __volatile__("wr %0, 0x0, %%pcr" : : "r" (val));
>>
>> Seems like performance counter registers are not supported.
>>
>> needed to add these to kernel parameters:
>> keep_bootcon -> to see where we are panicking
>> lpj=1000 -> jiffers could not calculate for some reason.
>>
>> NEC UltraSparc IWorking
>>
>> Does this look right or may be I have missed something, and we can get
>> some of the Sun UltraSparc to work for example?
>
> That's certainly a very comprehensive set of tests :) I'd say that it's
> a fairly accurate reflection of where we are at the moment in that the
> basic SPARC64 core is working well and it's the CPU-specific parts which
> need more work.
>
> Currently myself and Artyom (CCd) are the SPARC maintainers and we tend
> to focus on different areas: my interest is with the sun4u machine as I
> have some legacy Solaris images I'm trying to run whilst Artyom has
> recently contributed niagara support to the 2.9 release. Note that I do
> try and keep the wiki page at
> http://wiki.qemu.org/Documentation/Platforms/SPARC up to date so that
> people can track current progress.
>
> Yes we can certainly look at trying to support more of the UltraSPARC
> CPUs if that is of interest. In reality the main reason that things
> haven't progressed further is due to lack of available time: whilst both
> myself and Artyom have done bits of work for particular clients, the
> majority of the work takes place in our own time outside of work and
> home which is, of course, limited.
>
> If there are particular features you would like added to QEMU's SPARC
> emulation then offers of help in terms of
> people/documentation/sponsorship are always welcome. Feel free to
> follow-up with both myself and Artyom off-list if this is something of
> interest to you.
>
>

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

Re: [Qemu-devel] [PATCH v4 11/13] virtio-console: chardev hotswap support

2017-06-29 Thread Marc-André Lureau

Hi

Looks good, but please write something in the commit message about what needs 
to be done for be-change (what this patch does).

thanks

- Original Message -
> Signed-off-by: Anton Nefedov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Amit Shah 
> ---
>  hw/char/virtio-console.c | 35 ++-
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
> index afb4949..198b2a8 100644
> --- a/hw/char/virtio-console.c
> +++ b/hw/char/virtio-console.c
> @@ -49,7 +49,7 @@ static ssize_t flush_buf(VirtIOSerialPort *port,
>  VirtConsole *vcon = VIRTIO_CONSOLE(port);
>  ssize_t ret;
>  
> -if (!qemu_chr_fe_get_driver(&vcon->chr)) {
> +if (!qemu_chr_fe_backend_connected(&vcon->chr)) {
>  /* If there's no backend, we can just say we consumed all data. */
>  return len;
>  }
> @@ -163,12 +163,35 @@ static void chr_event(void *opaque, int event)
>  }
>  }
>  
> +static int chr_be_change(void *opaque)
> +{
> +VirtConsole *vcon = opaque;
> +VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(vcon);
> +VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> +
> +if (k->is_console) {
> +qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
> + NULL, chr_be_change, vcon, NULL, true);
> +} else {
> +qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
> + chr_event, chr_be_change, vcon, NULL,
> false);
> +}
> +
> +if (vcon->watch) {
> +g_source_remove(vcon->watch);
> +vcon->watch = qemu_chr_fe_add_watch(&vcon->chr,
> +G_IO_OUT | G_IO_HUP,
> +chr_write_unblocked, vcon);
> +}
> +
> +return 0;
> +}
> +
>  static void virtconsole_realize(DeviceState *dev, Error **errp)
>  {
>  VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
>  VirtConsole *vcon = VIRTIO_CONSOLE(dev);
>  VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(dev);
> -Chardev *chr = qemu_chr_fe_get_driver(&vcon->chr);
>  
>  if (port->id == 0 && !k->is_console) {
>  error_setg(errp, "Port number 0 on virtio-serial devices reserved "
> @@ -176,7 +199,7 @@ static void virtconsole_realize(DeviceState *dev, Error
> **errp)
>  return;
>  }
>  
> -if (chr) {
> +if (qemu_chr_fe_backend_connected(&vcon->chr)) {
>  /*
>   * For consoles we don't block guest data transfer just
>   * because nothing is connected - we'll just let it go
> @@ -188,11 +211,13 @@ static void virtconsole_realize(DeviceState *dev, Error
> **errp)
>   */
>  if (k->is_console) {
>  qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
> - NULL, NULL, vcon, NULL, true);
> + NULL, chr_be_change,
> + vcon, NULL, true);
>  virtio_serial_open(port);
>  } else {
>  qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
> - chr_event, NULL, vcon, NULL, false);
> + chr_event, chr_be_change,
> + vcon, NULL, false);
>  }
>  }
>  }
> --
> 2.7.4
> 
>

Re: [Qemu-devel] [PATCH v4 00/13] chardevice hotswap

2017-06-29 Thread Marc-André Lureau

Hi,

The series looks good to me. You could try to ping the subsystem maintainers to 
get their reviews (serial/virtio/hmp).

Paolo, would you like to take the series then?

thanks 

- Original Message -
> Changed in v4:
>   - rebased on top of the latest chardev changes
>   - remarks applied
>   - patch 1 fixed so it works with alias names
> 
> 
> 
> Changed in v3:
>   - minor remarks to patch 1 applied
>   - patch 3: avoid using bottom-half, handle syncronously
> As mentioned, it gets thing complicated and is only a problem for
> a monitor-connected chardev hotswap and that is not supported for now
>   - tests added (patches 6-9)
> 
> 
> 
> This serie is a v2 of the February submit
> http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg01989.html
> 
> The interface is changed as requested and the changes are slightly reworked
> and split into separate patches.
> 
> 
> 
> The patchset adds support of the character device change without
> a frontend device removal.
> Yet isa-serial and virtio-serial frontends are supported.
> 
> The feature can be helpful for e.g. Windows debug allowing to
> establish connection to a live VM from VM with WinDbg.
> 
> Anton Nefedov (13):
>   char: move QemuOpts->ChardevBackend translation to a separate func
>   char: add backend hotswap handler
>   char: chardevice hotswap
>   char: forbid direct chardevice access for hotswap devices
>   char: avoid chardevice direct access
>   test-char: unref chardev-udp after test
>   test-char: split char_udp_test
>   test-char: split char_file_test
>   test-char: add hotswap test
>   hmp: add hmp analogue for qmp-chardev-change
>   virtio-console: chardev hotswap support
>   serial: move TIOCM update to a separate function
>   serial: chardev hotswap support
> 
>  include/chardev/char-fe.h   |  39 +++
>  include/chardev/char.h  |  19 
>  hmp.h   |   1 +
>  backends/rng-egd.c  |   2 +-
>  chardev/char-fe.c   |  49 +++--
>  chardev/char-mux.c  |   1 +
>  chardev/char.c  | 167 +++-
>  gdbstub.c   |   2 +-
>  hmp-commands.hx |  18 ++-
>  hmp.c   |  34 ++
>  hw/arm/pxa2xx.c |   3 +-
>  hw/arm/strongarm.c  |   4 +-
>  hw/char/bcm2835_aux.c   |   2 +-
>  hw/char/cadence_uart.c  |   4 +-
>  hw/char/debugcon.c  |   4 +-
>  hw/char/digic-uart.c|   2 +-
>  hw/char/escc.c  |   8 +-
>  hw/char/etraxfs_ser.c   |   2 +-
>  hw/char/exynos4210_uart.c   |   4 +-
>  hw/char/grlib_apbuart.c |   4 +-
>  hw/char/imx_serial.c|   2 +-
>  hw/char/ipoctal232.c|   4 +-
>  hw/char/lm32_juart.c|   2 +-
>  hw/char/lm32_uart.c |   2 +-
>  hw/char/mcf_uart.c  |   2 +-
>  hw/char/milkymist-uart.c|   2 +-
>  hw/char/parallel.c  |   2 +-
>  hw/char/pl011.c |   2 +-
>  hw/char/sclpconsole-lm.c|   4 +-
>  hw/char/sclpconsole.c   |   4 +-
>  hw/char/serial.c|  63 ---
>  hw/char/sh_serial.c |   4 +-
>  hw/char/spapr_vty.c |   4 +-
>  hw/char/stm32f2xx_usart.c   |   3 +-
>  hw/char/terminal3270.c  |   4 +-
>  hw/char/virtio-console.c|  35 +-
>  hw/char/xen_console.c   |   4 +-
>  hw/char/xilinx_uartlite.c   |   2 +-
>  hw/ipmi/ipmi_bmc_extern.c   |   4 +-
>  hw/mips/boston.c|   2 +-
>  hw/mips/mips_malta.c|   2 +-
>  hw/misc/ivshmem.c   |   6 +-
>  hw/usb/ccid-card-passthru.c |   6 +-
>  hw/usb/dev-serial.c |   7 +-
>  hw/usb/redirect.c   |   7 +-
>  monitor.c   |   4 +-
>  net/colo-compare.c  |  10 +-
>  net/filter-mirror.c |   8 +-
>  net/slirp.c |   2 +-
>  net/vhost-user.c|   7 +-
>  qapi-schema.json|  40 +++
>  qtest.c |   2 +-
>  tests/test-char.c   | 263
>  +---
>  tests/test-hmp.c|   1 +
>  tests/vhost-user-test.c |   2 +-
>  55 files changed, 685 insertions(+), 202 deletions(-)
> 
> --
> 2.7.4
> 
>

Re: [Qemu-devel] [PATCH] iotests: Add test for dataplane mirroring

2017-06-29 Thread Kevin Wolf

Am 29.06.2017 um 01:23 hat Max Reitz geschrieben:
> Signed-off-by: Max Reitz 
> ---
> Depends on Stefan's "virtio: use ioeventfd in TCG and qtest mode" series
> to work at all, and on "mirror: Fix inconsistent backing AioContext for
> after mirroring" (in my block branch) so it does not fail.
> ---
>  tests/qemu-iotests/106 | 97 
> ++
>  tests/qemu-iotests/106.out | 14 +++

Your initiative to fill up the numbering hole is laudable, but are you
intentionally using 106 for multiple patches of yours? ;-)

>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 112 insertions(+)
>  create mode 100755 tests/qemu-iotests/106
>  create mode 100644 tests/qemu-iotests/106.out
> 
> diff --git a/tests/qemu-iotests/106 b/tests/qemu-iotests/106
> new file mode 100755
> index 000..ad438b5
> --- /dev/null
> +++ b/tests/qemu-iotests/106
> @@ -0,0 +1,97 @@
> +#!/bin/bash
> +#
> +# Test case for mirroring with dataplane
> +#
> +# Copyright (C) 2017 Red Hat, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +
> +# creator
> +owner=mre...@redhat.com
> +
> +seq=$(basename $0)
> +echo "QA output created by $seq"
> +
> +here=$PWD
> +status=1# failure is the default!
> +
> +_cleanup()
> +{
> +_cleanup_qemu
> +_cleanup_test_img
> +_rm_test_img "$TEST_IMG.overlay0"
> +_rm_test_img "$TEST_IMG.overlay1"
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and qemu instance handling
> +. ./common.rc
> +. ./common.filter
> +. ./common.qemu
> +
> +_supported_fmt qcow2
> +_supported_proto file
> +_supported_os Linux
> +
> +IMG_SIZE=64K
> +
> +_make_test_img $IMG_SIZE
> +TEST_IMG="$TEST_IMG.overlay0" _make_test_img -b "$TEST_IMG" $IMG_SIZE
> +TEST_IMG="$TEST_IMG.overlay1" _make_test_img -b "$TEST_IMG" $IMG_SIZE
> +
> +# So that we actually have something to mirror and the job does not return
> +# immediately (which may be bad because then we cannot know whether the
> +# 'return' or the 'BLOCK_JOB_READY' comes first).
> +$QEMU_IO -c 'write 0 64' "$TEST_IMG.overlay0" | _filter_qemu_io

64 bytes? Unusual, but yes, why not. We probably don't test this too
often. :-)

> +# We cannot use virtio-blk here because that does not actually set the 
> attached
> +# BB's AioContext in qtest mode

Why that? I don't see any qtest special casing in the virtio-blk code,
so is this intentional?

> +_launch_qemu \
> +-object iothread,id=iothr \
> +-blockdev 
> node-name=source,driver=$IMGFMT,file.driver=file,file.filename="$TEST_IMG.overlay0"
>  \
> +-device virtio-scsi,id=scsi-bus,iothread=iothr \
> +-device scsi-hd,bus=scsi-bus.0,drive=source
> +
> +_send_qemu_cmd $QEMU_HANDLE \
> +"{ 'execute': 'qmp_capabilities' }" \
> +'return'
> +
> +_send_qemu_cmd $QEMU_HANDLE \
> +"{ 'execute': 'drive-mirror',
> +   'arguments': {
> +   'job-id': 'mirror',
> +   'device': 'source',
> +   'target': '$TEST_IMG.overlay1',
> +   'mode':   'existing',
> +   'sync':   'top'
> +   } }" \
> +'BLOCK_JOB_READY'
> +
> +# The backing BDS should be assigned the overlay's AioContext
> +_send_qemu_cmd $QEMU_HANDLE \
> +"{ 'execute': 'block-job-complete',
> +   'arguments': { 'device': 'mirror' } }" \
> +'BLOCK_JOB_COMPLETED'
> +
> +_send_qemu_cmd $QEMU_HANDLE \
> +"{ 'execute': 'quit' }" \
> +'return'
> +
> +wait=yes _cleanup_qemu
> +
> +# success, all done
> +echo '*** done'
> +rm -f $seq.full
> +status=0

The actual test looks good to me.

Kevin

Re: [Qemu-devel] [PATCH V3 0/3] COLO-compare: Make COLO-compare support remote COLO-frame

2017-06-29 Thread Dr. David Alan Gilbert

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:
> This series focus on COLO-proxy remote colo-frame support.
> Xen COLO-frame is the first user. We add a new chardev socket
> in colo-compare as the way of communicate with remote COLO-frame.
> And remote COLO-frame notify colo-proxy part depend on this serise:
> https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg03904.html

Can you explain a bit more about how the 'remote colo-frame' works?
Is this the comparison that's separate or what?

Dave

> I will send another part of this series after depend patchset have
> been merged.
> 
> V3:
>  - Fix codestyle.
> 
> V2:
>  - Rename this series.
>  - Change communication way to remote colo-frame.
>  - Some bugfix.
>  - Split the main function, anther part wait depend patchset.
> 
> 
> Zhang Chen (3):
>   COLO-compare: Add new parameter for communicate with remote colo-frame
>   COLO-compare: Add remote checkpoint notify chardev socket handler
> frame
>   COLO-compare: Add remote initialization and checkpoint notification
> 
>  net/colo-compare.c | 91 
> --
>  qemu-options.hx| 41 
>  2 files changed, 124 insertions(+), 8 deletions(-)
> 
> -- 
> 2.7.4
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v4 3/5] migration: Create load_setup()/cleanup() methods

2017-06-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> "Dr. David Alan Gilbert"  wrote:
> > * Juan Quintela (quint...@redhat.com) wrote:
> >> We need to do things at load time and at cleanup time.
> >> 
> >> Signed-off-by: Juan Quintela 
> >> 
> >> --
> >> 
> >> Move the printing of the error message so we can print the device
> >> giving the error.
> >> Add call to postcopy stuff
> >> ---
> >>  include/migration/register.h |  2 ++
> >>  migration/savevm.c   | 45 
> >> +++-
> >>  migration/savevm.h   |  1 +
> >>  migration/trace-events   |  2 ++
> >>  4 files changed, 49 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/include/migration/register.h b/include/migration/register.h
> >> index 938ea2b..a0f1edd 100644
> >> --- a/include/migration/register.h
> >> +++ b/include/migration/register.h
> >> @@ -39,6 +39,8 @@ typedef struct SaveVMHandlers {
> >>uint64_t *non_postcopiable_pending,
> >>uint64_t *postcopiable_pending);
> >>  LoadStateHandler *load_state;
> >> +int (*load_setup)(QEMUFile *f, void *opaque);
> >> +int (*load_cleanup)(void *opaque);
> >>  } SaveVMHandlers;
> >>  
> >>  int register_savevm_live(DeviceState *dev,
> >> diff --git a/migration/savevm.c b/migration/savevm.c
> >> index fee11c5..fdd15fa 100644
> >> --- a/migration/savevm.c
> >> +++ b/migration/savevm.c
> >> @@ -1541,7 +1541,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
> >>   * got a bad migration state).
> >>   */
> >>  migration_incoming_state_destroy();
> >> -
> >> +qemu_loadvm_state_cleanup();
> >
> > Is that order right? It seems wrong to call the cleanup
> > code after MIS is destroyed.
> > (The precopy path seems to call mis_destroy at the end of
> > process_incoming_migration_bh which is much later).
> 
> we can do either way, for now it don't matters.
> 
> Once there, it got me thinking that we are doing things in a very
> "interesting" way on the incoming side:
> 
> (postcopy)
> 
> postcopy_ram_incoming_cleanup()
> migration_incoming_state_destroy()
> qemu_loadvm_state_cleanup()
> 
> (Ok, probably it is better to exchange the last two).
> 
> But I *think* that we should move the postcopy_ram_incoming_cleanup()
> inside ram_load_cleanup(), no?

postcopy_ram_incoming_cleanup shuts down a thread that's shared across
all RAMBlock's, so I don't think it can all be merged into
ram_load_cleanup.   You might be able to do the equivalent of the
cleanup_range function.

> And we don't have a postcopy_ram_incoming_setup() We could put there the
> mmap of mis->postcopy_tmp_zero_page and mis->largest_page_size, no?

Again that's a single shared zero page, not per RAMBlock.

> I am trying to understand if the postcopy_ram_incoming_init() can be
> moved soon, but I think no.

Dave

> 
> Later, Juan.
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 3/8] MAINTAINERS: update Xen entry

2017-06-29 Thread Anthony PERARD

On Wed, Jun 28, 2017 at 10:02:55PM -0300, Philippe Mathieu-Daudé wrote:
> moved in 56e2cd24..28b99f47 to accel/

That is not accurate, files have been moved to hw/i386/xen/ as written
in both commits messages.

Beside that:
Acked-by: Anthony PERARD 

> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 86a08c5aac..530293044b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -323,7 +323,6 @@ M: Stefano Stabellini 
>  M: Anthony Perard 
>  L: xen-de...@lists.xenproject.org
>  S: Supported
> -F: xen-*
>  F: */xen*
>  F: hw/9pfs/xen-9p-backend.c
>  F: hw/char/xen_console.c
> -- 
> 2.13.1
> 

-- 
Anthony PERARD

Re: [Qemu-devel] [PATCH v2 4/7] qdev: Introduce DEFINE_PROP_LINK

2017-06-29 Thread Igor Mammedov

On Thu, 29 Jun 2017 16:04:49 +0800
Fam Zheng  wrote:

> This property can be used to replace the object_property_add_link in
> device code, to add a link to other objects, which is a common pattern.
> 
> Signed-off-by: Fam Zheng 
> ---
>  hw/core/qdev-properties.c| 16 
>  include/hw/qdev-core.h   |  5 +
>  include/hw/qdev-properties.h | 11 +++
>  3 files changed, 32 insertions(+)
> 
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index 68cd653..7c11eb8 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -1192,3 +1192,19 @@ PropertyInfo qdev_prop_size = {
>  .set = set_size,
>  .set_default_value = set_default_value_uint,
>  };
> +
> +/* --- object link property --- */
> +
> +static void create_link_property(Object *obj, Property *prop, Error **errp)
> +{
> +Object **child = qdev_get_prop_ptr(DEVICE(obj), prop);
> +
> +object_property_add_link(obj, prop->name, prop->link_type,
> + child, prop->link.check,
> + prop->link.flags, errp);
> +}
> +
> +PropertyInfo qdev_prop_link = {
> +.name = "link",
> +.create = create_link_property,
> +};
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 33518ee..40afb3d 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -5,6 +5,7 @@
>  #include "qemu/option.h"
>  #include "qemu/bitmap.h"
>  #include "qom/object.h"
> +#include "qom/link-property.h"
>  #include "hw/irq.h"
>  #include "hw/hotplug.h"
>  
> @@ -233,6 +234,10 @@ struct Property {
>  int  arrayoffset;
>  PropertyInfo *arrayinfo;
>  int  arrayfieldsize;
> +/* Only @check and @flags are used; @child is unuseful because we need a
> + * dynamic pointer in @obj as derived from @offset. */
 @check, @flags, @child, @obj are not fields of struct Property so it's
not clear what doc comments talks about. Maybe adding prefixes would help,
for example:
  @link.child

> +LinkProperty link;
> +const char   *link_type;
>  };
>  
>  struct PropertyInfo {
> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> index 39bf4b2..767c10b 100644
> --- a/include/hw/qdev-properties.h
> +++ b/include/hw/qdev-properties.h
> @@ -30,6 +30,7 @@ extern PropertyInfo qdev_prop_pci_devfn;
>  extern PropertyInfo qdev_prop_blocksize;
>  extern PropertyInfo qdev_prop_pci_host_devaddr;
>  extern PropertyInfo qdev_prop_arraylen;
> +extern PropertyInfo qdev_prop_link;
>  
>  #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
>  .name  = (_name),\
> @@ -117,6 +118,16 @@ extern PropertyInfo qdev_prop_arraylen;
>  .arrayoffset = offsetof(_state, _arrayfield),   \
>  }
>  
> +#define DEFINE_PROP_LINK(_name, _state, _field, _type, _check, _flags) {\
> +.name = (_name),\
> +.info = &(qdev_prop_link),  \
> +.offset = offsetof(_state, _field)  \
> ++ type_check(Object *, typeof_field(_state, _field)),   \
> +.link.check = _check,   \
> +.link.flags = _flags,   \
maybe we shouldn't have custom _check and _flags fields as majority of devices
use qdev_prop_allow_set_link_before_realize + OBJ_PROP_LINK_UNREF_ON_RELEASE
policies. That will save us ~2LOC per property of boiler plate code.

I've looked at current device link usage and there is only few that actually
use or need custom check/flags and several are misusing 
object_property_allow_set_link()
where they should use qdev_prop_allow_set_link_before_realize().

We could leave alone exceptions that require custom check/flags as is
or provide DEFINE_PROP_LINK_CUSTOM() for a few that actually need it and fix
ones that misuse it.

There is a bunch of board code that uses links but that's not for Device
so it's not related to this macro anyways.

> +.link_type  = _type,\
> +}
> +
>  #define DEFINE_PROP_UINT8(_n, _s, _f, _d)   \
>  DEFINE_PROP_UNSIGNED(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
>  #define DEFINE_PROP_UINT16(_n, _s, _f, _d)  \

Re: [Qemu-devel] [PATCH] linux-user: Put PPC AT_IGNOREPPC auxv entries in the right place

2017-06-29 Thread Peter Maydell

On 27 June 2017 at 19:05, Richard Henderson  wrote:
> On 06/27/2017 09:49 AM, Peter Maydell wrote:
>>
>> The 32-bit PPC auxv is a bit complicated because in the
>> mists of time it used to be 16-aligned rather than directly
>> after the environment. Older glibc versions had code to
>> try to probe for whether it needed alignment or not:
>>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/powerpc/dl-sysdep.c;hb=e84eabb3871c9b39e59323bf3f6b98c2ca9d1cd0
>> and the kernel has code which puts some magic entries at
>> the bottom to ensure that the alignment probe fails:
>>
>> http://elixir.free-electrons.com/linux/latest/source/arch/powerpc/include/asm/elf.h#L158
>>
>> QEMU has similar code too, but it was broken by commit
>> 7c4ee5bcc82e64, which changed elfload.c from filling in
>> the auxv starting at the highest address and working down
>> to starting at the lowest address and working up. This
>> means that the ARCH_DLINFO hook must now be invoked first
>> rather than last, and the entries in it for PPC must
>> be reversed so that the magic AT_IGNOREPPC entries come
>> at the lowest address in the auxv as they should.
>>
>> The effect of this was that if running a guest binary that
>> used an old glibc with the alignment probing the guest ld.so
>> code would segfault if the size of the guest environment and
>> argv happened to put the auxv at an address that triggered
>> the alignment code in the guest glibc.
>>
>> Signed-off-by: Peter Maydell
>> ---
>>   linux-user/elfload.c | 23 ---
>>   1 file changed, 12 insertions(+), 11 deletions(-)
>
>
> Reviewed-by: Richard Henderson 
> Tested-by:  Richard Henderson 

Thanks; applied directly to master since this has been
causing my mergebuild tests to fail (some recent environment
change result in it triggering this week...)

thanks
-- PMM

[Qemu-devel] [PATCH V2 3/8] block/qcow2: parse compress create options

2017-06-29 Thread Peter Lieven

this adds parsing and validation for the compress create
options. They are only validated but not yet used.

Signed-off-by: Peter Lieven 
---
 block/qcow2.c | 56 +--
 block/qcow2.h |  9 
 include/block/block_int.h |  2 ++
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2f94f03..308121a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2144,7 +2144,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  const char *backing_file, const char *backing_format,
  int flags, size_t cluster_size, PreallocMode prealloc,
  QemuOpts *opts, int version, int refcount_order,
- Error **errp)
+ const char *compress_format_name,
+ uint8_t compress_level, Error **errp)
 {
 int cluster_bits;
 QDict *options;
@@ -2390,11 +2391,24 @@ out:
 return ret;
 }
 
+static int qcow2_compress_format_from_name(char *fmt)
+{
+if (!fmt || !fmt[0]) {
+return QCOW2_COMPRESS_ZLIB_COMPAT;
+} else if (g_str_equal(fmt, "zlib")) {
+return QCOW2_COMPRESS_ZLIB;
+} else {
+return -EINVAL;
+}
+}
+
 static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 char *backing_file = NULL;
 char *backing_fmt = NULL;
 char *buf = NULL;
+char *compress_format_name = NULL;
+uint64_t compress_level = 0;
 uint64_t size = 0;
 int flags = 0;
 size_t cluster_size = DEFAULT_CLUSTER_SIZE;
@@ -2475,15 +2489,40 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 
 refcount_order = ctz32(refcount_bits);
 
+compress_format_name = qemu_opt_get_del(opts,
+BLOCK_OPT_COMPRESS_FORMAT);
+ret = qcow2_compress_format_from_name(compress_format_name);
+if (ret < 0) {
+error_setg(errp, "Compress format '%s' is not supported",
+   compress_format_name);
+goto finish;
+}
+compress_level = qemu_opt_get_number_del(opts, BLOCK_OPT_COMPRESS_LEVEL,
+ compress_level);
+if (ret == QCOW2_COMPRESS_ZLIB_COMPAT && compress_level > 0) {
+error_setg(errp, "Compress level can only be defined in conjunction"
+   " with compress format");
+ret = -EINVAL;
+goto finish;
+}
+if ((ret == QCOW2_COMPRESS_ZLIB && compress_level > 9) ||
+compress_level > 0xff) {
+error_setg(errp, "Compress level %" PRIu64 " is not supported for"
+   " format '%s'", compress_level, compress_format_name);
+ret = -EINVAL;
+goto finish;
+}
+
 ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
 cluster_size, prealloc, opts, version, refcount_order,
-&local_err);
+compress_format_name, compress_level, &local_err);
 error_propagate(errp, local_err);
 
 finish:
 g_free(backing_file);
 g_free(backing_fmt);
 g_free(buf);
+g_free(compress_format_name);
 return ret;
 }
 
@@ -3458,6 +3497,19 @@ static QemuOptsList qcow2_create_opts = {
 .help = "Width of a reference count entry in bits",
 .def_value_str = "16"
 },
+{
+.name = BLOCK_OPT_COMPRESS_FORMAT,
+.type = QEMU_OPT_STRING,
+.help = "Compress format used for compressed clusters (zlib)",
+.def_value_str = ""
+},
+{
+.name = BLOCK_OPT_COMPRESS_LEVEL,
+.type = QEMU_OPT_NUMBER,
+.help = "Compress level used for compressed clusters (0 = default,"
+" 1=fastest, x=best; x varies depending on 
compress.format)",
+.def_value_str = "0"
+},
 { /* end of list */ }
 }
 };
diff --git a/block/qcow2.h b/block/qcow2.h
index 87b15eb..d21da33 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -171,6 +171,15 @@ typedef struct Qcow2UnknownHeaderExtension {
 } Qcow2UnknownHeaderExtension;
 
 enum {
+/* QCOW2_COMPRESS_ZLIB_COMPAT specifies to use the old standard
+ * zlib compression with a smaller window size that is compatible with
+ * old QEMU versions. This compression is used if no compression format
+ * is specified at create time */
+QCOW2_COMPRESS_ZLIB_COMPAT = 0,
+QCOW2_COMPRESS_ZLIB= 1,
+};
+
+enum {
 QCOW2_FEAT_TYPE_INCOMPATIBLE= 0,
 QCOW2_FEAT_TYPE_COMPATIBLE  = 1,
 QCOW2_FEAT_TYPE_AUTOCLEAR   = 2,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 15fa602..49811b0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,8 @@
 #define BLOCK_OPT_NOCOW "nocow"
 #define BLOCK_OPT_OBJECT_SIZE   "object_size"
 #de

[Qemu-devel] [PATCH V2 8/8] block/qcow2: add lzo compress format

2017-06-29 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 block/qcow2-cluster.c | 15 +++
 block/qcow2.c | 26 +-
 block/qcow2.h |  1 +
 configure |  2 +-
 qapi/block-core.json  | 14 --
 qemu-img.texi |  1 +
 6 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 353ac87..666d090 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -24,6 +24,9 @@
 
 #include "qemu/osdep.h"
 #include 
+#ifdef CONFIG_LZO
+#include 
+#endif
 
 #include "qapi/error.h"
 #include "qemu-common.h"
@@ -1546,6 +1549,18 @@ static int decompress_buffer(uint8_t *out_buf, int 
out_buf_size,
 inflateEnd(&z_strm);
 break;
 }
+#ifdef CONFIG_LZO
+case QCOW2_COMPRESS_LZO:
+out_len = out_buf_size;
+ret = lzo1x_decompress_safe(buf, buf_size, out_buf,
+(lzo_uint *) &out_len, NULL);
+if (ret == LZO_E_INPUT_NOT_CONSUMED) {
+/* We always read up to the next sector boundary. Thus
+ * buf_size may be larger than the original compressed size. */
+ret = 0;
+}
+break;
+#endif
 default:
 abort(); /* should never reach this point */
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index b41e58d..ef94193 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -26,6 +26,9 @@
 #include "sysemu/block-backend.h"
 #include "qemu/module.h"
 #include 
+#ifdef CONFIG_LZO
+#include 
+#endif
 #include "block/qcow2.h"
 #include "qemu/error-report.h"
 #include "qapi/qmp/qerror.h"
@@ -83,6 +86,10 @@ static int qcow2_compress_format_from_name(char *fmt)
 return QCOW2_COMPRESS_ZLIB_COMPAT;
 } else if (g_str_equal(fmt, "zlib")) {
 return QCOW2_COMPRESS_ZLIB;
+#ifdef CONFIG_LZO
+} else if (g_str_equal(fmt, "lzo")) {
+return QCOW2_COMPRESS_LZO;
+#endif
 } else {
 return -EINVAL;
 }
@@ -92,6 +99,7 @@ static int qcow2_compress_level_supported(int id, uint64_t 
level)
 {
 if ((id == QCOW2_COMPRESS_ZLIB_COMPAT && level > 0) ||
 (id == QCOW2_COMPRESS_ZLIB && level > 9) ||
+(id == QCOW2_COMPRESS_LZO && level > 0) ||
 level > 0xff) {
 return -EINVAL;
 }
@@ -197,6 +205,13 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
s->compress_format.level, s->compress_format.name);
 return 5;
 }
+#ifdef CONFIG_LZO
+if (s->compress_format_id == QCOW2_COMPRESS_LZO &&
+lzo_init() != LZO_E_OK) {
+error_setg(errp, "ERROR: internal error - lzo_init() failed");
+return 6;
+}
+#endif
 
 #ifdef DEBUG_EXT
 printf("Qcow2: Got compress format %s with compress level %"
@@ -2751,7 +2766,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 z_stream z_strm = {};
 int z_windowBits = -15, z_level = Z_DEFAULT_COMPRESSION;
 int ret, out_len = 0;
-uint8_t *buf, *out_buf = NULL, *local_buf = NULL;
+uint8_t *buf, *out_buf = NULL, *local_buf = NULL, *work_buf = NULL;
 uint64_t cluster_offset;
 
 if (bytes == 0) {
@@ -2803,6 +2818,14 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 
 ret = ret != Z_STREAM_END;
 break;
+#ifdef CONFIG_LZO
+case QCOW2_COMPRESS_LZO:
+out_buf = g_malloc(s->cluster_size + s->cluster_size / 16 + 64 + 3);
+work_buf = g_malloc(LZO1X_1_MEM_COMPRESS);
+ret = lzo1x_1_compress(buf, s->cluster_size, out_buf,
+   (lzo_uint *) &out_len, work_buf);
+break;
+#endif
 default:
 abort(); /* should never reach this point */
 }
@@ -2848,6 +2871,7 @@ success:
 fail:
 qemu_vfree(local_buf);
 g_free(out_buf);
+g_free(work_buf);
 return ret;
 }
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 4ceaba1..038688d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -177,6 +177,7 @@ enum {
  * is specified at create time */
 QCOW2_COMPRESS_ZLIB_COMPAT = 0,
 QCOW2_COMPRESS_ZLIB= 1,
+QCOW2_COMPRESS_LZO = 2,
 };
 
 enum {
diff --git a/configure b/configure
index c571ad1..81d3286 100755
--- a/configure
+++ b/configure
@@ -1890,7 +1890,7 @@ if test "$lzo" != "no" ; then
 int main(void) { lzo_version(); return 0; }
 EOF
 if compile_prog "" "-llzo2" ; then
-libs_softmmu="$libs_softmmu -llzo2"
+LIBS="$LIBS -llzo2"
 lzo="yes"
 else
 if test "$lzo" = "yes"; then
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1574ffb..736073a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2284,11 +2284,12 @@
 ##
 # @Qcow2CompressFormat:
 # @zlib: standard zlib deflate compression
+# @lzo: lzo1x compression
 #
 # Since: 2.10
 ##
 { 'enum': 'Qcow2CompressFormat',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', 'lzo' ] }
 
 ##
 # @Qcow2Co

[Qemu-devel] [PATCH V2 4/8] qemu-img: add documentation for compress settings

2017-06-29 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 qemu-img.texi | 21 +
 1 file changed, 21 insertions(+)

diff --git a/qemu-img.texi b/qemu-img.texi
index 5b925ec..430f0b9 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -621,6 +621,27 @@ file which is COW and has data blocks already, it couldn't 
be changed to NOCOW
 by setting @code{nocow=on}. One can issue @code{lsattr filename} to check if
 the NOCOW flag is set or not (Capital 'C' is NOCOW flag).
 
+@item compress.format
+Defines which compression algorithm is should be used for compressed clusters.
+The following options are available if support for the respective libraries
+has been enabled at compile time:
+
+   zlibUses standard zlib compression
+
+The compression algorithm can only be defined at image create time and cannot
+be changed later.
+
+Note: defining a compression format will result in the compression format
+  extension being written to the Qcow2 image. Older versions of QEMU will
+  not be able to open images with this extension.
+
+@item compress.level
+Defines which compression level is used for the selected compression format.
+The default of @code{compress.level=0} will use the default compression level
+for the format. Alternate values range from 1 for fastest compression to
+x for the best compression (x max vary between compression formats). This is
+always a trade in of compression speed against compressed size.
+
 @end table
 
 @item Other
-- 
1.9.1

[Qemu-devel] [PATCH V2 0/8] add Qcow2 compress format extension

2017-06-29 Thread Peter Lieven

this adds a create option for Qcow2 images to specify the compression format
and level for compressed clusters. The series adds 2 algorithms to choose from:
zlib and lzo. zlib is the current default, but with unoptimal settings.
If no compress.format option is specified the old zlib with the old parameters
is used and the created images are backwards compatible with older QEMU version.
As soon as a compression format is specified a new compress format header 
extension
is written and the Qcow2 images are incompatible with older QEMU versions.

Some numbers for an uncompressed Debian 9 QCOW2 image (size 1148MB):

compress.format compress timecompressed size   decompress time
none35.7s339MB 3.4s
zlib (default)  30.5s320MB 3.2s
zlib (level 1)  12.8s348MB 3.2s
lzo  4.2s429MB 1.6s

Changes V1->V2:
  - split the series into more patches
  - added an qapi scheme for the compression settings
  - renamed compression_algorithm to compress.format and added compress.level+
  - updated the header extension to carry a variable extra payload and compress
level.
  - removed extra reservations for header extensions
  - added missing lzo_init and fixed compress overhead for lzo

Peter Lieven (8):
  docs: add compress format extension to qcow2 spec
  qapi: add compress parameters to Qcow2 Blockdev options
  block/qcow2: parse compress create options
  qemu-img: add documentation for compress settings
  block/qcow2: read and write the compress format extension
  block/qcow2: optimize qcow2_co_pwritev_compressed
  block/qcow2: start using the compress format extension
  block/qcow2: add lzo compress format

 block/qcow2-cluster.c |  73 +++-
 block/qcow2.c | 219 +++---
 block/qcow2.h |  33 +--
 configure |   2 +-
 docs/interop/qcow2.txt|  43 -
 include/block/block_int.h |   2 +
 qapi/block-core.json  |  54 +++-
 qemu-img.texi |  22 +
 8 files changed, 384 insertions(+), 64 deletions(-)

-- 
1.9.1

[Qemu-devel] [PATCH V2 2/8] qapi: add compress parameters to Qcow2 Blockdev options

2017-06-29 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 qapi/block-core.json | 44 +++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f85c223..1574ffb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2282,6 +2282,43 @@
 'mode':  'Qcow2OverlapCheckMode' } }
 
 ##
+# @Qcow2CompressFormat:
+# @zlib: standard zlib deflate compression
+#
+# Since: 2.10
+##
+{ 'enum': 'Qcow2CompressFormat',
+  'data': [ 'zlib' ] }
+
+##
+# @Qcow2CompressZLib:
+#
+# Since: 2.10
+##
+{ 'struct': 'Qcow2CompressZLib',
+  'data': { } }
+
+##
+# @Qcow2Compress:
+#
+# Specifies the compression format and compression level that should
+# be used for compressed Qcow2 clusters.
+#
+# @format: specifies the compression format to use. (defaults to zlib)
+#
+# @level: specifies the compression level. 0 = default compression,
+# 1 = fastest compression, x = highest compresion (x may very between
+# different compression formats)
+#
+# Since: 2.10
+##
+{ 'union': 'Qcow2Compress',
+  'base': { 'format': 'Qcow2CompressFormat',
+'level': 'uint8' },
+  'discriminator': 'format',
+  'data': { 'zlib': 'Qcow2CompressZLib' } }
+
+##
 # @BlockdevOptionsQcow2:
 #
 # Driver specific block device options for qcow2.
@@ -2316,6 +2353,10 @@
 # caches. The interval is in seconds. The default value
 # is 0 and it disables this feature (since 2.5)
 #
+# @compress:  which format and compression level to use for
+# compressed clusters. Defaults to zlib with default
+# compression level (since 2.10)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsQcow2',
@@ -2328,7 +2369,8 @@
 '*cache-size': 'int',
 '*l2-cache-size': 'int',
 '*refcount-cache-size': 'int',
-'*cache-clean-interval': 'int' } }
+'*cache-clean-interval': 'int',
+'*compress': 'Qcow2Compress' } }
 
 
 ##
-- 
1.9.1

[Qemu-devel] [PATCH V2 1/8] docs: add compress format extension to qcow2 spec

2017-06-29 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 docs/interop/qcow2.txt | 43 ++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 80cdfd0..c01daf3 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -85,7 +85,12 @@ in the description of a field.
 be written to (unless for regaining
 consistency).
 
-Bits 2-63:  Reserved (set to 0)
+Bit 2:  Compress format bit.  If and only if this bit
+is set then the compress format extension
+MUST be present and MUST be parsed and checked
+for compatibility.
+
+Bits 3-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -135,6 +140,7 @@ be stored. Each extension has a structure like the 
following:
 0xE2792ACA - Backing file format name
 0x6803f857 - Feature name table
 0x23852875 - Bitmaps extension
+0xC03183A3 - Compress format extension
 other  - Unknown header extension, can be safely
  ignored
 
@@ -208,6 +214,41 @@ The fields of the bitmaps extension are:
starts. Must be aligned to a cluster boundary.
 
 
+== Compress format extension ==
+
+The compress format extension is an optional header extension. It provides
+the ability to specify the compress algorithm and compress parameters
+that are used for compressed clusters. This new header MUST be present if
+the incompatible-feature bit "compress format bit" is set and MUST be absent
+otherwise.
+
+The fields of the compress format extension are:
+
+Byte  0 - 15:  compress_format_name (padded with zeros, but not
+   necessarily null terminated if it has full length)
+
+  16:  compress_level (uint8_t)
+   0 = default compress level
+   1 = lowest compress level
+   x = highest compress level (the highest compress
+   level may vary for different compress formats)
+
+ 17 - 19:  Reserved for future use, must be zero.
+
+ 20 - 23:  extra_data_size
+   Size of compress format specific extra data.
+   For now, as no format specifies extra data,
+   extra_data_size is reserved and should be zero.
+
+variable:  extra_data
+   Extra data with additional parameters for the compress
+   format, occupying extra_data_size bytes.
+
+variable:  Padding to round up the size of compress format extension
+   to the next multiple of 8. All bytes of the padding must be
+   zero.
+
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
-- 
1.9.1

[Qemu-devel] [PATCH V2 7/8] block/qcow2: start using the compress format extension

2017-06-29 Thread Peter Lieven

we now pass the parameters to the zlib compressor if the
extension is present and use the old default values if
the extension is absent.

Signed-off-by: Peter Lieven 
---
 block/qcow2-cluster.c | 58 ++-
 block/qcow2.c | 57 +++---
 2 files changed, 65 insertions(+), 50 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3d341fd..353ac87 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1521,30 +1521,39 @@ again:
 }
 
 static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
- const uint8_t *buf, int buf_size)
+ const uint8_t *buf, int buf_size,
+ uint32_t compress_format_id)
 {
-z_stream strm1, *strm = &strm1;
-int ret, out_len;
-
-memset(strm, 0, sizeof(*strm));
-
-strm->next_in = (uint8_t *)buf;
-strm->avail_in = buf_size;
-strm->next_out = out_buf;
-strm->avail_out = out_buf_size;
-
-ret = inflateInit2(strm, -12);
-if (ret != Z_OK)
-return -1;
-ret = inflate(strm, Z_FINISH);
-out_len = strm->next_out - out_buf;
-if ((ret != Z_STREAM_END && ret != Z_BUF_ERROR) ||
-out_len != out_buf_size) {
-inflateEnd(strm);
-return -1;
-}
-inflateEnd(strm);
-return 0;
+int ret = 0, out_len;
+
+switch (compress_format_id) {
+case QCOW2_COMPRESS_ZLIB:
+case QCOW2_COMPRESS_ZLIB_COMPAT: {
+z_stream z_strm = {};
+
+z_strm.next_in = (uint8_t *)buf;
+z_strm.avail_in = buf_size;
+z_strm.next_out = out_buf;
+z_strm.avail_out = out_buf_size;
+
+ret = inflateInit2(&z_strm, -15);
+if (ret != Z_OK) {
+return -1;
+}
+ret = inflate(&z_strm, Z_FINISH);
+out_len = z_strm.next_out - out_buf;
+ret = -(ret != Z_STREAM_END);
+inflateEnd(&z_strm);
+break;
+}
+default:
+abort(); /* should never reach this point */
+}
+
+if (out_len != out_buf_size) {
+ret = -1;
+}
+return ret;
 }
 
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
@@ -1565,7 +1574,8 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
uint64_t cluster_offset)
 return ret;
 }
 if (decompress_buffer(s->cluster_cache, s->cluster_size,
-  s->cluster_data + sector_offset, csize) < 0) {
+  s->cluster_data + sector_offset, csize,
+  s->compress_format_id) < 0) {
 return -EIO;
 }
 s->cluster_cache_offset = coffset;
diff --git a/block/qcow2.c b/block/qcow2.c
index 0a7202a..b41e58d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2748,9 +2748,10 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 BDRVQcow2State *s = bs->opaque;
 QEMUIOVector hd_qiov;
 struct iovec iov;
-z_stream strm;
-int ret, out_len;
-uint8_t *buf, *out_buf, *local_buf = NULL;
+z_stream z_strm = {};
+int z_windowBits = -15, z_level = Z_DEFAULT_COMPRESSION;
+int ret, out_len = 0;
+uint8_t *buf, *out_buf = NULL, *local_buf = NULL;
 uint64_t cluster_offset;
 
 if (bytes == 0) {
@@ -2775,34 +2776,38 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 buf = qiov->iov[0].iov_base;
 }
 
-out_buf = g_malloc(s->cluster_size);
+switch (s->compress_format_id) {
+case QCOW2_COMPRESS_ZLIB_COMPAT:
+z_windowBits = -12;
+case QCOW2_COMPRESS_ZLIB:
+out_buf = g_malloc(s->cluster_size);
+if (s->compress_format.level > 0) {
+z_level = s->compress_format.level;
+}
 
-/* best compression, small window, no zlib header */
-memset(&strm, 0, sizeof(strm));
-ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
-   Z_DEFLATED, -12,
-   9, Z_DEFAULT_STRATEGY);
-if (ret != 0) {
-ret = -EINVAL;
-goto fail;
-}
+ret = deflateInit2(&z_strm, z_level, Z_DEFLATED, z_windowBits, 9,
+   Z_DEFAULT_STRATEGY);
+if (ret != Z_OK) {
+ret = -EINVAL;
+goto fail;
+}
 
-strm.avail_in = s->cluster_size;
-strm.next_in = (uint8_t *)buf;
-strm.avail_out = s->cluster_size;
-strm.next_out = out_buf;
+z_strm.avail_in = s->cluster_size;
+z_strm.next_in = (uint8_t *)buf;
+z_strm.avail_out = s->cluster_size;
+z_strm.next_out = out_buf;
 
-ret = deflate(&strm, Z_FINISH);
-if (ret != Z_STREAM_END && ret != Z_OK) {
-deflateEnd(&strm);
-ret = -EINVAL;
-goto fail;
-}
-out_len = strm.next_out - out_buf;
+ret = deflate(&z_strm, Z_FINISH);
+out_len = z_strm.next_out - out_buf;
+deflateEnd(&z_strm);
 
-deflateEnd

[Qemu-devel] [PATCH V2 6/8] block/qcow2: optimize qcow2_co_pwritev_compressed

2017-06-29 Thread Peter Lieven

if we specify exactly one iov of s->cluster_size bytes we can avoid
the bounce buffer.

Signed-off-by: Peter Lieven 
---
 block/qcow2.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 39a8afc..0a7202a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2750,7 +2750,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 struct iovec iov;
 z_stream strm;
 int ret, out_len;
-uint8_t *buf, *out_buf;
+uint8_t *buf, *out_buf, *local_buf = NULL;
 uint64_t cluster_offset;
 
 if (bytes == 0) {
@@ -2760,8 +2760,8 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 return bdrv_truncate(bs->file, cluster_offset, NULL);
 }
 
-buf = qemu_blockalign(bs, s->cluster_size);
-if (bytes != s->cluster_size) {
+if (bytes != s->cluster_size || qiov->niov != 1) {
+buf = local_buf = qemu_blockalign(bs, s->cluster_size);
 if (bytes > s->cluster_size ||
 offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
 {
@@ -2770,8 +2770,10 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 }
 /* Zero-pad last write if image size is not cluster aligned */
 memset(buf + bytes, 0, s->cluster_size - bytes);
+qemu_iovec_to_buf(qiov, 0, buf, bytes);
+} else {
+buf = qiov->iov[0].iov_base;
 }
-qemu_iovec_to_buf(qiov, 0, buf, bytes);
 
 out_buf = g_malloc(s->cluster_size);
 
@@ -2839,7 +2841,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 success:
 ret = 0;
 fail:
-qemu_vfree(buf);
+qemu_vfree(local_buf);
 g_free(out_buf);
 return ret;
 }
-- 
1.9.1

[Qemu-devel] [PATCH V2 5/8] block/qcow2: read and write the compress format extension

2017-06-29 Thread Peter Lieven

we now read the extension on open and write it on update, but
do not yet use it.

Signed-off-by: Peter Lieven 
---
 block/qcow2.c | 100 ++
 block/qcow2.h |  23 +++---
 2 files changed, 104 insertions(+), 19 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 308121a..39a8afc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -63,6 +63,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_END 0
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define  QCOW2_EXT_MAGIC_COMPRESS_FORMAT 0xC03183A3
 
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
@@ -76,6 +77,26 @@ static int qcow2_probe(const uint8_t *buf, int buf_size, 
const char *filename)
 return 0;
 }
 
+static int qcow2_compress_format_from_name(char *fmt)
+{
+if (!fmt || !fmt[0]) {
+return QCOW2_COMPRESS_ZLIB_COMPAT;
+} else if (g_str_equal(fmt, "zlib")) {
+return QCOW2_COMPRESS_ZLIB;
+} else {
+return -EINVAL;
+}
+}
+
+static int qcow2_compress_level_supported(int id, uint64_t level)
+{
+if ((id == QCOW2_COMPRESS_ZLIB_COMPAT && level > 0) ||
+(id == QCOW2_COMPRESS_ZLIB && level > 9) ||
+level > 0xff) {
+return -EINVAL;
+}
+return 0;
+}
 
 /* 
  * read qcow2 extension and fill bs
@@ -148,6 +169,43 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
 #endif
 break;
 
+case QCOW2_EXT_MAGIC_COMPRESS_FORMAT:
+if (ext.len != sizeof(s->compress_format)) {
+error_setg(errp, "ERROR: ext_compress_format: len=%"
+   PRIu32 " invalid (!=%zu)", ext.len,
+   sizeof(s->compress_format));
+return 2;
+}
+ret = bdrv_pread(bs->file, offset, &s->compress_format,
+ ext.len);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "ERROR: ext_compress_fromat:"
+ " Could not read extension");
+return 3;
+}
+s->compress_format_id =
+qcow2_compress_format_from_name(s->compress_format.name);
+if (s->compress_format_id < 0) {
+error_setg(errp, "ERROR: compression algorithm '%s' is "
+   " unsupported", s->compress_format.name);
+return 4;
+}
+if (qcow2_compress_level_supported(s->compress_format_id,
+   s->compress_format.level) < 0) {
+error_setg(errp, "ERROR: compress level %" PRIu8 " is not"
+   " supported for format '%s'",
+   s->compress_format.level, s->compress_format.name);
+return 5;
+}
+
+#ifdef DEBUG_EXT
+printf("Qcow2: Got compress format %s with compress level %"
+   PRIu8 "\n", s->compress_format.name,
+   s->compress_format.level);
+#endif
+break;
+
+
 case QCOW2_EXT_MAGIC_FEATURE_TABLE:
 if (p_feature_table != NULL) {
 void* feature_table = g_malloc0(ext.len + 2 * 
sizeof(Qcow2Feature));
@@ -1981,6 +2039,20 @@ int qcow2_update_header(BlockDriverState *bs)
 buflen -= ret;
 }
 
+/* Compress Format header extension */
+if (s->compress_format.name[0]) {
+assert(!s->compress_format.extra_data_size);
+ret = header_ext_add(buf, QCOW2_EXT_MAGIC_COMPRESS_FORMAT,
+ &s->compress_format, sizeof(s->compress_format),
+ buflen);
+if (ret < 0) {
+goto fail;
+}
+buf += ret;
+buflen -= ret;
+header->incompatible_features |= cpu_to_be64(QCOW2_INCOMPAT_COMPRESS);
+}
+
 /* Feature table */
 if (s->qcow_version >= 3) {
 Qcow2Feature features[] = {
@@ -1995,6 +2067,11 @@ int qcow2_update_header(BlockDriverState *bs)
 .name = "corrupt bit",
 },
 {
+.type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+.bit  = QCOW2_INCOMPAT_COMPRESS_BITNR,
+.name = "compress format bit",
+},
+{
 .type = QCOW2_FEAT_TYPE_COMPATIBLE,
 .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
 .name = "lazy refcounts",
@@ -2333,6 +2410,13 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 abort();
 }
 
+if (compress_format_name[0]) {
+BDRVQcow2State *s = blk_bs(blk)->opaque;
+memcpy(s->compress_format.name, compress_format_name,
+   strlen(compress_format_name));
+s->compress_format.level = compress_level;
+}
+
 /* Create a full header (including things like feature table) */
 ret = qcow2_

[Qemu-devel] Questions about GPU passthrough + multiple PCIE switches on host

2017-06-29 Thread Bob Chen

Hi folks,

I have 8 GPU cards needed to passthrough to 1 vm.

These cards are placed at 2 PCIE switches on host server, in case there
might be bandwidth limit within a single bus.

So what is the correct QEMU bus parameter if I want to achieve the best
performance. The QEMU's pcie.0/1 parameter could really reflect to the
actual physical device?



Thanks,
Bob

Re: [Qemu-devel] [PATCH v2 4/7] qdev: Introduce DEFINE_PROP_LINK

2017-06-29 Thread Fam Zheng

On Thu, 06/29 12:40, Igor Mammedov wrote:
> On Thu, 29 Jun 2017 16:04:49 +0800
> Fam Zheng  wrote:
> 
> > This property can be used to replace the object_property_add_link in
> > device code, to add a link to other objects, which is a common pattern.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  hw/core/qdev-properties.c| 16 
> >  include/hw/qdev-core.h   |  5 +
> >  include/hw/qdev-properties.h | 11 +++
> >  3 files changed, 32 insertions(+)
> > 
> > diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> > index 68cd653..7c11eb8 100644
> > --- a/hw/core/qdev-properties.c
> > +++ b/hw/core/qdev-properties.c
> > @@ -1192,3 +1192,19 @@ PropertyInfo qdev_prop_size = {
> >  .set = set_size,
> >  .set_default_value = set_default_value_uint,
> >  };
> > +
> > +/* --- object link property --- */
> > +
> > +static void create_link_property(Object *obj, Property *prop, Error **errp)
> > +{
> > +Object **child = qdev_get_prop_ptr(DEVICE(obj), prop);
> > +
> > +object_property_add_link(obj, prop->name, prop->link_type,
> > + child, prop->link.check,
> > + prop->link.flags, errp);
> > +}
> > +
> > +PropertyInfo qdev_prop_link = {
> > +.name = "link",
> > +.create = create_link_property,
> > +};
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 33518ee..40afb3d 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -5,6 +5,7 @@
> >  #include "qemu/option.h"
> >  #include "qemu/bitmap.h"
> >  #include "qom/object.h"
> > +#include "qom/link-property.h"
> >  #include "hw/irq.h"
> >  #include "hw/hotplug.h"
> >  
> > @@ -233,6 +234,10 @@ struct Property {
> >  int  arrayoffset;
> >  PropertyInfo *arrayinfo;
> >  int  arrayfieldsize;
> > +/* Only @check and @flags are used; @child is unuseful because we need 
> > a
> > + * dynamic pointer in @obj as derived from @offset. */
>  @check, @flags, @child, @obj are not fields of struct Property so it's
> not clear what doc comments talks about. Maybe adding prefixes would help,
> for example:
>   @link.child

Good point, I was being too stingy with words.

> 
> > +LinkProperty link;
> > +const char   *link_type;
> >  };
> >  
> >  struct PropertyInfo {
> > diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> > index 39bf4b2..767c10b 100644
> > --- a/include/hw/qdev-properties.h
> > +++ b/include/hw/qdev-properties.h
> > @@ -30,6 +30,7 @@ extern PropertyInfo qdev_prop_pci_devfn;
> >  extern PropertyInfo qdev_prop_blocksize;
> >  extern PropertyInfo qdev_prop_pci_host_devaddr;
> >  extern PropertyInfo qdev_prop_arraylen;
> > +extern PropertyInfo qdev_prop_link;
> >  
> >  #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
> >  .name  = (_name),\
> > @@ -117,6 +118,16 @@ extern PropertyInfo qdev_prop_arraylen;
> >  .arrayoffset = offsetof(_state, _arrayfield),   \
> >  }
> >  
> > +#define DEFINE_PROP_LINK(_name, _state, _field, _type, _check, _flags) {\
> > +.name = (_name),\
> > +.info = &(qdev_prop_link),  \
> > +.offset = offsetof(_state, _field)  \
> > ++ type_check(Object *, typeof_field(_state, _field)),   \
> > +.link.check = _check,   \
> > +.link.flags = _flags,   \
> maybe we shouldn't have custom _check and _flags fields as majority of devices
> use qdev_prop_allow_set_link_before_realize + OBJ_PROP_LINK_UNREF_ON_RELEASE
> policies. That will save us ~2LOC per property of boiler plate code.
> 
> I've looked at current device link usage and there is only few that actually
> use or need custom check/flags and several are misusing 
> object_property_allow_set_link()
> where they should use qdev_prop_allow_set_link_before_realize().
> 
> We could leave alone exceptions that require custom check/flags as is
> or provide DEFINE_PROP_LINK_CUSTOM() for a few that actually need it and fix
> ones that misuse it.
> 
> There is a bunch of board code that uses links but that's not for Device
> so it's not related to this macro anyways.

OK, I wasn't sure about this when Paolo pointed out in v1, but since you've now
taken a look, I will simplify it.

I will also try to cover more devices in v3. Thanks!

Fam

[Qemu-devel] [PATCH] qom: enforce readonly nature of link's check callback

2017-06-29 Thread Igor Mammedov

link's check callback is supposed to verify/permit setting it,
however currently nothing restricts it from misusing it
and modifying target object from within.
Make sure that readonly semantics are checked by compiler
to prevent callback's misuse.

Signed-off-by: Igor Mammedov 
---
Fam,
 it probably conflicts with yours DEFINE_PROP_LINK series,
 feel free to include this patch if you'll have to respin

---
 include/hw/qdev-properties.h | 3 ++-
 include/qom/object.h | 6 +++---
 hw/core/qdev-properties.c| 3 ++-
 hw/display/xlnx_dp.c | 2 +-
 hw/ipmi/ipmi.c   | 2 +-
 hw/mem/pc-dimm.c | 2 +-
 hw/misc/ivshmem.c| 2 +-
 qom/object.c | 8 
 8 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 306bbab..6dfe16e 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -234,7 +234,8 @@ void qdev_prop_set_after_realize(DeviceState *dev, const 
char *name,
  * This function should be used as the check() argument to
  * object_property_add_link().
  */
-void qdev_prop_allow_set_link_before_realize(Object *obj, const char *name,
+void qdev_prop_allow_set_link_before_realize(const Object *obj,
+ const char *name,
  Object *val, Error **errp);
 
 #endif
diff --git a/include/qom/object.h b/include/qom/object.h
index 5ecc2d1..5223692 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -788,7 +788,7 @@ ObjectClass *object_get_class(Object *obj);
  *
  * Returns: The QOM typename of @obj.
  */
-const char *object_get_typename(Object *obj);
+const char *object_get_typename(const Object *obj);
 
 /**
  * type_register_static:
@@ -1320,7 +1320,7 @@ typedef enum {
  * callback function.  It allows the link property to be set and never returns
  * an error.
  */
-void object_property_allow_set_link(Object *, const char *,
+void object_property_allow_set_link(const Object *, const char *,
 Object *, Error **);
 
 /**
@@ -1353,7 +1353,7 @@ void object_property_allow_set_link(Object *, const char 
*,
  */
 void object_property_add_link(Object *obj, const char *name,
   const char *type, Object **child,
-  void (*check)(Object *obj, const char *name,
+  void (*check)(const Object *obj, const char 
*name,
 Object *val, Error **errp),
   ObjectPropertyLinkFlags flags,
   Error **errp);
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 2a82768..95e5fdb 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -25,7 +25,8 @@ void qdev_prop_set_after_realize(DeviceState *dev, const char 
*name,
 }
 }
 
-void qdev_prop_allow_set_link_before_realize(Object *obj, const char *name,
+void qdev_prop_allow_set_link_before_realize(const Object *obj,
+ const char *name,
  Object *val, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index f43eb09..3ed81ff 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -515,7 +515,7 @@ static void xlnx_dp_aux_set_command(XlnxDPState *s, 
uint32_t value)
 s->core_registers[DP_INTERRUPT_SIGNAL_STATE] |= 0x04;
 }
 
-static void xlnx_dp_set_dpdma(Object *obj, const char *name, Object *val,
+static void xlnx_dp_set_dpdma(const Object *obj, const char *name, Object *val,
   Error **errp)
 {
 XlnxDPState *s = XLNX_DP(obj);
diff --git a/hw/ipmi/ipmi.c b/hw/ipmi/ipmi.c
index 5cf1caa..a2fd1eb 100644
--- a/hw/ipmi/ipmi.c
+++ b/hw/ipmi/ipmi.c
@@ -90,7 +90,7 @@ static TypeInfo ipmi_interface_type_info = {
 .class_init = ipmi_interface_class_init,
 };
 
-static void isa_ipmi_bmc_check(Object *obj, const char *name,
+static void isa_ipmi_bmc_check(const Object *obj, const char *name,
Object *val, Error **errp)
 {
 IPMIBmc *bmc = IPMI_BMC(val);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 9e8dab0..380cb30 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -366,7 +366,7 @@ static void pc_dimm_get_size(Object *obj, Visitor *v, const 
char *name,
 visit_type_int(v, name, &value, errp);
 }
 
-static void pc_dimm_check_memdev_is_busy(Object *obj, const char *name,
+static void pc_dimm_check_memdev_is_busy(const Object *obj, const char *name,
   Object *val, Error **errp)
 {
 Error *local_err = NULL;
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index abeaf3d..e25016c 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -1005,7 +1005,7 @@ static const TypeInfo ivshmem_common_info = {
 .class_init

Re: [Qemu-devel] [RISU PATCH v6 00/10] Record/replay patches

2017-06-29 Thread Peter Maydell

On 21 June 2017 at 16:42, Alex Bennée  wrote:
> Hi Peter,
>
> Re-based with review comments addressed and tags added where
> appropriate.
>
> Alex Bennée (10):
>   README: document the coding style used for risu
>   build-all-archs: support cross building via docker
>   risu: a bit more verbosity when starting
>   risu: paramterise send/receive functions
>   risu: add header to trace stream
>   risu: add simple trace and replay support
>   risu: handle trace through stdin/stdout
>   risu: add support compressed tracefiles
>   new: record_traces.sh helper script
>   new: run_risu.sh script

Hi -- these look OK to me so I have applied them to risu master
(I fixed up a few checkpatch style issues but nothing major).

I notice there's no documentation of the record/replay
feature, though -- could you send a patch which adds
discussion of how to use it to the README file, please?

The 'Building' section in the README should also mention that
we recommend having zlib available.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] block: fix bs->file leak in bdrv_new_open_driver()

2017-06-29 Thread Kevin Wolf

Am 29.06.2017 um 08:03 hat Manos Pitsidianakis geschrieben:
> bdrv_open_driver() is called in two places, bdrv_new_open_driver() and
> bdrv_open_common(). In the latter, failure cleanup in is in its caller,
> bdrv_open_inherit(), which unrefs the bs->file of the failed driver open
> if it exists. Let's check for this in bdrv_new_open_driver() as well.
> 
> Signed-off-by: Manos Pitsidianakis 
> ---
>  block.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 694396281b..aeacd520e0 100644
> --- a/block.c
> +++ b/block.c
> @@ -1165,6 +1165,9 @@ BlockDriverState *bdrv_new_open_driver(BlockDriver 
> *drv, const char *node_name,
>  
>  ret = bdrv_open_driver(bs, drv, node_name, bs->options, flags, errp);
>  if (ret < 0) {
> +if (bs->file != NULL) {
> +bdrv_unref_child(bs, bs->file);
> +}
>  QDECREF(bs->explicit_options);
>  QDECREF(bs->options);
>  bdrv_unref(bs);

I think we should set bs->file = NULL here to remove the dangling
pointer. I think it is never accessed anyway because of the
bs->drv = NULL in the error path of bdrv_open_driver(), but better safe
than sorry.

But what would you think about avoiding the code duplication and just
moving the bdrv_unref_child() call from bdrv_open_inherit() down to
bdrv_open_driver(), so that bdrv_new_open_driver() is automatically
covered?

And later we can maybe move it into the individual .bdrv_open
implementations where it really belongs (whoever creates something is
responsible for cleaning it up in error cases).

Kevin

Re: [Qemu-devel] [Qemu-block] [PATCH v4 1/2] live-block-ops.txt: Rename, rewrite, and improve it

2017-06-29 Thread Alberto Garcia

On Wed 28 Jun 2017 10:33:49 PM CEST, Eric Blake wrote:
>>> +Disk image backing chain notation
>>> +-
>>   [...]
>>> +.. important::
>>> +The base disk image can be raw format; however, all the overlay
>>> +files must be of QCOW2 format.
>> 
>> This is not quite like that: overlay files must be in a format that
>> supports backing files. QCOW2 is the most common one, but there are
>> others (qed). Grep for 'supports_backing' in the source code.
>
> At the same time, other image formats are not as frequently tested, or
> may be read-only.  Maybe a compromise of "The overlay files can
> generally be any format that supports a backing file, although qcow2
> is the preferred format and the one used in this document".

That sounds good.

>>> +(2) ``block-commit``: Live merge of data from overlay files into backing
>>> +files (with the optional goal of removing the overlay file from the
>>> +chain).  Since QEMU 2.0, this includes "active ``block-commit``"
>>> +(i.e.  merge the current active layer into the base image).
>> 
>> Same question about the 'optional' here.
>
> Here, optional is a bit more correct. With non-active (intermediate)
> commit, qemu ALWAYS rewrites the backing chain to be shorter; but with
> live commit, you can chose whether to pivot to the now-shorter chain
> (job-complete) or whether to keep the active image intact (starting to
> collect a new delta from the point-in-time of the just-completed
> commit, by job-cancel).

That's correct, I think in this case we can probably leave it as it is
now.

>>> +writing to it.  (The rule of thumb is: live QEMU will always be
>>> pointing +to the right-most image in a disk image chain.)
>> 
>> I think it's 'rightmost', without the hyphen.
>
> Sadly, I think this is one case where both spellings work to a native
> reader, and where I don't know of a specific style-guide preference.
> I probably would have written with the hyphen.

Ah, I didn't know! :-)

>>> +(3) Intermediate streaming (available since QEMU 2.8): Starting
>>> afresh + with the original example disk image chain, with a total of
>>> four + images, it is possible to copy contents from image [B] into
>>> image + [C].  Once the copy is finished, image [B] can now be
>>> (optionally) + discarded; and the backing file pointer of image [C]
>>> will be + adjusted to point to [A].
>> 
>> The 'optional' usage again. [B] will be removed from the chain and
>> can be (optionally) removed from the disk, but that you have to do
>> yourself, QEMU won't do that.
>
> Indeed, we may need to be specifically clear of the cases where qemu
> shortens the chain, but where disk images that are no longer used by
> the chain (whether they are still viable [as in stream], or
> invalidated [as in commit crossing more than one element of the
> chain]) are still left on the disk for the user to discard separately
> from qemu.

Yes, I think that should be clarified. The distinction between valid and
invalid images is also an important one, although it's already mentioned
in the document.

>>> +(QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0
>>> +{
>>> +"execute": "block-commit",
>>> +"arguments": {
>>> +"device": "node-D",
>>> +"job-id": "job0",
>>> +"top": "d.qcow2",
>>> +"base": "a.qcow2"
>>> +}
>>> +}
>> 
>> This is correct, but I don't know if it's worth mentioning that if
>> you omit the 'top' parameter it defaults to the active layer (node-D
>> in this example).
>
> I think it's better to be explicit in the examples (always provide all
> parameters, even if what you provide would also be the default when
> omitted), and then maybe the prose can mention which parameters have
> defaults.

Sounds good to me.

Berto

Re: [Qemu-devel] [PATCH v4 3/5] migration: Create load_setup()/cleanup() methods

2017-06-29 Thread Dr. David Alan Gilbert

* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
> > "Dr. David Alan Gilbert"  wrote:
> > > * Juan Quintela (quint...@redhat.com) wrote:
> > >> We need to do things at load time and at cleanup time.
> > >> 
> > >> Signed-off-by: Juan Quintela 
> > >> 
> > >> --
> > >> 
> > >> Move the printing of the error message so we can print the device
> > >> giving the error.
> > >> Add call to postcopy stuff
> > >> ---
> > >>  include/migration/register.h |  2 ++
> > >>  migration/savevm.c   | 45 
> > >> +++-
> > >>  migration/savevm.h   |  1 +
> > >>  migration/trace-events   |  2 ++
> > >>  4 files changed, 49 insertions(+), 1 deletion(-)
> > >> 
> > >> diff --git a/include/migration/register.h b/include/migration/register.h
> > >> index 938ea2b..a0f1edd 100644
> > >> --- a/include/migration/register.h
> > >> +++ b/include/migration/register.h
> > >> @@ -39,6 +39,8 @@ typedef struct SaveVMHandlers {
> > >>uint64_t *non_postcopiable_pending,
> > >>uint64_t *postcopiable_pending);
> > >>  LoadStateHandler *load_state;
> > >> +int (*load_setup)(QEMUFile *f, void *opaque);
> > >> +int (*load_cleanup)(void *opaque);
> > >>  } SaveVMHandlers;
> > >>  
> > >>  int register_savevm_live(DeviceState *dev,
> > >> diff --git a/migration/savevm.c b/migration/savevm.c
> > >> index fee11c5..fdd15fa 100644
> > >> --- a/migration/savevm.c
> > >> +++ b/migration/savevm.c
> > >> @@ -1541,7 +1541,7 @@ static void *postcopy_ram_listen_thread(void 
> > >> *opaque)
> > >>   * got a bad migration state).
> > >>   */
> > >>  migration_incoming_state_destroy();
> > >> -
> > >> +qemu_loadvm_state_cleanup();
> > >
> > > Is that order right? It seems wrong to call the cleanup
> > > code after MIS is destroyed.
> > > (The precopy path seems to call mis_destroy at the end of
> > > process_incoming_migration_bh which is much later).
> > 
> > we can do either way, for now it don't matters.
> > 
> > Once there, it got me thinking that we are doing things in a very
> > "interesting" way on the incoming side:
> > 
> > (postcopy)
> > 
> > postcopy_ram_incoming_cleanup()
> > migration_incoming_state_destroy()
> > qemu_loadvm_state_cleanup()
> > 
> > (Ok, probably it is better to exchange the last two).
> > 
> > But I *think* that we should move the postcopy_ram_incoming_cleanup()
> > inside ram_load_cleanup(), no?
> 
> postcopy_ram_incoming_cleanup shuts down a thread that's shared across
> all RAMBlock's, so I don't think it can all be merged into
> ram_load_cleanup.   You might be able to do the equivalent of the
> cleanup_range function.

Actually that's wrong, we only call ram_load_cleanup once - because
RAM is special and is only register_savevm_live once, not per device.

So yes you probably can do that.

Dave

> > And we don't have a postcopy_ram_incoming_setup() We could put there the
> > mmap of mis->postcopy_tmp_zero_page and mis->largest_page_size, no?
> 
> Again that's a single shared zero page, not per RAMBlock.
> 
> > I am trying to understand if the postcopy_ram_incoming_init() can be
> > moved soon, but I think no.
> 
> Dave
> 
> > 
> > Later, Juan.
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 3/8] MAINTAINERS: update Xen entry

2017-06-29 Thread Paolo Bonzini



On 29/06/2017 03:02, Philippe Mathieu-Daudé wrote:
> moved in 56e2cd24..28b99f47 to accel/

Actually to hw/.

I can certainly queue patches 1-4 immediately, thanks.

Paolo

> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 86a08c5aac..530293044b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -323,7 +323,6 @@ M: Stefano Stabellini 
>  M: Anthony Perard 
>  L: xen-de...@lists.xenproject.org
>  S: Supported
> -F: xen-*
>  F: */xen*
>  F: hw/9pfs/xen-9p-backend.c
>  F: hw/char/xen_console.c
>

Re: [Qemu-devel] [PATCH v2 3/3] tests: Add a tester for HMP commands

2017-06-29 Thread Dr. David Alan Gilbert

* Eric Blake (ebl...@redhat.com) wrote:
> On 03/30/2017 02:50 AM, Thomas Huth wrote:
> > HMP commands do not get any automatic testing yet, so on certain
> > QEMU machines, some HMP commands were causing crashes in the past.
> > Thus we should test HMP commands in our test suite, too, to avoid
> > that such problems creep in again in the future.
> > 
> > Signed-off-by: Thomas Huth 
> > ---
> 
> > +static const char *hmp_cmds[] = {
> > +"boot_set ndc",
> 
> > +/* Run through the list of pre-defined commands */
> > +static void test_commands(void)
> > +{
> > +char *response;
> > +int i;
> > +
> > +for (i = 0; hmp_cmds[i] != NULL; i++) {
> > +if (verbose) {
> > +fprintf(stderr, "\t%s\n", hmp_cmds[i]);
> > +}
> > +response = hmp(hmp_cmds[i]);
> 
> I failed to notice this sooner, but hmp() is passing its first arg as a
> format string through a printf family.  If hmp_cmds[i] ever gets
> modified to include something with a %, it will misbehave.  Better is to
> use hmp("%s", variable).
> 
> I've patched it locally as part of rebasing my work on avoiding dynamic
> JSON format strings, if no one beats me to a fix (my series also adds
> the gcc format attribute tag, so that the compiler catches any further
> mismatches in hmp() format vs. arguments).

Ah yes, good spot.  Please include that fix.

Dave

> > +while (*info) {
> > +/* Extract the info command, ignore parameters and description */
> > +g_assert(strncmp(info, "info ", 5) == 0);
> > +endp = strchr(&info[5], ' ');
> > +g_assert(endp != NULL);
> > +*endp = '\0';
> > +/* Now run the info command */
> > +if (verbose) {
> > +fprintf(stderr, "\t%s\n", info);
> > +}
> > +resp = hmp(info);
> 
> Another instance.
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
> 



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 4/7] qdev: Introduce DEFINE_PROP_LINK

2017-06-29 Thread Paolo Bonzini

On 29/06/2017 10:04, Fam Zheng wrote:
> +#define DEFINE_PROP_LINK(_name, _state, _field, _type, _check, _flags) {\
> +.name = (_name),\
> +.info = &(qdev_prop_link),  \
> +.offset = offsetof(_state, _field)  \
> ++ type_check(Object *, typeof_field(_state, _field)),   \
> +.link.check = _check,   \
> +.link.flags = _flags,   \
> +.link_type  = _type,\
> +}
> +

Still unsure about _check; qdev_prop_allow_set_link_before_realize is
mimicking the same behavior of any other qdev property, so it should be
always okay for DEFINE_PROP_LINK.

Paolo

Re: [Qemu-devel] [PATCH v2 07/23] hyperv: ensure VP index equal to QEMU cpu_index

2017-06-29 Thread Igor Mammedov

On Thu, 29 Jun 2017 12:53:27 +0300
Roman Kagan  wrote:

> On Wed, Jun 28, 2017 at 04:47:43PM +0200, Igor Mammedov wrote:
> > On Wed, 21 Jun 2017 19:24:08 +0300
> > Roman Kagan  wrote:
> >   
> > > Hyper-V identifies vCPUs by Virtual Processor (VP) index which can be
> > > queried by the guest via HV_X64_MSR_VP_INDEX msr.  It is defined by the
> > > spec as a sequential number which can't exceed the maximum number of
> > > vCPUs per VM.
> > > 
> > > It has to be owned by QEMU in order to preserve it across migration.
> > > 
> > > However, the initial implementation in KVM didn't allow to set this
> > > msr, and KVM used its own notion of VP index.  Fortunately, the way
> > > vCPUs are created in QEMU/KVM makes it likely that the KVM value is
> > > equal to QEMU cpu_index.
> > > 
> > > So choose cpu_index as the value for vp_index, and push that to KVM on
> > > kernels that support setting the msr.  On older ones that don't, query
> > > the kernel value and assert that it's in sync with QEMU.
> > > 
> > > Besides, since handling errors from vCPU init at hotplug time is
> > > impossible, disable vCPU hotplug.  
> > proper place to check if cpu might be created is at 
> > pc_cpu_pre_plug() where you can gracefully abort cpu creation process.   
> 
> Thanks for the suggestion, I'll rework it this way.
> 
> > Also it's possible to create cold-plugged CPUs in out of order
> > sequence using
> >  -device cpu-foo on CLI
> > will be hyperv kvm/guest side ok with it?  
> 
> On kernels that support setting HV_X64_MSR_VP_INDEX QEMU will
> synchronize all sides.  On kernels that don't, if out-of-order creation
> results in vp_index mismatch between the kernel and QEMU, vcpu creation
> will fail.

And additional question,
what would happen if VM is started on host supporting VP index setting
and then migrated to a host without it?

> 
> > > This patch also introduces accessor functions to wrap the mapping
> > > between a vCPU and its vp_index.  Besides, a few variables are renamed
> > > to avoid confusion of vp_index with vcpu_id (== apic_id).
> > > 
> > > Signed-off-by: Roman Kagan 
> > > ---
> > > v1 -> v2:
> > >  - were patches 5, 6 in v1
> > >  - move vp_index initialization to hyperv_init_vcpu
> > >  - check capability before trying to set the msr
> > >  - set the msr on the usual kvm_put_msrs path
> > >  - disable cpu hotplug if msr is not settable
> > > 
> > >  target/i386/hyperv.h |  5 -
> > >  target/i386/hyperv.c | 16 +---
> > >  target/i386/kvm.c| 51 
> > > +++
> > >  3 files changed, 68 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/target/i386/hyperv.h b/target/i386/hyperv.h
> > > index 0c3b562..82f4757 100644
> > > --- a/target/i386/hyperv.h
> > > +++ b/target/i386/hyperv.h
> > > @@ -32,11 +32,14 @@ struct HvSintRoute {
> > >  
> > >  int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit);
> > >  
> > > -HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, uint32_t sint,
> > > +HvSintRoute *kvm_hv_sint_route_create(uint32_t vp_index, uint32_t sint,
> > >HvSintAckClb sint_ack_clb);
> > >  
> > >  void kvm_hv_sint_route_destroy(HvSintRoute *sint_route);
> > >  
> > >  int kvm_hv_sint_route_set_sint(HvSintRoute *sint_route);
> > >  
> > > +uint32_t hyperv_vp_index(X86CPU *cpu);
> > > +X86CPU *hyperv_find_vcpu(uint32_t vp_index);
> > > +
> > >  #endif
> > > diff --git a/target/i386/hyperv.c b/target/i386/hyperv.c
> > > index 227185c..4f57447 100644
> > > --- a/target/i386/hyperv.c
> > > +++ b/target/i386/hyperv.c
> > > @@ -16,6 +16,16 @@
> > >  #include "hyperv.h"
> > >  #include "hyperv_proto.h"
> > >  
> > > +uint32_t hyperv_vp_index(X86CPU *cpu)
> > > +{
> > > +return CPU(cpu)->cpu_index;
> > > +}  
> > 
> >   
> > > +X86CPU *hyperv_find_vcpu(uint32_t vp_index)
> > > +{
> > > +return X86_CPU(qemu_get_cpu(vp_index));
> > > +}  
> > this helper isn't used in this patch, add it in the patch that would 
> > actually use it  
> 
> I thought I would put the only two functions that encapsulate the
> knowledge of how vp_index is realted to cpu_index, in a single patch.
> 
> I'm now thinking of open-coding the iteration over cpus here and
> directly look for cpu whose hyperv_vp_index() matches.  Then that
> knowledge will become encapsulated in a single place, and indeed, this
> helper can go into another patch where it's used.
> 
> > also if  qemu_get_cpu() were called from each CPU init,
> > it would incur O(N^2) complexity, could you do without it?  
> 
> It isn't called on hot paths (ATM it's called only when SINT routes are
> created, which is at most one per cpu).  I don't see a problem here.
For what/where do you need this lookup?

> 
> > > @@ -105,7 +115,7 @@ HvSintRoute *kvm_hv_sint_route_create(uint32_t 
> > > vcpu_id, uint32_t sint,
> > >  }
> > >  sint_route->gsi = gsi;
> > >  sint_route->sint_ack_clb = sint_ack_clb;
> > > -sint_route->vcpu_id = vcpu_id

Re: [Qemu-devel] [PATCH 3/8] MAINTAINERS: update Xen entry

2017-06-29 Thread Philippe Mathieu-Daudé

On Thu, Jun 29, 2017 at 7:39 AM, Anthony PERARD
 wrote:
> On Wed, Jun 28, 2017 at 10:02:55PM -0300, Philippe Mathieu-Daudé wrote:
>> moved in 56e2cd24..28b99f47 to accel/
>
> That is not accurate, files have been moved to hw/i386/xen/ as written
> in both commits messages.

Oops hopefully you noticed! I copied the commits ranges from patch 1
and forgot to update the paths which are actually hw/xen and
hw/i386/xen.

> Beside that:
> Acked-by: Anthony PERARD 

Thank you.

[Qemu-devel] [PATCH 0/3] Qemu: Add Xen vIOMMU interrupt remapping function support

2017-06-29 Thread Lan Tianyu

This patchset is to deal with MSI interrupt remapping request when guest
updates MSI registers.

Chao Gao (3):
  i386/msi: Correct mask of destination ID in MSI address
  xen-pt: bind/unbind interrupt remapping format MSI
  msi: Handle remappable format interrupt request

 configure | 54 +++
 hw/pci/msi.c  |  5 ++--
 hw/pci/msix.c |  4 +++-
 hw/xen/xen_pt_msi.c   | 52 ++---
 include/hw/i386/apic-msidef.h |  3 ++-
 include/hw/xen/xen.h  |  2 +-
 include/hw/xen/xen_common.h   | 25 
 xen-hvm-stub.c|  2 +-
 xen-hvm.c |  8 ++-
 9 files changed, 134 insertions(+), 21 deletions(-)

-- 
1.8.3.1

1 2 3 4 5 >

1 - 100 of 412 matches

Mail list logo