Re: [Qemu-devel] [PULL 23/44] target/ppc: Use vector variable shifts for VSL, VSR, VSRA

2019-06-11 Thread Laurent Vivier
On 11/06/2019 04:43, David Gibson wrote:
> On Fri, Jun 07, 2019 at 09:28:49AM -0500, Richard Henderson wrote:
>> On 6/7/19 9:09 AM, Laurent Vivier wrote:
>>> On 07/06/2019 11:29, Laurent Vivier wrote:
 On 29/05/2019 08:49, David Gibson wrote:
> From: Richard Henderson 
>
> The gvec expanders take care of masking the shift amount
> against the element width.
>
> Signed-off-by: Richard Henderson 
> Message-Id: <20190518191430.21686-2-richard.hender...@linaro.org>
> Signed-off-by: David Gibson 
> ---
>  target/ppc/helper.h | 12 --
>  target/ppc/int_helper.c | 37 -
>  target/ppc/translate/vmx-impl.inc.c | 24 +--
>  3 files changed, 12 insertions(+), 61 deletions(-)

 This patch introduces a regressions
  with Fedora 29 guest:

 - during kernel boot:

 [   40.397876] crypto_register_alg 'aes' = 0
 [   40.577517] crypto_register_alg 'cbc(aes)' = 0
 [   40.743576] crypto_register_alg 'ctr(aes)' = 0
 [   41.061379] alg: skcipher: Test 1 failed (invalid result) on encryption 
 for p8_aes_xts
 [   41.062054] : 91 7c f6 9e bd 68 b2 ec 9b 9f e9 a3 ea dd a6 92
 [   41.062163] 0010: 98 10 35 57 5e dc 36 1e 9a f7 bc ba 39 f2 5c eb
 [   41.062834] crypto_register_alg 'xts(aes)' = 0
 [   41.077358] alg: hash: Test 2 failed for p8_ghash
 [   41.077553] : 5f 89 ab f7 20 57 20 57 20 57 20 57 20 57 20 57

 - with libssl:

 # curl -o /dev/null https://www.google.com
   % Total% Received % Xferd  Average Speed   TimeTime Time  
 Current
  Dload  Upload   Total   SpentLeft  
 Speed
   0 00 00 0  0  0 --:--:-- --:--:-- --:--:--   
   0
 curl: (35) error:1408F119:SSL routines:ssl3_get_record:decryption failed 
 or bad record mac

 [before, this one fails with:
 curl: (35) error:04091068:rsa routines:int_rsa_verify:bad signature ]

 If I revert this patch on top of 0d74f3b427 + "target/ppc: Fix lxvw4x, 
 lxvh8x and lxvb16x", all works fine.

 Thanks,
 Laurent

>>>
>>> This seems to fix the problem:
>>>
>>> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
>>> index 3b6052fe97..6f0709b307 100644
>>> --- a/accel/tcg/tcg-runtime-gvec.c
>>> +++ b/accel/tcg/tcg-runtime-gvec.c
>>> @@ -874,7 +874,7 @@ void HELPER(gvec_sar8v)(void *d, void *a, void *b,
>>> uint32_t desc)
>>>  intptr_t oprsz = simd_oprsz(desc);
>>>  intptr_t i;
>>>
>>> -for (i = 0; i < oprsz; i += sizeof(vec8)) {
>>> +for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
>>>  uint8_t sh = *(uint8_t *)(b + i) & 7;
>>>  *(int8_t *)(d + i) = *(int8_t *)(a + i) >> sh;
>>>  }
>>
>> Grr.  I really really need to come up with a solution for testing that allows
>> me to test paths that the host cpu would not ordinarily take.  This bug is
>> hidden on a host with AVX2.
>>
>> Thanks for the digging.
> 
> Can one of you send this fix formally with a S-o-b and so forth?

I'm going to send it.

Thanks,
Laurent




Re: [Qemu-devel] PCI(e): Documentation "io-reserve" and related properties?

2019-06-11 Thread Kashyap Chamarthy
On Thu, Jun 06, 2019 at 02:20:18PM -0400, Michael S. Tsirkin wrote:
> On Thu, Jun 06, 2019 at 06:19:43PM +0200, Kashyap Chamarthy wrote:
> > Hi folks,
> > 
> > Today I learnt about some obscure PCIe-related properties, in context of
> > the adding PCIe root ports to a guest, namely:
> > 
> > io-reserve
> > mem-reserve
> > bus-reserve
> > pref32-reserve
> > pref64-reserve
> > 
> > Unfortunately, the commit[*] that added them provided no documentation
> > whatsover.
> > 
> > In my scenario, I was specifically wondering about what does
> > "io-reserve" mean, in what context to use it, etc.  (But documentation
> > about other properties is also welcome.)
> > 
> > Anyone more well-versed in this area care to shed some light?
> > 
> > 
> > [*] 6755e618d0 (hw/pci: add PCI resource reserve capability to legacy
> > PCI bridge, 2018-08-21)
> 
> So normally bios would reserve just enough io space to satisfy all
> devices behind a bridge. What if you intend to hotplug more devices?
> These properties allow you to ask bios to reserve extra space.

Thanks.  Would be useful to have them documented in the official QEMU
command-line documentation.  Otherwise, they will remain as arcane
properties that barely anyone knows about.

-- 
/kashyap



Re: [Qemu-devel] [RFC] vhost-user: don't ignore CTRL_VLAN feature

2019-06-11 Thread Jason Wang



On 2019/6/11 下午2:51, Tiwei Bie wrote:

The VIRTIO_NET_F_CTRL_VLAN feature requires the support of
vhost-user backend. But it will be advertised to guest driver
as long as it's enabled by users in QEMU, while it's not
supported by vhost-user backend. This patch fixes this issue.

Fixes: 72018d1e1917 ("vhost-user: ignore qemu-only features")



My understanding is if may want to revert this patch.



Cc: qemu-sta...@nongnu.org

Signed-off-by: Tiwei Bie 
---
It's not clear in the spec that, whether vlan filtering is
also best-effort:
https://github.com/oasis-tcs/virtio-spec/blob/37057052e7/content.tex#L3372



It should be a bug of the code, we should consider to implement ctrl 
command for vhost-user.


Thanks




  hw/net/vhost_net.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index a6b719035c..1444fc9230 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -75,6 +75,8 @@ static const int user_feature_bits[] = {
  VIRTIO_NET_F_MTU,
  VIRTIO_F_IOMMU_PLATFORM,
  
+VIRTIO_NET_F_CTRL_VLAN,

+
  /* This bit implies RARP isn't sent by QEMU out of band */
  VIRTIO_NET_F_GUEST_ANNOUNCE,
  




Re: [Qemu-devel] Sketch of a transition of QEMU docs to Sphinx

2019-06-11 Thread Markus Armbruster
Peter Maydell  writes:

> On Tue, 21 May 2019 at 19:56, Peter Maydell  wrote:
>>
>> Currently we have a vague plan that we should migrate our
>> documentation away from Texinfo to using Sphinx, plus some isolated
>> bits of documentation already in .rst format. This email is an attempt
>> to sketch out a transition plan for getting us from where we are today
>> to where (I think) we want to be.
>
> Since nobody seemed to disagree particularly with this sketch,

I don't think rST is an improvement over Texinfo.  As Paolo said, it's
the Perl of ASCII-based markups.  But I (reluctantly) agree with Paolo
that our current mix of Texinfo, rST, Markdown, and ad hoc markup is
worse than consistent use of one markup system, even if it's an
ill-conceived one like rST.

March of progress, I guess.

[...]



Re: [Qemu-devel] [PULL 23/44] target/ppc: Use vector variable shifts for VSL, VSR, VSRA

2019-06-11 Thread Laurent Vivier
On 11/06/2019 09:05, Laurent Vivier wrote:
> On 11/06/2019 04:43, David Gibson wrote:
>> On Fri, Jun 07, 2019 at 09:28:49AM -0500, Richard Henderson wrote:
>>> On 6/7/19 9:09 AM, Laurent Vivier wrote:
 On 07/06/2019 11:29, Laurent Vivier wrote:
> On 29/05/2019 08:49, David Gibson wrote:
>> From: Richard Henderson 
>>
>> The gvec expanders take care of masking the shift amount
>> against the element width.
>>
>> Signed-off-by: Richard Henderson 
>> Message-Id: <20190518191430.21686-2-richard.hender...@linaro.org>
>> Signed-off-by: David Gibson 
>> ---
>>  target/ppc/helper.h | 12 --
>>  target/ppc/int_helper.c | 37 -
>>  target/ppc/translate/vmx-impl.inc.c | 24 +--
>>  3 files changed, 12 insertions(+), 61 deletions(-)
>
> This patch introduces a regressions
>  with Fedora 29 guest:
>
> - during kernel boot:
>
> [   40.397876] crypto_register_alg 'aes' = 0
> [   40.577517] crypto_register_alg 'cbc(aes)' = 0
> [   40.743576] crypto_register_alg 'ctr(aes)' = 0
> [   41.061379] alg: skcipher: Test 1 failed (invalid result) on 
> encryption for p8_aes_xts
> [   41.062054] : 91 7c f6 9e bd 68 b2 ec 9b 9f e9 a3 ea dd a6 92
> [   41.062163] 0010: 98 10 35 57 5e dc 36 1e 9a f7 bc ba 39 f2 5c eb
> [   41.062834] crypto_register_alg 'xts(aes)' = 0
> [   41.077358] alg: hash: Test 2 failed for p8_ghash
> [   41.077553] : 5f 89 ab f7 20 57 20 57 20 57 20 57 20 57 20 57
>
> - with libssl:
>
> # curl -o /dev/null https://www.google.com
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  
> Speed
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:--  
>0
> curl: (35) error:1408F119:SSL routines:ssl3_get_record:decryption failed 
> or bad record mac
>
> [before, this one fails with:
> curl: (35) error:04091068:rsa routines:int_rsa_verify:bad signature ]
>
> If I revert this patch on top of 0d74f3b427 + "target/ppc: Fix lxvw4x, 
> lxvh8x and lxvb16x", all works fine.
>
> Thanks,
> Laurent
>

 This seems to fix the problem:

 diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
 index 3b6052fe97..6f0709b307 100644
 --- a/accel/tcg/tcg-runtime-gvec.c
 +++ b/accel/tcg/tcg-runtime-gvec.c
 @@ -874,7 +874,7 @@ void HELPER(gvec_sar8v)(void *d, void *a, void *b,
 uint32_t desc)
  intptr_t oprsz = simd_oprsz(desc);
  intptr_t i;

 -for (i = 0; i < oprsz; i += sizeof(vec8)) {
 +for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
  uint8_t sh = *(uint8_t *)(b + i) & 7;
  *(int8_t *)(d + i) = *(int8_t *)(a + i) >> sh;
  }
>>>
>>> Grr.  I really really need to come up with a solution for testing that 
>>> allows
>>> me to test paths that the host cpu would not ordinarily take.  This bug is
>>> hidden on a host with AVX2.
>>>
>>> Thanks for the digging.
>>
>> Can one of you send this fix formally with a S-o-b and so forth?
> 
> I'm going to send it.

Richard already sent it:

  [PATCH] tcg: Fix typos in helper_gvec_sar{8,32,64}v
  <20190607183016.8285-1-richard.hender...@linaro.org>

Thanks,
Laurent



Re: [Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?

2019-06-11 Thread Jason Wang



On 2019/6/10 下午11:55, Michael S. Tsirkin wrote:

On Tue, Jun 04, 2019 at 03:10:43PM +0800, Like Xu wrote:

Hi Michael,

At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for
network latency saying:

---
reduce networking latency:
  allow handling short packets from softirq or VCPU context
  Plan:
We are going through the scheduler 3 times
(could be up to 5 if softirqd is involved)
Consider RX: host irq -> io thread -> VCPU thread ->
guest irq -> guest thread.
This adds a lot of latency.
We can cut it by some 1.5x if we do a bit of work
either in the VCPU or softirq context.
  Testing: netperf TCP RR - should be improved drastically
   netperf TCP STREAM guest to host - no regression
  Contact: MST
---

I am trying to make some contributions to improving netperf TCP_RR.
Could you please share more ideas or plans or implemental details to make it
happen?

Thanks,
Like Xu


So some of this did happen. netif_receive_skb is now called
directly from tun_get_user.

Question is about the rx/tun_put_user path now.

If the vhost thread is idle, there's a single packet
outstanding then maybe we can forward the packet to userspace
directly from BH without waking up the thread.



After the batch dequeue, it's pretty hard to determine whether or not no 
packet is outstanding just from tun itself.





For this to work we need to map some userspace memory into kernel
ahead of the time. For example, maybe it can happen when
guest adds RX buffers? Copying Jason who's looking into
memory mapping matters.



Need to go over the rx queue and pin the pages and then use MMU 
notifiers to unpin them if necessary.  And need to consider a way to 
work with batch dequeue.


Thanks




Re: [Qemu-devel] [PATCH v5 02/12] qapi/block-core: add option for io_uring

2019-06-11 Thread Fam Zheng
On Mon, 06/10 19:18, Aarushi Mehta wrote:
> Option only enumerates for hosts that support it.
> 
> Signed-off-by: Aarushi Mehta 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  qapi/block-core.json | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 1defcde048..db7eedd058 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2792,11 +2792,13 @@
>  #
>  # @threads: Use qemu's thread pool
>  # @native:  Use native AIO backend (only Linux and Windows)
> +# @io_uring:Use linux io_uring (since 4.1)
>  #
>  # Since: 2.9
>  ##
>  { 'enum': 'BlockdevAioOptions',
> -  'data': [ 'threads', 'native' ] }
> +  'data': [ 'threads', 'native',
> +{ 'name': 'io_uring', 'if': 'defined(CONFIG_LINUX_IO_URING)' } ] 
> }

Question: 'native' has a dependency on libaio but it doesn't have the
condition.  Is the inconsistency intended?

>  
>  ##
>  # @BlockdevCacheOptions:
> -- 
> 2.17.1
> 




Re: [Qemu-devel] [PATCH] tcg: Fix typos in helper_gvec_sar{8, 32, 64}v

2019-06-11 Thread Laurent Vivier
On 07/06/2019 20:30, Richard Henderson wrote:
> The loop is written with scalars, not vectors.
> Use the correct type when incrementing.
> 
> Fixes: 5ee5c14cacd
> Reported-by: Laurent Vivier 
> Signed-off-by: Richard Henderson 
> ---
>  accel/tcg/tcg-runtime-gvec.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> index 3b6052fe97..51cb29ca79 100644
> --- a/accel/tcg/tcg-runtime-gvec.c
> +++ b/accel/tcg/tcg-runtime-gvec.c
> @@ -874,7 +874,7 @@ void HELPER(gvec_sar8v)(void *d, void *a, void *b, 
> uint32_t desc)
>  intptr_t oprsz = simd_oprsz(desc);
>  intptr_t i;
>  
> -for (i = 0; i < oprsz; i += sizeof(vec8)) {
> +for (i = 0; i < oprsz; i += sizeof(int8_t)) {
>  uint8_t sh = *(uint8_t *)(b + i) & 7;
>  *(int8_t *)(d + i) = *(int8_t *)(a + i) >> sh;
>  }
> @@ -898,7 +898,7 @@ void HELPER(gvec_sar32v)(void *d, void *a, void *b, 
> uint32_t desc)
>  intptr_t oprsz = simd_oprsz(desc);
>  intptr_t i;
>  
> -for (i = 0; i < oprsz; i += sizeof(vec32)) {
> +for (i = 0; i < oprsz; i += sizeof(int32_t)) {
>  uint8_t sh = *(uint32_t *)(b + i) & 31;
>  *(int32_t *)(d + i) = *(int32_t *)(a + i) >> sh;
>  }
> @@ -910,7 +910,7 @@ void HELPER(gvec_sar64v)(void *d, void *a, void *b, 
> uint32_t desc)
>  intptr_t oprsz = simd_oprsz(desc);
>  intptr_t i;
>  
> -for (i = 0; i < oprsz; i += sizeof(vec64)) {
> +for (i = 0; i < oprsz; i += sizeof(int64_t)) {
>  uint8_t sh = *(uint64_t *)(b + i) & 63;
>  *(int64_t *)(d + i) = *(int64_t *)(a + i) >> sh;
>  }
> 

Tested-by: Laurent Vivier 
Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [PATCH v2] q35: fix mmconfig and PCI0._CRS

2019-06-11 Thread Marcel Apfelbaum




On 6/7/19 10:34 AM, Gerd Hoffmann wrote:

This patch changes the handling of the mmconfig area.  Thanks to the
pci(e) expander devices we already have the logic to exclude address
ranges from PCI0._CRS.  We can simply add the mmconfig address range
to the list get it excluded as well.

With that in place we can go with a fixed pci hole which covers the
whole area from the end of (low) ram to the ioapic.

This will make the whole logic alot less fragile.  No matter where the
firmware places the mmconfig xbar, things should work correctly.  The
guest also gets a bit more PCI address space (seabios boot):

 # cat /proc/iomem
 [ ... ]
 7ffdd000-7fff : reserved
 8000-afff : PCI Bus :00<<-- this is new
 b000-bfff : PCI MMCONFIG  [bus 00-ff]
   b000-bfff : reserved
 c000-febf : PCI Bus :00
   f800-fbff : :00:01.0
 [ ... ]

So this is a guest visible change.

Cc: László Érsek 
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Igor Mammedov 
---
  tests/bios-tables-test-allowed-diff.h |  8 +++
  hw/i386/acpi-build.c  | 14 
  hw/pci-host/q35.c | 31 +++
  3 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/tests/bios-tables-test-allowed-diff.h 
b/tests/bios-tables-test-allowed-diff.h
index dfb8523c8bf4..3bbd22c62a3b 100644
--- a/tests/bios-tables-test-allowed-diff.h
+++ b/tests/bios-tables-test-allowed-diff.h
@@ -1 +1,9 @@
  /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/q35/DSDT.bridge",
+"tests/data/acpi/q35/DSDT.mmio64",
+"tests/data/acpi/q35/DSDT.ipmibt",
+"tests/data/acpi/q35/DSDT.cphp",
+"tests/data/acpi/q35/DSDT.memhp",
+"tests/data/acpi/q35/DSDT.numamem",
+"tests/data/acpi/q35/DSDT.dimmpxm",
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 85dc1640bc67..8e4f26977619 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -122,6 +122,8 @@ typedef struct FwCfgTPMConfig {
  uint8_t tpmppi_version;
  } QEMU_PACKED FwCfgTPMConfig;
  
+static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg);

+
  static void init_common_fadt_data(Object *o, AcpiFadtData *data)
  {
  uint32_t io = object_property_get_uint(o, ACPI_PM_PROP_PM_IO_BASE, NULL);
@@ -1807,6 +1809,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
  CrsRangeSet crs_range_set;
  PCMachineState *pcms = PC_MACHINE(machine);
  PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
+AcpiMcfgInfo mcfg;
  uint32_t nr_mem = machine->ram_slots;
  int root_bus_limit = 0xFF;
  PCIBus *bus = NULL;
@@ -1921,6 +1924,17 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
  }
  }
  
+/*

+ * At this point crs_range_set has all the ranges used by pci
+ * busses *other* than PCI0.  These ranges will be excluded from
+ * the PCI0._CRS.  Add mmconfig to the set so it will be excluded
+ * too.
+ */
+if (acpi_get_mcfg(&mcfg)) {
+crs_range_insert(crs_range_set.mem_ranges,
+ mcfg.base, mcfg.base + mcfg.size - 1);
+}
+
  scope = aml_scope("\\_SB.PCI0");
  /* build PCI0._CRS */
  crs = aml_resource_template();
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 960939f5ed3e..72093320befe 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -258,15 +258,6 @@ static void q35_host_initfn(Object *obj)
  object_property_add_link(obj, MCH_HOST_PROP_IO_MEM, TYPE_MEMORY_REGION,
   (Object **) &s->mch.address_space_io,
   qdev_prop_allow_set_link_before_realize, 0, 
NULL);
-
-/* Leave enough space for the biggest MCFG BAR */
-/* TODO: this matches current bios behaviour, but
- * it's not a power of two, which means an MTRR
- * can't cover it exactly.
- */
-range_set_bounds(&s->mch.pci_hole,
-MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT + MCH_HOST_BRIDGE_PCIEXBAR_MAX,
-IO_APIC_DEFAULT_ADDRESS - 1);
  }
  
  static const TypeInfo q35_host_info = {

@@ -338,20 +329,6 @@ static void mch_update_pciexbar(MCHPCIState *mch)
  }
  addr = pciexbar & addr_mask;
  pcie_host_mmcfg_update(pehb, enable, addr, length);
-/* Leave enough space for the MCFG BAR */
-/*
- * TODO: this matches current bios behaviour, but it's not a power of two,
- * which means an MTRR can't cover it exactly.
- */
-if (enable) {
-range_set_bounds(&mch->pci_hole,
- addr + length,
- IO_APIC_DEFAULT_ADDRESS - 1);
-} else {
-range_set_bounds(&mch->pci_hole,
- MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT,
- IO_APIC_DEFAULT_ADDRESS - 1);
-}
  }
  
  /* PAM */

@@ -484,6 +461,14 @@ static void mch_update(MCHPCIState *mch)
  mch_update_pam(mch);
  mch_update_smram(mch);
  mch_update_ext_tseg_mb

Re: [Qemu-devel] [PATCH] migration: remove unused field bytes_xfer

2019-06-11 Thread Wei Yang
On Tue, Apr 02, 2019 at 08:31:06AM +0800, Wei Yang wrote:
>MigrationState->bytes_xfer is only set to 0 in migrate_init().
>
>Remove this unnecessary field.
>
>Signed-off-by: Wei Yang 

Hi, David

Are you willing to pick up this one?

>---
> migration/migration.c | 1 -
> migration/migration.h | 1 -
> 2 files changed, 2 deletions(-)
>
>diff --git a/migration/migration.c b/migration/migration.c
>index dea7078bf4..c929cf8d0f 100644
>--- a/migration/migration.c
>+++ b/migration/migration.c
>@@ -1681,7 +1681,6 @@ void migrate_init(MigrationState *s)
>  * parameters/capabilities that the user set, and
>  * locks.
>  */
>-s->bytes_xfer = 0;
> s->cleanup_bh = 0;
> s->to_dst_file = NULL;
> s->rp_state.from_dst_file = NULL;
>diff --git a/migration/migration.h b/migration/migration.h
>index 852eb3c4e9..b9efbe9168 100644
>--- a/migration/migration.h
>+++ b/migration/migration.h
>@@ -116,7 +116,6 @@ struct MigrationState
> DeviceState parent_obj;
> 
> /*< public >*/
>-size_t bytes_xfer;
> QemuThread thread;
> QEMUBH *cleanup_bh;
> QEMUFile *to_dst_file;
>-- 
>2.19.1

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH] migration: cleanup check on ops in savevm.handlers iteration

2019-06-11 Thread Wei Yang
On Mon, Apr 01, 2019 at 02:14:57PM +0800, Wei Yang wrote:
>During migration, there are several places to iterate on
>savevm.handlers. And on each iteration, we need to check its ops and
>related callbacks before invoke it.
>
>Generally, ops is the first element to check, and it is only necessary
>to check it once.
>
>This patch clean all the related part in savevm.c to check ops only once
>in those iterations.
>
>Signed-off-by: Wei Yang 

Hi, David

Are you willing to pick up this one?

>---
> migration/savevm.c | 35 ++-
> 1 file changed, 14 insertions(+), 21 deletions(-)
>
>diff --git a/migration/savevm.c b/migration/savevm.c
>index 5f0ca7fac2..92af2471cd 100644
>--- a/migration/savevm.c
>+++ b/migration/savevm.c
>@@ -1096,10 +1096,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
> if (!se->ops || !se->ops->save_setup) {
> continue;
> }
>-if (se->ops && se->ops->is_active) {
>-if (!se->ops->is_active(se->opaque)) {
>+if (se->ops->is_active &&
>+!se->ops->is_active(se->opaque)) {
> continue;
>-}
> }
> save_section_header(f, se, QEMU_VM_SECTION_START);
> 
>@@ -1127,10 +1126,9 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
> if (!se->ops || !se->ops->resume_prepare) {
> continue;
> }
>-if (se->ops && se->ops->is_active) {
>-if (!se->ops->is_active(se->opaque)) {
>+if (se->ops->is_active &&
>+!se->ops->is_active(se->opaque)) {
> continue;
>-}
> }
> ret = se->ops->resume_prepare(s, se->opaque);
> if (ret < 0) {
>@@ -1223,10 +1221,9 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
> if (!se->ops || !se->ops->save_live_complete_postcopy) {
> continue;
> }
>-if (se->ops && se->ops->is_active) {
>-if (!se->ops->is_active(se->opaque)) {
>+if (se->ops->is_active &&
>+!se->ops->is_active(se->opaque)) {
> continue;
>-}
> }
> trace_savevm_section_start(se->idstr, se->section_id);
> /* Section type */
>@@ -1265,18 +1262,16 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, 
>bool iterable_only,
> cpu_synchronize_all_states();
> 
> QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>-if (!se->ops ||
>+if (!se->ops || !se->ops->save_live_complete_precopy ||
> (in_postcopy && se->ops->has_postcopy &&
>  se->ops->has_postcopy(se->opaque)) ||
>-(in_postcopy && !iterable_only) ||
>-!se->ops->save_live_complete_precopy) {
>+(in_postcopy && !iterable_only)) {
> continue;
> }
> 
>-if (se->ops && se->ops->is_active) {
>-if (!se->ops->is_active(se->opaque)) {
>+if (se->ops->is_active &&
>+!se->ops->is_active(se->opaque)) {
> continue;
>-}
> }
> trace_savevm_section_start(se->idstr, se->section_id);
> 
>@@ -1377,10 +1372,9 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t 
>threshold_size,
> if (!se->ops || !se->ops->save_live_pending) {
> continue;
> }
>-if (se->ops && se->ops->is_active) {
>-if (!se->ops->is_active(se->opaque)) {
>+if (se->ops->is_active &&
>+!se->ops->is_active(se->opaque)) {
> continue;
>-}
> }
> se->ops->save_live_pending(f, se->opaque, threshold_size,
>res_precopy_only, res_compatible,
>@@ -2276,10 +2270,9 @@ static int qemu_loadvm_state_setup(QEMUFile *f)
> if (!se->ops || !se->ops->load_setup) {
> continue;
> }
>-if (se->ops && se->ops->is_active) {
>-if (!se->ops->is_active(se->opaque)) {
>+if (se->ops->is_active &&
>+!se->ops->is_active(se->opaque)) {
> continue;
>-}
> }
> 
> ret = se->ops->load_setup(f, se->opaque);
>-- 
>2.19.1

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH] migration: remove unused field bytes_xfer

2019-06-11 Thread Juan Quintela
Wei Yang  wrote:
> On Tue, Apr 02, 2019 at 08:31:06AM +0800, Wei Yang wrote:
>>MigrationState->bytes_xfer is only set to 0 in migrate_init().
>>
>>Remove this unnecessary field.
>>
>>Signed-off-by: Wei Yang 
>
> Hi, David

Hi

I am on duty this week, will get it.

>
> Are you willing to pick up this one?
>
>>---
>> migration/migration.c | 1 -
>> migration/migration.h | 1 -
>> 2 files changed, 2 deletions(-)
>>
>>diff --git a/migration/migration.c b/migration/migration.c
>>index dea7078bf4..c929cf8d0f 100644
>>--- a/migration/migration.c
>>+++ b/migration/migration.c
>>@@ -1681,7 +1681,6 @@ void migrate_init(MigrationState *s)
>>  * parameters/capabilities that the user set, and
>>  * locks.
>>  */
>>-s->bytes_xfer = 0;
>> s->cleanup_bh = 0;
>> s->to_dst_file = NULL;
>> s->rp_state.from_dst_file = NULL;
>>diff --git a/migration/migration.h b/migration/migration.h
>>index 852eb3c4e9..b9efbe9168 100644
>>--- a/migration/migration.h
>>+++ b/migration/migration.h
>>@@ -116,7 +116,6 @@ struct MigrationState
>> DeviceState parent_obj;
>> 
>> /*< public >*/
>>-size_t bytes_xfer;
>> QemuThread thread;
>> QEMUBH *cleanup_bh;
>> QEMUFile *to_dst_file;
>>-- 
>>2.19.1



Re: [Qemu-devel] [PATCH] migration: cleanup check on ops in savevm.handlers iteration

2019-06-11 Thread Juan Quintela
Wei Yang  wrote:
> On Mon, Apr 01, 2019 at 02:14:57PM +0800, Wei Yang wrote:
>>During migration, there are several places to iterate on
>>savevm.handlers. And on each iteration, we need to check its ops and
>>related callbacks before invoke it.
>>
>>Generally, ops is the first element to check, and it is only necessary
>>to check it once.
>>
>>This patch clean all the related part in savevm.c to check ops only once
>>in those iterations.
>>
>>Signed-off-by: Wei Yang 
>
> Hi, David
>
> Are you willing to pick up this one?

also will pick up this one.

Later, Juan.



Re: [Qemu-devel] [PATCH v7 0/4] rng-builtin: add an RNG backend that uses qemu_guest_getrandom()

2019-06-11 Thread Laurent Vivier
Michael,

Could you pick this series in the next virtio pull request?

If you disagree with some of my patches, could you take at least the
first one (from Kashyap)?

Thanks,
Laurent

On 29/05/2019 16:31, Laurent Vivier wrote:
> Add a new RNG backend using QEMU builtin getrandom function.
> 
> v7: rebase on master
> Make rng-builtin asynchronous with QEMUBH (removed existing R-b)
> 
> v6: remove "sysemu/rng-random.h" from virtio-rng.c
> rebase on qemu_getrandom v8
> 
> v5: PATCH 1 s/linux/Linux/
> remove superfluous includes from rng-builtin.c
> don't update rng-random documentation
> add a patch from Markus to keep the default backend out of VirtIORNGConf
> move TYPE_RNG_BUILTIN to sysemu/rng.h and remove sysemu/rng-builtin.h
> 
> v4: update PATCH 1 commit message
> 
> v3: Include Kashyap's patch in the series
> Add a patch to change virtio-rng default backend to rng-builtin
> 
> v2: Update qemu-options.hx
> describe the new backend and specify virtio-rng uses the
> rng-random by default
> 
> Kashyap Chamarthy (1):
>   VirtIO-RNG: Update default entropy source to `/dev/urandom`
> 
> Laurent Vivier (2):
>   rng-builtin: add an RNG backend that uses qemu_guest_getrandom()
>   virtio-rng: change default backend to rng-builtin
> 
> Markus Armbruster (1):
>   virtio-rng: Keep the default backend out of VirtIORNGConf
> 
>  backends/Makefile.objs |  2 +-
>  backends/rng-builtin.c | 77 ++
>  backends/rng-random.c  |  2 +-
>  hw/virtio/virtio-rng.c | 19 -
>  include/hw/virtio/virtio-rng.h |  2 -
>  include/sysemu/rng.h   |  2 +
>  qemu-options.hx|  9 +++-
>  7 files changed, 97 insertions(+), 16 deletions(-)
>  create mode 100644 backends/rng-builtin.c
> 




Re: [Qemu-devel] [PATCH] hax: Honor CPUState::halted

2019-06-11 Thread Philippe Mathieu-Daudé
Cc'ing Paolo & Richard.

On 6/10/19 4:27 AM, Colin Xu wrote:
> cc more.
> 
> On 2019-06-10 10:19, Colin Xu wrote:
>> QEMU tracks whether a vcpu is halted using CPUState::halted. E.g.,
>> after initialization or reset, halted is 0 for the BSP (vcpu 0)
>> and 1 for the APs (vcpu 1, 2, ...). A halted vcpu should not be
>> handed to the hypervisor to run (e.g. hax_vcpu_run()).
>>
>> Under HAXM, Android Emulator sometimes boots into a "vcpu shutdown
>> request" error while executing in SeaBIOS, with the HAXM driver
>> logging a guest triple fault in vcpu 1, 2, ... at RIP 0x3. That is
>> ultimately because the HAX accelerator asks HAXM to run those APs
>> when they are still in the halted state.
>>
>> Normally, the vcpu thread for an AP will start by looping in
>> qemu_wait_io_event(), until the BSP kicks it via a pair of IPIs
>> (INIT followed by SIPI). But because the HAX accelerator does not
>> honor cpu->halted, it allows the AP vcpu thread to proceed to
>> hax_vcpu_run() as soon as it receives any kick, even if the kick
>> does not come from the BSP. It turns out that emulator has a
>> worker thread which periodically kicks every vcpu thread (possibly
>> to collect CPU usage data), and if one of these kicks comes before
>> those by the BSP, the AP will start execution from the wrong RIP,
>> resulting in the aforementioned SMP boot failure.
>>
>> The solution is inspired by the KVM accelerator (credit to
>> Chuanxiao Dong  for the pointer):
>>
>> 1. Get rid of questionable logic that unconditionally resets
>>     cpu->halted before hax_vcpu_run(). Instead, only reset it at the
>>     right moments (there are only a few "unhalt" events).
>> 2. Add a check for cpu->halted before hax_vcpu_run().
>>
>> Note that although the non-Unrestricted Guest (!ug_platform) code
>> path also forcibly resets cpu->halted, it is left untouched,
>> because only the UG code path supports SMP guests.
>>
>> The patch is first merged to android emulator with Change-Id:
>> I9c5752cc737fd305d7eace1768ea12a07309d716
>>
>> Cc: Yu Ning 
>> Cc: Chuanxiao Dong 
>> Signed-off-by: Colin Xu 
>> ---
>>   cpus.c    |  1 -
>>   target/i386/hax-all.c | 36 ++--
>>   2 files changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index ffc57119ca5e..c1a56cd9ab01 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1591,7 +1591,6 @@ static void *qemu_hax_cpu_thread_fn(void *arg)
>>     cpu->thread_id = qemu_get_thread_id();
>>   cpu->created = true;
>> -    cpu->halted = 0;
>>   current_cpu = cpu;
>>     hax_init_vcpu(cpu);
>> diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
>> index 44b89c1d74ae..58a27b475ec8 100644
>> --- a/target/i386/hax-all.c
>> +++ b/target/i386/hax-all.c
>> @@ -471,13 +471,35 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>   return 0;
>>   }
>>   -    cpu->halted = 0;
>> -
>>   if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
>>   cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
>>   apic_poll_irq(x86_cpu->apic_state);
>>   }
>>   +    /* After a vcpu is halted (either because it is an AP and has
>> just been
>> + * reset, or because it has executed the HLT instruction), it
>> will not be
>> + * run (hax_vcpu_run()) until it is unhalted. The next few if
>> blocks check
>> + * for events that may change the halted state of this vcpu:
>> + *  a) Maskable interrupt, when RFLAGS.IF is 1;
>> + * Note: env->eflags may not reflect the current RFLAGS
>> state, because
>> + *   it is not updated after each hax_vcpu_run(). We
>> cannot afford
>> + *   to fail to recognize any
>> unhalt-by-maskable-interrupt event
>> + *   (in which case the vcpu will halt forever), and yet
>> we cannot
>> + *   afford the overhead of hax_vcpu_sync_state(). The
>> current
>> + *   solution is to err on the side of caution and have
>> the HLT
>> + *   handler (see case HAX_EXIT_HLT below)
>> unconditionally set the
>> + *   IF_MASK bit in env->eflags, which, in effect,
>> disables the
>> + *   RFLAGS.IF check.
>> + *  b) NMI;
>> + *  c) INIT signal;
>> + *  d) SIPI signal.
>> + */
>> +    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
>> + (env->eflags & IF_MASK)) ||
>> +    (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
>> +    cpu->halted = 0;
>> +    }
>> +
>>   if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
>>   DPRINTF("\nhax_vcpu_hax_exec: handling INIT for %d\n",
>>   cpu->cpu_index);
>> @@ -493,6 +515,16 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>   hax_vcpu_sync_state(env, 1);
>>   }
>>   +    if (cpu->halted) {
>> +    /* If this vcpu is halted, we must not ask HAXM to run it.
>> Instead, we
>> + * break out of hax_smp_cpu_exec() as if this vcpu had
>> executed HLT.
>> + * That way, this

Re: [Qemu-devel] [PATCH] net: cadence_gem: fix compilation error when debug is on

2019-06-11 Thread Philippe Mathieu-Daudé
Hi Ramon,

On 6/9/19 12:08 PM, Ramon Fried wrote:
> defining CADENCE_GEM_ERR_DEBUG causes compilation
> errors, fix that.
> 
> Signed-off-by: Ramon Fried 
> ---
>  hw/net/cadence_gem.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
> index 7f63411430..5cc5a71524 100644
> --- a/hw/net/cadence_gem.c
> +++ b/hw/net/cadence_gem.c
> @@ -982,8 +982,8 @@ static ssize_t gem_receive(NetClientState *nc, const 
> uint8_t *buf, size_t size)
>  return -1;
>  }
>  
> -DB_PRINT("copy %d bytes to 0x%x\n", MIN(bytes_to_copy, rxbufsize),
> -rx_desc_get_buffer(s->rx_desc[q]));
> +DB_PRINT("copy %d bytes to 0x%lx\n", MIN(bytes_to_copy, rxbufsize),
> +rx_desc_get_buffer(s, s->rx_desc[q]));

Your patch fails on 32-bit hosts:

./hw/net/cadence_gem.c:987:18: error: format '%lx' expects argument of
type 'long unsigned int', but argument 4 has type 'uint64_t {aka long
long unsigned int}' [-Werror=format=]
 DB_PRINT("copy %d bytes to 0x%lx\n", MIN(bytes_to_copy, rxbufsize),
  ^
./hw/net/cadence_gem.c:39:24: note: in definition of macro 'DB_PRINT'
 fprintf(stderr, ## __VA_ARGS__); \
^
./hw/net/cadence_gem.c: In function 'gem_transmit':
./hw/net/cadence_gem.c:1160:26: error: format '%lx' expects argument of
type 'long unsigned int', but argument 5 has type 'unsigned int'
[-Werror=format=]
 DB_PRINT("TX descriptor @ 0x%x too large: size 0x%x
space " \
  ^
./hw/net/cadence_gem.c:39:24: note: in definition of macro 'DB_PRINT'
 fprintf(stderr, ## __VA_ARGS__); \
^
cc1: all warnings being treated as errors

QEMU provides "HWADDR_PRIx" format for addresses, see for example few
lines earlier:

DB_PRINT("read descriptor 0x%" HWADDR_PRIx "\n", packet_desc_addr);


>  
>  /* Copy packet data to emulated DMA buffer */
>  address_space_write(&s->dma_as, rx_desc_get_buffer(s, s->rx_desc[q]) 
> +
> @@ -1156,7 +1156,7 @@ static void gem_transmit(CadenceGEMState *s)
>  if (tx_desc_get_length(desc) > sizeof(tx_packet) -
> (p - tx_packet)) {
>  DB_PRINT("TX descriptor @ 0x%x too large: size 0x%x space " \
> - "0x%x\n", (unsigned)packet_desc_addr,
> + "0x%lx\n", (unsigned)packet_desc_addr,

Here the correct format seems to be "%zd" (difference of sizeof).

>   (unsigned)tx_desc_get_length(desc),
>   sizeof(tx_packet) - (p - tx_packet));
>  break;
> 

Nowadays QEMU prefers to move from the old DB_PRINT() macros to the
trace events API, see for example this commit:

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=da1804d17a9ed7f060c072fbc4889db5fbc9c7d2;hp=a4f667b6714916683408b983cfe0a615a725775f

The first line you changed would be replaced by a trace event, while the
second could be replaced by a qemu_log_mask() call (it is an error
condition).

Also I suggest to include "QEMU Trivial " in
the list of recipients, so your patch might get reviewed/merged quicker.

Regards,

Phil.



Re: [Qemu-devel] [PATCH v5 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema

2019-06-11 Thread Markus Armbruster
Igor Mammedov  writes:

> Legacy '-numa node,mem' option has a number of issues and mgmt often
> defaults to it. Unfortunately it's no possible to replace it with
> an alternative '-numa memdev' without breaking migration compatibility.
> What's possible though is to deprecate it, keeping option working with
> old machine types only.
>
> In order to help users to find out if being deprecated CLI option
> '-numa node,mem' is still supported by particular machine type, add new
> "numa-mem-supported" property to output of query-machines.
>
> "numa-mem-supported" is set to 'true' for machines that currently support
> NUMA, but it will be flipped to 'false' later on, once deprecation period
> expires and kept 'true' only for old machine types that used to support
> the legacy option so it won't break existing configuration that are using
> it.
>
> Signed-off-by: Igor Mammedov 

Reviewed-by: Markus Armbruster 



Re: [Qemu-devel] [PATCH v4 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema

2019-06-11 Thread Markus Armbruster
Eduardo Habkost  writes:

> On Fri, Jun 07, 2019 at 07:39:17PM +0200, Markus Armbruster wrote:
>> This is correct when the TYPE_VIRT_MACHINE, TYPE_PC_MACHINE and
>> TYPE_SPAPR_MACHINE are exactly the machines supporting NUMA.  How could
>> I check that?
>
> parse_numa_node() rejects the -numa option if the machine doesn't
> implement MachineClass::get_default_cpu_node_id().
>
> Grepping for it:
>
> $ git grep -pw get_default_cpu_node_id
> hw/arm/virt.c=static void virt_machine_class_init(ObjectClass *oc, void *data)
> hw/arm/virt.c:mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
> hw/core/machine.c=static void machine_numa_finish_cpu_init(MachineState 
> *machine)
> hw/core/machine.c:props.node_id = 
> mc->get_default_cpu_node_id(machine, i);
> hw/i386/pc.c=static void pc_machine_class_init(ObjectClass *oc, void *data)
> hw/i386/pc.c:mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
> hw/ppc/spapr.c=static void spapr_machine_class_init(ObjectClass *oc, void 
> *data)
> hw/ppc/spapr.c:mc->get_default_cpu_node_id = 
> spapr_get_default_cpu_node_id;
> include/hw/boards.h=typedef struct {
> include/hw/boards.h: * @get_default_cpu_node_id:
> include/hw/boards.h=struct MachineClass {
> include/hw/boards.h:int64_t (*get_default_cpu_node_id)(const MachineState 
> *ms, int idx);
> numa.c=static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
> numa.c:if (!mc->cpu_index_to_instance_props || 
> !mc->get_default_cpu_node_id) {
>
>
> Related:
>   [PATCH v4 01/11] numa: move numa global variable nb_numa_nodes into 
> MachineState
> which adds a MachineClass::numa_supported flag to those machines.

Thanks, Eduardo!

Preferably with commit message and doc comment tweaked:
Reviewed-by: Markus Armbruster 



Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits

2019-06-11 Thread Maxim Levitsky
On Fri, 2019-06-07 at 15:28 -0400, John Snow wrote:
> 
> On 6/7/19 7:08 AM, Paolo Bonzini wrote:
> > On 06/06/19 23:23, John Snow wrote:
> > > So: This looks right; does this fix a bug that can be observed? Do we
> > > have any regression tests for block/NVMe?
> > 
> > I don't think it fixes a bug; by the time the CQ entry is picked up by
> > QEMU, the device is not supposed to touch it anymore.
> > 
> > However, the idea behind the phase bits is that you can decide whether
> > the driver has placed a completion in the queue.  When we get here, we have
> > 
> > le16_to_cpu(c->status) & 0x1) == !q->cq_phase
> > 
> > On the next pass through the ring buffer q->cq_phase will be flipped,
> > and thus when we see this element we'll get
> > 
> > le16_to_cpu(c->status) & 0x1) == q->cq_phase
> > 
> > and not process it.  Since block/nvme.c flips the bit, this mechanism
> > does not work and the loop termination relies on the other part of the
> > condition, "if (!c->cid) break;".
> > 
> > So the patch is correct, but it would also be nice to also either remove
> > phase handling altogether, or check that the phase handling works
> > properly and drop the !c->cid test.
> > 
> > Paolo


I agree with that and I'll send an updated patch soon.

The driver should not touch the completion entries at all, but rather just scan 
for the entries whose
phase bit was flipped by the hardware.

in fact I don't even think that the 'c->cid' became the exit condition, but 
rather since the device is not allowed 
to fully fill the compleiton queue (it must alway keep at least one free entry 
there), the end condition would still
be the check on the flipped phase bit.


I'll fix that to be up to the spec,

Best regards,
Maxim Levitskky




Re: [Qemu-devel] [Qemu-block] [PATCH] file-posix: unlock qemu_global_mutex before pread when attach disk

2019-06-11 Thread Kevin Wolf
Am 11.06.2019 um 04:53 hat l00284672 geschrieben:
> -- Would the "open" hang as well in that case?
>The "open" doesn't hang in that case.
> 
> Do you have any better solutions to solve this problem in the case?

Yes, but unfortunately it's a lot harder.

This is roughly what you'd have to do:

1. Make QMP command handlers async (patches from Marc-André are on the
   list)
2. Stop using HMP drive_add and instead switch to QMP blockdev-add
3. Move the blockdev-add code into a coroutine
4. Make .bdrv_open a coroutine_fn
5. Move the pread() in file-posix to the thread pool and let the
   coroutine yield while the request is running

Only with all of these pieces in place we'll be able to release the
global mutex while we're waiting for the pread() to complete.

Kevin



Re: [Qemu-devel] [PATCH] migration: remove unused field bytes_xfer

2019-06-11 Thread Wei Yang
On Tue, Jun 11, 2019 at 10:33:29AM +0200, Juan Quintela wrote:
>Wei Yang  wrote:
>> On Tue, Apr 02, 2019 at 08:31:06AM +0800, Wei Yang wrote:
>>>MigrationState->bytes_xfer is only set to 0 in migrate_init().
>>>
>>>Remove this unnecessary field.
>>>
>>>Signed-off-by: Wei Yang 
>>
>> Hi, David
>
>Hi
>
>I am on duty this week, will get it.

Thanks :-)

>
>>
>> Are you willing to pick up this one?
>>
>>>---
>>> migration/migration.c | 1 -
>>> migration/migration.h | 1 -
>>> 2 files changed, 2 deletions(-)
>>>
>>>diff --git a/migration/migration.c b/migration/migration.c
>>>index dea7078bf4..c929cf8d0f 100644
>>>--- a/migration/migration.c
>>>+++ b/migration/migration.c
>>>@@ -1681,7 +1681,6 @@ void migrate_init(MigrationState *s)
>>>  * parameters/capabilities that the user set, and
>>>  * locks.
>>>  */
>>>-s->bytes_xfer = 0;
>>> s->cleanup_bh = 0;
>>> s->to_dst_file = NULL;
>>> s->rp_state.from_dst_file = NULL;
>>>diff --git a/migration/migration.h b/migration/migration.h
>>>index 852eb3c4e9..b9efbe9168 100644
>>>--- a/migration/migration.h
>>>+++ b/migration/migration.h
>>>@@ -116,7 +116,6 @@ struct MigrationState
>>> DeviceState parent_obj;
>>> 
>>> /*< public >*/
>>>-size_t bytes_xfer;
>>> QemuThread thread;
>>> QEMUBH *cleanup_bh;
>>> QEMUFile *to_dst_file;
>>>-- 
>>>2.19.1

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH v6 5/7] qemu-coroutine-sleep: introduce qemu_co_sleep_wake

2019-06-11 Thread Kevin Wolf
Am 07.06.2019 um 19:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.06.2019 18:52, Vladimir Sementsov-Ogievskiy wrote:
> > 07.06.2019 16:02, Kevin Wolf wrote:
> >> Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>> 07.06.2019 10:57, Kevin Wolf wrote:
>  Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > Introduce a function to gracefully wake-up a coroutine, sleeping in
> > qemu_co_sleep_ns() sleep.
> >
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> 
>  You can simply reenter the coroutine while it has yielded in
>  qemu_co_sleep_ns(). This is supported.
> >>>
> >>> No it doesn't. qemu_aio_coroutine_enter checks for scheduled field,
> >>> and aborts if it is set.
> >>
> >> Ah, yes, it has been broken since commit
> >>
> >> I actually tried to fix it once, but it turned out more complicated and
> >> I think we found a different solution for the problem at hand:
> >>
> >>  Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures
> >>  Message-Id: <20171128154350.21504-1-kw...@redhat.com>
> >>
> >> In this case, I guess your approach with a new function to interrupt
> >> qemu_co_sleep_ns() is okay.
> >>
> >> Do we need to timer_del() when taking the shortcut? We don't necessarily
> >> reenter the coroutine immediately, but might only be scheduling it. In
> >> this case, the timer could fire before qemu_co_sleep_ns() has run and
> >> schedule the coroutine a second time
> > 
> > No it will not, as we do cmpxchg, scheduled to NULL, so second call will do
> > nothing..
> > 
> > But it seems unsafe, as even coroutine pointer may be stale when we call
> > qemu_co_sleep_wake second time. So, we possibly should remove timer, but ..
> > 
> >   (ignoring co->scheduled again -
> >> maybe we should actually not do that in the timer callback path, but
> >> instead let it run into the assertion because it would be a bug for the
> >> timer callback to end up in this situation).
> >>
> >> Kevin
> >>
> > 
> > Interesting, could there be a race condition, when we call 
> > qemu_co_sleep_wake,
> > but co_sleep_cb already scheduled in some queue and will run soon? Then 
> > removing
> > the timer will not help.
> > 
> > 
> 
> Hmm, it's commented that timer_del is thread-safe..
> 
> Hmm, so, if anyway want to return Timer pointer from qemu_co_sleep_ns, may be 
> it's better
> to just call timer_mod(ts, 0) to shorten waiting instead of cheating with 
> .scheduled?

This is probably slower than timer_del() and directly entering the
coroutine. Is there any advantage in using timer_mod()? I don't think
messing with .scheduled is too bad as it's set in the function just
below, so it pairs nicely enough.

Kevin



Re: [Qemu-devel] qgraph

2019-06-11 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 10/06/19 18:12, Andreas Färber wrote:
>> Am 10.06.19 um 15:52 schrieb Paolo Bonzini:
>>> On 10/06/19 15:28, Andreas Färber wrote:
 Am 10.06.19 um 14:03 schrieb Paolo Bonzini:
> Well, that was explained upthread---finding out what device can be
> plugged where.

Fair feature request.  It has come up before.

[...]
 So if we want a new QMP operation, the most sense would probably make
 where-can-I-attach-type(foo) returning a list of QOM paths, showing only
 the first free slot per bus. That would allow a more efficient lookup
 implementation inside QEMU than needing to check each slot[n] property
 via qom-get after discovering it with qom-list.
>>>
>>> Note that what Natalia is seeking is an introspection mechanism to be
>>> used _before_ creating a virtual machine though.

This requires introspecting the machine to find its onboard devices,
then introspecting onboard devices to find relevant sockets.  Perhaps
even introspect the devices that could be plugged into available sockets
to find more sockets.

I'm afraid this founders right on the first step: we can't introspect
machines that way, can we?

Instead, we need to run with -M $machine_of_interest, then walk the QOM
tree to find the onboard devices.

>> QMP implied creating a virtual machine though.
>
> Yes, but you can start QEMU with -M none and just invoke QOM
> introspection commands.

Yes, this is how introspection (both QMP and QOM) is commonly used.
Just keep in mind one difference: QMP is static, QOM is dynamic.

QMP being static means it's defined at compile time.  So is the value of
query-qmp-schema.  Same QEMU build, same value.  This permits caching.

QOM being dynamic means to introspect an object's properties, you have
to create it.  Worse, an object's properties may (in theory) change at
any time.  *Properties*, not just property *values*.  In practice, I'd
expect properties to change only at realize time.

QOM introspection can only see the properties in a newly created object.
Even these could (in theory) depend on state, i.e.  the next time you
introspect, you could get a different result.  Even in the same process.

I never quite understood why QOM needs *that* much flexibility.  But it
is how it is.  The common way for a management application to deal with
it is to assume what introspection shows us is for all practical
purposes close enough to what we'll actually get.

[...]



Re: [Qemu-devel] [RFC PATCH 10/10] monitor: Split out monitor/core.c

2019-06-11 Thread Kevin Wolf
Am 07.06.2019 um 19:29 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kw...@redhat.com) wrote:
> > Move the monitor core infrastructure from monitor/misc.c to
> > monitor/core.c. This is code that can be shared for all targets, so
> > compile it only once.
> > 
> > What remains in monitor/misc.c after this patch is mostly monitor
> > command implementations and code that requires a system emulator or is
> > even target-dependent.
> > 
> > The amount of function and particularly extern variables in
> > monitor_int.h is probably a bit larger than it needs to be, but this way
> > no non-trivial code modifications are needed. The interfaces between all
> > monitor parts can be cleaned up later.
> > 
> > Signed-off-by: Kevin Wolf 
> 
> OK, but can you call it anything other than core.* - I regularly end up
> deleting things like that!

Oh, I didn't even think of this kind of core.*!

I imagine in practice it wouldn't be so bad to have a monitor/core.c
because it's in a subdirectory, and it's under version control anyway.
We already seem to have quite a few of them in subdirectories:

./hw/acpi/core.c
./hw/bt/core.c
./hw/cpu/core.c
./hw/i2c/core.c
./hw/ide/core.c
./hw/sd/core.c
./hw/usb/core.c

But I'll gladly rename it if I can find a good name. Do you have any
suggestions? Maybe just monitor/monitor.c?

Kevin



Re: [Qemu-devel] [RFC PATCH 10/10] monitor: Split out monitor/core.c

2019-06-11 Thread Dr. David Alan Gilbert
* Kevin Wolf (kw...@redhat.com) wrote:
> Am 07.06.2019 um 19:29 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kw...@redhat.com) wrote:
> > > Move the monitor core infrastructure from monitor/misc.c to
> > > monitor/core.c. This is code that can be shared for all targets, so
> > > compile it only once.
> > > 
> > > What remains in monitor/misc.c after this patch is mostly monitor
> > > command implementations and code that requires a system emulator or is
> > > even target-dependent.
> > > 
> > > The amount of function and particularly extern variables in
> > > monitor_int.h is probably a bit larger than it needs to be, but this way
> > > no non-trivial code modifications are needed. The interfaces between all
> > > monitor parts can be cleaned up later.
> > > 
> > > Signed-off-by: Kevin Wolf 
> > 
> > OK, but can you call it anything other than core.* - I regularly end up
> > deleting things like that!
> 
> Oh, I didn't even think of this kind of core.*!
> 
> I imagine in practice it wouldn't be so bad to have a monitor/core.c
> because it's in a subdirectory, and it's under version control anyway.
> We already seem to have quite a few of them in subdirectories:
> 
> ./hw/acpi/core.c
> ./hw/bt/core.c
> ./hw/cpu/core.c
> ./hw/i2c/core.c
> ./hw/ide/core.c
> ./hw/sd/core.c
> ./hw/usb/core.c

Yes, they all annoy me in the same way :-)

> But I'll gladly rename it if I can find a good name. Do you have any
> suggestions? Maybe just monitor/monitor.c?

Yes that's fine, thanks!

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v10 0/3] linux-user: A set of miscellaneous patches

2019-06-11 Thread Aleksandar Markovic
Ping
On Jun 7, 2019 2:21 PM, "Aleksandar Markovic" 
wrote:

> From: Aleksandar Markovic 
>
> This is a collection of misc patches for Linux user that I recently
> accumulated from variuous sources. All of them originate from problems
> observed on mips target. However, these changes actually affect and fix
> problems on multiple targets.
>
> v9->v10:
>
>   - improved commit messages for patches 2 and 3
>
> v8->v9:
>
>   - fixed build error on some systems related to SOL_ALG
>
> v7->v8:
>
>   - added a patch on setsockopt() option SOL_ALG
>
> v6->v7:
>
>   - fixed a build error for older kernels related to the patch on
> setsockopt() options
>   - removed four patches that on the meantime got accepted into the
> main source tree
>
> v5->v6:
>
>   - fixed a mistake in patch #4
>   - improved commit messages in patches #4 and #6
>
> v4->v5:
>
>   - added the patch on statx() support
>   - improved the patch on IPV6__MEMBERSHIP to take into
> account the possibility of different names for a field
>   - minor corrections in commit messages
>
> v3->v4:
>
>   - improved commit messages (fixed some typos, improved relevance)
>
> v2->v3:
>
>   - updated and improved commit messages
>   - added IPV6_DROP_MEMBERSHIP support to the patch on setsockopt()'s
> option
>
> v1->v2:
>
>   - added the patch on setsockopt()'s option IPV6_ADD_MEMBERSHIP
>   - improved the commit me
>
> Aleksandar Rikalo (1):
>   linux-user: Add support for statx() syscall
>
> Neng Chen (1):
>   linux-user: Add support for setsockopt() options
> IPV6__MEMBERSHIP
>
> Yunqiang Su (1):
>   linux-user: Add support for setsockopt() option SOL_ALG
>
>  linux-user/syscall.c  | 193 ++
> +++-
>  linux-user/syscall_defs.h |  37 +
>  2 files changed, 229 insertions(+), 1 deletion(-)
>
> --
> 2.7.4
>
>
>


Re: [Qemu-devel] [PATCH v5 02/12] qapi/block-core: add option for io_uring

2019-06-11 Thread Stefan Hajnoczi
On Tue, Jun 11, 2019 at 03:36:53PM +0800, Fam Zheng wrote:
> On Mon, 06/10 19:18, Aarushi Mehta wrote:
> > Option only enumerates for hosts that support it.
> > 
> > Signed-off-by: Aarushi Mehta 
> > Reviewed-by: Stefan Hajnoczi 
> > ---
> >  qapi/block-core.json | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 1defcde048..db7eedd058 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2792,11 +2792,13 @@
> >  #
> >  # @threads: Use qemu's thread pool
> >  # @native:  Use native AIO backend (only Linux and Windows)
> > +# @io_uring:Use linux io_uring (since 4.1)
> >  #
> >  # Since: 2.9
> >  ##
> >  { 'enum': 'BlockdevAioOptions',
> > -  'data': [ 'threads', 'native' ] }
> > +  'data': [ 'threads', 'native',
> > +{ 'name': 'io_uring', 'if': 'defined(CONFIG_LINUX_IO_URING)' } 
> > ] }
> 
> Question: 'native' has a dependency on libaio but it doesn't have the
> condition.  Is the inconsistency intended?

'native' could be conditional too but I guess it's a historical thing.
Either QAPI 'if' didn't exit when BlockdevAioOptions was defined or we
simply forgot to use it :).

It doesn't need to be changed in this patch series.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 09/12] block: add trace events for io_uring

2019-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2019 at 07:19:02PM +0530, Aarushi Mehta wrote:
> @@ -294,6 +302,7 @@ LuringState *luring_init(Error **errp)
>  int rc;
>  LuringState *s;
>  s = g_malloc0(sizeof(*s));
> +trace_luring_init_state((void *)s, sizeof(*s));

In C conversion to void * is automatic and doesn't need to be done
manually.

> diff --git a/block/trace-events b/block/trace-events
> index eab51497fc..c4564dcd96 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -60,6 +60,14 @@ qmp_block_stream(void *bs, void *job) "bs %p job %p"
>  file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int 
> type) "acb %p opaque %p offset %"PRId64" count %d type %d"
>  file_copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t 
> dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset 
> %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64
>  
> +#io_uring.c
> +luring_init_state(void *s, size_t size) "s %p size %zu"
> +luring_cleanup_state(void) "s freed"
> +disable luring_io_plug(void) "plug"
> +disable luring_io_unplug(int blocked, int plugged, int queued, int inflight) 
> "blocked %d plugged %d queued %d inflight %d"
> +disable luring_do_submit(int blocked, int plugged, int queued, int inflight) 
> "blocked %d plugged %d queued %d inflight %d"
> +disable luring_do_submit_done(int ret) "submitted to kernel %d"

Why are these disabled?  "disable" compiles them out and they won't be
available at runtime.  "disable" should probably be dropped here.

Please include the LuringState *s pointer in trace events since there
can be multiple LuringStates at any given time and it should be possible
to correlate trace events.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 1/2] docs/specs/index.rst: Fix minor syntax issues

2019-06-11 Thread Peter Maydell
On Mon, 10 Jun 2019 at 22:41, Aleksandar Markovic
 wrote:
>
>
> On Jun 10, 2019 5:25 PM, "Peter Maydell"  wrote:
> >
> > The docs/specs/index.rst has a couple of minor issues which
> > we didn't notice because we weren't building the manual:
> >  * the ToC entry for the new PPC XIVE docs points to
> >a nonexistent file
> >  * the initial comment needs to be marked by '..', not '.',
> >or it will appear in the output
> >  * the title doesn't match the capitialization used by
> >the existing interop or devel manuals, and uses
> >'full-system emulation' rather than the 'system emulation'
> >that the interop manual title uses
> >
> > Fix these minor issues before we start trying to build the manual.
> >
> > Signed-off-by: Peter Maydell 
> > ---
>
> Acked-by: Aleksandar Markovic 

Hi Aleksandar; I'm just wondering what you were meaning
with this acked-by tag. Generally acked-by means  (to me,
and I think usually with qemu) "this patch touches an
area that I maintain, I haven't reviewed it but I'm OK with
it". But this series isn't mips-related, so maybe you
meant reviewed-by instead ?

(Acked-by is a bit of an odd tag because it's less
clear what it means than reviewed-by or signed-off-by,
so it's not very surprising if you've picked up a
different opinion on what it's for.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH v5 10/12] block/io_uring: adds userspace completion polling

2019-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2019 at 07:19:03PM +0530, Aarushi Mehta wrote:
> +static bool qemu_luring_poll_cb(void *opaque)
> +{
> +LuringState *s = opaque;
> +struct io_uring_cqe *cqes;
> +
> +if (io_uring_peek_cqe(&s->ring, &cqes) == 0) {
> +if (!cqes) {
> +qemu_luring_process_completions_and_submit(s);
> +return true;
> +}

Is this logic inverted?  We have a completion when cqes != NULL.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 11/12] qemu-io: adds support for io_uring

2019-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2019 at 07:19:04PM +0530, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta 
> ---
>  qemu-io.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/qemu-io.c b/qemu-io.c
> index 8d5d5911cb..54b82151c4 100644
> --- a/qemu-io.c
> +++ b/qemu-io.c
> @@ -129,6 +129,7 @@ static void open_help(void)
>  " -n, -- disable host cache, short for -t none\n"
>  " -U, -- force shared permissions\n"
>  " -k, -- use kernel AIO implementation (on Linux only)\n"
> +" -i  -- use kernel io_uring (Linux 5.1+)\n"
>  " -t, -- use the given cache mode for the image\n"
>  " -d, -- use the given discard mode for the image\n"
>  " -o, -- options to be given to the block driver"
> @@ -188,6 +189,11 @@ static int open_f(BlockBackend *blk, int argc, char 
> **argv)
>  case 'k':
>  flags |= BDRV_O_NATIVE_AIO;
>  break;
> +#ifdef CONFIG_LINUX_IO_URING
> +case 'i':
> +flags |= BDRV_O_IO_URING;
> +break;
> +#endif
>  case 't':
>  if (bdrv_parse_cache_mode(optarg, &flags, &writethrough) < 0) {
>  error_report("Invalid cache option: %s", optarg);
> @@ -290,6 +296,7 @@ static void usage(const char *name)
>  "  -C, --copy-on-read   enable copy-on-read\n"
>  "  -m, --misalign   misalign allocations for O_DIRECT\n"
>  "  -k, --native-aio use kernel AIO implementation (on Linux only)\n"
> +"  -i  --io_uring   use kernel io_uring (Linux 5.1+)\n"
>  "  -t, --cache=MODE use the given cache mode for the image\n"
>  "  -d, --discard=MODE   use the given discard mode for the image\n"
>  "  -T, --trace [[enable=]][,events=][,file=]\n"
> @@ -499,6 +506,7 @@ int main(int argc, char **argv)
>  { "copy-on-read", no_argument, NULL, 'C' },
>  { "misalign", no_argument, NULL, 'm' },
>  { "native-aio", no_argument, NULL, 'k' },
> +{ "io_uring", no_argument, NULL, 'i' },
>  { "discard", required_argument, NULL, 'd' },
>  { "cache", required_argument, NULL, 't' },
>  { "trace", required_argument, NULL, 'T' },
> @@ -566,6 +574,11 @@ int main(int argc, char **argv)
>  case 'k':
>  flags |= BDRV_O_NATIVE_AIO;
>  break;
> +#ifdef CONFIG_LINUX_IO_URING
> +case 'i':
> +flags |= BDRV_O_IO_URING;
> +break;
> +#endif

An --aio=threads|native|io_uring option would be more general than
adding --io_uring.  That new AIO engines do not require their own
command-line options.

Can you implement something like the -drive aio= parameter so that a
single option can specify threads, native, or io_uring?

Thanks,
Stefan


signature.asc
Description: PGP signature


[Qemu-devel] [Bug 1832250] Re: arm32v6/golang:1.10-alpine is broken for qemu 2.8 on MacOS cross-compilation

2019-06-11 Thread Peter Maydell
Please can you try with a more recent version of QEMU? 2.8 is pretty
old, and there are definitely some bugs involving Alpine Linux glibc and
also go that we've fixed in later versions.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1832250

Title:
  arm32v6/golang:1.10-alpine is broken for qemu 2.8 on MacOS cross-
  compilation

Status in QEMU:
  New

Bug description:
  FROM arm32v6/golang:1.10-alpine

  docker build -t openhorizon/ibm.gps_arm:2.0.7 -f ./Dockerfile.arm .
  Sending build context to Docker daemon  110.6kB
  Step 1/12 : FROM arm32v6/golang:1.10-alpine
  1.10-alpine: Pulling from arm32v6/golang
  05276f4299f2: Pull complete 
  5657e63df536: Pull complete 
  febca98d0249: Pull complete 
  5053a7aa5dea: Pull complete 
  d048463a3701: Pull complete 
  b628c679d668: Pull complete 
  Digest: 
sha256:94c5fd97b17d0e9fe89e011446bedda4784cb0af7a60494989e2a21c0dcba92f
  Status: Downloaded newer image for arm32v6/golang:1.10-alpine
   ---> 3110964e8c9a
  Step 2/12 : RUN apk --no-cache update && apk add git
   ---> Running in 14ffb11506bb
  fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/armhf/APKINDEX.tar.gz
  fetch 
http://dl-cdn.alpinelinux.org/alpine/v3.9/community/armhf/APKINDEX.tar.gz
  v3.9.4-24-g4e2ff29bbe [http://dl-cdn.alpinelinux.org/alpine/v3.9/main]
  v3.9.4-25-g65097c9cdc [http://dl-cdn.alpinelinux.org/alpine/v3.9/community]
  OK: 9547 distinct packages available
  fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/armhf/APKINDEX.tar.gz
  fetch 
http://dl-cdn.alpinelinux.org/alpine/v3.9/community/armhf/APKINDEX.tar.gz
  (1/7) Installing nghttp2-libs (1.35.1-r0)
  (2/7) Installing libssh2 (1.8.2-r0)
  (3/7) Installing libcurl (7.64.0-r2)
  (4/7) Installing libgcc (8.3.0-r0)
  (5/7) Installing expat (2.2.6-r0)
  (6/7) Installing pcre2 (10.32-r1)
  (7/7) Installing git (2.20.1-r0)
  Executing busybox-1.29.3-r10.trigger
  OK: 18 MiB in 22 packages
  Removing intermediate container 14ffb11506bb
   ---> 6890ea7ed09b
  Step 3/12 : RUN mkdir -p /build/bin
   ---> Running in 44e52d78d7b4
  Removing intermediate container 44e52d78d7b4
   ---> 0763afda41d1
  Step 4/12 : COPY src /build/src
   ---> 05bab9a72a34
  Step 5/12 : WORKDIR /build
   ---> Running in 5a663caff249
  Removing intermediate container 5a663caff249
   ---> 5a6ca53c00de
  Step 6/12 : RUN env GOPATH=/build GOOPTIONS_ARM='CGO_ENABLED=0 GOOS=linux 
GOARCH=arm GOARM=6' go get github.com/kellydunn/golang-geo
   ---> Running in 05b09ee0c206
  Removing intermediate container 05b09ee0c206
   ---> e68c6e222e51
  Step 7/12 : RUN env GOPATH=/build GOOPTIONS_ARM='CGO_ENABLED=0 GOOS=linux 
GOARCH=arm GOARM=6' go build -o /build/bin/armv6_gps /build/src/main.go
   ---> Running in ea6d2707e35f
  qemu-arm: /build/qemu-rwi8RH/qemu-2.8+dfsg/translate-all.c:175: tb_lock: 
Assertion `!have_tb_lock' failed.
  qemu-arm: /build/qemu-rwi8RH/qemu-2.8+dfsg/translate-all.c:175: tb_lock: 
Assertion `!have_tb_lock' failed.
  The command '/bin/sh -c env GOPATH=/build GOOPTIONS_ARM='CGO_ENABLED=0 
GOOS=linux GOARCH=arm GOARM=6' go build -o /build/bin/armv6_gps 
/build/src/main.go' returned a non-zero code: 139
  make: *** [build] Error 139

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1832250/+subscriptions



Re: [Qemu-devel] [PATCH v5 00/12] Add support for io_uring

2019-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2019 at 07:18:53PM +0530, Aarushi Mehta wrote:
> This patch series adds support for the newly developed io_uring Linux AIO
> interface. Linux io_uring is faster than Linux's AIO asynchronous I/O code,
> offers efficient buffered asynchronous I/O support, the ability to do I/O
> without performing a system call via polled I/O, and other efficiency 
> enhancements.
> 
> Testing it requires a host kernel (5.1+) and the liburing library.
> Use the option -drive aio=io_uring to enable it.
> 
> v5:
> - Adds completion polling
> - Extends qemu-io
> - Adds qemu-iotest

Flush is not hooked up.  Please use the io_uring IOURING_OP_FSYNC that
you've already written and connect it to file-posix.c.

When doing this watch out for the qiov->size check during completion
processing.  Flush doesn't have a qiov so it may be NULL.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 12/12] qemu-iotests/087: checks for io_uring

2019-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2019 at 07:19:05PM +0530, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta 
> ---
>  tests/qemu-iotests/087 | 26 ++
>  tests/qemu-iotests/087.out | 10 ++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/tests/qemu-iotests/087 b/tests/qemu-iotests/087
> index d6c8613419..0cc7283ad8 100755
> --- a/tests/qemu-iotests/087
> +++ b/tests/qemu-iotests/087
> @@ -124,6 +124,32 @@ run_qemu_filter_aio <  { "execute": "quit" }
>  EOF
>  
> +echo
> +echo === aio=io_uring without O_DIRECT ===
> +echo
> +
> +# Skip this test if io_uring is not enabled in this build

Is this comment a todo?  I see nothing that skips the test.


signature.asc
Description: PGP signature


Re: [Qemu-devel] PCI(e): Documentation "io-reserve" and related properties?

2019-06-11 Thread Marcel Apfelbaum




On 6/7/19 2:43 PM, Andrea Bolognani wrote:

On Thu, 2019-06-06 at 14:20 -0400, Michael S. Tsirkin wrote:

On Thu, Jun 06, 2019 at 06:19:43PM +0200, Kashyap Chamarthy wrote:

Hi folks,

Today I learnt about some obscure PCIe-related properties, in context of
the adding PCIe root ports to a guest, namely:

 io-reserve
 mem-reserve
 bus-reserve
 pref32-reserve
 pref64-reserve

Unfortunately, the commit[*] that added them provided no documentation
whatsover.

In my scenario, I was specifically wondering about what does
"io-reserve" mean, in what context to use it, etc.  (But documentation
about other properties is also welcome.)

Anyone more well-versed in this area care to shed some light?


[*] 6755e618d0 (hw/pci: add PCI resource reserve capability to legacy
 PCI bridge, 2018-08-21)

So normally bios would reserve just enough io space to satisfy all
devices behind a bridge. What if you intend to hotplug more devices?
These properties allow you to ask bios to reserve extra space.

Is it fair to say that setting io-reserve=0 for a pcie-root-port
would be a way to implement the requirements set forth in

   https://bugzilla.redhat.com/show_bug.cgi?id=1408810

? I tested this on aarch64 and it seems to work as expected, but
then again without documentation it's hard to tell.

More specifically, I created an aarch64/virt guest with several
pcie-root-ports and it couldn't boot much further than GRUB when
the number of ports exceeded 24, but as soon as I added the
io-reserve=0 option I could get the same guest to boot fine with
32 or even 64 pcie-root-ports. I'm attaching the boot log for
reference: there are a bunch of messages about the topic but they
would appear to be benign.

Hotplug seemed to work too: I tried with a single virtio-net-pci
and I could access the network. My understanding is that PCIe
devices are required to work without IO space, so this behavior
matches my expectations.

I wonder, though, what would happen if I had something like

   -device pcie-root-port,io-reserve=0,id=pci.1
   -device pcie-pci-bridge,bus=pci.1

Would I be able to hotplug conventional PCI devices into the
pcie-pci-bridge, or would the lack of IO space reservation for
the pcie-root-port cause issues with that?



You would not have any IO space for a PCI device or PCIe device
that for some reason will require IO space (even if they shouldn't)
and the hotplug operation would fail.

On the other hand, if the pcie-pci-bridge device itself will require
some IO space, it will work.. it worth trying.

Thanks,
Marcel








Re: [Qemu-devel] PCI(e): Documentation "io-reserve" and related properties?

2019-06-11 Thread Marcel Apfelbaum




On 6/11/19 10:21 AM, Kashyap Chamarthy wrote:

On Thu, Jun 06, 2019 at 02:20:18PM -0400, Michael S. Tsirkin wrote:

On Thu, Jun 06, 2019 at 06:19:43PM +0200, Kashyap Chamarthy wrote:

Hi folks,

Today I learnt about some obscure PCIe-related properties, in context of
the adding PCIe root ports to a guest, namely:

 io-reserve
 mem-reserve
 bus-reserve
 pref32-reserve
 pref64-reserve

Unfortunately, the commit[*] that added them provided no documentation
whatsover.

In my scenario, I was specifically wondering about what does
"io-reserve" mean, in what context to use it, etc.  (But documentation
about other properties is also welcome.)

Anyone more well-versed in this area care to shed some light?


[*] 6755e618d0 (hw/pci: add PCI resource reserve capability to legacy
 PCI bridge, 2018-08-21)

So normally bios would reserve just enough io space to satisfy all
devices behind a bridge. What if you intend to hotplug more devices?
These properties allow you to ask bios to reserve extra space.

Thanks.  Would be useful to have them documented in the official QEMU
command-line documentation.  Otherwise, they will remain as arcane
properties that barely anyone knows about.



There is some documentation under qemu/docs/pcie_pci_bridge.txt.
I agree there is always room for QEMU cmd-line improvement.

Thanks,
Marcel







[Qemu-devel] [PATCH 3/3] block/nbd: merge NBDClientSession struct back to BDRVNBDState

2019-06-11 Thread Vladimir Sementsov-Ogievskiy
No reason to keep it separate, it differs from others block driver
behavior and therefor confuses. Instead of generic
  'state = (State*)bs->opaque' we have to use special helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 197 +---
 1 file changed, 94 insertions(+), 103 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 1f00be2d66..81edabbf35 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -53,7 +53,7 @@ typedef struct {
 bool receiving; /* waiting for connection_co? */
 } NBDClientRequest;
 
-typedef struct NBDClientSession {
+typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
 QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
 NBDExportInfo info;
@@ -67,24 +67,13 @@ typedef struct NBDClientSession {
 NBDReply reply;
 BlockDriverState *bs;
 bool quit;
-} NBDClientSession;
-
-typedef struct BDRVNBDState {
-NBDClientSession client;
 
 /* For nbd_refresh_filename() */
 SocketAddress *saddr;
 char *export, *tlscredsid;
 } BDRVNBDState;
 
-static NBDClientSession *nbd_get_client_session(BlockDriverState *bs)
-{
-BDRVNBDState *s = bs->opaque;
-return &s->client;
-}
-
-
-static void nbd_recv_coroutines_wake_all(NBDClientSession *s)
+static void nbd_recv_coroutines_wake_all(BDRVNBDState *s)
 {
 int i;
 
@@ -99,14 +88,15 @@ static void nbd_recv_coroutines_wake_all(NBDClientSession 
*s)
 
 static void nbd_client_detach_aio_context(BlockDriverState *bs)
 {
-NBDClientSession *client = nbd_get_client_session(bs);
-qio_channel_detach_aio_context(QIO_CHANNEL(client->ioc));
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
 }
 
 static void nbd_client_attach_aio_context_bh(void *opaque)
 {
 BlockDriverState *bs = opaque;
-NBDClientSession *client = nbd_get_client_session(bs);
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
 /*
  * The node is still drained, so we know the coroutine has yielded in
@@ -114,15 +104,16 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
  * entered for the first time. Both places are safe for entering the
  * coroutine.
  */
-qemu_aio_coroutine_enter(bs->aio_context, client->connection_co);
+qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
 bdrv_dec_in_flight(bs);
 }
 
 static void nbd_client_attach_aio_context(BlockDriverState *bs,
   AioContext *new_context)
 {
-NBDClientSession *client = nbd_get_client_session(bs);
-qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
 
 bdrv_inc_in_flight(bs);
 
@@ -136,26 +127,26 @@ static void 
nbd_client_attach_aio_context(BlockDriverState *bs,
 
 static void nbd_teardown_connection(BlockDriverState *bs)
 {
-NBDClientSession *client = nbd_get_client_session(bs);
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-assert(client->ioc);
+assert(s->ioc);
 
 /* finish any pending coroutines */
-qio_channel_shutdown(client->ioc,
+qio_channel_shutdown(s->ioc,
  QIO_CHANNEL_SHUTDOWN_BOTH,
  NULL);
-BDRV_POLL_WHILE(bs, client->connection_co);
+BDRV_POLL_WHILE(bs, s->connection_co);
 
 nbd_client_detach_aio_context(bs);
-object_unref(OBJECT(client->sioc));
-client->sioc = NULL;
-object_unref(OBJECT(client->ioc));
-client->ioc = NULL;
+object_unref(OBJECT(s->sioc));
+s->sioc = NULL;
+object_unref(OBJECT(s->ioc));
+s->ioc = NULL;
 }
 
 static coroutine_fn void nbd_connection_entry(void *opaque)
 {
-NBDClientSession *s = opaque;
+BDRVNBDState *s = opaque;
 uint64_t i;
 int ret = 0;
 Error *local_err = NULL;
@@ -223,7 +214,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
NBDRequest *request,
QEMUIOVector *qiov)
 {
-NBDClientSession *s = nbd_get_client_session(bs);
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 int rc, i;
 
 qemu_co_mutex_lock(&s->send_mutex);
@@ -298,7 +289,7 @@ static inline uint64_t payload_advance64(uint8_t **payload)
 return ldq_be_p(*payload - 8);
 }
 
-static int nbd_parse_offset_hole_payload(NBDClientSession *client,
+static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  NBDStructuredReplyChunk *chunk,
  uint8_t *payload, uint64_t 
orig_offset,
  QEMUIOVector *qiov, Error **errp)
@@ -321,8 +312,8 @@ static int nbd_parse_offset_hole_payload(NBDClientSession 
*client,
  " region");
 return -EINVAL;
 }
-if (client->info.min_block &&
-   

[Qemu-devel] [PATCH 1/3] block/nbd-client: drop stale logout

2019-06-11 Thread Vladimir Sementsov-Ogievskiy
Drop one on failure path (we have errp) and turn two others into trace
points.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.h | 9 -
 block/nbd-client.c | 6 +++---
 block/trace-events | 2 ++
 3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 09e03013d2..231dc13c48 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -6,15 +6,6 @@
 #include "block/block_int.h"
 #include "io/channel-socket.h"
 
-/* #define DEBUG_NBD */
-
-#if defined(DEBUG_NBD)
-#define logout(fmt, ...) \
-fprintf(stderr, "nbd\t%-24s" fmt, __func__, ##__VA_ARGS__)
-#else
-#define logout(fmt, ...) ((void)0)
-#endif
-
 #define MAX_NBD_REQUESTS16
 
 typedef struct {
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 790ecc1ee1..f89a67c23b 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -1136,7 +1136,7 @@ static int nbd_client_connect(BlockDriverState *bs,
 }
 
 /* NBD handshake */
-logout("session init %s\n", export);
+trace_nbd_client_connect(export);
 qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
 
 client->info.request_sizes = true;
@@ -1149,7 +1149,6 @@ static int nbd_client_connect(BlockDriverState *bs,
 g_free(client->info.x_dirty_bitmap);
 g_free(client->info.name);
 if (ret < 0) {
-logout("Failed to negotiate with the NBD server\n");
 object_unref(OBJECT(sioc));
 return ret;
 }
@@ -1187,7 +1186,8 @@ static int nbd_client_connect(BlockDriverState *bs,
 bdrv_inc_in_flight(bs);
 nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
-logout("Established connection with NBD server\n");
+trace_nbd_client_connect_success(export);
+
 return 0;
 
  fail:
diff --git a/block/trace-events b/block/trace-events
index eab51497fc..01fa5eb081 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -165,6 +165,8 @@ nbd_parse_blockstatus_compliance(const char *err) "ignoring 
extra data from non-
 nbd_structured_read_compliance(const char *type) "server sent non-compliant 
unaligned read %s chunk"
 nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
 nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t 
flags, uint16_t type, const char *name, int ret, const char *err) "Request 
failed { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags 
= 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s"
+nbd_client_connect(const char *export_name) "export '%s'"
+nbd_client_connect_success(const char *export_name) "export '%s'"
 
 # ssh.c
 ssh_restart_coroutine(void *co) "co=%p"
-- 
2.18.0




[Qemu-devel] [PATCH 0/3] nbd: merge block/nbd.c and block/nbd-client.c

2019-06-11 Thread Vladimir Sementsov-Ogievskiy
Hi all!

I need some fields of BDRVNBDState to be available in nbd-client.c
code for my nbd-reconnect series. This leads to the following idea:
It seems that there is no actual benefits in splitting
NBDClientSession out of BDRVNBDState and nbd-client.c out of nbd.c
It only increases confusion around nbd client architecture, and make
it different from other formats. So, I propose to merge them back.

The only thing I doubt in:
NBD client block driver is called "nbd", so, seems logical to keep
BDRVNBDState structure name and block/nbd.c filename. But I can't
rename all nbd_client_* handlers to nbd_*, as they start conflicting
with definitions in include/block/nbd.h (nbd_init for example). So,
for now I've kept old names, so, some handlers are nbd_* and some
nbd_client_*, which is definitely inconsistent.. So, maybe they
all should become block_nbd_ or bdrv_nbd_ or nbddrv_ or something
like this to stress that it is BlockDriver implementation, not
nbd/client.c code (separated from generic block layer). Or keep them
as is, a bit inconsistent. What do you think?

Vladimir Sementsov-Ogievskiy (3):
  block/nbd-client: drop stale logout
  block/nbd: merge nbd-client.* to nbd.c
  block/nbd: merge NBDClientSession struct back to BDRVNBDState

 block/nbd-client.h  |   72 ---
 block/nbd-client.c  | 1226 -
 block/nbd.c | 1282 +--
 block/Makefile.objs |2 +-
 block/trace-events  |4 +-
 5 files changed, 1251 insertions(+), 1335 deletions(-)
 delete mode 100644 block/nbd-client.h
 delete mode 100644 block/nbd-client.c

-- 
2.18.0




[Qemu-devel] [PATCH 2/3] block/nbd: merge nbd-client.* to nbd.c

2019-06-11 Thread Vladimir Sementsov-Ogievskiy
No reason of keeping driver handlers realization in separate of driver
structure. We can get rid of extra header file.

While being here, fix comments style, restore forgotten comments for
NBD_FOREACH_REPLY_CHUNK and nbd_reply_chunk_iter_receive, remove extra
includes.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.h  |   63 ---
 block/nbd-client.c  | 1226 -
 block/nbd.c | 1285 +--
 block/Makefile.objs |2 +-
 block/trace-events  |2 +-
 5 files changed, 1255 insertions(+), 1323 deletions(-)
 delete mode 100644 block/nbd-client.h
 delete mode 100644 block/nbd-client.c

diff --git a/block/nbd-client.h b/block/nbd-client.h
deleted file mode 100644
index 231dc13c48..00
--- a/block/nbd-client.h
+++ /dev/null
@@ -1,63 +0,0 @@
-#ifndef NBD_CLIENT_H
-#define NBD_CLIENT_H
-
-#include "qemu-common.h"
-#include "block/nbd.h"
-#include "block/block_int.h"
-#include "io/channel-socket.h"
-
-#define MAX_NBD_REQUESTS16
-
-typedef struct {
-Coroutine *coroutine;
-uint64_t offset;/* original offset of the request */
-bool receiving; /* waiting for connection_co? */
-} NBDClientRequest;
-
-typedef struct NBDClientSession {
-QIOChannelSocket *sioc; /* The master data channel */
-QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
-NBDExportInfo info;
-
-CoMutex send_mutex;
-CoQueue free_sema;
-Coroutine *connection_co;
-int in_flight;
-
-NBDClientRequest requests[MAX_NBD_REQUESTS];
-NBDReply reply;
-BlockDriverState *bs;
-bool quit;
-} NBDClientSession;
-
-NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
-
-int nbd_client_init(BlockDriverState *bs,
-SocketAddress *saddr,
-const char *export_name,
-QCryptoTLSCreds *tlscreds,
-const char *hostname,
-const char *x_dirty_bitmap,
-Error **errp);
-void nbd_client_close(BlockDriverState *bs);
-
-int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes);
-int nbd_client_co_flush(BlockDriverState *bs);
-int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, QEMUIOVector *qiov, int flags);
-int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
-int bytes, BdrvRequestFlags flags);
-int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
- uint64_t bytes, QEMUIOVector *qiov, int flags);
-
-void nbd_client_detach_aio_context(BlockDriverState *bs);
-void nbd_client_attach_aio_context(BlockDriverState *bs,
-   AioContext *new_context);
-
-int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
-bool want_zero,
-int64_t offset, int64_t bytes,
-int64_t *pnum, int64_t *map,
-BlockDriverState **file);
-
-#endif /* NBD_CLIENT_H */
diff --git a/block/nbd-client.c b/block/nbd-client.c
deleted file mode 100644
index f89a67c23b..00
--- a/block/nbd-client.c
+++ /dev/null
@@ -1,1226 +0,0 @@
-/*
- * QEMU Block driver for  NBD
- *
- * Copyright (C) 2016 Red Hat, Inc.
- * Copyright (C) 2008 Bull S.A.S.
- * Author: Laurent Vivier 
- *
- * Some parts:
- *Copyright (C) 2007 Anthony Liguori 
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "qemu/osdep.h"
-
-#include "trace.h"
-#include "qapi/error.h"
-#include "nbd-client.h"
-
-#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
-#define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
-
-static void nbd_recv_coroutines_wake_all(NBDClientSession *s)

Re: [Qemu-devel] [PATCH v6 5/7] qemu-coroutine-sleep: introduce qemu_co_sleep_wake

2019-06-11 Thread Vladimir Sementsov-Ogievskiy
11.06.2019 11:53, Kevin Wolf wrote:
> Am 07.06.2019 um 19:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 07.06.2019 18:52, Vladimir Sementsov-Ogievskiy wrote:
>>> 07.06.2019 16:02, Kevin Wolf wrote:
 Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.06.2019 10:57, Kevin Wolf wrote:
>> Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Introduce a function to gracefully wake-up a coroutine, sleeping in
>>> qemu_co_sleep_ns() sleep.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>
>> You can simply reenter the coroutine while it has yielded in
>> qemu_co_sleep_ns(). This is supported.
>
> No it doesn't. qemu_aio_coroutine_enter checks for scheduled field,
> and aborts if it is set.

 Ah, yes, it has been broken since commit

 I actually tried to fix it once, but it turned out more complicated and
 I think we found a different solution for the problem at hand:

   Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures
   Message-Id: <20171128154350.21504-1-kw...@redhat.com>

 In this case, I guess your approach with a new function to interrupt
 qemu_co_sleep_ns() is okay.

 Do we need to timer_del() when taking the shortcut? We don't necessarily
 reenter the coroutine immediately, but might only be scheduling it. In
 this case, the timer could fire before qemu_co_sleep_ns() has run and
 schedule the coroutine a second time
>>>
>>> No it will not, as we do cmpxchg, scheduled to NULL, so second call will do
>>> nothing..
>>>
>>> But it seems unsafe, as even coroutine pointer may be stale when we call
>>> qemu_co_sleep_wake second time. So, we possibly should remove timer, but ..
>>>
>>>    (ignoring co->scheduled again -
 maybe we should actually not do that in the timer callback path, but
 instead let it run into the assertion because it would be a bug for the
 timer callback to end up in this situation).

 Kevin

>>>
>>> Interesting, could there be a race condition, when we call 
>>> qemu_co_sleep_wake,
>>> but co_sleep_cb already scheduled in some queue and will run soon? Then 
>>> removing
>>> the timer will not help.
>>>
>>>
>>
>> Hmm, it's commented that timer_del is thread-safe..
>>
>> Hmm, so, if anyway want to return Timer pointer from qemu_co_sleep_ns, may 
>> be it's better
>> to just call timer_mod(ts, 0) to shorten waiting instead of cheating with 
>> .scheduled?
> 
> This is probably slower than timer_del() and directly entering the
> coroutine. Is there any advantage in using timer_mod()? I don't think
> messing with .scheduled is too bad as it's set in the function just
> below, so it pairs nicely enough.
> 

Ok, will try this variant too.


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH] hax: Honor CPUState::halted

2019-06-11 Thread Paolo Bonzini
On 11/06/19 10:38, Philippe Mathieu-Daudé wrote:
> Cc'ing Paolo & Richard.
> 
> On 6/10/19 4:27 AM, Colin Xu wrote:
>> cc more.
>>
>> On 2019-06-10 10:19, Colin Xu wrote:
>>> QEMU tracks whether a vcpu is halted using CPUState::halted. E.g.,
>>> after initialization or reset, halted is 0 for the BSP (vcpu 0)
>>> and 1 for the APs (vcpu 1, 2, ...). A halted vcpu should not be
>>> handed to the hypervisor to run (e.g. hax_vcpu_run()).
>>>
>>> Under HAXM, Android Emulator sometimes boots into a "vcpu shutdown
>>> request" error while executing in SeaBIOS, with the HAXM driver
>>> logging a guest triple fault in vcpu 1, 2, ... at RIP 0x3. That is
>>> ultimately because the HAX accelerator asks HAXM to run those APs
>>> when they are still in the halted state.
>>>
>>> Normally, the vcpu thread for an AP will start by looping in
>>> qemu_wait_io_event(), until the BSP kicks it via a pair of IPIs
>>> (INIT followed by SIPI). But because the HAX accelerator does not
>>> honor cpu->halted, it allows the AP vcpu thread to proceed to
>>> hax_vcpu_run() as soon as it receives any kick, even if the kick
>>> does not come from the BSP. It turns out that emulator has a
>>> worker thread which periodically kicks every vcpu thread (possibly
>>> to collect CPU usage data), and if one of these kicks comes before
>>> those by the BSP, the AP will start execution from the wrong RIP,
>>> resulting in the aforementioned SMP boot failure.
>>>
>>> The solution is inspired by the KVM accelerator (credit to
>>> Chuanxiao Dong  for the pointer):
>>>
>>> 1. Get rid of questionable logic that unconditionally resets
>>>     cpu->halted before hax_vcpu_run(). Instead, only reset it at the
>>>     right moments (there are only a few "unhalt" events).
>>> 2. Add a check for cpu->halted before hax_vcpu_run().
>>>
>>> Note that although the non-Unrestricted Guest (!ug_platform) code
>>> path also forcibly resets cpu->halted, it is left untouched,
>>> because only the UG code path supports SMP guests.
>>>
>>> The patch is first merged to android emulator with Change-Id:
>>> I9c5752cc737fd305d7eace1768ea12a07309d716
>>>
>>> Cc: Yu Ning 
>>> Cc: Chuanxiao Dong 
>>> Signed-off-by: Colin Xu 
>>> ---
>>>   cpus.c    |  1 -
>>>   target/i386/hax-all.c | 36 ++--
>>>   2 files changed, 34 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index ffc57119ca5e..c1a56cd9ab01 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -1591,7 +1591,6 @@ static void *qemu_hax_cpu_thread_fn(void *arg)
>>>     cpu->thread_id = qemu_get_thread_id();
>>>   cpu->created = true;
>>> -    cpu->halted = 0;
>>>   current_cpu = cpu;
>>>     hax_init_vcpu(cpu);
>>> diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
>>> index 44b89c1d74ae..58a27b475ec8 100644
>>> --- a/target/i386/hax-all.c
>>> +++ b/target/i386/hax-all.c
>>> @@ -471,13 +471,35 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>>   return 0;
>>>   }
>>>   -    cpu->halted = 0;
>>> -
>>>   if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
>>>   cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
>>>   apic_poll_irq(x86_cpu->apic_state);
>>>   }
>>>   +    /* After a vcpu is halted (either because it is an AP and has
>>> just been
>>> + * reset, or because it has executed the HLT instruction), it
>>> will not be
>>> + * run (hax_vcpu_run()) until it is unhalted. The next few if
>>> blocks check
>>> + * for events that may change the halted state of this vcpu:
>>> + *  a) Maskable interrupt, when RFLAGS.IF is 1;
>>> + * Note: env->eflags may not reflect the current RFLAGS
>>> state, because
>>> + *   it is not updated after each hax_vcpu_run(). We
>>> cannot afford
>>> + *   to fail to recognize any
>>> unhalt-by-maskable-interrupt event
>>> + *   (in which case the vcpu will halt forever), and yet
>>> we cannot
>>> + *   afford the overhead of hax_vcpu_sync_state(). The
>>> current
>>> + *   solution is to err on the side of caution and have
>>> the HLT
>>> + *   handler (see case HAX_EXIT_HLT below)
>>> unconditionally set the
>>> + *   IF_MASK bit in env->eflags, which, in effect,
>>> disables the
>>> + *   RFLAGS.IF check.
>>> + *  b) NMI;
>>> + *  c) INIT signal;
>>> + *  d) SIPI signal.
>>> + */
>>> +    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
>>> + (env->eflags & IF_MASK)) ||
>>> +    (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
>>> +    cpu->halted = 0;
>>> +    }
>>> +
>>>   if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
>>>   DPRINTF("\nhax_vcpu_hax_exec: handling INIT for %d\n",
>>>   cpu->cpu_index);
>>> @@ -493,6 +515,16 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>>   hax_vcpu_sync_state(env, 1);
>>>   }
>>>   +    if (cpu->halted) {
>>> +    /* If this vcpu is 

Re: [Qemu-devel] qgraph

2019-06-11 Thread Paolo Bonzini
On 11/06/19 10:56, Markus Armbruster wrote:
> Yes, this is how introspection (both QMP and QOM) is commonly used.
> Just keep in mind one difference: QMP is static, QOM is dynamic.
> 
> QMP being static means it's defined at compile time.  So is the value of
> query-qmp-schema.  Same QEMU build, same value.  This permits caching.
> 
> QOM being dynamic means to introspect an object's properties, you have
> to create it.  Worse, an object's properties may (in theory) change at
> any time.  *Properties*, not just property *values*.  In practice, I'd
> expect properties to change only at realize time.

Right, and we should move more towards class-based properties so that
the dynamic nature of QOM is only used for the bare minimum needed (e.g.
memory regions).

Paolo



Re: [Qemu-devel] [PATCH v3 1/4] net/announce: Allow optional list of interfaces

2019-06-11 Thread Dr. David Alan Gilbert
* Eric Blake (ebl...@redhat.com) wrote:
> On 6/10/19 1:43 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Allow the caller to restrict the set of interfaces that announces are
> > sent on.  The default is still to send on all interfaces.
> > 
> > e.g.
> > 
> >   { "execute": "announce-self", "arguments": { "initial": 50, "max": 550, 
> > "rounds": 5, "step": 50, "interfaces": ["vn2","vn1"] } }
> > 
> > This doesn't affect the behaviour of migraiton announcments.
> > 
> > Note: There's still only one timer for the qmp command, so that
> > performing an 'announce-self' on one list of interfaces followed
> > by another 'announce-self' on another list will stop the announces
> > on the existing set.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> 
> > +++ b/qapi/net.json
> > @@ -699,6 +699,9 @@
> >  #
> >  # @step: Delay increase (in ms) after each self-announcement attempt
> >  #
> > +# @interfaces: An optional list of interface names, which restrict the
> 
> restricts

Done

> > +#announcment to the listed interfaces. (Since 4.1)
> 
> announcement

Done

> > +#
> >  # Since: 4.0
> >  ##
> >  
> > @@ -706,7 +709,8 @@
> >'data': { 'initial': 'int',
> >  'max': 'int',
> >  'rounds': 'int',
> > -'step': 'int' } }
> > +'step': 'int',
> > +'*interfaces': ['str'] } }
> >  
> >  ##
> >  # @announce-self:
> > @@ -718,9 +722,10 @@
> >  #
> >  # Example:
> >  #
> > -# -> { "execute": "announce-self"
> > +# -> { "execute": "announce-self",
> 
> Embarrassing that we didn't notice that one earlier.

The way to avoid it I guess would be to parse the example code.

> >  #  "arguments": {
> > -#  "initial": 50, "max": 550, "rounds": 10, "step": 50 } }
> > +#  "initial": 50, "max": 550, "rounds": 10, "step": 50,
> > +#  "interfaces": ["vn2","vn3"] } }
> 
> Worth a space after the comma? Not required, but I think it looks nicer.

Added

> As I only focused on doc issues, I'll leave the full review to others.

Thanks,

Dave

> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
> 



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH v2 01/42] decodetree: Fix comparison of Field

2019-06-11 Thread Peter Maydell
From: Richard Henderson 

Typo comparing the sign of the field, twice, instead of also comparing
the mask of the field (which itself encodes both position and length).

Reported-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20190604154225.26992-1-richard.hender...@linaro.org
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 81874e22cc7..d7a59d63ac3 100755
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -184,7 +184,7 @@ class Field:
 return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
 
 def __eq__(self, other):
-return self.sign == other.sign and self.sign == other.sign
+return self.sign == other.sign and self.mask == other.mask
 
 def __ne__(self, other):
 return not self.__eq__(other)
-- 
2.20.1




[Qemu-devel] [PATCH v2 04/42] target/arm: Fix Cortex-R5F MVFR values

2019-06-11 Thread Peter Maydell
The Cortex-R5F initfn was not correctly setting up the MVFR
ID register values. Fill these in, since some subsequent patches
will use ID register checks rather than CPU feature bit checks.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index f70e07fd118..ac5adb81bf1 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1607,6 +1607,8 @@ static void cortex_r5f_initfn(Object *obj)
 
 cortex_r5_initfn(obj);
 set_feature(&cpu->env, ARM_FEATURE_VFP3);
+cpu->isar.mvfr0 = 0x10110221;
+cpu->isar.mvfr1 = 0x0011;
 }
 
 static const ARMCPRegInfo cortexa8_cp_reginfo[] = {
-- 
2.20.1




[Qemu-devel] [PATCH v2 00/42] target/arm: Convert VFP decoder to decodetree

2019-06-11 Thread Peter Maydell
This patchset converts the Arm VFP instructions to use decodetree
instead of the current hand-written decode.

v2 has only very minor changes since v1:
 * patch 33 (VFP comparisons): added missing TCG frees
 * patch 39 (VJCVT): add back missing jscvt feature check

Patch 39 is the only one still in need of review.



Rest of the cover letter from v1 below, for further context:

We gain:
 * a more maintainable decoder which doesn't live in one big function
 * correct prioritization of UNDEF exceptions against "VFP disabled"
   exceptions and "M-profile lazy FP stacking" activity
 * significant reduction in the use of the "cpu_F0[sd]" and "cpu_F1[sd]"
   TCG globals. These are a relic of a much older translator and
   eventually we should try to get rid of them entirely
 * more accurate decode, UNDEFing some things we were incorrectly lax on
 * a fixed bug for VFP short-vector mixed vector/scalar VMLA/VMLS/VNMLA/VNMLS
   insns: we were incorrectly corrupting the scalar input operand
   in the process of performing the multiply-accumulate, so every
   element after the first was miscalculated
 * a fixed bug in the calculation of the next register number to use
   when VFP short-vector operations wrapped around the vector bank
 * decode which checks ID registers for "do we have D16-D31" rather
   than using "is this VFPv3" -- this means that Cortex-M4, -M33 and -R5F
   all now correctly give the guest only 16 Dregs rather than 31.
   (Note that the old decoder hides this UNDEF handling inside the
   VFP_DREG macros...)
 * the fused multiply-add insns now correctly UNDEF for attempts to
   use them as short-vector operations
 * short-vector functionality is only implemented if the ID registers
   say it should be (which in practice means "only Cortex-A8 or earlier");
   we continue to provide it in -cpu max for compatibility
 * VRINTR, VRINTZ and VRINTX are only provided in v8A and above
 * VFP related translation code split out into its own source file
 * the "is this special register present and accessible" check is
   now consistent between read and write

There is definitely scope for further cleanup:
 * the translate-vfp.inc.c could be further isolated into its
   own standalone .c file rather than being #included into translate.c
 * cpu_F0* are still used in parts of the Neon decode (and the
   iwmmxt code, alas)
 * I noticed some places doing a load-and-shift or load-modify-store
   sequence to update byte or halfword parts of float registers;
   these could be rewritten to do direct byte or halfword loads/stores
 * we could remove the remaining uses of tcg_gen_ld/st_f32()
   (in the Neon decode)
but at 42 patches this is already a pretty hefty patchset, so
I have deferred those to attack later once this has got in.

On the downside, there are more lines of code here, but some of
them we'll get back when we finish some of the cleanups noted
above, some are just copyright-and-license boilerplate, and I
think the rest are well invested in easier to modify code...

Patch 1 is Richard's recent decodetree script bugfix, which
is needed for the VFP decode to behave correctly.

Tested with RISU, a mixture of comparison against real Cortex-A7
and Cortex-A8 and against the old version of QEMU, plus some
smoke-testing of aarch32 system emulation.

thanks
-- PMM


Peter Maydell (41):
  target/arm: Add stubs for AArch32 VFP decodetree
  target/arm: Factor out VFP access checking code
  target/arm: Fix Cortex-R5F MVFR values
  target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
  target/arm: Convert the VSEL instructions to decodetree
  target/arm: Convert VMINNM, VMAXNM to decodetree
  target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
  target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
  target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
  target/arm: Add helpers for VFP register loads and stores
  target/arm: Convert "double-precision" register moves to decodetree
  target/arm: Convert "single-precision" register moves to decodetree
  target/arm: Convert VFP two-register transfer insns to decodetree
  target/arm: Convert VFP VLDR and VSTR to decodetree
  target/arm: Convert the VFP load/store multiple insns to decodetree
  target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
  target/arm: Convert VFP VMLA to decodetree
  target/arm: Convert VFP VMLS to decodetree
  target/arm: Convert VFP VNMLS to decodetree
  target/arm: Convert VFP VNMLA to decodetree
  target/arm: Convert VMUL to decodetree
  target/arm: Convert VNMUL to decodetree
  target/arm: Convert VADD to decodetree
  target/arm: Convert VSUB to decodetree
  target/arm: Convert VDIV to decodetree
  target/arm: Convert VFP fused multiply-add insns to decodetree
  target/arm: Convert VMOV (imm) to decodetree
  target/arm: Convert VABS to decodetree
  target/arm: Convert VNEG to decodetree
  target/arm: Convert VSQRT to decodetree
  target/arm: Convert VMOV (register) to decodetree
  target/arm:

[Qemu-devel] [PATCH v2 03/42] target/arm: Factor out VFP access checking code

2019-06-11 Thread Peter Maydell
Factor out the VFP access checking code so that we can use it in the
leaf functions of the decodetree decoder.

We call the function full_vfp_access_check() so we can keep
the more natural vfp_access_check() for a version which doesn't
have the 'ignore_vfp_enabled' flag -- that way almost all VFP
insns will be able to use vfp_access_check(s) and only the
special-register access function will have to use
full_vfp_access_check(s, ignore_vfp_enabled).

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 100 
 target/arm/translate.c | 101 +
 2 files changed, 113 insertions(+), 88 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 3447b3e6466..cf3d7febaa7 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -29,3 +29,103 @@
 /* Include the generated VFP decoder */
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
+
+/*
+ * Check that VFP access is enabled. If it is, do the necessary
+ * M-profile lazy-FP handling and then return true.
+ * If not, emit code to generate an appropriate exception and
+ * return false.
+ * The ignore_vfp_enabled argument specifies that we should ignore
+ * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
+ * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other 
insns.
+ */
+static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+{
+if (s->fp_excp_el) {
+if (arm_dc_feature(s, ARM_FEATURE_M)) {
+gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
+   s->fp_excp_el);
+} else {
+gen_exception_insn(s, 4, EXCP_UDEF,
+   syn_fp_access_trap(1, 0xe, false),
+   s->fp_excp_el);
+}
+return false;
+}
+
+if (!s->vfp_enabled && !ignore_vfp_enabled) {
+assert(!arm_dc_feature(s, ARM_FEATURE_M));
+gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
+   default_exception_el(s));
+return false;
+}
+
+if (arm_dc_feature(s, ARM_FEATURE_M)) {
+/* Handle M-profile lazy FP state mechanics */
+
+/* Trigger lazy-state preservation if necessary */
+if (s->v7m_lspact) {
+/*
+ * Lazy state saving affects external memory and also the NVIC,
+ * so we must mark it as an IO operation for icount.
+ */
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_v7m_preserve_fp_state(cpu_env);
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_end();
+}
+/*
+ * If the preserve_fp_state helper doesn't throw an exception
+ * then it will clear LSPACT; we don't need to repeat this for
+ * any further FP insns in this TB.
+ */
+s->v7m_lspact = false;
+}
+
+/* Update ownership of FP context: set FPCCR.S to match current state 
*/
+if (s->v8m_fpccr_s_wrong) {
+TCGv_i32 tmp;
+
+tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
+if (s->v8m_secure) {
+tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
+} else {
+tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
+}
+store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
+/* Don't need to do this for any further FP insns in this TB */
+s->v8m_fpccr_s_wrong = false;
+}
+
+if (s->v7m_new_fp_ctxt_needed) {
+/*
+ * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
+ * and the FPSCR.
+ */
+TCGv_i32 control, fpscr;
+uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
+
+fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
+gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+tcg_temp_free_i32(fpscr);
+/*
+ * We don't need to arrange to end the TB, because the only
+ * parts of FPSCR which we cache in the TB flags are the VECLEN
+ * and VECSTRIDE, and those don't exist for M-profile.
+ */
+
+if (s->v8m_secure) {
+bits |= R_V7M_CONTROL_SFPA_MASK;
+}
+control = load_cpu_field(v7m.control[M_REG_S]);
+tcg_gen_ori_i32(control, control, bits);
+store_cpu_field(control, v7m.control[M_REG_S]);
+/* Don't need to do this for any further FP insns in this TB */
+s->v7m_new_fp_ctxt_needed = false;
+}
+}
+
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index c75d94952de..4ba3f1287ee 100644
--- a/target/arm/translate.c
+++ b/targ

[Qemu-devel] [PATCH v2 05/42] target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max

2019-06-11 Thread Peter Maydell
At the moment our -cpu max for AArch32 supports VFP short-vectors
because we always implement them, even for CPUs which should
not have them. The following commits are going to switch to
using the correct ID-register-check to enable or disable short
vector support, so we need to turn it on explicitly for -cpu max,
because Cortex-A15 doesn't implement it.

We don't enable this for the AArch64 -cpu max, because the v8A
architecture never supports short-vectors.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index ac5adb81bf1..cdd76c5 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2021,6 +2021,10 @@ static void arm_max_initfn(Object *obj)
 kvm_arm_set_cpu_features_from_host(cpu);
 } else {
 cortex_a15_initfn(obj);
+
+/* old-style VFP short-vector support */
+cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
+
 #ifdef CONFIG_USER_ONLY
 /* We don't set these in system emulation mode for the moment,
  * since we don't correctly set (all of) the ID registers to
-- 
2.20.1




[Qemu-devel] [PATCH v2 02/42] target/arm: Add stubs for AArch32 VFP decodetree

2019-06-11 Thread Peter Maydell
Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 VFP encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We need to have one decoder for the unconditional insns and one for
the conditional insns, as otherwise the patterns for conditional
insns would incorrectly match against the unconditional ones too.

Since translate.c is over 14,000 lines long and we're going to be
touching pretty much every line of the VFP code as part of the
decodetree conversion, we create a new translate-vfp.inc.c to hold
the code which deals with VFP in the new scheme.  It should be
possible to convert this into a standalone translation unit
eventually, but the conversion process will be much simpler if we
simply #include it midway through translate.c to start with.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/Makefile.objs   | 13 +
 target/arm/translate-vfp.inc.c | 31 +++
 target/arm/translate.c | 19 +++
 target/arm/vfp-uncond.decode   | 28 
 target/arm/vfp.decode  | 28 
 5 files changed, 119 insertions(+)
 create mode 100644 target/arm/translate-vfp.inc.c
 create mode 100644 target/arm/vfp-uncond.decode
 create mode 100644 target/arm/vfp.decode

diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index 6bdcc65c2c8..dfa736a3752 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -19,5 +19,18 @@ target/arm/decode-sve.inc.c: 
$(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
  "GEN", $(TARGET_DIR)$@)
 
+target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
+   $(call quiet-command,\
+ $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
+ "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-vfp-uncond.inc.c: $(SRC_PATH)/target/arm/vfp-uncond.decode 
$(DECODETREE)
+   $(call quiet-command,\
+ $(PYTHON) $(DECODETREE) --static-decode disas_vfp_uncond -o $@ $<,\
+ "GEN", $(TARGET_DIR)$@)
+
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+target/arm/translate.o: target/arm/decode-vfp.inc.c
+target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
+
 obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
new file mode 100644
index 000..3447b3e6466
--- /dev/null
+++ b/target/arm/translate-vfp.inc.c
@@ -0,0 +1,31 @@
+/*
+ *  ARM translation: AArch32 VFP instructions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *  Copyright (c) 2019 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+/*
+ * This file is intended to be included from translate.c; it uses
+ * some macros and definitions provided by that file.
+ * It might be possible to convert it to a standalone .c file eventually.
+ */
+
+/* Include the generated VFP decoder */
+#include "decode-vfp.inc.c"
+#include "decode-vfp-uncond.inc.c"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d25e19ef113..c75d94952de 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1727,6 +1727,9 @@ static inline void gen_mov_vreg_F0(int dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
+/* Include the VFP decoder */
+#include "translate-vfp.inc.c"
+
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
 tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
@@ -3384,6 +3387,22 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 }
 
+/*
+ * If the decodetree decoder handles this insn it will always
+ * emit code to either execute the insn or generate an appropriate
+ * exception; so we don't need to ever return non-zero to tell
+ * the calling code to emit an UNDEF exception.
+ */
+if (extract32(insn, 28, 4) == 0xf) {
+if (disas_vfp_uncond(s, insn)) {
+return 0;
+}
+} else {
+if (disas_vfp(s, insn)) {
+return 0;
+}
+}
+
 /* FIXME: this access check should not take precedence ov

[Qemu-devel] [PATCH v2 06/42] target/arm: Convert the VSEL instructions to decodetree

2019-06-11 Thread Peter Maydell
Convert the VSEL instructions to decodetree.
We leave trans_VSEL() in translate.c for now as this allows
the patch to show just the changes from the old handle_vsel().

In the old code the check for "do D16-D31 exist" was hidden in
the VFP_DREG macro, and assumed that VFPv3 always implied that
D16-D31 exist. In the new code we do the correct ID register test.
This gives identical behaviour for most of our CPUs, and fixes
previously incorrect handling for  Cortex-R5F, Cortex-M4 and
Cortex-M33, which all implement VFPv3 or better with only 16
double-precision registers.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.h   |  6 ++
 target/arm/translate-vfp.inc.c |  9 +
 target/arm/translate.c | 35 --
 target/arm/vfp-uncond.decode   | 19 ++
 4 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8fa9772c9da..c612901daeb 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3372,6 +3372,12 @@ static inline bool isar_feature_aa32_fp16_arith(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
 }
 
+static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
+{
+/* Return true if D16-D31 are implemented */
+return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
+}
+
 /*
  * We always set the FP and SIMD FP16 fields to indicate identical
  * levels of support (assuming SIMD is implemented at all), so
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index cf3d7febaa7..f7535138d0f 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -129,3 +129,12 @@ static bool full_vfp_access_check(DisasContext *s, bool 
ignore_vfp_enabled)
 
 return true;
 }
+
+/*
+ * The most usual kind of VFP access check, for everything except
+ * FMXR/FMRX to the always-available special registers.
+ */
+static bool vfp_access_check(DisasContext *s)
+{
+return full_vfp_access_check(s, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4ba3f1287ee..6ee60303eeb 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3076,10 +3076,27 @@ static void gen_neon_dup_high16(TCGv_i32 var)
 tcg_temp_free_i32(tmp);
 }
 
-static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
-   uint32_t dp)
+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 {
-uint32_t cc = extract32(insn, 20, 2);
+uint32_t rd, rn, rm;
+bool dp = a->dp;
+
+if (!dc_isar_feature(aa32_vsel, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+((a->vm | a->vn | a->vd) & 0x10)) {
+return false;
+}
+rd = a->vd;
+rn = a->vn;
+rm = a->vm;
+
+if (!vfp_access_check(s)) {
+return true;
+}
 
 if (dp) {
 TCGv_i64 frn, frm, dest;
@@ -3101,7 +3118,7 @@ static int handle_vsel(uint32_t insn, uint32_t rd, 
uint32_t rn, uint32_t rm,
 
 tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-switch (cc) {
+switch (a->cc) {
 case 0: /* eq: Z */
 tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
 frn, frm);
@@ -3148,7 +3165,7 @@ static int handle_vsel(uint32_t insn, uint32_t rd, 
uint32_t rn, uint32_t rm,
 dest = tcg_temp_new_i32();
 tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-switch (cc) {
+switch (a->cc) {
 case 0: /* eq: Z */
 tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
 frn, frm);
@@ -3182,7 +3199,7 @@ static int handle_vsel(uint32_t insn, uint32_t rd, 
uint32_t rn, uint32_t rm,
 tcg_temp_free_i32(zero);
 }
 
-return 0;
+return true;
 }
 
 static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
@@ -3354,10 +3371,8 @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t 
insn)
 rm = VFP_SREG_M(insn);
 }
 
-if ((insn & 0x0f800e50) == 0x0e000a00 && dc_isar_feature(aa32_vsel, s)) {
-return handle_vsel(insn, rd, rn, rm, dp);
-} else if ((insn & 0x0fb00e10) == 0x0e800a00 &&
-   dc_isar_feature(aa32_vminmaxnm, s)) {
+if ((insn & 0x0fb00e10) == 0x0e800a00 &&
+dc_isar_feature(aa32_vminmaxnm, s)) {
 return handle_vminmaxnm(insn, rd, rn, rm, dp);
 } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
dc_isar_feature(aa32_vrint, s)) {
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index b1d9dc507c2..b7f7c27fe86 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -26,3 +26,22 @@
 #   1110    101.  .

[Qemu-devel] [PATCH v2 19/42] target/arm: Convert VFP VMLS to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP VMLS instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 38 ++
 target/arm/translate.c |  8 +--
 target/arm/vfp.decode  |  5 +
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 4f922dc8405..00f64401dda 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1303,3 +1303,41 @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+/*
+ * VMLS: vd = vd + -(vn * vm)
+ * Note that order of inputs to the add matters for NaNs.
+ */
+TCGv_i32 tmp = tcg_temp_new_i32();
+
+gen_helper_vfp_muls(tmp, vn, vm, fpst);
+gen_helper_vfp_negs(tmp, tmp);
+gen_helper_vfp_adds(vd, vd, tmp, fpst);
+tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLS_sp(DisasContext *s, arg_VMLS_sp *a)
+{
+return do_vfp_3op_sp(s, gen_VMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+/*
+ * VMLS: vd = vd + -(vn * vm)
+ * Note that order of inputs to the add matters for NaNs.
+ */
+TCGv_i64 tmp = tcg_temp_new_i64();
+
+gen_helper_vfp_muld(tmp, vn, vm, fpst);
+gen_helper_vfp_negd(tmp, tmp);
+gen_helper_vfp_addd(vd, vd, tmp, fpst);
+tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
+{
+return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4e40a8562c4..bddc0d20447 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3134,7 +3134,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0:
+case 0 ... 1:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3320,12 +3320,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 1: /* VMLS: fd + -(fn * fm) */
-gen_vfp_mul(dp);
-gen_vfp_F1_neg(dp);
-gen_mov_F0_vreg(dp, rd);
-gen_vfp_add(dp);
-break;
 case 2: /* VNMLS: -fd + (fn * fm) */
 /* Note that it isn't valid to replace (-A + B) with (B - 
A)
  * or similar plausible looking simplifications
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 9530e17ae02..7bcf2260eec 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -102,3 +102,8 @@ VMLA_sp   1110 0.00   1010 .0.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLA_dp   1110 0.00   1011 .0.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMLS_sp   1110 0.00   1010 .1.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMLS_dp   1110 0.00   1011 .1.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 07/42] target/arm: Convert VMINNM, VMAXNM to decodetree

2019-06-11 Thread Peter Maydell
Convert the VMINNM and VMAXNM instructions to decodetree.
As with VSEL, we leave the trans_VMINMAXNM() function
in translate.c for the moment.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate.c   | 41 
 target/arm/vfp-uncond.decode |  5 +
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 6ee60303eeb..53badde1f52 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3202,11 +3202,31 @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 return true;
 }
 
-static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
-uint32_t rm, uint32_t dp)
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 {
-uint32_t vmin = extract32(insn, 6, 1);
-TCGv_ptr fpst = get_fpstatus_ptr(0);
+uint32_t rd, rn, rm;
+bool dp = a->dp;
+bool vmin = a->op;
+TCGv_ptr fpst;
+
+if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+((a->vm | a->vn | a->vd) & 0x10)) {
+return false;
+}
+rd = a->vd;
+rn = a->vn;
+rm = a->vm;
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(0);
 
 if (dp) {
 TCGv_i64 frn, frm, dest;
@@ -3247,7 +3267,7 @@ static int handle_vminmaxnm(uint32_t insn, uint32_t rd, 
uint32_t rn,
 }
 
 tcg_temp_free_ptr(fpst);
-return 0;
+return true;
 }
 
 static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -3359,23 +3379,18 @@ static const uint8_t fp_decode_rm[] = {
 
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
-uint32_t rd, rn, rm, dp = extract32(insn, 8, 1);
+uint32_t rd, rm, dp = extract32(insn, 8, 1);
 
 if (dp) {
 VFP_DREG_D(rd, insn);
-VFP_DREG_N(rn, insn);
 VFP_DREG_M(rm, insn);
 } else {
 rd = VFP_SREG_D(insn);
-rn = VFP_SREG_N(insn);
 rm = VFP_SREG_M(insn);
 }
 
-if ((insn & 0x0fb00e10) == 0x0e800a00 &&
-dc_isar_feature(aa32_vminmaxnm, s)) {
-return handle_vminmaxnm(insn, rd, rn, rm, dp);
-} else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-   dc_isar_feature(aa32_vrint, s)) {
+if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
+dc_isar_feature(aa32_vrint, s)) {
 /* VRINTA, VRINTN, VRINTP, VRINTM */
 int rounding = fp_decode_rm[extract32(insn, 16, 2)];
 return handle_vrint(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index b7f7c27fe86..8ab201fa058 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -45,3 +45,8 @@ VSEL 1110 0. cc:2   1010 .0.0  \
 vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VSEL 1110 0. cc:2   1011 .0.0  \
 vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VMINMAXNM    1110 1.00   1010 . op:1 .0  \
+vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
+VMINMAXNM    1110 1.00   1011 . op:1 .0  \
+vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
-- 
2.20.1




[Qemu-devel] [PATCH v2 08/42] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree

2019-06-11 Thread Peter Maydell
Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
Again, trans_VRINT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell Reviewed-by: Richard 
Henderson 
---
 target/arm/translate.c   | 60 +++-
 target/arm/vfp-uncond.decode |  5 +++
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 53badde1f52..1f106645bca 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3270,11 +3270,43 @@ static bool trans_VMINMAXNM(DisasContext *s, 
arg_VMINMAXNM *a)
 return true;
 }
 
-static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-int rounding)
+/*
+ * Table for converting the most common AArch32 encoding of
+ * rounding mode to arm_fprounding order (which matches the
+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
+ */
+static const uint8_t fp_decode_rm[] = {
+FPROUNDING_TIEAWAY,
+FPROUNDING_TIEEVEN,
+FPROUNDING_POSINF,
+FPROUNDING_NEGINF,
+};
+
+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 {
-TCGv_ptr fpst = get_fpstatus_ptr(0);
+uint32_t rd, rm;
+bool dp = a->dp;
+TCGv_ptr fpst;
 TCGv_i32 tcg_rmode;
+int rounding = fp_decode_rm[a->rm];
+
+if (!dc_isar_feature(aa32_vrint, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+((a->vm | a->vd) & 0x10)) {
+return false;
+}
+rd = a->vd;
+rm = a->vm;
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(0);
 
 tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
@@ -3305,7 +3337,7 @@ static int handle_vrint(uint32_t insn, uint32_t rd, 
uint32_t rm, uint32_t dp,
 tcg_temp_free_i32(tcg_rmode);
 
 tcg_temp_free_ptr(fpst);
-return 0;
+return true;
 }
 
 static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -3366,17 +3398,6 @@ static int handle_vcvt(uint32_t insn, uint32_t rd, 
uint32_t rm, uint32_t dp,
 return 0;
 }
 
-/* Table for converting the most common AArch32 encoding of
- * rounding mode to arm_fprounding order (which matches the
- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
- */
-static const uint8_t fp_decode_rm[] = {
-FPROUNDING_TIEAWAY,
-FPROUNDING_TIEEVEN,
-FPROUNDING_POSINF,
-FPROUNDING_NEGINF,
-};
-
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
 uint32_t rd, rm, dp = extract32(insn, 8, 1);
@@ -3389,13 +3410,8 @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t 
insn)
 rm = VFP_SREG_M(insn);
 }
 
-if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-dc_isar_feature(aa32_vrint, s)) {
-/* VRINTA, VRINTN, VRINTP, VRINTM */
-int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-return handle_vrint(insn, rd, rm, dp, rounding);
-} else if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-   dc_isar_feature(aa32_vcvt_dr, s)) {
+if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
+dc_isar_feature(aa32_vcvt_dr, s)) {
 /* VCVTA, VCVTN, VCVTP, VCVTM */
 int rounding = fp_decode_rm[extract32(insn, 16, 2)];
 return handle_vcvt(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index 8ab201fa058..0aa83285de2 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -50,3 +50,8 @@ VMINMAXNM    1110 1.00   1010 . op:1 .0  \
 vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VMINMAXNM    1110 1.00   1011 . op:1 .0  \
 vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VRINT    1110 1.11 10 rm:2  1010 01.0  \
+vm=%vm_sp vd=%vd_sp dp=0
+VRINT    1110 1.11 10 rm:2  1011 01.0  \
+vm=%vm_dp vd=%vd_dp dp=1
-- 
2.20.1




[Qemu-devel] [PATCH v2 09/42] target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree

2019-06-11 Thread Peter Maydell
Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
trans_VCVT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate.c   | 72 +---
 target/arm/vfp-uncond.decode |  6 +++
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1f106645bca..6da472dbca8 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3340,12 +3340,31 @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 return true;
 }
 
-static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-   int rounding)
+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 {
-bool is_signed = extract32(insn, 7, 1);
-TCGv_ptr fpst = get_fpstatus_ptr(0);
+uint32_t rd, rm;
+bool dp = a->dp;
+TCGv_ptr fpst;
 TCGv_i32 tcg_rmode, tcg_shift;
+int rounding = fp_decode_rm[a->rm];
+bool is_signed = a->op;
+
+if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+return false;
+}
+rd = a->vd;
+rm = a->vm;
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(0);
 
 tcg_shift = tcg_const_i32(0);
 
@@ -3355,10 +3374,6 @@ static int handle_vcvt(uint32_t insn, uint32_t rd, 
uint32_t rm, uint32_t dp,
 if (dp) {
 TCGv_i64 tcg_double, tcg_res;
 TCGv_i32 tcg_tmp;
-/* Rd is encoded as a single precision register even when the source
- * is double precision.
- */
-rd = ((rd << 1) & 0x1e) | ((rd >> 4) & 0x1);
 tcg_double = tcg_temp_new_i64();
 tcg_res = tcg_temp_new_i64();
 tcg_tmp = tcg_temp_new_i32();
@@ -3395,28 +3410,7 @@ static int handle_vcvt(uint32_t insn, uint32_t rd, 
uint32_t rm, uint32_t dp,
 
 tcg_temp_free_ptr(fpst);
 
-return 0;
-}
-
-static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
-{
-uint32_t rd, rm, dp = extract32(insn, 8, 1);
-
-if (dp) {
-VFP_DREG_D(rd, insn);
-VFP_DREG_M(rm, insn);
-} else {
-rd = VFP_SREG_D(insn);
-rm = VFP_SREG_M(insn);
-}
-
-if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-dc_isar_feature(aa32_vcvt_dr, s)) {
-/* VCVTA, VCVTN, VCVTP, VCVTM */
-int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-return handle_vcvt(insn, rd, rm, dp, rounding);
-}
-return 1;
+return true;
 }
 
 /*
@@ -3452,6 +3446,15 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 }
 }
 
+if (extract32(insn, 28, 4) == 0xf) {
+/*
+ * Encodings with T=1 (Thumb) or unconditional (ARM): these
+ * were all handled by the decodetree decoder, so any insn
+ * patterns which get here must be UNDEF.
+ */
+return 1;
+}
+
 /*
  * FIXME: this access check should not take precedence over UNDEF
  * for invalid encodings; we will generate incorrect syndrome information
@@ -3468,15 +3471,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 0;
 }
 
-if (extract32(insn, 28, 4) == 0xf) {
-/*
- * Encodings with T=1 (Thumb) or unconditional (ARM):
- * only used for the "miscellaneous VFP features" added in v8A
- * and v7M (and gated on the MVFR2.FPMisc field).
- */
-return disas_vfp_misc_insn(s, insn);
-}
-
 dp = ((insn & 0xf00) == 0xb00);
 switch ((insn >> 24) & 0xf) {
 case 0xe:
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index 0aa83285de2..5af1f2ee664 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -55,3 +55,9 @@ VRINT    1110 1.11 10 rm:2  1010 01.0  \
 vm=%vm_sp vd=%vd_sp dp=0
 VRINT    1110 1.11 10 rm:2  1011 01.0  \
 vm=%vm_dp vd=%vd_dp dp=1
+
+# VCVT float to int with specified rounding mode; Vd is always single-precision
+VCVT 1110 1.11 11 rm:2  1010 op:1 1.0  \
+vm=%vm_sp vd=%vd_sp dp=0
+VCVT 1110 1.11 11 rm:2  1011 op:1 1.0  \
+vm=%vm_dp vd=%vd_sp dp=1
-- 
2.20.1




[Qemu-devel] [PATCH v2 21/42] target/arm: Convert VFP VNMLA to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP VNMLA instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 34 ++
 target/arm/translate.c | 19 +--
 target/arm/vfp.decode  |  5 +
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 1d7100debe4..8532bf4abcd 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1383,3 +1383,37 @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+/* VNMLA: -fd + -(fn * fm) */
+TCGv_i32 tmp = tcg_temp_new_i32();
+
+gen_helper_vfp_muls(tmp, vn, vm, fpst);
+gen_helper_vfp_negs(tmp, tmp);
+gen_helper_vfp_negs(vd, vd);
+gen_helper_vfp_adds(vd, vd, tmp, fpst);
+tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLA_sp(DisasContext *s, arg_VNMLA_sp *a)
+{
+return do_vfp_3op_sp(s, gen_VNMLA_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+/* VNMLA: -fd + (fn * fm) */
+TCGv_i64 tmp = tcg_temp_new_i64();
+
+gen_helper_vfp_muld(tmp, vn, vm, fpst);
+gen_helper_vfp_negd(tmp, tmp);
+gen_helper_vfp_negd(vd, vd);
+gen_helper_vfp_addd(vd, vd, tmp, fpst);
+tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
+{
+return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index b3d0648bb50..1f83723b81a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1393,16 +1393,6 @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_neg(int dp)
-{
-/* Like gen_vfp_neg() but put result in F1 */
-if (dp) {
-gen_helper_vfp_negd(cpu_F1d, cpu_F0d);
-} else {
-gen_helper_vfp_negs(cpu_F1s, cpu_F0s);
-}
-}
-
 static inline void gen_vfp_abs(int dp)
 {
 if (dp)
@@ -3122,7 +3112,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 2:
+case 0 ... 3:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3308,13 +3298,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 3: /* VNMLA: -fd + -(fn * fm) */
-gen_vfp_mul(dp);
-gen_vfp_F1_neg(dp);
-gen_mov_F0_vreg(dp, rd);
-gen_vfp_neg(dp);
-gen_vfp_add(dp);
-break;
 case 4: /* mul: fn * fm */
 gen_vfp_mul(dp);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 08e4f427408..c50d2c3ebf3 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -112,3 +112,8 @@ VNMLS_sp  1110 0.01   1010 .0.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLS_dp  1110 0.01   1011 .0.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLA_sp  1110 0.01   1010 .1.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLA_dp  1110 0.01   1011 .1.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 15/42] target/arm: Convert VFP VLDR and VSTR to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP single load/store insns VLDR and VSTR to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 73 ++
 target/arm/translate.c | 22 +-
 target/arm/vfp.decode  |  7 
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 5f081221b83..40f2cac3e2e 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -853,3 +853,76 @@ static bool trans_VMOV_64_dp(DisasContext *s, 
arg_VMOV_64_sp *a)
 
 return true;
 }
+
+static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+{
+uint32_t offset;
+TCGv_i32 addr;
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+offset = a->imm << 2;
+if (!a->u) {
+offset = -offset;
+}
+
+if (s->thumb && a->rn == 15) {
+/* This is actually UNPREDICTABLE */
+addr = tcg_temp_new_i32();
+tcg_gen_movi_i32(addr, s->pc & ~2);
+} else {
+addr = load_reg(s, a->rn);
+}
+tcg_gen_addi_i32(addr, addr, offset);
+if (a->l) {
+gen_vfp_ld(s, false, addr);
+gen_mov_vreg_F0(false, a->vd);
+} else {
+gen_mov_F0_vreg(false, a->vd);
+gen_vfp_st(s, false, addr);
+}
+tcg_temp_free_i32(addr);
+
+return true;
+}
+
+static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+{
+uint32_t offset;
+TCGv_i32 addr;
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+offset = a->imm << 2;
+if (!a->u) {
+offset = -offset;
+}
+
+if (s->thumb && a->rn == 15) {
+/* This is actually UNPREDICTABLE */
+addr = tcg_temp_new_i32();
+tcg_gen_movi_i32(addr, s->pc & ~2);
+} else {
+addr = load_reg(s, a->rn);
+}
+tcg_gen_addi_i32(addr, addr, offset);
+if (a->l) {
+gen_vfp_ld(s, true, addr);
+gen_mov_vreg_F0(true, a->vd);
+} else {
+gen_mov_F0_vreg(true, a->vd);
+gen_vfp_st(s, true, addr);
+}
+tcg_temp_free_i32(addr);
+
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d2dced7c45a..d954e8de1eb 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3713,26 +3713,8 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 else
 rd = VFP_SREG_D(insn);
 if ((insn & 0x0120) == 0x0100) {
-/* Single load/store */
-offset = (insn & 0xff) << 2;
-if ((insn & (1 << 23)) == 0)
-offset = -offset;
-if (s->thumb && rn == 15) {
-/* This is actually UNPREDICTABLE */
-addr = tcg_temp_new_i32();
-tcg_gen_movi_i32(addr, s->pc & ~2);
-} else {
-addr = load_reg(s, rn);
-}
-tcg_gen_addi_i32(addr, addr, offset);
-if (insn & (1 << 20)) {
-gen_vfp_ld(s, dp, addr);
-gen_mov_vreg_F0(dp, rd);
-} else {
-gen_mov_F0_vreg(dp, rd);
-gen_vfp_st(s, dp, addr);
-}
-tcg_temp_free_i32(addr);
+/* Already handled by decodetree */
+return 1;
 } else {
 /* load/store multiple */
 int w = insn & (1 << 21);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 134f1c9ef58..8fa7fa0bead 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -71,3 +71,10 @@ VMOV_64_sp    1100 010 op:1 rt2:4 rt:4 1010 00.1  \
  vm=%vm_sp
 VMOV_64_dp    1100 010 op:1 rt2:4 rt:4 1011 00.1  \
  vm=%vm_dp
+
+# Note that the half-precision variants of VLDR and VSTR are
+# not part of this decodetree at all because they have bits [9:8] == 0b01
+VLDR_VSTR_sp  1101 u:1 .0 l:1 rn:4  1010 imm:8 \
+ vd=%vd_sp
+VLDR_VSTR_dp  1101 u:1 .0 l:1 rn:4  1011 imm:8 \
+ vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 10/42] target/arm: Move the VFP trans_* functions to translate-vfp.inc.c

2019-06-11 Thread Peter Maydell
Move the trans_*() functions we've just created from translate.c
to translate-vfp.inc.c. This is pure code motion with no textual
changes (this can be checked with 'git show --color-moved').

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 337 +
 target/arm/translate.c | 337 -
 2 files changed, 337 insertions(+), 337 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index f7535138d0f..2f070a6e0d9 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -138,3 +138,340 @@ static bool vfp_access_check(DisasContext *s)
 {
 return full_vfp_access_check(s, false);
 }
+
+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+{
+uint32_t rd, rn, rm;
+bool dp = a->dp;
+
+if (!dc_isar_feature(aa32_vsel, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+((a->vm | a->vn | a->vd) & 0x10)) {
+return false;
+}
+rd = a->vd;
+rn = a->vn;
+rm = a->vm;
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (dp) {
+TCGv_i64 frn, frm, dest;
+TCGv_i64 tmp, zero, zf, nf, vf;
+
+zero = tcg_const_i64(0);
+
+frn = tcg_temp_new_i64();
+frm = tcg_temp_new_i64();
+dest = tcg_temp_new_i64();
+
+zf = tcg_temp_new_i64();
+nf = tcg_temp_new_i64();
+vf = tcg_temp_new_i64();
+
+tcg_gen_extu_i32_i64(zf, cpu_ZF);
+tcg_gen_ext_i32_i64(nf, cpu_NF);
+tcg_gen_ext_i32_i64(vf, cpu_VF);
+
+tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+switch (a->cc) {
+case 0: /* eq: Z */
+tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
+frn, frm);
+break;
+case 1: /* vs: V */
+tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
+frn, frm);
+break;
+case 2: /* ge: N == V -> N ^ V == 0 */
+tmp = tcg_temp_new_i64();
+tcg_gen_xor_i64(tmp, vf, nf);
+tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+frn, frm);
+tcg_temp_free_i64(tmp);
+break;
+case 3: /* gt: !Z && N == V */
+tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
+frn, frm);
+tmp = tcg_temp_new_i64();
+tcg_gen_xor_i64(tmp, vf, nf);
+tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+dest, frm);
+tcg_temp_free_i64(tmp);
+break;
+}
+tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+tcg_temp_free_i64(frn);
+tcg_temp_free_i64(frm);
+tcg_temp_free_i64(dest);
+
+tcg_temp_free_i64(zf);
+tcg_temp_free_i64(nf);
+tcg_temp_free_i64(vf);
+
+tcg_temp_free_i64(zero);
+} else {
+TCGv_i32 frn, frm, dest;
+TCGv_i32 tmp, zero;
+
+zero = tcg_const_i32(0);
+
+frn = tcg_temp_new_i32();
+frm = tcg_temp_new_i32();
+dest = tcg_temp_new_i32();
+tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+switch (a->cc) {
+case 0: /* eq: Z */
+tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
+frn, frm);
+break;
+case 1: /* vs: V */
+tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
+frn, frm);
+break;
+case 2: /* ge: N == V -> N ^ V == 0 */
+tmp = tcg_temp_new_i32();
+tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+frn, frm);
+tcg_temp_free_i32(tmp);
+break;
+case 3: /* gt: !Z && N == V */
+tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
+frn, frm);
+tmp = tcg_temp_new_i32();
+tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+dest, frm);
+tcg_temp_free_i32(tmp);
+break;
+}
+tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+tcg_temp_free_i32(frn);
+tcg_temp_free_i32(frm);
+tcg_temp_free_i32(dest);
+
+tcg_temp_free_i32(zero);
+}
+
+return true;
+}
+
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+{
+uint32_t rd, rn, rm;
+bool dp = a->dp;
+bool vmin = a->op;
+TCGv_ptr fpst;
+
+if (!dc_isar_fea

[Qemu-devel] [PATCH v2 11/42] target/arm: Add helpers for VFP register loads and stores

2019-06-11 Thread Peter Maydell
The current VFP code has two different idioms for
loading and storing from the VFP register file:
 1 using the gen_mov_F0_vreg() and similar functions,
   which load and store to a fixed set of TCG globals
   cpu_F0s, CPU_F0d, etc
 2 by direct calls to tcg_gen_ld_f64() and friends

We want to phase out idiom 1 (because the use of the
fixed globals is a relic of a much older version of TCG),
but idiom 2 is quite longwinded:
 tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
requires us to specify the 64-bitness twice, once in
the function name and once by passing 'true' to
vfp_reg_offset(). There's no guard against accidentally
passing the wrong flag.

Instead, let's move to a convention of accessing 64-bit
registers via the existing neon_load_reg64() and
neon_store_reg64(), and provide new neon_load_reg32()
and neon_store_reg32() for the 32-bit equivalents.

Implement the new functions and use them in the code in
translate-vfp.inc.c. We will convert the rest of the VFP
code as we do the decodetree conversion in subsequent
commits.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 40 +-
 target/arm/translate.c | 10 +
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 2f070a6e0d9..24358f3d3eb 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -179,8 +179,8 @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 tcg_gen_ext_i32_i64(nf, cpu_NF);
 tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+neon_load_reg64(frn, rn);
+neon_load_reg64(frm, rm);
 switch (a->cc) {
 case 0: /* eq: Z */
 tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -207,7 +207,7 @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 tcg_temp_free_i64(tmp);
 break;
 }
-tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+neon_store_reg64(dest, rd);
 tcg_temp_free_i64(frn);
 tcg_temp_free_i64(frm);
 tcg_temp_free_i64(dest);
@@ -226,8 +226,8 @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 frn = tcg_temp_new_i32();
 frm = tcg_temp_new_i32();
 dest = tcg_temp_new_i32();
-tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+neon_load_reg32(frn, rn);
+neon_load_reg32(frm, rm);
 switch (a->cc) {
 case 0: /* eq: Z */
 tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -254,7 +254,7 @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 tcg_temp_free_i32(tmp);
 break;
 }
-tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+neon_store_reg32(dest, rd);
 tcg_temp_free_i32(frn);
 tcg_temp_free_i32(frm);
 tcg_temp_free_i32(dest);
@@ -298,14 +298,14 @@ static bool trans_VMINMAXNM(DisasContext *s, 
arg_VMINMAXNM *a)
 frm = tcg_temp_new_i64();
 dest = tcg_temp_new_i64();
 
-tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+neon_load_reg64(frn, rn);
+neon_load_reg64(frm, rm);
 if (vmin) {
 gen_helper_vfp_minnumd(dest, frn, frm, fpst);
 } else {
 gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
 }
-tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+neon_store_reg64(dest, rd);
 tcg_temp_free_i64(frn);
 tcg_temp_free_i64(frm);
 tcg_temp_free_i64(dest);
@@ -316,14 +316,14 @@ static bool trans_VMINMAXNM(DisasContext *s, 
arg_VMINMAXNM *a)
 frm = tcg_temp_new_i32();
 dest = tcg_temp_new_i32();
 
-tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+neon_load_reg32(frn, rn);
+neon_load_reg32(frm, rm);
 if (vmin) {
 gen_helper_vfp_minnums(dest, frn, frm, fpst);
 } else {
 gen_helper_vfp_maxnums(dest, frn, frm, fpst);
 }
-tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+neon_store_reg32(dest, rd);
 tcg_temp_free_i32(frn);
 tcg_temp_free_i32(frm);
 tcg_temp_free_i32(dest);
@@ -379,9 +379,9 @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 TCGv_i64 tcg_res;
 tcg_op = tcg_temp_new_i64();
 tcg_res = tcg_temp_new_i64();
-tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+neon_load_reg64(tcg_op, rm);
 gen_helper_rintd(tcg_res, tcg_op, fpst);
-tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+neon_stor

[Qemu-devel] [PATCH v2 24/42] target/arm: Convert VADD to decodetree

2019-06-11 Thread Peter Maydell
Convert the VADD instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 10 ++
 target/arm/translate.c |  6 +-
 target/arm/vfp.decode  |  5 +
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 4c684f033b6..14aeb25f597 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1451,3 +1451,13 @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
+{
+return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
+{
+return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1f9fa6b03a1..cd1f24798b1 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1386,7 +1386,6 @@ static inline void gen_vfp_##name(int dp) 
\
 tcg_temp_free_ptr(fpst);  \
 }
 
-VFP_OP2(add)
 VFP_OP2(sub)
 VFP_OP2(div)
 
@@ -3111,7 +3110,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 5:
+case 0 ... 6:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3297,9 +3296,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 6: /* add: fn + fm */
-gen_vfp_add(dp);
-break;
 case 7: /* sub: fn - fm */
 gen_vfp_sub(dp);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 3063fcac23f..d911f12dfd0 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -127,3 +127,8 @@ VNMUL_sp  1110 0.10   1010 .1.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMUL_dp  1110 0.10   1011 .1.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VADD_sp   1110 0.11   1010 .0.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VADD_dp   1110 0.11   1011 .0.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 14/42] target/arm: Convert VFP two-register transfer insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP two-register transfer instructions to decodetree
(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
64-bit move" encoding group).

Again, we expand out the sequences involving gen_vfp_msr() and
gen_msr_vfp().

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 70 ++
 target/arm/translate.c | 46 +-
 target/arm/vfp.decode  |  5 +++
 3 files changed, 77 insertions(+), 44 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 74c10f9024b..5f081221b83 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -783,3 +783,73 @@ static bool trans_VMOV_single(DisasContext *s, 
arg_VMOV_single *a)
 
 return true;
 }
+
+static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
+{
+TCGv_i32 tmp;
+
+/*
+ * VMOV between two general-purpose registers and two single precision
+ * floating point registers
+ */
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (a->op) {
+/* fpreg to gpreg */
+tmp = tcg_temp_new_i32();
+neon_load_reg32(tmp, a->vm);
+store_reg(s, a->rt, tmp);
+tmp = tcg_temp_new_i32();
+neon_load_reg32(tmp, a->vm + 1);
+store_reg(s, a->rt2, tmp);
+} else {
+/* gpreg to fpreg */
+tmp = load_reg(s, a->rt);
+neon_store_reg32(tmp, a->vm);
+tmp = load_reg(s, a->rt2);
+neon_store_reg32(tmp, a->vm + 1);
+}
+
+return true;
+}
+
+static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
+{
+TCGv_i32 tmp;
+
+/*
+ * VMOV between two general-purpose registers and one double precision
+ * floating point register
+ */
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (a->op) {
+/* fpreg to gpreg */
+tmp = tcg_temp_new_i32();
+neon_load_reg32(tmp, a->vm * 2);
+store_reg(s, a->rt, tmp);
+tmp = tcg_temp_new_i32();
+neon_load_reg32(tmp, a->vm * 2 + 1);
+store_reg(s, a->rt2, tmp);
+} else {
+/* gpreg to fpreg */
+tmp = load_reg(s, a->rt);
+neon_store_reg32(tmp, a->vm * 2);
+tcg_temp_free_i32(tmp);
+tmp = load_reg(s, a->rt2);
+neon_store_reg32(tmp, a->vm * 2 + 1);
+tcg_temp_free_i32(tmp);
+}
+
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index cbb86a49213..d2dced7c45a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3703,50 +3703,8 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 case 0xc:
 case 0xd:
 if ((insn & 0x03e0) == 0x0040) {
-/* two-register transfer */
-rn = (insn >> 16) & 0xf;
-rd = (insn >> 12) & 0xf;
-if (dp) {
-VFP_DREG_M(rm, insn);
-} else {
-rm = VFP_SREG_M(insn);
-}
-
-if (insn & ARM_CP_RW_BIT) {
-/* vfp->arm */
-if (dp) {
-gen_mov_F0_vreg(0, rm * 2);
-tmp = gen_vfp_mrs();
-store_reg(s, rd, tmp);
-gen_mov_F0_vreg(0, rm * 2 + 1);
-tmp = gen_vfp_mrs();
-store_reg(s, rn, tmp);
-} else {
-gen_mov_F0_vreg(0, rm);
-tmp = gen_vfp_mrs();
-store_reg(s, rd, tmp);
-gen_mov_F0_vreg(0, rm + 1);
-tmp = gen_vfp_mrs();
-store_reg(s, rn, tmp);
-}
-} else {
-/* arm->vfp */
-if (dp) {
-tmp = load_reg(s, rd);
-gen_vfp_msr(tmp);
-gen_mov_vreg_F0(0, rm * 2);
-tmp = load_reg(s, rn);
-gen_vfp_msr(tmp);
-gen_mov_vreg_F0(0, rm * 2 + 1);
-} else {
-tmp = load_reg(s, rd);
-gen_vfp_msr(tmp);
-gen_mov_vreg_F0(0, rm);
-tmp = load_reg(s, rn);
-gen_vfp_msr(tmp);
-gen_mov_vreg_F0(0, rm + 1);
-}
-}
+/* Already handled by decodetree */
+return 1;
 } else {
 /* Load/store */
 rn = (insn >> 16) & 0xf;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index bb7de403df3..134f1c9ef58 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -66,3 +66,8 @@ VDUP  1110 1 b:1 q:1 0  rt:4 1011 . 0 e:1 1 
 \
 VMSR_VMRS 1110 111 l:1 reg:4 rt:4 101

[Qemu-devel] [PATCH v2 13/42] target/arm: Convert "single-precision" register moves to decodetree

2019-06-11 Thread Peter Maydell
Convert the "single-precision" register moves to decodetree:
 * VMSR
 * VMRS
 * VMOV between general purpose register and single precision

Note that the VMSR/VMRS conversions make our handling of
the "should this UNDEF?" checks consistent between the two
instructions:
 * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
   (previously was a nop)
 * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
   (previously was a nop)
 * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
   (previously would write to the register, which had no
   guest-visible effect because we always UNDEF reads)

We also tighten up the decode: we were previously underdecoding
some SBZ or SBO bits.

The conversion of VMOV_single includes the expansion out of the
gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
sequences into the simpler direct load/store of the TCG temp via
neon_{load,store}_reg32(): we know in the new function that we're
always single-precision, we don't need to use the old-and-deprecated
cpu_F0* TCG globals, and we don't happen to have the declaration of
gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
new function is.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 161 +
 target/arm/translate.c | 148 +-
 target/arm/vfp.decode  |   4 +
 3 files changed, 168 insertions(+), 145 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 8b0899fa05c..74c10f9024b 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -622,3 +622,164 @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 
 return true;
 }
+
+static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
+{
+TCGv_i32 tmp;
+bool ignore_vfp_enabled = false;
+
+if (arm_dc_feature(s, ARM_FEATURE_M)) {
+/*
+ * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
+ * Writes to R15 are UNPREDICTABLE; we choose to undef.
+ */
+if (a->rt == 15 || a->reg != ARM_VFP_FPSCR) {
+return false;
+}
+}
+
+switch (a->reg) {
+case ARM_VFP_FPSID:
+/*
+ * VFPv2 allows access to FPSID from userspace; VFPv3 restricts
+ * all ID registers to privileged access only.
+ */
+if (IS_USER(s) && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+return false;
+}
+ignore_vfp_enabled = true;
+break;
+case ARM_VFP_MVFR0:
+case ARM_VFP_MVFR1:
+if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_MVFR)) {
+return false;
+}
+ignore_vfp_enabled = true;
+break;
+case ARM_VFP_MVFR2:
+if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_V8)) {
+return false;
+}
+ignore_vfp_enabled = true;
+break;
+case ARM_VFP_FPSCR:
+break;
+case ARM_VFP_FPEXC:
+if (IS_USER(s)) {
+return false;
+}
+ignore_vfp_enabled = true;
+break;
+case ARM_VFP_FPINST:
+case ARM_VFP_FPINST2:
+/* Not present in VFPv3 */
+if (IS_USER(s) || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+return false;
+}
+break;
+default:
+return false;
+}
+
+if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+return true;
+}
+
+if (a->l) {
+/* VMRS, move VFP special register to gp register */
+switch (a->reg) {
+case ARM_VFP_FPSID:
+case ARM_VFP_FPEXC:
+case ARM_VFP_FPINST:
+case ARM_VFP_FPINST2:
+case ARM_VFP_MVFR0:
+case ARM_VFP_MVFR1:
+case ARM_VFP_MVFR2:
+tmp = load_cpu_field(vfp.xregs[a->reg]);
+break;
+case ARM_VFP_FPSCR:
+if (a->rt == 15) {
+tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+tcg_gen_andi_i32(tmp, tmp, 0xf000);
+} else {
+tmp = tcg_temp_new_i32();
+gen_helper_vfp_get_fpscr(tmp, cpu_env);
+}
+break;
+default:
+g_assert_not_reached();
+}
+
+if (a->rt == 15) {
+/* Set the 4 flag bits in the CPSR.  */
+gen_set_nzcv(tmp);
+tcg_temp_free_i32(tmp);
+} else {
+store_reg(s, a->rt, tmp);
+}
+} else {
+/* VMSR, move gp register to VFP special register */
+switch (a->reg) {
+case ARM_VFP_FPSID:
+case ARM_VFP_MVFR0:
+case ARM_VFP_MVFR1:
+case ARM_VFP_MVFR2:
+/* Writes are ignored.  */
+break;
+case ARM_VFP_FPSCR:
+tmp = load_reg(s, a->rt);
+gen_helper_vfp_set_fpscr(cpu_env, tmp);
+tcg_temp_free_i32(tmp);
+gen_lookup_tb(s);
+break;
+case ARM_VFP_FPEXC:
+  

[Qemu-devel] [PATCH v2 20/42] target/arm: Convert VFP VNMLS to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP VNMLS instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 42 ++
 target/arm/translate.c | 24 +--
 target/arm/vfp.decode  |  5 
 3 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 00f64401dda..1d7100debe4 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1341,3 +1341,45 @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+/*
+ * VNMLS: -fd + (fn * fm)
+ * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+ * plausible looking simplifications because this will give wrong results
+ * for NaNs.
+ */
+TCGv_i32 tmp = tcg_temp_new_i32();
+
+gen_helper_vfp_muls(tmp, vn, vm, fpst);
+gen_helper_vfp_negs(vd, vd);
+gen_helper_vfp_adds(vd, vd, tmp, fpst);
+tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLS_sp(DisasContext *s, arg_VNMLS_sp *a)
+{
+return do_vfp_3op_sp(s, gen_VNMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+/*
+ * VNMLS: -fd + (fn * fm)
+ * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+ * plausible looking simplifications because this will give wrong results
+ * for NaNs.
+ */
+TCGv_i64 tmp = tcg_temp_new_i64();
+
+gen_helper_vfp_muld(tmp, vn, vm, fpst);
+gen_helper_vfp_negd(vd, vd);
+gen_helper_vfp_addd(vd, vd, tmp, fpst);
+tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
+{
+return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index bddc0d20447..b3d0648bb50 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1393,18 +1393,6 @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_mul(int dp)
-{
-/* Like gen_vfp_mul() but put result in F1 */
-TCGv_ptr fpst = get_fpstatus_ptr(0);
-if (dp) {
-gen_helper_vfp_muld(cpu_F1d, cpu_F0d, cpu_F1d, fpst);
-} else {
-gen_helper_vfp_muls(cpu_F1s, cpu_F0s, cpu_F1s, fpst);
-}
-tcg_temp_free_ptr(fpst);
-}
-
 static inline void gen_vfp_F1_neg(int dp)
 {
 /* Like gen_vfp_neg() but put result in F1 */
@@ -3134,7 +3122,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 1:
+case 0 ... 2:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3320,16 +3308,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 2: /* VNMLS: -fd + (fn * fm) */
-/* Note that it isn't valid to replace (-A + B) with (B - 
A)
- * or similar plausible looking simplifications
- * because this will give wrong results for NaNs.
- */
-gen_vfp_F1_mul(dp);
-gen_mov_F0_vreg(dp, rd);
-gen_vfp_neg(dp);
-gen_vfp_add(dp);
-break;
 case 3: /* VNMLA: -fd + -(fn * fm) */
 gen_vfp_mul(dp);
 gen_vfp_F1_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 7bcf2260eec..08e4f427408 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -107,3 +107,8 @@ VMLS_sp   1110 0.00   1010 .1.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLS_dp   1110 0.00   1011 .1.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLS_sp  1110 0.01   1010 .0.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLS_dp  1110 0.01   1011 .0.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 12/42] target/arm: Convert "double-precision" register moves to decodetree

2019-06-11 Thread Peter Maydell
Convert the "double-precision" register moves to decodetree:
this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.

Note that the conversion process has tightened up a few of the
UNDEF encoding checks: we now correctly forbid:
 * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
 * VMOV-from-gpr with opc1:opc2 == 0x10
 * VDUP with B:E == 11
 * VDUP with Q == 1 and Vn<0> == 1

Signed-off-by: Peter Maydell 
---
The accesses of elements < 32 bits could be improved by doing
direct ld/st of the right size rather than 32-bit read-and-shift
or read-modify-write, but we leave this for later cleanup,
since this series is generally trying to stick to fixing
the decode.
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 147 +
 target/arm/translate.c |  83 +--
 target/arm/vfp.decode  |  36 
 3 files changed, 185 insertions(+), 81 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 24358f3d3eb..8b0899fa05c 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -475,3 +475,150 @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 
 return true;
 }
+
+static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
+{
+/* VMOV scalar to general purpose register */
+TCGv_i32 tmp;
+int pass;
+uint32_t offset;
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
+return false;
+}
+
+offset = a->index << a->size;
+pass = extract32(offset, 2, 1);
+offset = extract32(offset, 0, 2) * 8;
+
+if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = neon_load_reg(a->vn, pass);
+switch (a->size) {
+case 0:
+if (offset) {
+tcg_gen_shri_i32(tmp, tmp, offset);
+}
+if (a->u) {
+gen_uxtb(tmp);
+} else {
+gen_sxtb(tmp);
+}
+break;
+case 1:
+if (a->u) {
+if (offset) {
+tcg_gen_shri_i32(tmp, tmp, 16);
+} else {
+gen_uxth(tmp);
+}
+} else {
+if (offset) {
+tcg_gen_sari_i32(tmp, tmp, 16);
+} else {
+gen_sxth(tmp);
+}
+}
+break;
+case 2:
+break;
+}
+store_reg(s, a->rt, tmp);
+
+return true;
+}
+
+static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
+{
+/* VMOV general purpose register to scalar */
+TCGv_i32 tmp, tmp2;
+int pass;
+uint32_t offset;
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
+return false;
+}
+
+offset = a->index << a->size;
+pass = extract32(offset, 2, 1);
+offset = extract32(offset, 0, 2) * 8;
+
+if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = load_reg(s, a->rt);
+switch (a->size) {
+case 0:
+tmp2 = neon_load_reg(a->vn, pass);
+tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
+tcg_temp_free_i32(tmp2);
+break;
+case 1:
+tmp2 = neon_load_reg(a->vn, pass);
+tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
+tcg_temp_free_i32(tmp2);
+break;
+case 2:
+break;
+}
+neon_store_reg(a->vn, pass, tmp);
+
+return true;
+}
+
+static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
+{
+/* VDUP (general purpose register) */
+TCGv_i32 tmp;
+int size, vec_size;
+
+if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
+return false;
+}
+
+if (a->b && a->e) {
+return false;
+}
+
+if (a->q && (a->vn & 1)) {
+return false;
+}
+
+vec_size = a->q ? 16 : 8;
+if (a->b) {
+size = 0;
+} else if (a->e) {
+size = 1;
+} else {
+size = 2;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = load_reg(s, a->rt);
+tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
+ vec_size, vec_size, tmp);
+tcg_temp_free_i32(tmp);
+
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3661ed57cd3..08307bb526d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3151,87 +3151,8 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 /* single register transfer */
 rd = (insn >> 12) & 0xf;
 if (dp) {
-int size;
-int pass;
-
- 

[Qemu-devel] [PATCH v2 25/42] target/arm: Convert VSUB to decodetree

2019-06-11 Thread Peter Maydell
Convert the VSUB instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 10 ++
 target/arm/translate.c |  6 +-
 target/arm/vfp.decode  |  5 +
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 14aeb25f597..12da3b8acb8 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1461,3 +1461,13 @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
+{
+return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
+{
+return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index cd1f24798b1..18d4f9933ad 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1386,7 +1386,6 @@ static inline void gen_vfp_##name(int dp) 
\
 tcg_temp_free_ptr(fpst);  \
 }
 
-VFP_OP2(sub)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -3110,7 +3109,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 6:
+case 0 ... 7:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3296,9 +3295,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 7: /* sub: fn - fm */
-gen_vfp_sub(dp);
-break;
 case 8: /* div: fn / fm */
 gen_vfp_div(dp);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index d911f12dfd0..de56f44efc9 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -132,3 +132,8 @@ VADD_sp   1110 0.11   1010 .0.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VADD_dp   1110 0.11   1011 .0.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VSUB_sp   1110 0.11   1010 .1.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VSUB_dp   1110 0.11   1011 .1.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 16/42] target/arm: Convert the VFP load/store multiple insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 162 +
 target/arm/translate.c |  97 +---
 target/arm/vfp.decode  |  18 
 3 files changed, 183 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 40f2cac3e2e..32a1805e582 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -926,3 +926,165 @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, 
arg_VLDR_VSTR_sp *a)
 
 return true;
 }
+
+static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
+{
+uint32_t offset;
+TCGv_i32 addr;
+int i, n;
+
+n = a->imm;
+
+if (n == 0 || (a->vd + n) > 32) {
+/*
+ * UNPREDICTABLE cases for bad immediates: we choose to
+ * UNDEF to avoid generating huge numbers of TCG ops
+ */
+return false;
+}
+if (a->rn == 15 && a->w) {
+/* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (s->thumb && a->rn == 15) {
+/* This is actually UNPREDICTABLE */
+addr = tcg_temp_new_i32();
+tcg_gen_movi_i32(addr, s->pc & ~2);
+} else {
+addr = load_reg(s, a->rn);
+}
+if (a->p) {
+/* pre-decrement */
+tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
+}
+
+if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+/*
+ * Here 'addr' is the lowest address we will store to,
+ * and is either the old SP (if post-increment) or
+ * the new SP (if pre-decrement). For post-increment
+ * where the old value is below the limit and the new
+ * value is above, it is UNKNOWN whether the limit check
+ * triggers; we choose to trigger.
+ */
+gen_helper_v8m_stackcheck(cpu_env, addr);
+}
+
+offset = 4;
+for (i = 0; i < n; i++) {
+if (a->l) {
+/* load */
+gen_vfp_ld(s, false, addr);
+gen_mov_vreg_F0(false, a->vd + i);
+} else {
+/* store */
+gen_mov_F0_vreg(false, a->vd + i);
+gen_vfp_st(s, false, addr);
+}
+tcg_gen_addi_i32(addr, addr, offset);
+}
+if (a->w) {
+/* writeback */
+if (a->p) {
+offset = -offset * n;
+tcg_gen_addi_i32(addr, addr, offset);
+}
+store_reg(s, a->rn, addr);
+} else {
+tcg_temp_free_i32(addr);
+}
+
+return true;
+}
+
+static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
+{
+uint32_t offset;
+TCGv_i32 addr;
+int i, n;
+
+n = a->imm >> 1;
+
+if (n == 0 || (a->vd + n) > 32 || n > 16) {
+/*
+ * UNPREDICTABLE cases for bad immediates: we choose to
+ * UNDEF to avoid generating huge numbers of TCG ops
+ */
+return false;
+}
+if (a->rn == 15 && a->w) {
+/* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd + n) > 16) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (s->thumb && a->rn == 15) {
+/* This is actually UNPREDICTABLE */
+addr = tcg_temp_new_i32();
+tcg_gen_movi_i32(addr, s->pc & ~2);
+} else {
+addr = load_reg(s, a->rn);
+}
+if (a->p) {
+/* pre-decrement */
+tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
+}
+
+if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+/*
+ * Here 'addr' is the lowest address we will store to,
+ * and is either the old SP (if post-increment) or
+ * the new SP (if pre-decrement). For post-increment
+ * where the old value is below the limit and the new
+ * value is above, it is UNKNOWN whether the limit check
+ * triggers; we choose to trigger.
+ */
+gen_helper_v8m_stackcheck(cpu_env, addr);
+}
+
+offset = 8;
+for (i = 0; i < n; i++) {
+if (a->l) {
+/* load */
+gen_vfp_ld(s, true, addr);

[Qemu-devel] [PATCH v2 41/42] target/arm: Convert float-to-integer VCVT insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the float-to-integer VCVT instructions to decodetree.
Since these are the last unconverted instructions, we can
delete the old decoder structure entirely now.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c |  72 ++
 target/arm/translate.c | 241 +
 target/arm/vfp.decode  |   6 +
 3 files changed, 80 insertions(+), 239 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index db07fdd8736..8216dba796e 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -2578,3 +2578,75 @@ static bool trans_VCVT_fix_dp(DisasContext *s, 
arg_VCVT_fix_dp *a)
 tcg_temp_free_ptr(fpst);
 return true;
 }
+
+static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+{
+TCGv_i32 vm;
+TCGv_ptr fpst;
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(false);
+vm = tcg_temp_new_i32();
+neon_load_reg32(vm, a->vm);
+
+if (a->s) {
+if (a->rz) {
+gen_helper_vfp_tosizs(vm, vm, fpst);
+} else {
+gen_helper_vfp_tosis(vm, vm, fpst);
+}
+} else {
+if (a->rz) {
+gen_helper_vfp_touizs(vm, vm, fpst);
+} else {
+gen_helper_vfp_touis(vm, vm, fpst);
+}
+}
+neon_store_reg32(vm, a->vd);
+tcg_temp_free_i32(vm);
+tcg_temp_free_ptr(fpst);
+return true;
+}
+
+static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
+{
+TCGv_i32 vd;
+TCGv_i64 vm;
+TCGv_ptr fpst;
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(false);
+vm = tcg_temp_new_i64();
+vd = tcg_temp_new_i32();
+neon_load_reg64(vm, a->vm);
+
+if (a->s) {
+if (a->rz) {
+gen_helper_vfp_tosizd(vd, vm, fpst);
+} else {
+gen_helper_vfp_tosid(vd, vm, fpst);
+}
+} else {
+if (a->rz) {
+gen_helper_vfp_touizd(vd, vm, fpst);
+} else {
+gen_helper_vfp_touid(vd, vm, fpst);
+}
+}
+neon_store_reg32(vd, a->vd);
+tcg_temp_free_i32(vd);
+tcg_temp_free_i64(vm);
+tcg_temp_free_ptr(fpst);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 6046bb32247..1e6b0fa769e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1418,9 +1418,7 @@ static inline void gen_vfp_##name(int dp, int neon) \
 tcg_temp_free_ptr(statusptr); \
 }
 
-VFP_GEN_FTOI(toui)
 VFP_GEN_FTOI(touiz)
-VFP_GEN_FTOI(tosi)
 VFP_GEN_FTOI(tosiz)
 #undef VFP_GEN_FTOI
 
@@ -1612,33 +1610,7 @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 }
 
 #define tcg_gen_ld_f32 tcg_gen_ld_i32
-#define tcg_gen_ld_f64 tcg_gen_ld_i64
 #define tcg_gen_st_f32 tcg_gen_st_i32
-#define tcg_gen_st_f64 tcg_gen_st_i64
-
-static inline void gen_mov_F0_vreg(int dp, int reg)
-{
-if (dp)
-tcg_gen_ld_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-else
-tcg_gen_ld_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_F1_vreg(int dp, int reg)
-{
-if (dp)
-tcg_gen_ld_f64(cpu_F1d, cpu_env, vfp_reg_offset(dp, reg));
-else
-tcg_gen_ld_f32(cpu_F1s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_vreg_F0(int dp, int reg)
-{
-if (dp)
-tcg_gen_st_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-else
-tcg_gen_st_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
@@ -2983,9 +2955,6 @@ static void gen_neon_dup_high16(TCGv_i32 var)
  */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
-uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
-int dp, veclen;
-
 if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
 return 1;
 }
@@ -3005,214 +2974,8 @@ static int disas_vfp_insn(DisasContext *s, uint32_t 
insn)
 return 0;
 }
 }
-
-if (extract32(insn, 28, 4) == 0xf) {
-/*
- * Encodings with T=1 (Thumb) or unconditional (ARM): these
- * were all handled by the decodetree decoder, so any insn
- * patterns which get here must be UNDEF.
- */
-return 1;
-}
-
-/*
- * FIXME: this access check should not take precedence over UNDEF
- * for invalid encodings; we will generate incorrect syndrome information
- * for attempts to execute invalid vfp/neon encodings with FP disabled.
- */
-if (!vfp_access_check(s)) {
-return 0;
-}
-
-dp = ((insn & 0xf00) == 0xb00);
-switch ((insn >> 24) & 0xf) {
-case 0xe:
-if (insn & (1 << 4)) {
-/* already handled by decodetree */
-return 1;
-} else {
-  

[Qemu-devel] [PATCH v2 26/42] target/arm: Convert VDIV to decodetree

2019-06-11 Thread Peter Maydell
Convert the VDIV instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 10 ++
 target/arm/translate.c | 21 +
 target/arm/vfp.decode  |  5 +
 3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 12da3b8acb8..6af99605d5c 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1471,3 +1471,13 @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
+{
+return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
+{
+return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 18d4f9933ad..a9ec6eaef80 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1374,22 +1374,6 @@ static TCGv_ptr get_fpstatus_ptr(int neon)
 return statusptr;
 }
 
-#define VFP_OP2(name) \
-static inline void gen_vfp_##name(int dp) \
-{ \
-TCGv_ptr fpst = get_fpstatus_ptr(0);  \
-if (dp) { \
-gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, cpu_F1d, fpst);\
-} else {  \
-gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, cpu_F1s, fpst);\
-} \
-tcg_temp_free_ptr(fpst);  \
-}
-
-VFP_OP2(div)
-
-#undef VFP_OP2
-
 static inline void gen_vfp_abs(int dp)
 {
 if (dp)
@@ -3109,7 +3093,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 7:
+case 0 ... 8:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3295,9 +3279,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 8: /* div: fn / fm */
-gen_vfp_div(dp);
-break;
 case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
 case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
 case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index de56f44efc9..de305f60e18 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -137,3 +137,8 @@ VSUB_sp   1110 0.11   1010 .1.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VSUB_dp   1110 0.11   1011 .1.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VDIV_sp   1110 1.00   1010 .0.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VDIV_dp   1110 1.00   1011 .0.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 17/42] target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d

2019-06-11 Thread Peter Maydell
Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
functions which perform the memory accesses by going via the TCG
globals cpu_F0s and cpu_F0d, to use local TCG temps instead.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 46 +-
 target/arm/translate.c | 18 -
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 32a1805e582..9729946d734 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -857,7 +857,7 @@ static bool trans_VMOV_64_dp(DisasContext *s, 
arg_VMOV_64_sp *a)
 static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
 uint32_t offset;
-TCGv_i32 addr;
+TCGv_i32 addr, tmp;
 
 if (!vfp_access_check(s)) {
 return true;
@@ -876,13 +876,15 @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, 
arg_VLDR_VSTR_sp *a)
 addr = load_reg(s, a->rn);
 }
 tcg_gen_addi_i32(addr, addr, offset);
+tmp = tcg_temp_new_i32();
 if (a->l) {
-gen_vfp_ld(s, false, addr);
-gen_mov_vreg_F0(false, a->vd);
+gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+neon_store_reg32(tmp, a->vd);
 } else {
-gen_mov_F0_vreg(false, a->vd);
-gen_vfp_st(s, false, addr);
+neon_load_reg32(tmp, a->vd);
+gen_aa32_st32(s, tmp, addr, get_mem_index(s));
 }
+tcg_temp_free_i32(tmp);
 tcg_temp_free_i32(addr);
 
 return true;
@@ -892,6 +894,7 @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, 
arg_VLDR_VSTR_sp *a)
 {
 uint32_t offset;
 TCGv_i32 addr;
+TCGv_i64 tmp;
 
 /* UNDEF accesses to D16-D31 if they don't exist */
 if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
@@ -915,13 +918,15 @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, 
arg_VLDR_VSTR_sp *a)
 addr = load_reg(s, a->rn);
 }
 tcg_gen_addi_i32(addr, addr, offset);
+tmp = tcg_temp_new_i64();
 if (a->l) {
-gen_vfp_ld(s, true, addr);
-gen_mov_vreg_F0(true, a->vd);
+gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+neon_store_reg64(tmp, a->vd);
 } else {
-gen_mov_F0_vreg(true, a->vd);
-gen_vfp_st(s, true, addr);
+neon_load_reg64(tmp, a->vd);
+gen_aa32_st64(s, tmp, addr, get_mem_index(s));
 }
+tcg_temp_free_i64(tmp);
 tcg_temp_free_i32(addr);
 
 return true;
@@ -930,7 +935,7 @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, 
arg_VLDR_VSTR_sp *a)
 static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
 {
 uint32_t offset;
-TCGv_i32 addr;
+TCGv_i32 addr, tmp;
 int i, n;
 
 n = a->imm;
@@ -976,18 +981,20 @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, 
arg_VLDM_VSTM_sp *a)
 }
 
 offset = 4;
+tmp = tcg_temp_new_i32();
 for (i = 0; i < n; i++) {
 if (a->l) {
 /* load */
-gen_vfp_ld(s, false, addr);
-gen_mov_vreg_F0(false, a->vd + i);
+gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+neon_store_reg32(tmp, a->vd + i);
 } else {
 /* store */
-gen_mov_F0_vreg(false, a->vd + i);
-gen_vfp_st(s, false, addr);
+neon_load_reg32(tmp, a->vd + i);
+gen_aa32_st32(s, tmp, addr, get_mem_index(s));
 }
 tcg_gen_addi_i32(addr, addr, offset);
 }
+tcg_temp_free_i32(tmp);
 if (a->w) {
 /* writeback */
 if (a->p) {
@@ -1006,6 +1013,7 @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, 
arg_VLDM_VSTM_dp *a)
 {
 uint32_t offset;
 TCGv_i32 addr;
+TCGv_i64 tmp;
 int i, n;
 
 n = a->imm >> 1;
@@ -1056,18 +1064,20 @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, 
arg_VLDM_VSTM_dp *a)
 }
 
 offset = 8;
+tmp = tcg_temp_new_i64();
 for (i = 0; i < n; i++) {
 if (a->l) {
 /* load */
-gen_vfp_ld(s, true, addr);
-gen_mov_vreg_F0(true, a->vd + i);
+gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+neon_store_reg64(tmp, a->vd + i);
 } else {
 /* store */
-gen_mov_F0_vreg(true, a->vd + i);
-gen_vfp_st(s, true, addr);
+neon_load_reg64(tmp, a->vd + i);
+gen_aa32_st64(s, tmp, addr, get_mem_index(s));
 }
 tcg_gen_addi_i32(addr, addr, offset);
 }
+tcg_temp_free_i64(tmp);
 if (a->w) {
 /* writeback */
 if (a->p) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 5a9d0c30d3d..c197084e925 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1522,24 +1522,6 @@ VFP_GEN_FIX(uhto, )
 VFP_GEN_FIX(ulto, )
 #undef VFP_GEN_FIX
 
-static inline void gen_vfp_ld(DisasContext *s, int dp, TCGv_i32 addr)
-{
-if (dp) {
-gen_aa32_ld64(s, cpu_F0d, addr, get_mem_index(

[Qemu-devel] [PATCH v2 37/42] target/arm: Convert double-single precision conversion insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VCVT double/single precision conversion insns to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 48 ++
 target/arm/translate.c | 13 +
 target/arm/vfp.decode  |  6 +
 3 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index e94a8f2f0c5..c50093776b6 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -2320,3 +2320,51 @@ static bool trans_VRINTX_dp(DisasContext *s, 
arg_VRINTX_dp *a)
 tcg_temp_free_i64(tmp);
 return true;
 }
+
+static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
+{
+TCGv_i64 vd;
+TCGv_i32 vm;
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vm = tcg_temp_new_i32();
+vd = tcg_temp_new_i64();
+neon_load_reg32(vm, a->vm);
+gen_helper_vfp_fcvtds(vd, vm, cpu_env);
+neon_store_reg64(vd, a->vd);
+tcg_temp_free_i32(vm);
+tcg_temp_free_i64(vd);
+return true;
+}
+
+static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
+{
+TCGv_i64 vm;
+TCGv_i32 vd;
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vd = tcg_temp_new_i32();
+vm = tcg_temp_new_i64();
+neon_load_reg64(vm, a->vm);
+gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
+neon_store_reg32(vd, a->vd);
+tcg_temp_free_i32(vd);
+tcg_temp_free_i64(vm);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 412d8aaedb2..05ee76da77c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3050,7 +3050,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 0 ... 14:
+case 0 ... 15:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3063,10 +3063,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 if (op == 15) {
 /* rn is opcode, encoded as per VFP_SREG_N. */
 switch (rn) {
-case 0x0f: /* vcvt double<->single */
-rd_is_dp = !dp;
-break;
-
 case 0x10: /* vcvt.fxx.u32 */
 case 0x11: /* vcvt.fxx.s32 */
 rm_is_dp = false;
@@ -3185,13 +3181,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 switch (op) {
 case 15: /* extension space */
 switch (rn) {
-case 15: /* single<->double conversion */
-if (dp) {
-gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
-} else {
-gen_helper_vfp_fcvtds(cpu_F0d, cpu_F0s, cpu_env);
-}
-break;
 case 16: /* fuito */
 gen_vfp_uito(dp, 0);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 9942d2ae7ad..56b8b4e6046 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -208,3 +208,9 @@ VRINTX_sp 1110 1.11 0111  1010 01.0  \
  vd=%vd_sp vm=%vm_sp
 VRINTX_dp 1110 1.11 0111  1011 01.0  \
  vd=%vd_dp vm=%vm_dp
+
+# VCVT between single and double: Vm precision depends on size; Vd is its 
reverse
+VCVT_sp   1110 1.11 0111  1010 11.0  \
+ vd=%vd_dp vm=%vm_sp
+VCVT_dp   1110 1.11 0111  1011 11.0  \
+ vd=%vd_sp vm=%vm_dp
-- 
2.20.1




Re: [Qemu-devel] [PATCH v5 04/12] block/io_uring: implements interfaces for io_uring

2019-06-11 Thread Fam Zheng
On Mon, 06/10 19:18, Aarushi Mehta wrote:
> Aborts when sqe fails to be set as sqes cannot be returned to the ring.
> 
> Signed-off-by: Aarushi Mehta 
> ---
>  MAINTAINERS |   7 +
>  block/Makefile.objs |   3 +
>  block/io_uring.c| 314 
>  include/block/aio.h |  16 +-
>  include/block/raw-aio.h |  12 ++
>  5 files changed, 351 insertions(+), 1 deletion(-)
>  create mode 100644 block/io_uring.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7be1225415..49f896796e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2516,6 +2516,13 @@ F: block/file-posix.c
>  F: block/file-win32.c
>  F: block/win32-aio.c
>  
> +Linux io_uring
> +M: Aarushi Mehta 
> +R: Stefan Hajnoczi 
> +L: qemu-bl...@nongnu.org
> +S: Maintained
> +F: block/io_uring.c
> +
>  qcow2
>  M: Kevin Wolf 
>  M: Max Reitz 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index ae11605c9f..8fde7a23a5 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
>  block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
>  block-obj-$(CONFIG_POSIX) += file-posix.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
> +block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
>  block-obj-y += null.o mirror.o commit.o io.o create.o
>  block-obj-y += throttle-groups.o
>  block-obj-$(CONFIG_LINUX) += nvme.o
> @@ -61,5 +62,7 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
>  dmg-lzfse.o-libs   := $(LZFSE_LIBS)
>  qcow.o-libs:= -lz
>  linux-aio.o-libs   := -laio
> +io_uring.o-cflags  := $(LINUX_IO_URING_CFLAGS)
> +io_uring.o-libs:= $(LINUX_IO_URING_LIBS)
>  parallels.o-cflags := $(LIBXML2_CFLAGS)
>  parallels.o-libs   := $(LIBXML2_LIBS)
> diff --git a/block/io_uring.c b/block/io_uring.c
> new file mode 100644
> index 00..f327c7ef96
> --- /dev/null
> +++ b/block/io_uring.c
> @@ -0,0 +1,314 @@
> +/*
> + * Linux io_uring support.
> + *
> + * Copyright (C) 2009 IBM, Corp.
> + * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2019 Aarushi Mehta
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include 
> +#include "qemu-common.h"
> +#include "block/aio.h"
> +#include "qemu/queue.h"
> +#include "block/block.h"
> +#include "block/raw-aio.h"
> +#include "qemu/coroutine.h"
> +#include "qapi/error.h"
> +
> +#define MAX_EVENTS 128
> +
> +typedef struct LuringAIOCB {

I have to say it is a good name.

> +Coroutine *co;
> +struct io_uring_sqe sqeq;
> +ssize_t ret;
> +QEMUIOVector *qiov;
> +bool is_read;
> +QSIMPLEQ_ENTRY(LuringAIOCB) next;
> +} LuringAIOCB;
> +
> +typedef struct LuringQueue {
> +int plugged;
> +unsigned int in_queue;
> +unsigned int in_flight;
> +bool blocked;
> +QSIMPLEQ_HEAD(, LuringAIOCB) sq_overflow;
> +} LuringQueue;
> +
> +typedef struct LuringState {
> +AioContext *aio_context;
> +
> +struct io_uring ring;
> +
> +/* io queue for submit at batch.  Protected by AioContext lock. */
> +LuringQueue io_q;
> +
> +/* I/O completion processing.  Only runs in I/O thread.  */
> +QEMUBH *completion_bh;
> +} LuringState;
> +
> +/**
> + * ioq_submit:
> + * @s: AIO state
> + *
> + * Queues pending sqes and submits them
> + *
> + */
> +static int ioq_submit(LuringState *s);
> +
> +/**
> + * qemu_luring_process_completions:
> + * @s: AIO state
> + *
> + * Fetches completed I/O requests, consumes cqes and invokes their callbacks.
> + *
> + */
> +static void qemu_luring_process_completions(LuringState *s)
> +{
> +struct io_uring_cqe *cqes;
> +int ret;
> +
> +/*
> + * Request completion callbacks can run the nested event loop.
> + * Schedule ourselves so the nested event loop will "see" remaining
> + * completed requests and process them.  Without this, completion
> + * callbacks that wait for other requests using a nested event loop
> + * would hang forever.
> + */
> +qemu_bh_schedule(s->completion_bh);
> +
> +while (io_uring_peek_cqe(&s->ring, &cqes) == 0) {
> +if (!cqes) {
> +break;
> +}
> +LuringAIOCB *luringcb = io_uring_cqe_get_data(cqes);
> +ret = cqes->res;

Declarations should be in the beginning of the code block.

> +
> +if (ret == luringcb->qiov->size) {
> +ret = 0;
> +} else if (ret >= 0) {
> +/* Short Read/Write */
> +if (luringcb->is_read) {
> +/* Read, pad with zeroes */
> +qemu_iovec_memset(luringcb->qiov, ret, 0,
> +luringcb->qiov->size - ret);

Should you check that (ret < luringcb->qiov->size) since ret is from external?

Either way, ret should be assigned 0, I think.

> +} else {
> +ret = -ENOSPC;;

s/;;/;/

> +}
> +}
> +luringcb->ret = ret;
> +

[Qemu-devel] [PATCH v2 35/42] target/arm: Convert the VCVT-to-f16 insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 62 ++
 target/arm/translate.c | 79 +-
 target/arm/vfp.decode  |  6 +++
 3 files changed, 69 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 732bf6020a9..a19ede86719 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -2095,3 +2095,65 @@ static bool trans_VCVT_f64_f16(DisasContext *s, 
arg_VCVT_f64_f16 *a)
 tcg_temp_free_i64(vd);
 return true;
 }
+
+static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+{
+TCGv_ptr fpst;
+TCGv_i32 ahp_mode;
+TCGv_i32 tmp;
+
+if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(false);
+ahp_mode = get_ahp_flag();
+tmp = tcg_temp_new_i32();
+
+neon_load_reg32(tmp, a->vm);
+gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
+tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+tcg_temp_free_i32(ahp_mode);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(tmp);
+return true;
+}
+
+static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
+{
+TCGv_ptr fpst;
+TCGv_i32 ahp_mode;
+TCGv_i32 tmp;
+TCGv_i64 vm;
+
+if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(false);
+ahp_mode = get_ahp_flag();
+tmp = tcg_temp_new_i32();
+vm = tcg_temp_new_i64();
+
+neon_load_reg64(vm, a->vm);
+gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
+tcg_temp_free_i64(vm);
+tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+tcg_temp_free_i32(ahp_mode);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(tmp);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 34a82cfa424..143b250a996 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -2963,20 +2963,6 @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 #define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
-/* Move between integer and VFP cores.  */
-static TCGv_i32 gen_vfp_mrs(void)
-{
-TCGv_i32 tmp = tcg_temp_new_i32();
-tcg_gen_mov_i32(tmp, cpu_F0s);
-return tmp;
-}
-
-static void gen_vfp_msr(TCGv_i32 tmp)
-{
-tcg_gen_mov_i32(cpu_F0s, tmp);
-tcg_temp_free_i32(tmp);
-}
-
 static void gen_neon_dup_low16(TCGv_i32 var)
 {
 TCGv_i32 tmp = tcg_temp_new_i32();
@@ -3003,8 +2989,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
 uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
 int dp, veclen;
-TCGv_i32 tmp;
-TCGv_i32 tmp2;
 
 if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
 return 1;
@@ -3066,8 +3050,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 0 ... 5:
-case 8 ... 11:
+case 0 ... 11:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3080,20 +3063,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 if (op == 15) {
 /* rn is opcode, encoded as per VFP_SREG_N. */
 switch (rn) {
-case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-if (dp) {
-if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-return 1;
-}
-} else {
-if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-return 1;
-}
-}
-rd_is_dp = false;
-break;
-
 case 0x0c: /* vrintr */
 case 0x0d: /* vrintz */
 case 0x0e: /* vrintx */
@@ -3221,52 +3190,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 switch (op) {
 case 15: /* extension space */
 switch (rn) {
-case 

[Qemu-devel] [PATCH v2 22/42] target/arm: Convert VMUL to decodetree

2019-06-11 Thread Peter Maydell
Convert the VMUL instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 10 ++
 target/arm/translate.c |  5 +
 target/arm/vfp.decode  |  5 +
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 8532bf4abcd..a2afe82b349 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1417,3 +1417,13 @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
+{
+return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
+{
+return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1f83723b81a..96790e65c6f 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3112,7 +3112,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 3:
+case 0 ... 4:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3298,9 +3298,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 4: /* mul: fn * fm */
-gen_vfp_mul(dp);
-break;
 case 5: /* nmul: -(fn * fm) */
 gen_vfp_mul(dp);
 gen_vfp_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index c50d2c3ebf3..d7fcb9709a9 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -117,3 +117,8 @@ VNMLA_sp  1110 0.01   1010 .1.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLA_dp  1110 0.01   1011 .1.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMUL_sp   1110 0.10   1010 .0.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMUL_dp   1110 0.10   1011 .0.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 28/42] target/arm: Convert VMOV (imm) to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP VMOV (immediate) instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 129 +
 target/arm/translate.c |  27 +--
 target/arm/vfp.decode  |   5 ++
 3 files changed, 136 insertions(+), 25 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index ba6506a378c..a2eeb6cb511 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1602,3 +1602,132 @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
 
 return true;
 }
+
+static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
+{
+uint32_t delta_d = 0;
+uint32_t bank_mask = 0;
+int veclen = s->vec_len;
+TCGv_i32 fd;
+uint32_t n, i, vd;
+
+vd = a->vd;
+
+if (!dc_isar_feature(aa32_fpshvec, s) &&
+(veclen != 0 || s->vec_stride != 0)) {
+return false;
+}
+
+if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (veclen > 0) {
+bank_mask = 0x18;
+/* Figure out what type of vector operation this is.  */
+if ((vd & bank_mask) == 0) {
+/* scalar */
+veclen = 0;
+} else {
+delta_d = s->vec_stride + 1;
+}
+}
+
+n = (a->imm4h << 28) & 0x8000;
+i = ((a->imm4h << 4) & 0x70) | a->imm4l;
+if (i & 0x40) {
+i |= 0x780;
+} else {
+i |= 0x800;
+}
+n |= i << 19;
+
+fd = tcg_temp_new_i32();
+tcg_gen_movi_i32(fd, n);
+
+for (;;) {
+neon_store_reg32(fd, vd);
+
+if (veclen == 0) {
+break;
+}
+
+/* Set up the operands for the next iteration */
+veclen--;
+vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+}
+
+tcg_temp_free_i32(fd);
+return true;
+}
+
+static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+{
+uint32_t delta_d = 0;
+uint32_t bank_mask = 0;
+int veclen = s->vec_len;
+TCGv_i64 fd;
+uint32_t n, i, vd;
+
+vd = a->vd;
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (vd & 0x10)) {
+return false;
+}
+
+if (!dc_isar_feature(aa32_fpshvec, s) &&
+(veclen != 0 || s->vec_stride != 0)) {
+return false;
+}
+
+if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (veclen > 0) {
+bank_mask = 0xc;
+/* Figure out what type of vector operation this is.  */
+if ((vd & bank_mask) == 0) {
+/* scalar */
+veclen = 0;
+} else {
+delta_d = (s->vec_stride >> 1) + 1;
+}
+}
+
+n = (a->imm4h << 28) & 0x8000;
+i = ((a->imm4h << 4) & 0x70) | a->imm4l;
+if (i & 0x40) {
+i |= 0x3f80;
+} else {
+i |= 0x4000;
+}
+n |= i << 16;
+
+fd = tcg_temp_new_i64();
+tcg_gen_movi_i64(fd, ((uint64_t)n) << 32);
+
+for (;;) {
+neon_store_reg64(fd, vd);
+
+if (veclen == 0) {
+break;
+}
+
+/* Set up the operands for the next iteration */
+veclen--;
+vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+}
+
+tcg_temp_free_i64(fd);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 150e9e64cc3..b0a12991131 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3033,7 +3033,7 @@ static void gen_neon_dup_high16(TCGv_i32 var)
  */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
-uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
+uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
 int dp, veclen;
 TCGv_i32 tmp;
 TCGv_i32 tmp2;
@@ -3093,7 +3093,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 13:
+case 0 ... 14:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3279,29 +3279,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 14: /* fconst */
-if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-return 1;
-}
-
-n = (insn << 12) & 0x8000;
-i = ((insn >> 12) & 0x70) | (insn & 0xf);
-if (dp) {
-if (i & 0x40)
-i |= 0x3f80;
-else
-i |= 0x4000;
-n |= i << 16;
-t

[Qemu-devel] [PATCH v2 23/42] target/arm: Convert VNMUL to decodetree

2019-06-11 Thread Peter Maydell
Convert the VNMUL instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 24 
 target/arm/translate.c |  7 +--
 target/arm/vfp.decode  |  5 +
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index a2afe82b349..4c684f033b6 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1427,3 +1427,27 @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
 }
+
+static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+/* VNMUL: -(fn * fm) */
+gen_helper_vfp_muls(vd, vn, vm, fpst);
+gen_helper_vfp_negs(vd, vd);
+}
+
+static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
+{
+return do_vfp_3op_sp(s, gen_VNMUL_sp, a->vd, a->vn, a->vm, false);
+}
+
+static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+/* VNMUL: -(fn * fm) */
+gen_helper_vfp_muld(vd, vn, vm, fpst);
+gen_helper_vfp_negd(vd, vd);
+}
+
+static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
+{
+return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 96790e65c6f..1f9fa6b03a1 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1388,7 +1388,6 @@ static inline void gen_vfp_##name(int dp) 
\
 
 VFP_OP2(add)
 VFP_OP2(sub)
-VFP_OP2(mul)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -3112,7 +3111,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 4:
+case 0 ... 5:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3298,10 +3297,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 for (;;) {
 /* Perform the calculation.  */
 switch (op) {
-case 5: /* nmul: -(fn * fm) */
-gen_vfp_mul(dp);
-gen_vfp_neg(dp);
-break;
 case 6: /* add: fn + fm */
 gen_vfp_add(dp);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index d7fcb9709a9..3063fcac23f 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -122,3 +122,8 @@ VMUL_sp   1110 0.10   1010 .0.0  \
  vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMUL_dp   1110 0.10   1011 .0.0  \
  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMUL_sp  1110 0.10   1010 .1.0  \
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMUL_dp  1110 0.10   1011 .1.0  \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 34/42] target/arm: Convert the VCVT-from-f16 insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 82 ++
 target/arm/translate.c | 56 +--
 target/arm/vfp.decode  |  6 +++
 3 files changed, 89 insertions(+), 55 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index ebde86210a6..732bf6020a9 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -30,6 +30,26 @@
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
 
+/*
+ * Return the offset of a 16-bit half of the specified VFP single-precision
+ * register. If top is true, returns the top 16 bits; otherwise the bottom
+ * 16 bits.
+ */
+static inline long vfp_f16_offset(unsigned reg, bool top)
+{
+long offs = vfp_reg_offset(false, reg);
+#ifdef HOST_WORDS_BIGENDIAN
+if (!top) {
+offs += 2;
+}
+#else
+if (top) {
+offs += 2;
+}
+#endif
+return offs;
+}
+
 /*
  * Check that VFP access is enabled. If it is, do the necessary
  * M-profile lazy-FP handling and then return true.
@@ -2013,3 +2033,65 @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp 
*a)
 
 return true;
 }
+
+static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
+{
+TCGv_ptr fpst;
+TCGv_i32 ahp_mode;
+TCGv_i32 tmp;
+
+if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(false);
+ahp_mode = get_ahp_flag();
+tmp = tcg_temp_new_i32();
+/* The T bit tells us if we want the low or high 16 bits of Vm */
+tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
+neon_store_reg32(tmp, a->vd);
+tcg_temp_free_i32(ahp_mode);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(tmp);
+return true;
+}
+
+static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
+{
+TCGv_ptr fpst;
+TCGv_i32 ahp_mode;
+TCGv_i32 tmp;
+TCGv_i64 vd;
+
+if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+fpst = get_fpstatus_ptr(false);
+ahp_mode = get_ahp_flag();
+tmp = tcg_temp_new_i32();
+/* The T bit tells us if we want the low or high 16 bits of Vm */
+tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+vd = tcg_temp_new_i64();
+gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
+neon_store_reg64(vd, a->vd);
+tcg_temp_free_i32(ahp_mode);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(tmp);
+tcg_temp_free_i64(vd);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 761e8347fa0..34a82cfa424 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3066,7 +3066,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 0 ... 3:
+case 0 ... 5:
 case 8 ... 11:
 /* Already handled by decodetree */
 return 1;
@@ -3080,24 +3080,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 if (op == 15) {
 /* rn is opcode, encoded as per VFP_SREG_N. */
 switch (rn) {
-case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
-/*
- * VCVTB, VCVTT: only present with the halfprec extension
- * UNPREDICTABLE if bit 8 is set prior to ARMv8
- * (we choose to UNDEF)
- */
-if (dp) {
-if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-return 1;
-}
-} else {
-if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-return 1;
-}
-}
-rm_is_dp = false;
-break;
 case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
 case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
 if (dp) {
@@ -3239,42 +3221,6 @@ sta

[Qemu-devel] [PATCH v2 39/42] target/arm: Convert VJCVT to decodetree

2019-06-11 Thread Peter Maydell
Convert the VJCVT instruction to decodetree.

Signed-off-by: Peter Maydell 
---
 target/arm/translate-vfp.inc.c | 28 
 target/arm/translate.c | 12 +---
 target/arm/vfp.decode  |  4 
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index cc3f61d9c41..161f0fdd888 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -2426,3 +2426,31 @@ static bool trans_VCVT_int_dp(DisasContext *s, 
arg_VCVT_int_dp *a)
 tcg_temp_free_ptr(fpst);
 return true;
 }
+
+static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+{
+TCGv_i32 vd;
+TCGv_i64 vm;
+
+if (!dc_isar_feature(aa32_jscvt, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vm = tcg_temp_new_i64();
+vd = tcg_temp_new_i32();
+neon_load_reg64(vm, a->vm);
+gen_helper_vjcvt(vd, vm, cpu_env);
+neon_store_reg32(vd, a->vd);
+tcg_temp_free_i64(vm);
+tcg_temp_free_i32(vd);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1e28308aa6a..99b436ad6f7 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3050,7 +3050,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 0 ... 17:
+case 0 ... 19:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3085,13 +3085,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rm_is_dp = false;
 break;
 
-case 0x13: /* vjcvt */
-if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
-return 1;
-}
-rd_is_dp = false;
-break;
-
 default:
 return 1;
 }
@@ -3177,9 +3170,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 switch (op) {
 case 15: /* extension space */
 switch (rn) {
-case 19: /* vjcvt */
-gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
-break;
 case 20: /* fshto */
 gen_vfp_shto(dp, 16 - rm, 0);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 6da9a7913da..1a7c9b533de 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -220,3 +220,7 @@ VCVT_int_sp   1110 1.11 1000  1010 s:1 1.0  \
  vd=%vd_sp vm=%vm_sp
 VCVT_int_dp   1110 1.11 1000  1011 s:1 1.0  \
  vd=%vd_dp vm=%vm_sp
+
+# VJCVT is always dp to sp
+VJCVT 1110 1.11 1001  1011 11.0  \
+ vd=%vd_sp vm=%vm_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 36/42] target/arm: Convert VFP round insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.

These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 163 +
 target/arm/translate.c |  45 +
 target/arm/vfp.decode  |  15 +++
 3 files changed, 179 insertions(+), 44 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index a19ede86719..e94a8f2f0c5 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -2157,3 +2157,166 @@ static bool trans_VCVT_f16_f64(DisasContext *s, 
arg_VCVT_f16_f64 *a)
 tcg_temp_free_i32(tmp);
 return true;
 }
+
+static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
+{
+TCGv_ptr fpst;
+TCGv_i32 tmp;
+
+if (!dc_isar_feature(aa32_vrint, s)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = tcg_temp_new_i32();
+neon_load_reg32(tmp, a->vm);
+fpst = get_fpstatus_ptr(false);
+gen_helper_rints(tmp, tmp, fpst);
+neon_store_reg32(tmp, a->vd);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(tmp);
+return true;
+}
+
+static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_sp *a)
+{
+TCGv_ptr fpst;
+TCGv_i64 tmp;
+
+if (!dc_isar_feature(aa32_vrint, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = tcg_temp_new_i64();
+neon_load_reg64(tmp, a->vm);
+fpst = get_fpstatus_ptr(false);
+gen_helper_rintd(tmp, tmp, fpst);
+neon_store_reg64(tmp, a->vd);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i64(tmp);
+return true;
+}
+
+static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+TCGv_ptr fpst;
+TCGv_i32 tmp;
+TCGv_i32 tcg_rmode;
+
+if (!dc_isar_feature(aa32_vrint, s)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = tcg_temp_new_i32();
+neon_load_reg32(tmp, a->vm);
+fpst = get_fpstatus_ptr(false);
+tcg_rmode = tcg_const_i32(float_round_to_zero);
+gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+gen_helper_rints(tmp, tmp, fpst);
+gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+neon_store_reg32(tmp, a->vd);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(tcg_rmode);
+tcg_temp_free_i32(tmp);
+return true;
+}
+
+static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+TCGv_ptr fpst;
+TCGv_i64 tmp;
+TCGv_i32 tcg_rmode;
+
+if (!dc_isar_feature(aa32_vrint, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = tcg_temp_new_i64();
+neon_load_reg64(tmp, a->vm);
+fpst = get_fpstatus_ptr(false);
+tcg_rmode = tcg_const_i32(float_round_to_zero);
+gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+gen_helper_rintd(tmp, tmp, fpst);
+gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+neon_store_reg64(tmp, a->vd);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i64(tmp);
+tcg_temp_free_i32(tcg_rmode);
+return true;
+}
+
+static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
+{
+TCGv_ptr fpst;
+TCGv_i32 tmp;
+
+if (!dc_isar_feature(aa32_vrint, s)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = tcg_temp_new_i32();
+neon_load_reg32(tmp, a->vm);
+fpst = get_fpstatus_ptr(false);
+gen_helper_rints_exact(tmp, tmp, fpst);
+neon_store_reg32(tmp, a->vd);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(tmp);
+return true;
+}
+
+static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
+{
+TCGv_ptr fpst;
+TCGv_i64 tmp;
+
+if (!dc_isar_feature(aa32_vrint, s)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+tmp = tcg_temp_new_i64();
+neon_load_reg64(tmp, a->vm);
+fpst = get_fpstatus_ptr(false);
+gen_helper_rintd_exact(tmp, tmp, fpst);
+neon_store_reg64(tmp, a->vd);
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i64(tmp);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 143b250a996..412d8aaedb2 100644
--- a/target/arm/translate.c

[Qemu-devel] [PATCH v2 18/42] target/arm: Convert VFP VMLA to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP VMLA instruction to decodetree.

This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.

We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)

Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
  vmla.f64 d10, d1, d16   with length 2 stride 2
is equivalent to the pair of scalar operations
  vmla.f64 d10, d1, d16
  vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.

Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.h   |   5 +
 target/arm/translate-vfp.inc.c | 205 +
 target/arm/translate.c |  14 ++-
 target/arm/vfp.decode  |   6 +
 4 files changed, 224 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c612901daeb..135deb9cd62 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3378,6 +3378,11 @@ static inline bool isar_feature_aa32_fp_d32(const 
ARMISARegisters *id)
 return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
 }
 
+static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
+}
+
 /*
  * We always set the FP and SIMD FP16 fields to indicate identical
  * levels of support (assuming SIMD is implemented at all), so
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 9729946d734..4f922dc8405 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1098,3 +1098,208 @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, 
arg_VLDM_VSTM_dp *a)
 
 return true;
 }
+
+/*
+ * Types for callbacks for do_vfp_3op_sp() and do_vfp_3op_dp().
+ * The callback should emit code to write a value to vd. If
+ * do_vfp_3op_{sp,dp}() was passed reads_vd then the TCGv vd
+ * will contain the old value of the relevant VFP register;
+ * otherwise it must be written to only.
+ */
+typedef void VFPGen3OpSPFn(TCGv_i32 vd,
+   TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst);
+typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+   TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
+
+/*
+ * Perform a 3-operand VFP data processing instruction. fn is the
+ * callback to do the actual operation; this function deals with the
+ * code to handle looping around for VFP vector processing.
+ */
+static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
+  int vd, int vn, int vm, bool reads_vd)
+{
+uint32_t delta_m = 0;
+uint32_t delta_d = 0;
+uint32_t bank_mask = 0;
+int veclen = s->vec_len;
+TCGv_i32 f0, f1, fd;
+TCGv_ptr fpst;
+
+if (!dc_isar_feature(aa32_fpshvec, s) &&
+(veclen != 0 || s->vec_stride != 0)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (veclen > 0) {
+bank_mask = 0x18;
+
+/* Figure out what type of vector operation this is.  */
+if ((vd & bank_mask) == 0) {
+/* scalar */
+veclen = 0;
+} else {
+delta_d = s->vec_stride + 1;
+
+if ((vm & bank_ma

[Qemu-devel] [PATCH v2 27/42] target/arm: Convert VFP fused multiply-add insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.

Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 121 +
 target/arm/translate.c |  53 +--
 target/arm/vfp.decode  |   9 +++
 3 files changed, 131 insertions(+), 52 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 6af99605d5c..ba6506a378c 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1481,3 +1481,124 @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp 
*a)
 {
 return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VFM_sp(DisasContext *s, arg_VFM_sp *a)
+{
+/*
+ * VFNMA : fd = muladd(-fd,  fn, fm)
+ * VFNMS : fd = muladd(-fd, -fn, fm)
+ * VFMA  : fd = muladd( fd,  fn, fm)
+ * VFMS  : fd = muladd( fd, -fn, fm)
+ *
+ * These are fused multiply-add, and must be done as one floating
+ * point operation with no rounding between the multiplication and
+ * addition steps.  NB that doing the negations here as separate
+ * steps is correct : an input NaN should come out with its sign
+ * bit flipped if it is a negated-input.
+ */
+TCGv_ptr fpst;
+TCGv_i32 vn, vm, vd;
+
+/*
+ * Present in VFPv4 only.
+ * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+ * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+ */
+if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+(s->vec_len != 0 || s->vec_stride != 0)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vn = tcg_temp_new_i32();
+vm = tcg_temp_new_i32();
+vd = tcg_temp_new_i32();
+
+neon_load_reg32(vn, a->vn);
+neon_load_reg32(vm, a->vm);
+if (a->o2) {
+/* VFNMS, VFMS */
+gen_helper_vfp_negs(vn, vn);
+}
+neon_load_reg32(vd, a->vd);
+if (a->o1 & 1) {
+/* VFNMA, VFNMS */
+gen_helper_vfp_negs(vd, vd);
+}
+fpst = get_fpstatus_ptr(0);
+gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
+neon_store_reg32(vd, a->vd);
+
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i32(vn);
+tcg_temp_free_i32(vm);
+tcg_temp_free_i32(vd);
+
+return true;
+}
+
+static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
+{
+/*
+ * VFNMA : fd = muladd(-fd,  fn, fm)
+ * VFNMS : fd = muladd(-fd, -fn, fm)
+ * VFMA  : fd = muladd( fd,  fn, fm)
+ * VFMS  : fd = muladd( fd, -fn, fm)
+ *
+ * These are fused multiply-add, and must be done as one floating
+ * point operation with no rounding between the multiplication and
+ * addition steps.  NB that doing the negations here as separate
+ * steps is correct : an input NaN should come out with its sign
+ * bit flipped if it is a negated-input.
+ */
+TCGv_ptr fpst;
+TCGv_i64 vn, vm, vd;
+
+/*
+ * Present in VFPv4 only.
+ * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+ * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+ */
+if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+(s->vec_len != 0 || s->vec_stride != 0)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vn = tcg_temp_new_i64();
+vm = tcg_temp_new_i64();
+vd = tcg_temp_new_i64();
+
+neon_load_reg64(vn, a->vn);
+neon_load_reg64(vm, a->vm);
+if (a->o2) {
+/* VFNMS, VFMS */
+gen_helper_vfp_negd(vn, vn);
+}
+neon_load_reg64(vd, a->vd);
+if (a->o1 & 1) {
+/* VFNMA, VFNMS */
+gen_helper_vfp_negd(vd, vd);
+}
+fpst = get_fpstatus_ptr(0);
+gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
+neon_store_reg64(vd, a->vd);
+
+tcg_temp_free_ptr(fpst);
+tcg_temp_free_i64(vn);
+tcg_temp_free_i64(vm);
+tcg_temp_free_i64(vd);
+
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a9ec6eaef80..150e9e64cc3 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3093,7 +3093,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rn = VFP_SREG_N(insn);
 
 switch (op) {
-case 0 ... 8:
+case 0 ... 13:
 /* Already handled by decodetree */
 return 1;

[Qemu-devel] [PATCH v19 19/21] Add rx-softmmu

2019-06-11 Thread Philippe Mathieu-Daudé
From: Yoshinori Sato 

Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-17-ys...@users.sourceforge.jp>
Signed-off-by: Richard Henderson 
pick ed65c02993 target/rx: Add RX to SysEmuTarget
pick 01372568ae tests: Add rx to machine-none-test.c
[PMD: Squashed patches from Richard Henderson modifying
  qapi/common.json and tests/machine-none-test.c]
Signed-off-by: Philippe Mathieu-Daudé 
---
 arch_init.c| 2 ++
 configure  | 8 
 default-configs/rx-softmmu.mak | 3 +++
 hw/Kconfig | 1 +
 include/exec/poison.h  | 1 +
 include/sysemu/arch_init.h | 1 +
 qapi/common.json   | 3 ++-
 tests/machine-none-test.c  | 1 +
 8 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 default-configs/rx-softmmu.mak

diff --git a/arch_init.c b/arch_init.c
index f4f3f610c8..cc25ddd7ca 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -74,6 +74,8 @@ int graphic_depth = 32;
 #define QEMU_ARCH QEMU_ARCH_PPC
 #elif defined(TARGET_RISCV)
 #define QEMU_ARCH QEMU_ARCH_RISCV
+#elif defined(TARGET_RX)
+#define QEMU_ARCH QEMU_ARCH_RX
 #elif defined(TARGET_S390X)
 #define QEMU_ARCH QEMU_ARCH_S390X
 #elif defined(TARGET_SH4)
diff --git a/configure b/configure
index b091b82cb3..d6e16c58c3 100755
--- a/configure
+++ b/configure
@@ -7595,6 +7595,11 @@ case "$target_name" in
 gdb_xml_files="riscv-64bit-cpu.xml riscv-64bit-fpu.xml riscv-64bit-csr.xml"
 target_compiler=$cross_cc_riscv64
   ;;
+  rx)
+TARGET_ARCH=rx
+bflt="yes"
+target_compiler=$cross_cc_rx
+  ;;
   sh4|sh4eb)
 TARGET_ARCH=sh4
 bflt="yes"
@@ -7815,6 +7820,9 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
   riscv*)
 disas_config "RISCV"
   ;;
+  rx)
+disas_config "RX"
+  ;;
   s390*)
 disas_config "S390"
   ;;
diff --git a/default-configs/rx-softmmu.mak b/default-configs/rx-softmmu.mak
new file mode 100644
index 00..a3eecefb11
--- /dev/null
+++ b/default-configs/rx-softmmu.mak
@@ -0,0 +1,3 @@
+# Default configuration for rx-softmmu
+
+CONFIG_RX_VIRT=y
diff --git a/hw/Kconfig b/hw/Kconfig
index 195f541e50..b0c7221240 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -54,6 +54,7 @@ source nios2/Kconfig
 source openrisc/Kconfig
 source ppc/Kconfig
 source riscv/Kconfig
+source rx/Kconfig
 source s390x/Kconfig
 source sh4/Kconfig
 source sparc/Kconfig
diff --git a/include/exec/poison.h b/include/exec/poison.h
index b862320fa6..c17911d859 100644
--- a/include/exec/poison.h
+++ b/include/exec/poison.h
@@ -26,6 +26,7 @@
 #pragma GCC poison TARGET_PPC
 #pragma GCC poison TARGET_PPC64
 #pragma GCC poison TARGET_ABI32
+#pragma GCC poison TARGET_RX
 #pragma GCC poison TARGET_S390X
 #pragma GCC poison TARGET_SH4
 #pragma GCC poison TARGET_SPARC
diff --git a/include/sysemu/arch_init.h b/include/sysemu/arch_init.h
index 10cbafe970..3f4f844f7b 100644
--- a/include/sysemu/arch_init.h
+++ b/include/sysemu/arch_init.h
@@ -25,6 +25,7 @@ enum {
 QEMU_ARCH_NIOS2 = (1 << 17),
 QEMU_ARCH_HPPA = (1 << 18),
 QEMU_ARCH_RISCV = (1 << 19),
+QEMU_ARCH_RX = (1 << 20),
 };
 
 extern const uint32_t arch_type;
diff --git a/qapi/common.json b/qapi/common.json
index 99d313ef3b..d0fc931159 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -183,6 +183,7 @@
 #is true even for "qemu-system-x86_64".
 #
 # ppcemb: dropped in 3.1
+# rx: added in 4.1
 #
 # Since: 3.0
 ##
@@ -190,6 +191,6 @@
   'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
  'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
  'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
- 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
+ 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
  'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
  'x86_64', 'xtensa', 'xtensaeb' ] }
diff --git a/tests/machine-none-test.c b/tests/machine-none-test.c
index 4c6d470798..80df277357 100644
--- a/tests/machine-none-test.c
+++ b/tests/machine-none-test.c
@@ -56,6 +56,7 @@ static struct arch2cpu cpus_map[] = {
 { "hppa", "hppa" },
 { "riscv64", "rv64gcsu-v1.10.0" },
 { "riscv32", "rv32gcsu-v1.9.1" },
+{ "rx", "rx62n" },
 };
 
 static const char *get_cpu_model_by_arch(const char *arch)
-- 
2.20.1




[Qemu-devel] [PATCH v2 30/42] target/arm: Convert VNEG to decodetree

2019-06-11 Thread Peter Maydell
Convert the VNEG instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 10 ++
 target/arm/translate.c |  6 +-
 target/arm/vfp.decode  |  5 +
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index d0282f1f921..6e06b2a130a 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1898,3 +1898,13 @@ static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp 
*a)
 {
 return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
 }
+
+static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
+{
+return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
+}
+
+static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
+{
+return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index dc9076a60a3..b0eecb6ca8f 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3098,7 +3098,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 1:
+case 1 ... 2:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3112,7 +3112,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 /* rn is opcode, encoded as per VFP_SREG_N. */
 switch (rn) {
 case 0x00: /* vmov */
-case 0x02: /* vneg */
 case 0x03: /* vsqrt */
 break;
 
@@ -3291,9 +3290,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 case 0: /* cpy */
 /* no-op */
 break;
-case 2: /* neg */
-gen_vfp_neg(dp);
-break;
 case 3: /* sqrt */
 gen_vfp_sqrt(dp);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 7035861c270..79e41963be4 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -161,3 +161,8 @@ VABS_sp   1110 1.11   1010 11.0  \
  vd=%vd_sp vm=%vm_sp
 VABS_dp   1110 1.11   1011 11.0  \
  vd=%vd_dp vm=%vm_dp
+
+VNEG_sp   1110 1.11 0001  1010 01.0  \
+ vd=%vd_sp vm=%vm_sp
+VNEG_dp   1110 1.11 0001  1011 01.0  \
+ vd=%vd_dp vm=%vm_dp
-- 
2.20.1




[Qemu-devel] [PATCH v2 31/42] target/arm: Convert VSQRT to decodetree

2019-06-11 Thread Peter Maydell
Convert the VSQRT instruction to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 20 
 target/arm/translate.c | 14 +-
 target/arm/vfp.decode  |  5 +
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 6e06b2a130a..ae2f77a873b 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1908,3 +1908,23 @@ static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp 
*a)
 {
 return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
 }
+
+static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
+{
+gen_helper_vfp_sqrts(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
+{
+return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
+}
+
+static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
+{
+gen_helper_vfp_sqrtd(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
+{
+return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index b0eecb6ca8f..ce805f0ab28 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1390,14 +1390,6 @@ static inline void gen_vfp_neg(int dp)
 gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_sqrt(int dp)
-{
-if (dp)
-gen_helper_vfp_sqrtd(cpu_F0d, cpu_F0d, cpu_env);
-else
-gen_helper_vfp_sqrts(cpu_F0s, cpu_F0s, cpu_env);
-}
-
 static inline void gen_vfp_cmp(int dp)
 {
 if (dp)
@@ -3098,7 +3090,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 1 ... 2:
+case 1 ... 3:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3112,7 +3104,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 /* rn is opcode, encoded as per VFP_SREG_N. */
 switch (rn) {
 case 0x00: /* vmov */
-case 0x03: /* vsqrt */
 break;
 
 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
@@ -3290,9 +3281,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 case 0: /* cpy */
 /* no-op */
 break;
-case 3: /* sqrt */
-gen_vfp_sqrt(dp);
-break;
 case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
 {
 TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 79e41963be4..2780e1ed9ea 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -166,3 +166,8 @@ VNEG_sp   1110 1.11 0001  1010 01.0  \
  vd=%vd_sp vm=%vm_sp
 VNEG_dp   1110 1.11 0001  1011 01.0  \
  vd=%vd_dp vm=%vm_dp
+
+VSQRT_sp  1110 1.11 0001  1010 11.0  \
+ vd=%vd_sp vm=%vm_sp
+VSQRT_dp  1110 1.11 0001  1011 11.0  \
+ vd=%vd_dp vm=%vm_dp
-- 
2.20.1




[Qemu-devel] [PATCH v19 21/21] BootLinuxConsoleTest: Test the RX-Virt machine

2019-06-11 Thread Philippe Mathieu-Daudé
Add two tests for the rx-virt machine, based on the recommended test
setup from Yoshinori Sato:
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03586.html

- U-Boot prompt
- Linux kernel with Sash shell

These are very quick tests:

  $ avocado run -t arch:rx tests/acceptance/boot_linux_console.py
  JOB ID : 84a6ef01c0b87975ecbfcb31a920afd735753ace
  JOB LOG: 
/home/phil/avocado/job-results/job-2019-05-24T05.02-84a6ef0/job.log
   (1/2) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_rx_uboot: 
PASS (0.11 s)
   (2/2) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_rx_linux: 
PASS (0.45 s)
  RESULTS: PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
CANCEL 0

Tests can also be run with:

  $ avocado --show=console run -t arch:rx tests/acceptance/boot_linux_console.py
  console: U-Boot 2016.05-rc3-23705-ga1ef3c71cb-dirty (Feb 05 2019 - 21:56:06 
+0900)
  console: Linux version 4.19.0+ (yo-satoh@yo-satoh-debian) (gcc version 9.0.0 
20181105 (experimental) (GCC)) #137 Wed Feb 20 23:20:02 JST 2019
  console: Built 1 zonelists, mobility grouping on.  Total pages: 8128
  ...
  console: SuperH (H)SCI(F) driver initialized
  console: 88240.serial: ttySC0 at MMIO 0x88240 (irq = 215, base_baud = 0) is a 
sci
  console: console [ttySC0] enabled
  console: 88248.serial: ttySC1 at MMIO 0x88248 (irq = 219, base_baud = 0) is a 
sci

Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: 20190517045136.3509-1-richard.hender...@linaro.org
"RX architecture support"
---
 tests/acceptance/boot_linux_console.py | 51 ++
 1 file changed, 51 insertions(+)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index d5c500ea30..f68aab1df8 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -45,6 +45,11 @@ class BootLinuxConsole(Test):
 fail = 'Failure message found in console: %s' % failure_message
 self.fail(fail)
 
+def exec_command_and_wait_for_pattern(self, command, success_message):
+command += '\n'
+self.vm.console_socket.sendall(command.encode())
+self.wait_for_console_pattern(success_message)
+
 def extract_from_deb(self, deb, path):
 """
 Extracts a file from a deb package into the test workdir
@@ -217,3 +222,49 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_rx_uboot(self):
+"""
+:avocado: tags=arch:rx
+:avocado: tags=machine:rx-virt
+:avocado: tags=endian:little
+"""
+uboot_url = ('https://acc.dl.osdn.jp/users/23/23888/u-boot.bin.gz')
+uboot_hash = '9b78dbd43b40b2526848c0b1ce9de02c24f4dcdb'
+uboot_path = self.fetch_asset(uboot_url, asset_hash=uboot_hash)
+uboot_path = archive.uncompress(uboot_path, self.workdir)
+
+self.vm.set_machine('rx-virt')
+self.vm.set_console()
+self.vm.add_args('-bios', uboot_path,
+ '-no-reboot')
+self.vm.launch()
+uboot_version = 'U-Boot 2016.05-rc3-23705-ga1ef3c71cb-dirty'
+self.wait_for_console_pattern(uboot_version)
+gcc_version = 'rx-unknown-linux-gcc (GCC) 9.0.0 20181105 
(experimental)'
+# FIXME limit baudrate on chardev, else we type too fast
+#self.exec_command_and_wait_for_pattern('version', gcc_version)
+
+def test_rx_linux(self):
+"""
+:avocado: tags=arch:rx
+:avocado: tags=machine:rx-virt
+:avocado: tags=endian:little
+"""
+dtb_url = ('https://acc.dl.osdn.jp/users/23/23887/rx-qemu.dtb')
+dtb_hash = '7b4e4e2c71905da44e86ce47adee2210b026ac18'
+dtb_path = self.fetch_asset(dtb_url, asset_hash=dtb_hash)
+kernel_url = ('http://acc.dl.osdn.jp/users/23/23845/zImage')
+kernel_hash = '39a81067f8d72faad90866ddfefa19165d68fc99'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_machine('rx-virt')
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'earlycon'
+self.vm.add_args('-kernel', kernel_path,
+ '-dtb', dtb_path,
+ '-no-reboot')
+self.vm.launch()
+self.wait_for_console_pattern('Sash command shell (version 1.1.1)')
+self.exec_command_and_wait_for_pattern('printenv',
+   'TERM=linux')
-- 
2.20.1




[Qemu-devel] [PATCH v19 13/21] hw/char: RX62N serial communication interface (SCI)

2019-06-11 Thread Philippe Mathieu-Daudé
From: Yoshinori Sato 

This module supported only non FIFO type.
Hardware manual.
https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf

Signed-off-by: Yoshinori Sato 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-8-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 hw/char/Kconfig   |   3 +
 hw/char/Makefile.objs |   1 +
 hw/char/renesas_sci.c | 340 ++
 include/hw/char/renesas_sci.h |  45 +
 4 files changed, 389 insertions(+)
 create mode 100644 hw/char/renesas_sci.c
 create mode 100644 include/hw/char/renesas_sci.h

diff --git a/hw/char/Kconfig b/hw/char/Kconfig
index 40e7a8b8bb..874627520c 100644
--- a/hw/char/Kconfig
+++ b/hw/char/Kconfig
@@ -46,3 +46,6 @@ config SCLPCONSOLE
 
 config TERMINAL3270
 bool
+
+config RENESAS_SCI
+bool
diff --git a/hw/char/Makefile.objs b/hw/char/Makefile.objs
index 02d8a66925..4472d563b5 100644
--- a/hw/char/Makefile.objs
+++ b/hw/char/Makefile.objs
@@ -21,6 +21,7 @@ obj-$(CONFIG_PSERIES) += spapr_vty.o
 obj-$(CONFIG_DIGIC) += digic-uart.o
 obj-$(CONFIG_STM32F2XX_USART) += stm32f2xx_usart.o
 obj-$(CONFIG_RASPI) += bcm2835_aux.o
+obj-$(CONFIG_RENESAS_SCI) += renesas_sci.o
 
 common-obj-$(CONFIG_CMSDK_APB_UART) += cmsdk-apb-uart.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_ser.o
diff --git a/hw/char/renesas_sci.c b/hw/char/renesas_sci.c
new file mode 100644
index 00..6298cbf43a
--- /dev/null
+++ b/hw/char/renesas_sci.c
@@ -0,0 +1,340 @@
+/*
+ * Renesas Serial Communication Interface
+ *
+ * Datasheet: RX62N Group, RX621 Group User's Manual: Hardware
+ * (Rev.1.40 R01UH0033EJ0140)
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/char/renesas_sci.h"
+#include "qemu/error-report.h"
+
+/* SCI register map */
+REG8(SMR, 0)
+  FIELD(SMR, CKS,  0, 2)
+  FIELD(SMR, MP,   2, 1)
+  FIELD(SMR, STOP, 3, 1)
+  FIELD(SMR, PM,   4, 1)
+  FIELD(SMR, PE,   5, 1)
+  FIELD(SMR, CHR,  6, 1)
+  FIELD(SMR, CM,   7, 1)
+REG8(BRR, 1)
+REG8(SCR, 2)
+  FIELD(SCR, CKE, 0, 2)
+  FIELD(SCR, TEIE, 2, 1)
+  FIELD(SCR, MPIE, 3, 1)
+  FIELD(SCR, RE,   4, 1)
+  FIELD(SCR, TE,   5, 1)
+  FIELD(SCR, RIE,  6, 1)
+  FIELD(SCR, TIE,  7, 1)
+REG8(TDR, 3)
+REG8(SSR, 4)
+  FIELD(SSR, MPBT, 0, 1)
+  FIELD(SSR, MPB,  1, 1)
+  FIELD(SSR, TEND, 2, 1)
+  FIELD(SSR, ERR, 3, 3)
+FIELD(SSR, PER,  3, 1)
+FIELD(SSR, FER,  4, 1)
+FIELD(SSR, ORER, 5, 1)
+  FIELD(SSR, RDRF, 6, 1)
+  FIELD(SSR, TDRE, 7, 1)
+REG8(RDR, 5)
+REG8(SCMR, 6)
+  FIELD(SCMR, SMIF, 0, 1)
+  FIELD(SCMR, SINV, 2, 1)
+  FIELD(SCMR, SDIR, 3, 1)
+  FIELD(SCMR, BCP2, 7, 1)
+REG8(SEMR, 7)
+  FIELD(SEMR, ACS0, 0, 1)
+  FIELD(SEMR, ABCS, 4, 1)
+
+static int can_receive(void *opaque)
+{
+RSCIState *sci = RSCI(opaque);
+if (sci->rx_next > qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)) {
+return 0;
+} else {
+return FIELD_EX8(sci->scr, SCR, RE);
+}
+}
+
+static void receive(void *opaque, const uint8_t *buf, int size)
+{
+RSCIState *sci = RSCI(opaque);
+sci->rx_next = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + sci->trtime;
+if (FIELD_EX8(sci->ssr, SSR, RDRF) || size > 1) {
+sci->ssr = FIELD_DP8(sci->ssr, SSR, ORER, 1);
+if (FIELD_EX8(sci->scr, SCR, RIE)) {
+qemu_set_irq(sci->irq[ERI], 1);
+}
+} else {
+sci->rdr = buf[0];
+sci->ssr = FIELD_DP8(sci->ssr, SSR, RDRF, 1);
+if (FIELD_EX8(sci->scr, SCR, RIE)) {
+qemu_irq_pulse(sci->irq[RXI]);
+}
+}
+}
+
+static void send_byte(RSCIState *sci)
+{
+if (qemu_chr_fe_backend_connected(&sci->chr)) {
+qemu_chr_fe_write_all(&sci->chr, &sci->tdr, 1);
+}
+timer_mod(sci->timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + sci->trtime);
+sci->ssr = FIELD_DP8(sci->ssr, SSR, TEND, 0);
+sci->ssr = FIELD_DP8(sci->ssr, SSR, TDRE, 1);
+qemu_set_irq(sci->irq[TEI], 0);
+if (FIELD_EX8(sci->scr, SCR, TIE)) {
+qemu_irq_pulse(sci->irq[TXI]);
+}
+}
+
+static void txend(void *opaque)
+{
+RSCIState *sci = RSCI(opaque);
+if (!FIELD_EX8(sci

[Qemu-devel] [PATCH v2 42/42] target/arm: Fix short-vector increment behaviour

2019-06-11 Thread Peter Maydell
For VFP short vectors, the VFP registers are divided into a
series of banks: for single-precision these are s0-s7, s8-s15,
s16-s23 and s24-s31; for double-precision they are d0-d3,
d4-d7, ... d28-d31. Some banks are "scalar" meaning that
use of a register within them triggers a pure-scalar or
mixed vector-scalar operation rather than a full vector
operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
When using a bank as part of a vector operation, we
iterate through it, increasing the register number by
the specified stride each time, and wrapping around to
the beginning of the bank.

Unfortunately our calculation of the "increment" part of this
was incorrect:
 vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
will only do the intended thing if bank_mask has exactly
one set high bit. For instance for doubles (bank_mask = 0xc),
if we start with vd = 6 and delta_d = 2 then vd is updated
to 12 rather than the intended 4.

This only causes problems in the unlikely case that the
starting register is not the first in its bank: if the
register number doesn't have to wrap around then the
expression happens to give the right answer.

Fix this bug by abstracting out the "check whether register
is in a scalar bank" and "advance register within bank"
operations to utility functions which use the right
bit masking operations.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 100 -
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 8216dba796e..709fc65374d 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1139,6 +1139,42 @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
 typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
 
+/*
+ * Return true if the specified S reg is in a scalar bank
+ * (ie if it is s0..s7)
+ */
+static inline bool vfp_sreg_is_scalar(int reg)
+{
+return (reg & 0x18) == 0;
+}
+
+/*
+ * Return true if the specified D reg is in a scalar bank
+ * (ie if it is d0..d3 or d16..d19)
+ */
+static inline bool vfp_dreg_is_scalar(int reg)
+{
+return (reg & 0xc) == 0;
+}
+
+/*
+ * Advance the S reg number forwards by delta within its bank
+ * (ie increment the low 3 bits but leave the rest the same)
+ */
+static inline int vfp_advance_sreg(int reg, int delta)
+{
+return ((reg + delta) & 0x7) | (reg & ~0x7);
+}
+
+/*
+ * Advance the D reg number forwards by delta within its bank
+ * (ie increment the low 2 bits but leave the rest the same)
+ */
+static inline int vfp_advance_dreg(int reg, int delta)
+{
+return ((reg + delta) & 0x3) | (reg & ~0x3);
+}
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -1149,7 +1185,6 @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn 
*fn,
 {
 uint32_t delta_m = 0;
 uint32_t delta_d = 0;
-uint32_t bank_mask = 0;
 int veclen = s->vec_len;
 TCGv_i32 f0, f1, fd;
 TCGv_ptr fpst;
@@ -1164,16 +1199,14 @@ static bool do_vfp_3op_sp(DisasContext *s, 
VFPGen3OpSPFn *fn,
 }
 
 if (veclen > 0) {
-bank_mask = 0x18;
-
 /* Figure out what type of vector operation this is.  */
-if ((vd & bank_mask) == 0) {
+if (vfp_sreg_is_scalar(vd)) {
 /* scalar */
 veclen = 0;
 } else {
 delta_d = s->vec_stride + 1;
 
-if ((vm & bank_mask) == 0) {
+if (vfp_sreg_is_scalar(vm)) {
 /* mixed scalar/vector */
 delta_m = 0;
 } else {
@@ -1204,11 +1237,11 @@ static bool do_vfp_3op_sp(DisasContext *s, 
VFPGen3OpSPFn *fn,
 
 /* Set up the operands for the next iteration */
 veclen--;
-vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+vd = vfp_advance_sreg(vd, delta_d);
+vn = vfp_advance_sreg(vn, delta_d);
 neon_load_reg32(f0, vn);
 if (delta_m) {
-vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+vm = vfp_advance_sreg(vm, delta_m);
 neon_load_reg32(f1, vm);
 }
 }
@@ -1226,7 +1259,6 @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn 
*fn,
 {
 uint32_t delta_m = 0;
 uint32_t delta_d = 0;
-uint32_t bank_mask = 0;
 int veclen = s->vec_len;
 TCGv_i64 f0, f1, fd;
 TCGv_ptr fpst;
@@ -1246,16 +1278,14 @@ static bool do_vfp_3op_dp(DisasContext *s, 
VFPGen3OpDPFn *fn,
 }
 
 if (veclen > 0) {
-bank_mask = 0xc;
-
 /* Figure out what type of vector operation this is.  */
-if ((vd & bank_mask) == 0) {
+if (vfp_dreg_is_scalar(vd)) {
 /* scalar */
 veclen = 0;
 } else {
 delta_d

[Qemu-devel] [PATCH v2 29/42] target/arm: Convert VABS to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP VABS instruction to decodetree.

Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 167 +
 target/arm/translate.c |  12 ++-
 target/arm/vfp.decode  |   5 +
 3 files changed, 180 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index a2eeb6cb511..d0282f1f921 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -,6 +,14 @@ typedef void VFPGen3OpSPFn(TCGv_i32 vd,
 typedef void VFPGen3OpDPFn(TCGv_i64 vd,
TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
 
+/*
+ * Types for callbacks for do_vfp_2op_sp() and do_vfp_2op_dp().
+ * The callback should emit code to write a value to vd (which
+ * should be written to only).
+ */
+typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -1274,6 +1282,155 @@ static bool do_vfp_3op_dp(DisasContext *s, 
VFPGen3OpDPFn *fn,
 return true;
 }
 
+static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+{
+uint32_t delta_m = 0;
+uint32_t delta_d = 0;
+uint32_t bank_mask = 0;
+int veclen = s->vec_len;
+TCGv_i32 f0, fd;
+
+if (!dc_isar_feature(aa32_fpshvec, s) &&
+(veclen != 0 || s->vec_stride != 0)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (veclen > 0) {
+bank_mask = 0x18;
+
+/* Figure out what type of vector operation this is.  */
+if ((vd & bank_mask) == 0) {
+/* scalar */
+veclen = 0;
+} else {
+delta_d = s->vec_stride + 1;
+
+if ((vm & bank_mask) == 0) {
+/* mixed scalar/vector */
+delta_m = 0;
+} else {
+/* vector */
+delta_m = delta_d;
+}
+}
+}
+
+f0 = tcg_temp_new_i32();
+fd = tcg_temp_new_i32();
+
+neon_load_reg32(f0, vm);
+
+for (;;) {
+fn(fd, f0);
+neon_store_reg32(fd, vd);
+
+if (veclen == 0) {
+break;
+}
+
+if (delta_m == 0) {
+/* single source one-many */
+while (veclen--) {
+vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+neon_store_reg32(fd, vd);
+}
+break;
+}
+
+/* Set up the operands for the next iteration */
+veclen--;
+vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+neon_load_reg32(f0, vm);
+}
+
+tcg_temp_free_i32(f0);
+tcg_temp_free_i32(fd);
+
+return true;
+}
+
+static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+{
+uint32_t delta_m = 0;
+uint32_t delta_d = 0;
+uint32_t bank_mask = 0;
+int veclen = s->vec_len;
+TCGv_i64 f0, fd;
+
+/* UNDEF accesses to D16-D31 if they don't exist */
+if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
+return false;
+}
+
+if (!dc_isar_feature(aa32_fpshvec, s) &&
+(veclen != 0 || s->vec_stride != 0)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+if (veclen > 0) {
+bank_mask = 0xc;
+
+/* Figure out what type of vector operation this is.  */
+if ((vd & bank_mask) == 0) {
+/* scalar */
+veclen = 0;
+} else {
+delta_d = (s->vec_stride >> 1) + 1;
+
+if ((vm & bank_mask) == 0) {
+/* mixed scalar/vector */
+delta_m = 0;
+} else {
+/* vector */
+delta_m = delta_d;
+}
+}
+}
+
+f0 = tcg_temp_new_i64();
+fd = tcg_temp_new_i64();
+
+neon_load_reg64(f0, vm);
+
+for (;;) {
+fn(fd, f0);
+neon_store_reg64(fd, vd);
+
+if (veclen == 0) {
+break;
+}
+
+if (delta_m == 0) {
+/* single source one-many */
+while (veclen--) {
+vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+neon_store_reg64(fd, vd);
+}
+break;
+}
+
+/* Set up the operands for the next iteration */
+veclen--;
+vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+neon_load_reg64(f0, vm);
+}
+
+tcg_te

Re: [Qemu-devel] [PATCH] decodetree: Fix comparison of Field

2019-06-11 Thread Philippe Mathieu-Daudé
On 6/4/19 5:42 PM, Richard Henderson wrote:
> Typo comparing the sign of the field, twice, instead of also comparing
> the mask of the field (which itself encodes both position and length).
> 
> Reported-by: Peter Maydell 
> Signed-off-by: Richard Henderson 
> ---
>  scripts/decodetree.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 81874e22cc..d7a59d63ac 100755
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -184,7 +184,7 @@ class Field:
>  return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
>  
>  def __eq__(self, other):
> -return self.sign == other.sign and self.sign == other.sign
> +return self.sign == other.sign and self.mask == other.mask

Argh

Reviewed-by: Philippe Mathieu-Daudé 

>  
>  def __ne__(self, other):
>  return not self.__eq__(other)
> 



[Qemu-devel] [PATCH v19 11/21] hw/intc: RX62N interrupt controller (ICUa)

2019-06-11 Thread Philippe Mathieu-Daudé
From: Yoshinori Sato 

This implementation supported only ICUa.
Hardware manual.
https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf

Signed-off-by: Yoshinori Sato 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-6-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 hw/intc/Kconfig  |   3 +
 hw/intc/Makefile.objs|   1 +
 hw/intc/rx_icu.c | 376 +++
 include/hw/intc/rx_icu.h |  56 ++
 4 files changed, 436 insertions(+)
 create mode 100644 hw/intc/rx_icu.c
 create mode 100644 include/hw/intc/rx_icu.h

diff --git a/hw/intc/Kconfig b/hw/intc/Kconfig
index 5347f8412c..67e9d97464 100644
--- a/hw/intc/Kconfig
+++ b/hw/intc/Kconfig
@@ -58,3 +58,6 @@ config S390_FLIC_KVM
 
 config OMPIC
 bool
+
+config RX_ICU
+bool
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 03019b9a03..16bdc7e427 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -49,3 +49,4 @@ obj-$(CONFIG_ARM_GIC) += arm_gicv3_cpuif.o
 obj-$(CONFIG_MIPS_CPS) += mips_gic.o
 obj-$(CONFIG_NIOS2) += nios2_iic.o
 obj-$(CONFIG_OMPIC) += ompic.o
+obj-$(CONFIG_RX) += rx_icu.o
diff --git a/hw/intc/rx_icu.c b/hw/intc/rx_icu.c
new file mode 100644
index 00..cb28c7a8d2
--- /dev/null
+++ b/hw/intc/rx_icu.c
@@ -0,0 +1,376 @@
+/*
+ * RX Interrupt Control Unit
+ *
+ * Warning: Only ICUa is supported.
+ *
+ * Datasheet: RX62N Group, RX621 Group User's Manual: Hardware
+ * (Rev.1.40 R01UH0033EJ0140)
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/intc/rx_icu.h"
+#include "qemu/error-report.h"
+
+REG8(IR, 0)
+  FIELD(IR, IR,  0, 1)
+REG8(DTCER, 0x100)
+  FIELD(DTCER, DTCE,  0, 1)
+REG8(IER, 0x200)
+REG8(SWINTR, 0x2e0)
+  FIELD(SWINTR, SWINT, 0, 1)
+REG16(FIR, 0x2f0)
+  FIELD(FIR, FVCT, 0, 8)
+  FIELD(FIR, FIEN, 15, 1)
+REG8(IPR, 0x300)
+  FIELD(IPR, IPR, 0, 4)
+REG8(DMRSR, 0x400)
+REG8(IRQCR, 0x500)
+  FIELD(IRQCR, IRQMD, 2, 2)
+REG8(NMISR, 0x580)
+  FIELD(NMISR, NMIST, 0, 1)
+  FIELD(NMISR, LVDST, 1, 1)
+  FIELD(NMISR, OSTST, 2, 1)
+REG8(NMIER, 0x581)
+  FIELD(NMIER, NMIEN, 0, 1)
+  FIELD(NMIER, LVDEN, 1, 1)
+  FIELD(NMIER, OSTEN, 2, 1)
+REG8(NMICLR, 0x582)
+  FIELD(NMICLR, NMICLR, 0, 1)
+  FIELD(NMICLR, OSTCLR, 2, 1)
+REG8(NMICR, 0x583)
+  FIELD(NMICR, NMIMD, 3, 1)
+
+#define request(icu, n) (icu->ipr[icu->map[n]] << 8 | n)
+
+static void set_irq(RXICUState *icu, int n_IRQ, int req)
+{
+if ((icu->fir & R_FIR_FIEN_MASK) &&
+(icu->fir & R_FIR_FVCT_MASK) == n_IRQ) {
+qemu_set_irq(icu->_fir, req);
+} else {
+qemu_set_irq(icu->_irq, req);
+}
+}
+
+static void rxicu_request(RXICUState *icu, int n_IRQ)
+{
+int enable;
+
+enable = icu->ier[n_IRQ / 8] & (1 << (n_IRQ & 7));
+if (n_IRQ > 0 && enable != 0 && atomic_read(&icu->req_irq) < 0) {
+atomic_set(&icu->req_irq, n_IRQ);
+set_irq(icu, n_IRQ, request(icu, n_IRQ));
+}
+}
+
+static void rxicu_set_irq(void *opaque, int n_IRQ, int level)
+{
+RXICUState *icu = opaque;
+struct IRQSource *src;
+int issue;
+
+if (n_IRQ >= NR_IRQS) {
+error_report("%s: IRQ %d out of range", __func__, n_IRQ);
+return;
+}
+
+src = &icu->src[n_IRQ];
+
+level = (level != 0);
+switch (src->sense) {
+case TRG_LEVEL:
+/* level-sensitive irq */
+issue = level;
+src->level = level;
+break;
+case TRG_NEDGE:
+issue = (level == 0 && src->level == 1);
+src->level = level;
+break;
+case TRG_PEDGE:
+issue = (level == 1 && src->level == 0);
+src->level = level;
+break;
+case TRG_BEDGE:
+issue = ((level ^ src->level) & 1);
+src->level = level;
+break;
+default:
+g_assert_not_reached();
+}
+if (issue == 0 && src->sense == TRG_LEVEL) {
+icu->ir[n_IRQ] = 0;
+if (atomic_read(&icu->req_irq) == n_IRQ) {
+/* clear request */
+set_irq(icu, n_IRQ, 0);
+atomic_set(&icu->req_irq, -1);
+}
+return;
+}
+if (issue) 

[Qemu-devel] [PATCH v2 32/42] target/arm: Convert VMOV (register) to decodetree

2019-06-11 Thread Peter Maydell
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 10 ++
 target/arm/translate.c |  8 +---
 target/arm/vfp.decode  |  5 +
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index ae2f77a873b..a7e4ae31985 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1889,6 +1889,16 @@ static bool trans_VMOV_imm_dp(DisasContext *s, 
arg_VMOV_imm_dp *a)
 return true;
 }
 
+static bool trans_VMOV_reg_sp(DisasContext *s, arg_VMOV_reg_sp *a)
+{
+return do_vfp_2op_sp(s, tcg_gen_mov_i32, a->vd, a->vm);
+}
+
+static bool trans_VMOV_reg_dp(DisasContext *s, arg_VMOV_reg_dp *a)
+{
+return do_vfp_2op_dp(s, tcg_gen_mov_i64, a->vd, a->vm);
+}
+
 static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
 {
 return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ce805f0ab28..ad723466b18 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3090,7 +3090,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 1 ... 3:
+case 0 ... 3:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3103,9 +3103,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 if (op == 15) {
 /* rn is opcode, encoded as per VFP_SREG_N. */
 switch (rn) {
-case 0x00: /* vmov */
-break;
-
 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
 case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
 /*
@@ -3278,9 +3275,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 switch (op) {
 case 15: /* extension space */
 switch (rn) {
-case 0: /* cpy */
-/* no-op */
-break;
 case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
 {
 TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 2780e1ed9ea..b72ab8b8067 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -157,6 +157,11 @@ VMOV_imm_sp   1110 1.11 imm4h:4  1010  imm4l:4 
\
 VMOV_imm_dp   1110 1.11 imm4h:4  1011  imm4l:4 \
  vd=%vd_dp
 
+VMOV_reg_sp   1110 1.11   1010 01.0  \
+ vd=%vd_sp vm=%vm_sp
+VMOV_reg_dp   1110 1.11   1011 01.0  \
+ vd=%vd_dp vm=%vm_dp
+
 VABS_sp   1110 1.11   1010 11.0  \
  vd=%vd_sp vm=%vm_sp
 VABS_dp   1110 1.11   1011 11.0  \
-- 
2.20.1




[Qemu-devel] [PATCH v19 15/21] hw/rx: Honor -accel qtest

2019-06-11 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Issue an error if no kernel, no bios, and not qtest'ing.
Fixes make check-qtest-rx: test/qom-test.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-16-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
We could squash this with the previous patch
---
 hw/rx/rx62n.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/rx/rx62n.c b/hw/rx/rx62n.c
index 74d2fd0ee3..05d82d0b8f 100644
--- a/hw/rx/rx62n.c
+++ b/hw/rx/rx62n.c
@@ -21,11 +21,13 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "hw/hw.h"
 #include "hw/rx/rx62n.h"
 #include "hw/loader.h"
 #include "hw/sysbus.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/qtest.h"
 #include "cpu.h"
 
 /*
@@ -190,8 +192,14 @@ static void rx62n_realize(DeviceState *dev, Error **errp)
 memory_region_init_rom(&s->c_flash, NULL, "codeflash",
RX62N_CFLASH_SIZE, errp);
 memory_region_add_subregion(s->sysmem, RX62N_CFLASH_BASE, &s->c_flash);
+
 if (!s->kernel) {
-rom_add_file_fixed(bios_name, RX62N_CFLASH_BASE, 0);
+if (bios_name) {
+rom_add_file_fixed(bios_name, RX62N_CFLASH_BASE, 0);
+}  else if (!qtest_enabled()) {
+error_report("No bios or kernel specified");
+exit(1);
+}
 }
 
 /* Initialize CPU */
-- 
2.20.1




[Qemu-devel] [PATCH v2 40/42] target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VCVT (between floating-point and fixed-point) instructions
to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 124 +
 target/arm/translate.c |  57 +--
 target/arm/vfp.decode  |  10 +++
 3 files changed, 136 insertions(+), 55 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 161f0fdd888..db07fdd8736 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -2454,3 +2454,127 @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
 tcg_temp_free_i32(vd);
 return true;
 }
+
+static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
+{
+TCGv_i32 vd, shift;
+TCGv_ptr fpst;
+int frac_bits;
+
+if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
+
+vd = tcg_temp_new_i32();
+neon_load_reg32(vd, a->vd);
+
+fpst = get_fpstatus_ptr(false);
+shift = tcg_const_i32(frac_bits);
+
+/* Switch on op:U:sx bits */
+switch (a->opc) {
+case 0:
+gen_helper_vfp_shtos(vd, vd, shift, fpst);
+break;
+case 1:
+gen_helper_vfp_sltos(vd, vd, shift, fpst);
+break;
+case 2:
+gen_helper_vfp_uhtos(vd, vd, shift, fpst);
+break;
+case 3:
+gen_helper_vfp_ultos(vd, vd, shift, fpst);
+break;
+case 4:
+gen_helper_vfp_toshs_round_to_zero(vd, vd, shift, fpst);
+break;
+case 5:
+gen_helper_vfp_tosls_round_to_zero(vd, vd, shift, fpst);
+break;
+case 6:
+gen_helper_vfp_touhs_round_to_zero(vd, vd, shift, fpst);
+break;
+case 7:
+gen_helper_vfp_touls_round_to_zero(vd, vd, shift, fpst);
+break;
+default:
+g_assert_not_reached();
+}
+
+neon_store_reg32(vd, a->vd);
+tcg_temp_free_i32(vd);
+tcg_temp_free_i32(shift);
+tcg_temp_free_ptr(fpst);
+return true;
+}
+
+static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
+{
+TCGv_i64 vd;
+TCGv_i32 shift;
+TCGv_ptr fpst;
+int frac_bits;
+
+if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
+
+vd = tcg_temp_new_i64();
+neon_load_reg64(vd, a->vd);
+
+fpst = get_fpstatus_ptr(false);
+shift = tcg_const_i32(frac_bits);
+
+/* Switch on op:U:sx bits */
+switch (a->opc) {
+case 0:
+gen_helper_vfp_shtod(vd, vd, shift, fpst);
+break;
+case 1:
+gen_helper_vfp_sltod(vd, vd, shift, fpst);
+break;
+case 2:
+gen_helper_vfp_uhtod(vd, vd, shift, fpst);
+break;
+case 3:
+gen_helper_vfp_ultod(vd, vd, shift, fpst);
+break;
+case 4:
+gen_helper_vfp_toshd_round_to_zero(vd, vd, shift, fpst);
+break;
+case 5:
+gen_helper_vfp_tosld_round_to_zero(vd, vd, shift, fpst);
+break;
+case 6:
+gen_helper_vfp_touhd_round_to_zero(vd, vd, shift, fpst);
+break;
+case 7:
+gen_helper_vfp_tould_round_to_zero(vd, vd, shift, fpst);
+break;
+default:
+g_assert_not_reached();
+}
+
+neon_store_reg64(vd, a->vd);
+tcg_temp_free_i64(vd);
+tcg_temp_free_i32(shift);
+tcg_temp_free_ptr(fpst);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 99b436ad6f7..6046bb32247 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1439,13 +1439,9 @@ static inline void gen_vfp_##name(int dp, int shift, int 
neon) \
 tcg_temp_free_i32(tmp_shift); \
 tcg_temp_free_ptr(statusptr); \
 }
-VFP_GEN_FIX(tosh, _round_to_zero)
 VFP_GEN_FIX(tosl, _round_to_zero)
-VFP_GEN_FIX(touh, _round_to_zero)
 VFP_GEN_FIX(toul, _round_to_zero)
-VFP_GEN_FIX(shto, )
 VFP_GEN_FIX(slto, )
-VFP_GEN_FIX(uhto, )
 VFP_GEN_FIX(ulto, )
 #undef VFP_GEN_FIX
 
@@ -3050,7 +3046,8 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 0 ... 19:
+case 0 ... 23:
+case 28 ... 31:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3070,21 +3067,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rd_is_dp = false;
 break;
 
-case 0x14: /* vcvt fp <-> fixed */
-case 0x15:
-case 0x16:
-case 0x17:
- 

[Qemu-devel] [PATCH v2 33/42] target/arm: Convert VFP comparison insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VFP comparison instructions to decodetree.

Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations.  This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 75 ++
 target/arm/translate.c | 51 +--
 target/arm/vfp.decode  |  5 +++
 3 files changed, 81 insertions(+), 50 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index a7e4ae31985..ebde86210a6 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1938,3 +1938,78 @@ static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp 
*a)
 {
 return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
 }
+
+static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+{
+TCGv_i32 vd, vm;
+
+/* Vm/M bits must be zero for the Z variant */
+if (a->z && a->vm != 0) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vd = tcg_temp_new_i32();
+vm = tcg_temp_new_i32();
+
+neon_load_reg32(vd, a->vd);
+if (a->z) {
+tcg_gen_movi_i32(vm, 0);
+} else {
+neon_load_reg32(vm, a->vm);
+}
+
+if (a->e) {
+gen_helper_vfp_cmpes(vd, vm, cpu_env);
+} else {
+gen_helper_vfp_cmps(vd, vm, cpu_env);
+}
+
+tcg_temp_free_i32(vd);
+tcg_temp_free_i32(vm);
+
+return true;
+}
+
+static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
+{
+TCGv_i64 vd, vm;
+
+/* Vm/M bits must be zero for the Z variant */
+if (a->z && a->vm != 0) {
+return false;
+}
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vd = tcg_temp_new_i64();
+vm = tcg_temp_new_i64();
+
+neon_load_reg64(vd, a->vd);
+if (a->z) {
+tcg_gen_movi_i64(vm, 0);
+} else {
+neon_load_reg64(vm, a->vm);
+}
+
+if (a->e) {
+gen_helper_vfp_cmped(vd, vm, cpu_env);
+} else {
+gen_helper_vfp_cmpd(vd, vm, cpu_env);
+}
+
+tcg_temp_free_i64(vd);
+tcg_temp_free_i64(vm);
+
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ad723466b18..761e8347fa0 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1390,30 +1390,6 @@ static inline void gen_vfp_neg(int dp)
 gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_cmp(int dp)
-{
-if (dp)
-gen_helper_vfp_cmpd(cpu_F0d, cpu_F1d, cpu_env);
-else
-gen_helper_vfp_cmps(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_cmpe(int dp)
-{
-if (dp)
-gen_helper_vfp_cmped(cpu_F0d, cpu_F1d, cpu_env);
-else
-gen_helper_vfp_cmpes(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_F1_ld0(int dp)
-{
-if (dp)
-tcg_gen_movi_i64(cpu_F1d, 0);
-else
-tcg_gen_movi_i32(cpu_F1s, 0);
-}
-
 #define VFP_GEN_ITOF(name) \
 static inline void gen_vfp_##name(int dp, int neon) \
 { \
@@ -3091,6 +3067,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 case 15:
 switch (rn) {
 case 0 ... 3:
+case 8 ... 11:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3135,11 +3112,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 rd_is_dp = false;
 break;
 
-case 0x08: case 0x0a: /* vcmp, vcmpz */
-case 0x09: case 0x0b: /* vcmpe, vcmpez */
-no_output = true;
-break;
-
 case 0x0c: /* vrintr */
 case 0x0d: /* vrintz */
 case 0x0e: /* vrintx */
@@ -3240,14 +3212,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 /* Load the initial operands.  */
 if (op == 15) {
 switch (rn) {
-case 0x08: case 0x09: /* Compare */
-gen_mov_F0_vreg(dp, rd);
-gen_mov_F1_vreg(dp, rm);
-break;
-case 0x0a: case 0x0b: /* Compare with zero */
-gen_mov_F0_vreg(dp, rd);
-gen_vfp_F1_ld0(dp);
-break;
 case 0x14: /* vcvt fp <-> fixed */
 case 0x15:
 case 0x16:
@@ -3357,19 +3321,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 gen_vfp_msr(tmp);
   

[Qemu-devel] [PATCH v19 02/21] target/rx: TCG helper

2019-06-11 Thread Philippe Mathieu-Daudé
From: Yoshinori Sato 

Signed-off-by: Yoshinori Sato 
Reviewed-by: Richard Henderson 
Message-Id: <20190607091116.49044-3-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
[PMD: Removed tlb_fill, extracted from patch of Yoshinori Sato
 'Convert to CPUClass::tlb_fill']
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/rx/helper.c| 148 +
 target/rx/helper.h|  31 +++
 target/rx/op_helper.c | 470 ++
 3 files changed, 649 insertions(+)
 create mode 100644 target/rx/helper.c
 create mode 100644 target/rx/helper.h
 create mode 100644 target/rx/op_helper.c

diff --git a/target/rx/helper.c b/target/rx/helper.c
new file mode 100644
index 00..1dae74eae7
--- /dev/null
+++ b/target/rx/helper.c
@@ -0,0 +1,148 @@
+/*
+ *  RX emulation
+ *
+ *  Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "cpu.h"
+#include "exec/log.h"
+#include "exec/cpu_ldst.h"
+#include "sysemu/sysemu.h"
+
+void rx_cpu_unpack_psw(CPURXState *env, uint32_t psw, int rte)
+{
+if (env->psw_pm == 0) {
+env->psw_ipl = FIELD_EX32(psw, PSW, IPL);
+if (rte) {
+/* PSW.PM can write RTE and RTFI */
+env->psw_pm = FIELD_EX32(psw, PSW, PM);
+}
+env->psw_u = FIELD_EX32(psw, PSW, U);
+env->psw_i = FIELD_EX32(psw, PSW, I);
+}
+env->psw_o = FIELD_EX32(psw, PSW, O) << 31;
+env->psw_s = FIELD_EX32(psw, PSW, S) << 31;
+env->psw_z = 1 - FIELD_EX32(psw, PSW, Z);
+env->psw_c = FIELD_EX32(psw, PSW, C);
+}
+
+#define INT_FLAGS (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIR)
+void rx_cpu_do_interrupt(CPUState *cs)
+{
+RXCPU *cpu = RXCPU(cs);
+CPURXState *env = &cpu->env;
+int do_irq = cs->interrupt_request & INT_FLAGS;
+uint32_t save_psw;
+
+env->in_sleep = 0;
+
+if (env->psw_u) {
+env->usp = env->regs[0];
+} else {
+env->isp = env->regs[0];
+}
+save_psw = rx_cpu_pack_psw(env);
+env->psw_pm = env->psw_i = env->psw_u = 0;
+
+if (do_irq) {
+if (do_irq & CPU_INTERRUPT_FIR) {
+env->bpc = env->pc;
+env->bpsw = save_psw;
+env->pc = env->fintv;
+env->psw_ipl = 15;
+cs->interrupt_request &= ~CPU_INTERRUPT_FIR;
+qemu_set_irq(env->ack, env->ack_irq);
+qemu_log_mask(CPU_LOG_INT, "fast interrupt raised\n");
+} else if (do_irq & CPU_INTERRUPT_HARD) {
+env->isp -= 4;
+cpu_stl_all(env, env->isp, save_psw);
+env->isp -= 4;
+cpu_stl_all(env, env->isp, env->pc);
+env->pc = cpu_ldl_all(env, env->intb + env->ack_irq * 4);
+env->psw_ipl = env->ack_ipl;
+cs->interrupt_request &= ~CPU_INTERRUPT_HARD;
+qemu_set_irq(env->ack, env->ack_irq);
+qemu_log_mask(CPU_LOG_INT,
+  "interrupt 0x%02x raised\n", env->ack_irq);
+}
+} else {
+uint32_t vec = cs->exception_index;
+const char *expname = "unknown exception";
+
+env->isp -= 4;
+cpu_stl_all(env, env->isp, save_psw);
+env->isp -= 4;
+cpu_stl_all(env, env->isp, env->pc);
+
+if (vec < 0x100) {
+env->pc = cpu_ldl_all(env, 0xffc0 + vec * 4);
+} else {
+env->pc = cpu_ldl_all(env, env->intb + (vec & 0xff) * 4);
+}
+switch (vec) {
+case 20:
+expname = "privilege violation";
+break;
+case 21:
+expname = "access exception";
+break;
+case 23:
+expname = "illegal instruction";
+break;
+case 25:
+expname = "fpu exception";
+break;
+case 30:
+expname = "non-maskable interrupt";
+break;
+case 0x100 ... 0x1ff:
+expname = "unconditional trap";
+}
+qemu_log_mask(CPU_LOG_INT, "exception 0x%02x [%s] raised\n",
+  (vec & 0xff), expname);
+}
+env->regs[0] = env->isp;
+}
+
+bool rx_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
+{
+RXCPU *cpu = RXCPU(cs);
+CPURXState *env = &cpu->env;
+int accept = 0;
+/* hardware interrupt (Normal) */
+if ((interrupt_request & CPU_INTERRUP

[Qemu-devel] [PATCH v19 14/21] hw/rx: RX Target hardware definition

2019-06-11 Thread Philippe Mathieu-Daudé
From: Yoshinori Sato 

rx62n - RX62N cpu.
rx-virt - RX QEMU virtual target.

Signed-off-by: Yoshinori Sato 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-9-ys...@users.sourceforge.jp>
Signed-off-by: Richard Henderson 
[PMD: Use TYPE_RX62N_CPU, use #define for RX62N_NR_TMR/CMT/SCI,
 renamed CPU -> MCU, device -> microcontroller]
Signed-off-by: Philippe Mathieu-Daudé 
---
v19: Fixed typo (Peter Maydell)
---
 hw/rx/Kconfig |  14 +++
 hw/rx/Makefile.objs   |   2 +
 hw/rx/rx-virt.c   | 105 +++
 hw/rx/rx62n.c | 238 ++
 include/hw/rx/rx.h|   7 ++
 include/hw/rx/rx62n.h |  91 
 6 files changed, 457 insertions(+)
 create mode 100644 hw/rx/Kconfig
 create mode 100644 hw/rx/Makefile.objs
 create mode 100644 hw/rx/rx-virt.c
 create mode 100644 hw/rx/rx62n.c
 create mode 100644 include/hw/rx/rx.h
 create mode 100644 include/hw/rx/rx62n.h

diff --git a/hw/rx/Kconfig b/hw/rx/Kconfig
new file mode 100644
index 00..a07490a65e
--- /dev/null
+++ b/hw/rx/Kconfig
@@ -0,0 +1,14 @@
+config RX
+bool
+
+config RX62N
+bool
+select RX
+select RX_ICU
+select RENESAS_TMR8
+select RENESAS_CMT
+select RENESAS_SCI
+
+config RX_VIRT
+bool
+select RX62N
diff --git a/hw/rx/Makefile.objs b/hw/rx/Makefile.objs
new file mode 100644
index 00..63f8be0e82
--- /dev/null
+++ b/hw/rx/Makefile.objs
@@ -0,0 +1,2 @@
+obj-$(CONFIG_RX62N) += rx62n.o
+obj-$(CONFIG_RX_VIRT) += rx-virt.o
diff --git a/hw/rx/rx-virt.c b/hw/rx/rx-virt.c
new file mode 100644
index 00..ed0a3a1da0
--- /dev/null
+++ b/hw/rx/rx-virt.c
@@ -0,0 +1,105 @@
+/*
+ * RX QEMU virtual platform
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "hw/sysbus.h"
+#include "hw/loader.h"
+#include "hw/rx/rx62n.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/qtest.h"
+#include "sysemu/device_tree.h"
+#include "hw/boards.h"
+
+/* Same address of GDB integrated simulator */
+#define SDRAM_BASE 0x0100
+
+static void rxvirt_init(MachineState *machine)
+{
+RX62NState *s = g_new(RX62NState, 1);
+MemoryRegion *sysmem = get_system_memory();
+MemoryRegion *sdram = g_new(MemoryRegion, 1);
+const char *kernel_filename = machine->kernel_filename;
+const char *dtb_filename = machine->dtb;
+void *dtb = NULL;
+int dtb_size;
+
+/* Allocate memory space */
+memory_region_init_ram(sdram, NULL, "sdram", 16 * MiB,
+   &error_fatal);
+memory_region_add_subregion(sysmem, SDRAM_BASE, sdram);
+
+/* Initialize MCU */
+object_initialize_child(OBJECT(machine), "mcu", s,
+sizeof(RX62NState), TYPE_RX62N,
+&error_fatal, NULL);
+object_property_set_link(OBJECT(s), OBJECT(get_system_memory()),
+ "memory", &error_abort);
+object_property_set_bool(OBJECT(s), kernel_filename != NULL,
+ "load-kernel", &error_abort);
+object_property_set_bool(OBJECT(s), true, "realized", &error_abort);
+
+/* Load kernel and dtb */
+if (kernel_filename) {
+rx_load_image(RXCPU(first_cpu), kernel_filename,
+  SDRAM_BASE + 8 * MiB, 8 * MiB);
+if (dtb_filename) {
+dtb = load_device_tree(dtb_filename, &dtb_size);
+if (dtb == NULL) {
+fprintf(stderr, "Couldn't open dtb file %s\n", dtb_filename);
+exit(1);
+}
+if (machine->kernel_cmdline &&
+qemu_fdt_setprop_string(dtb, "/chosen", "bootargs",
+machine->kernel_cmdline) < 0) {
+fprintf(stderr, "couldn't set /chosen/bootargs\n");
+exit(1);
+}
+rom_add_blob_fixed("dtb", dtb, dtb_size,
+   SDRAM_BASE + 16 * MiB - dtb_size);
+/* Set dtb address to R1 */
+RXCPU(first_cpu)->env.regs[1] = 0x0200 - dtb_size;
+}
+}
+}
+
+static void rxvirt_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+mc->desc = "RX QEMU Virtual Target";
+mc->init = rxvir

[Qemu-devel] [PATCH v19 00/21] Add RX archtecture support

2019-06-11 Thread Philippe Mathieu-Daudé
Hi Yoshinori, Richard, Igor.

This series an iteration of the previous v16 from Yoshinori with
the fixups requested by Igor here:
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07260.html
and
https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg01547.html
plus trivial cleanups.

It is based on commit 19735c837ae2056b4651720290eda59498eca65a,
that is before the merge of pull-tcg-20190610 (CPUNegativeOffsetState)
which resulted in commit a578cdfbdd8f9beff5ced52b7826ddb1669abbbf.

Series reordered, some patches squashed.

Patches modified: 2, 3, 14, 19 (wait review from Igor)
New patch: 16 (wait review from Yoshinori)
Extra patch: 21 (meant for testing)

This branch is available here: https://gitlab.com/philmd/qemu/tree/rx-v19

We have:

$ qemu-system-rx -cpu help
rx62n

(qemu) info qom-tree
/machine (rx-virt-machine)
  /peripheral (container)
  /mcu (rx62n)
/sci[0] (renesas-sci)
  /renesas-sci[0] (qemu:memory-region)
/icu (rx-icu)
  ...
/cpu (rx62n-rx-cpu)
  /unnamed-gpio-in[0] (irq)
  /unnamed-gpio-in[1] (irq)
...

$ git backport-diff -u rx-16 -r 19735c837ae..rx-v19
Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/21:[] [--] 'target/rx: TCG translation'
002/21:[0011] [FC] 'target/rx: TCG helper'
003/21:[0136] [FC] 'target/rx: CPU definition'
004/21:[] [--] 'target/rx: RX disassembler'
005/21:[] [--] 'target/rx: Disassemble rx_index_addr into a string'
006/21:[] [--] 'target/rx: Replace operand with prt_ldmi in disassembler'
007/21:[] [--] 'target/rx: Use prt_ldmi for XCHG_mr disassembly'
008/21:[] [--] 'target/rx: Emit all disassembly in one prt()'
009/21:[] [--] 'target/rx: Collect all bytes during disassembly'
010/21:[] [--] 'target/rx: Dump bytes for each insn during disassembly'
011/21:[] [--] 'hw/intc: RX62N interrupt controller (ICUa)'
012/21:[] [--] 'hw/timer: RX62N internal timer modules'
013/21:[] [--] 'hw/char: RX62N serial communication interface (SCI)'
014/21:[0013] [FC] 'hw/rx: RX Target hardware definition'
015/21:[] [-C] 'hw/rx: Honor -accel qtest'
016/21:[down] 'hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core'
017/21:[] [--] 'qemu/bitops.h: Add extract8 and extract16'
018/21:[] [--] 'hw/registerfields.h: Add 8bit and 16bit register macros'
019/21:[0005] [FC] 'Add rx-softmmu'
020/21:[] [--] 'MAINTAINERS: Add RX'
021/21:[down] 'BootLinuxConsoleTest: Test the RX-Virt machine'

Thanks,

Phil.

Philippe Mathieu-Daudé (3):
  hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core
  hw/registerfields.h: Add 8bit and 16bit register macros
  BootLinuxConsoleTest: Test the RX-Virt machine

Richard Henderson (7):
  target/rx: Disassemble rx_index_addr into a string
  target/rx: Replace operand with prt_ldmi in disassembler
  target/rx: Use prt_ldmi for XCHG_mr disassembly
  target/rx: Emit all disassembly in one prt()
  target/rx: Collect all bytes during disassembly
  target/rx: Dump bytes for each insn during disassembly
  hw/rx: Honor -accel qtest

Yoshinori Sato (11):
  target/rx: TCG translation
  target/rx: TCG helper
  target/rx: CPU definition
  target/rx: RX disassembler
  hw/intc: RX62N interrupt controller (ICUa)
  hw/timer: RX62N internal timer modules
  hw/char: RX62N serial communication interface (SCI)
  hw/rx: RX Target hardware definition
  qemu/bitops.h: Add extract8 and extract16
  Add rx-softmmu
  MAINTAINERS: Add RX

 MAINTAINERS|   19 +
 arch_init.c|2 +
 configure  |8 +
 default-configs/rx-softmmu.mak |3 +
 hw/Kconfig |1 +
 hw/char/Kconfig|3 +
 hw/char/Makefile.objs  |1 +
 hw/char/renesas_sci.c  |  340 
 hw/intc/Kconfig|3 +
 hw/intc/Makefile.objs  |1 +
 hw/intc/rx_icu.c   |  376 
 hw/rx/Kconfig  |   14 +
 hw/rx/Makefile.objs|2 +
 hw/rx/rx-virt.c|  113 ++
 hw/rx/rx62n.c  |  246 +++
 hw/timer/Kconfig   |6 +
 hw/timer/Makefile.objs |3 +
 hw/timer/renesas_cmt.c |  275 +++
 hw/timer/renesas_tmr.c |  455 +
 include/disas/dis-asm.h|5 +
 include/exec/poison.h  |1 +
 include/hw/char/renesas_sci.h  |   45 +
 include/hw/intc/rx_icu.h   |   56 +
 include/hw/registerfields.h|   32 +-
 include/hw/rx/rx.h |7 +
 include/hw/rx/rx62n.h  |   91 +
 include/hw/timer/renesas_cmt.h |   38 +
 include/hw/timer/renesas_tmr.h |   53

[Qemu-devel] [PATCH v2 38/42] target/arm: Convert integer-to-float insns to decodetree

2019-06-11 Thread Peter Maydell
Convert the VCVT integer-to-float instructions to decodetree.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-vfp.inc.c | 58 ++
 target/arm/translate.c | 12 +--
 target/arm/vfp.decode  |  6 
 3 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index c50093776b6..cc3f61d9c41 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -2368,3 +2368,61 @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp 
*a)
 tcg_temp_free_i64(vm);
 return true;
 }
+
+static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
+{
+TCGv_i32 vm;
+TCGv_ptr fpst;
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vm = tcg_temp_new_i32();
+neon_load_reg32(vm, a->vm);
+fpst = get_fpstatus_ptr(false);
+if (a->s) {
+/* i32 -> f32 */
+gen_helper_vfp_sitos(vm, vm, fpst);
+} else {
+/* u32 -> f32 */
+gen_helper_vfp_uitos(vm, vm, fpst);
+}
+neon_store_reg32(vm, a->vd);
+tcg_temp_free_i32(vm);
+tcg_temp_free_ptr(fpst);
+return true;
+}
+
+static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
+{
+TCGv_i32 vm;
+TCGv_i64 vd;
+TCGv_ptr fpst;
+
+/* UNDEF accesses to D16-D31 if they don't exist. */
+if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+return false;
+}
+
+if (!vfp_access_check(s)) {
+return true;
+}
+
+vm = tcg_temp_new_i32();
+vd = tcg_temp_new_i64();
+neon_load_reg32(vm, a->vm);
+fpst = get_fpstatus_ptr(false);
+if (a->s) {
+/* i32 -> f64 */
+gen_helper_vfp_sitod(vd, vm, fpst);
+} else {
+/* u32 -> f64 */
+gen_helper_vfp_uitod(vd, vm, fpst);
+}
+neon_store_reg64(vd, a->vd);
+tcg_temp_free_i32(vm);
+tcg_temp_free_i64(vd);
+tcg_temp_free_ptr(fpst);
+return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 05ee76da77c..1e28308aa6a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3050,7 +3050,7 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 return 1;
 case 15:
 switch (rn) {
-case 0 ... 15:
+case 0 ... 17:
 /* Already handled by decodetree */
 return 1;
 default:
@@ -3063,10 +3063,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 if (op == 15) {
 /* rn is opcode, encoded as per VFP_SREG_N. */
 switch (rn) {
-case 0x10: /* vcvt.fxx.u32 */
-case 0x11: /* vcvt.fxx.s32 */
-rm_is_dp = false;
-break;
 case 0x18: /* vcvtr.u32.fxx */
 case 0x19: /* vcvtz.u32.fxx */
 case 0x1a: /* vcvtr.s32.fxx */
@@ -3181,12 +3177,6 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 switch (op) {
 case 15: /* extension space */
 switch (rn) {
-case 16: /* fuito */
-gen_vfp_uito(dp, 0);
-break;
-case 17: /* fsito */
-gen_vfp_sito(dp, 0);
-break;
 case 19: /* vjcvt */
 gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
 break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 56b8b4e6046..6da9a7913da 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -214,3 +214,9 @@ VCVT_sp   1110 1.11 0111  1010 11.0  \
  vd=%vd_dp vm=%vm_sp
 VCVT_dp   1110 1.11 0111  1011 11.0  \
  vd=%vd_sp vm=%vm_dp
+
+# VCVT from integer to floating point: Vm always single; Vd depends on size
+VCVT_int_sp   1110 1.11 1000  1010 s:1 1.0  \
+ vd=%vd_sp vm=%vm_sp
+VCVT_int_dp   1110 1.11 1000  1011 s:1 1.0  \
+ vd=%vd_dp vm=%vm_sp
-- 
2.20.1




[Qemu-devel] [PATCH v19 07/21] target/rx: Use prt_ldmi for XCHG_mr disassembly

2019-06-11 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Note that the ld == 3 case handled by prt_ldmi is decoded as
XCHG_rr and cannot appear here.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-21-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index 515b365528..db10385fd0 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -366,13 +366,7 @@ static bool trans_XCHG_rr(DisasContext *ctx, arg_XCHG_rr 
*a)
 /* xchg dsp[rs].,rd */
 static bool trans_XCHG_mr(DisasContext *ctx, arg_XCHG_mr *a)
 {
-static const char msize[][4] = {
-"b", "w", "l", "ub", "uw",
-};
-char dsp[8];
-
-rx_index_addr(ctx, dsp, a->ld, a->mi);
-prt("xchg\t%s[r%d].%s, r%d", dsp, a->rs, msize[a->mi], a->rd);
+prt_ldmi(ctx, "xchg", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
-- 
2.20.1




[Qemu-devel] [PATCH v19 12/21] hw/timer: RX62N internal timer modules

2019-06-11 Thread Philippe Mathieu-Daudé
From: Yoshinori Sato 

renesas_tmr: 8bit timer modules.
renesas_cmt: 16bit compare match timer modules.
This part use many renesas's CPU.
Hardware manual.
https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf

Signed-off-by: Yoshinori Sato 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-7-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 hw/timer/Kconfig   |   6 +
 hw/timer/Makefile.objs |   3 +
 hw/timer/renesas_cmt.c | 275 
 hw/timer/renesas_tmr.c | 455 +
 include/hw/timer/renesas_cmt.h |  38 +++
 include/hw/timer/renesas_tmr.h |  53 
 6 files changed, 830 insertions(+)
 create mode 100644 hw/timer/renesas_cmt.c
 create mode 100644 hw/timer/renesas_tmr.c
 create mode 100644 include/hw/timer/renesas_cmt.h
 create mode 100644 include/hw/timer/renesas_tmr.h

diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
index 51921eb63f..2249458f42 100644
--- a/hw/timer/Kconfig
+++ b/hw/timer/Kconfig
@@ -61,3 +61,9 @@ config CMSDK_APB_TIMER
 config CMSDK_APB_DUALTIMER
 bool
 select PTIMER
+
+config RENESAS_TMR8
+bool
+
+config RENESAS_CMT
+bool
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 0e9a4530f8..86a75bc8d8 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -40,6 +40,9 @@ obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 
 obj-$(CONFIG_ALLWINNER_A10_PIT) += allwinner-a10-pit.o
 
+obj-$(CONFIG_RENESAS_TMR8) += renesas_tmr.o
+obj-$(CONFIG_RENESAS_CMT) += renesas_cmt.o
+
 common-obj-$(CONFIG_STM32F2XX_TIMER) += stm32f2xx_timer.o
 common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o
 
diff --git a/hw/timer/renesas_cmt.c b/hw/timer/renesas_cmt.c
new file mode 100644
index 00..a2a2b92055
--- /dev/null
+++ b/hw/timer/renesas_cmt.c
@@ -0,0 +1,275 @@
+/*
+ * Renesas 16bit Compare-match timer
+ *
+ * Datasheet: RX62N Group, RX621 Group User's Manual: Hardware
+ * (Rev.1.40 R01UH0033EJ0140)
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "qemu/timer.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/timer/renesas_cmt.h"
+#include "qemu/error-report.h"
+
+/*
+ *  +0 CMSTR - common control
+ *  +2 CMCR  - ch0
+ *  +4 CMCNT - ch0
+ *  +6 CMCOR - ch0
+ *  +8 CMCR  - ch1
+ * +10 CMCNT - ch1
+ * +12 CMCOR - ch1
+ * If we think that the address of CH 0 has an offset of +2,
+ * we can treat it with the same address as CH 1, so define it like that.
+ */
+REG16(CMSTR, 0)
+  FIELD(CMSTR, STR0, 0, 1)
+  FIELD(CMSTR, STR1, 1, 1)
+  FIELD(CMSTR, STR,  0, 2)
+/* This addeess is channel offset */
+REG16(CMCR, 0)
+  FIELD(CMCR, CKS, 0, 2)
+  FIELD(CMCR, CMIE, 6, 1)
+REG16(CMCNT, 2)
+REG16(CMCOR, 4)
+
+static void update_events(RCMTState *cmt, int ch)
+{
+int64_t next_time;
+
+if ((cmt->cmstr & (1 << ch)) == 0) {
+/* count disable, so not happened next event. */
+return ;
+}
+next_time = cmt->cmcor[ch] - cmt->cmcnt[ch];
+next_time *= NANOSECONDS_PER_SECOND;
+next_time /= cmt->input_freq;
+/*
+ * CKS -> div rate
+ *  0 -> 8 (1 << 3)
+ *  1 -> 32 (1 << 5)
+ *  2 -> 128 (1 << 7)
+ *  3 -> 512 (1 << 9)
+ */
+next_time *= 1 << (3 + FIELD_EX16(cmt->cmcr[ch], CMCR, CKS) * 2);
+next_time += qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+timer_mod(cmt->timer[ch], next_time);
+}
+
+static int64_t read_cmcnt(RCMTState *cmt, int ch)
+{
+int64_t delta, now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+if (cmt->cmstr & (1 << ch)) {
+delta = (now - cmt->tick[ch]);
+delta /= NANOSECONDS_PER_SECOND;
+delta /= cmt->input_freq;
+delta /= 1 << (3 + FIELD_EX16(cmt->cmcr[ch], CMCR, CKS) * 2);
+cmt->tick[ch] = now;
+return cmt->cmcnt[ch] + delta;
+} else {
+return cmt->cmcnt[ch];
+}
+}
+
+static uint64_t cmt_read(void *opaque, hwaddr addr, unsigned size)
+{
+hwaddr offset = addr & 0x0f;
+RCMTState *cmt = opaque;
+int ch = offset / 0x08;
+uint64_t ret;
+
+if (offset == A_CMSTR) {
+ret = 0;
+ret = FIELD_DP16(ret, CMSTR, STR,
+  

[Qemu-devel] [PATCH v19 05/21] target/rx: Disassemble rx_index_addr into a string

2019-06-11 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

We were eliding all zero indexes.  It is only ld==0 that does
not have an index in the instruction.  This also allows us to
avoid breaking the final print into multiple pieces.

Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-19-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 154 +-
 1 file changed, 55 insertions(+), 99 deletions(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index 8cada4825d..64342537ee 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -107,49 +107,42 @@ static const char psw[] = {
 'i', 'u', 0, 0, 0, 0, 0, 0,
 };
 
-static uint32_t rx_index_addr(int ld, int size, DisasContext *ctx)
+static void rx_index_addr(DisasContext *ctx, char out[8], int ld, int mi)
 {
-bfd_byte buf[2];
+uint32_t addr = ctx->addr;
+uint8_t buf[2];
+uint16_t dsp;
+
 switch (ld) {
 case 0:
-return 0;
+/* No index; return empty string.  */
+out[0] = '\0';
+return;
 case 1:
-ctx->dis->read_memory_func(ctx->addr, buf, 1, ctx->dis);
 ctx->addr += 1;
-return ((uint8_t)buf[0]) << size;
+ctx->dis->read_memory_func(addr, buf, 1, ctx->dis);
+dsp = buf[0];
+break;
 case 2:
-ctx->dis->read_memory_func(ctx->addr, buf, 2, ctx->dis);
 ctx->addr += 2;
-return lduw_le_p(buf) << size;
+ctx->dis->read_memory_func(addr, buf, 2, ctx->dis);
+dsp = lduw_le_p(buf);
+break;
+default:
+g_assert_not_reached();
 }
-g_assert_not_reached();
+
+sprintf(out, "%u", dsp << (mi < 3 ? mi : 4 - mi));
 }
 
 static void operand(DisasContext *ctx, int ld, int mi, int rs, int rd)
 {
-int dsp;
 static const char sizes[][4] = {".b", ".w", ".l", ".uw", ".ub"};
+char dsp[8];
+
 if (ld < 3) {
-switch (mi) {
-case 4:
-/* dsp[rs].ub */
-dsp = rx_index_addr(ld, RX_MEMORY_BYTE, ctx);
-break;
-case 3:
-/* dsp[rs].uw */
-dsp = rx_index_addr(ld, RX_MEMORY_WORD, ctx);
-break;
-default:
-/* dsp[rs].b */
-/* dsp[rs].w */
-/* dsp[rs].l */
-dsp = rx_index_addr(ld, mi, ctx);
-break;
-}
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]%s", rs, sizes[mi]);
+rx_index_addr(ctx, dsp, ld, mi);
+prt("%s[r%d]%s", dsp, rs, sizes[mi]);
 } else {
 prt("r%d", rs);
 }
@@ -235,7 +228,7 @@ static bool trans_MOV_ra(DisasContext *ctx, arg_MOV_ra *a)
 /* mov.[bwl] rs,rd */
 static bool trans_MOV_mm(DisasContext *ctx, arg_MOV_mm *a)
 {
-int dsp;
+char dspd[8], dsps[8];
 
 prt("mov.%c\t", size[a->sz]);
 if (a->lds == 3 && a->ldd == 3) {
@@ -244,29 +237,15 @@ static bool trans_MOV_mm(DisasContext *ctx, arg_MOV_mm *a)
 return true;
 }
 if (a->lds == 3) {
-prt("r%d, ", a->rd);
-dsp = rx_index_addr(a->ldd, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]", a->rs);
+rx_index_addr(ctx, dspd, a->ldd, a->sz);
+prt("r%d, %s[r%d]", a->rs, dspd, a->rd);
 } else if (a->ldd == 3) {
-dsp = rx_index_addr(a->lds, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d], r%d", a->rs, a->rd);
+rx_index_addr(ctx, dsps, a->lds, a->sz);
+prt("%s[r%d], r%d", dsps, a->rs, a->rd);
 } else {
-dsp = rx_index_addr(a->lds, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d], ", a->rs);
-dsp = rx_index_addr(a->ldd, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]", a->rd);
+rx_index_addr(ctx, dsps, a->lds, a->sz);
+rx_index_addr(ctx, dspd, a->ldd, a->sz);
+prt("%s[r%d], %s[r%d]", dsps, a->rs, dspd, a->rd);
 }
 return true;
 }
@@ -357,12 +336,10 @@ static bool trans_PUSH_r(DisasContext *ctx, arg_PUSH_r *a)
 /* push dsp[rs] */
 static bool trans_PUSH_m(DisasContext *ctx, arg_PUSH_m *a)
 {
-prt("push\t");
-int dsp = rx_index_addr(a->ld, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]", a->rs);
+char dsp[8];
+
+rx_index_addr(ctx, dsp, a->ld, a->sz);
+prt("push\t%s[r%d]", dsp, a->rs);
 return true;
 }
 
@@ -389,17 +366,13 @@ static bool trans_XCHG_rr(DisasContext *ctx, arg_XCHG_rr 
*a)
 /* xchg dsp[rs].,rd */
 static bool trans_XCHG_mr(DisasContext *ctx, arg_XCHG_mr *a)
 {
-int dsp;
 static const char msize[][4] = {
 "b", "w", "l", "ub", "uw",
 };
+char dsp[8];
 
-prt("xchg\t");
-dsp = rx_index_addr(a->ld, a->mi, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]

  1   2   3   >