RE: [PATCH] vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support

2021-02-07 Thread Tian, Kevin
> From: Peter Xu
> Sent: Friday, February 5, 2021 11:31 PM
> 
> > >
> > >
> > >> or virtio-iommu
> > >> since dev-iotlb (or PCIe ATS)
> > >
> > >
> > > We may need to add this in the future.
> > added Jean-Philippe in CC
> 
> So that's the part I'm unsure about..  Since everybody is cced so maybe good
> time to ask. :)
> 
> The thing is I'm still not clear on whether dev-iotlb is useful for a full
> emulation environment and how that should differ from a normal iotlb, since
> after all normal iotlb will be attached with device information too.

dev-iotlb is useful in two manners. First, it's a functional prerequisite for
supporting I/O page faults. Second, it has performance benefit as you don't
need to contend the lock of global iotlb.

> 
> For real hardwares, they make sense because they ask for two things: iotlb is
> for IOMMU, but dev-iotlb is for the device cache.  For emulation
> environment
> (virtio-iommu is the case) do we really need that complexity?
> 
> Note that even if there're assigned devices under virtio-iommu in the future,
> we can still isolate that and iiuc we can easily convert an iotlb (from
> virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of
> IOMMU is
> underneath the vIOMMU.
> 

Didn't get this point. Hardware dev-iotlb is updated by hardware (between
the device and the IOMMU). How could software convert a virtual iotlb
entry into hardware dev-iotlb?

Thanks
Kevin


Re: Increased execution time with TCI in latest git master (was: Re: [PULL 00/46] tcg patch queue)

2021-02-07 Thread Stefan Weil

Am 07.02.21 um 04:45 schrieb Richard Henderson:


On 2/6/21 11:38 AM, Stefan Weil wrote:

I am still searching what caused this detoriation. My first suspect was thread
local storage, but that wasn't it. Do you have any idea?

No, but since it's 1/3 of a complete patch set, I don't care to investigate the
intermediate result either.



Your latest code from the rth7680/tci-next branch is twice as fast as my 
code with BIOS boot and qemu-x86_64 on sparc64. That's great.


With that code I don't get any BIOS output at all when running 
qemu-i386. That's not so good.


Did I test the correct branch? If yes, I could try the same test on 
amd64 and arm64 hosts.


Stefan






[PATCH v3] travis-ci: Disable C++ optional objects on AArch64 container

2021-02-07 Thread Philippe Mathieu-Daudé
Travis-CI seems to have enforced memory limit on containers,
and the 'GCC check-tcg' job started to fail on AArch64 [*]:

  [2041/3679] Compiling C++ object libcommon.fa.p/disas_nanomips.cpp.o
  FAILED: libcommon.fa.p/disas_nanomips.cpp.o
  {standard input}: Assembler messages:
  {standard input}:577781: Warning: end of file not at end of a line; newline 
inserted
  {standard input}:577882: Error: unknown pseudo-op: `.lvl35769'
  {standard input}: Error: open CFI at the end of file; missing .cfi_endproc 
directive
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.

Until we have a replacement for this job on Gitlab-CI, disable
compilation of C++ files by forcing the c++ compiler to /bin/false
so Meson build system can not detect it:

  $ ../configure --cxx=/bin/false

  Compilation
   C compiler: cc
  Host C compiler: cc
 C++ compiler: NO

[*] https://travis-ci.org/github/qemu/qemu/jobs/757819402#L3754

Signed-off-by: Philippe Mathieu-Daudé 
---
v3: Aarch -> AArch
v2: Link to first line with error, describe Meson

Supersedes: <20210206200537.2249362-1-f4...@amsat.org>
---
 .travis.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index 5f1dea873ec..b4b2d66fa4b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -261,7 +261,7 @@ jobs:
   - genisoimage
   env:
 - TEST_CMD="make check check-tcg V=1"
-- CONFIG="--disable-containers --target-list=${MAIN_SOFTMMU_TARGETS}"
+- CONFIG="--disable-containers --target-list=${MAIN_SOFTMMU_TARGETS} 
--cxx=/bin/false"
 - UNRELIABLE=true
 
 - name: "[ppc64] GCC check-tcg"
-- 
2.26.2




[PATCH] target/i386: expose more MSRs to GDB

2021-02-07 Thread Dominik Glöß
This patch adds 7 more model-specific registers to be usable while remote
debugging in gdb. Accessing these registers can for example be useful for
tracing Linux Systemcalls.

Signed-off-by: Dominik Glöß 
---

Adding registers to GDB like this works fine for now. Should there
arise the need to add more MSRs, a rework of the code that reads
the xml file should be considered. Hard coding the amount of registers and
matching the offsets in gdbstub and the xml seems prone to error.

This is similar to the patch by Elias Djossou to allow outputting the same
registers via HMP. Both patches are however independent from each other.

gdb-xml/i386-32bit.xml |   7 +++
 gdb-xml/i386-64bit.xml |   7 +++
 target/i386/cpu.c  |   4 +-
 target/i386/gdbstub.c  | 122 -
 4 files changed, 125 insertions(+), 15 deletions(-)

diff --git a/gdb-xml/i386-32bit.xml b/gdb-xml/i386-32bit.xml
index 872fcea9c2..0e650c9027 100644
--- a/gdb-xml/i386-32bit.xml
+++ b/gdb-xml/i386-32bit.xml
@@ -61,6 +61,13 @@
   
   
   
+  
+  
+  
+  
+  
+  
+  
   

   
diff --git a/gdb-xml/i386-64bit.xml b/gdb-xml/i386-64bit.xml
index 6d88969211..d7ca2d8586 100644
--- a/gdb-xml/i386-64bit.xml
+++ b/gdb-xml/i386-64bit.xml
@@ -74,6 +74,13 @@
   
   
   
+  
+  
+  
+  
+  
+  
+  
   

   
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ae89024d36..2b7be1c248 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7321,10 +7321,10 @@ static void x86_cpu_common_class_init(ObjectClass *oc, 
void *data)
 cc->gdb_arch_name = x86_gdb_arch_name;
 #ifdef TARGET_X86_64
 cc->gdb_core_xml_file = "i386-64bit.xml";
-cc->gdb_num_core_regs = 66;
+cc->gdb_num_core_regs = 73;
 #else
 cc->gdb_core_xml_file = "i386-32bit.xml";
-cc->gdb_num_core_regs = 50;
+cc->gdb_num_core_regs = 57;
 #endif
 cc->disas_set_info = x86_disas_set_info;

diff --git a/target/i386/gdbstub.c b/target/i386/gdbstub.c
index 41e265fc67..5743ba39b3 100644
--- a/target/i386/gdbstub.c
+++ b/target/i386/gdbstub.c
@@ -46,7 +46,8 @@ static const int gpr_map32[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };
  */
 #define IDX_NB_IP   1
 #define IDX_NB_FLAGS1
-#define IDX_NB_SEG  (6 + 3)
+#define IDX_NB_SEG  6
+#define IDX_NB_MSR  10
 #define IDX_NB_CTL  6
 #define IDX_NB_FP   16
 /*
@@ -54,13 +55,14 @@ static const int gpr_map32[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };
  */
 #define IDX_NB_MXCSR1
 /*
- *  total > 8+1+1+9+6+16+8+1=50 or 16+1+1+9+6+16+16+1=66
+ *  total > 8+1+1+6+10+6+16+8+1=57 or 16+1+1+6+10+6+16+16+1=73
  */

 #define IDX_IP_REG  CPU_NB_REGS
 #define IDX_FLAGS_REG   (IDX_IP_REG + IDX_NB_IP)
 #define IDX_SEG_REGS(IDX_FLAGS_REG + IDX_NB_FLAGS)
-#define IDX_CTL_REGS(IDX_SEG_REGS + IDX_NB_SEG)
+#define IDX_MSR_REGS(IDX_SEG_REGS + IDX_NB_SEG)
+#define IDX_CTL_REGS(IDX_MSR_REGS + IDX_NB_MSR)
 #define IDX_FP_REGS (IDX_CTL_REGS + IDX_NB_CTL)
 #define IDX_XMM_REGS(IDX_FP_REGS + IDX_NB_FP)
 #define IDX_MXCSR_REG   (IDX_XMM_REGS + CPU_NB_REGS)
@@ -143,25 +145,56 @@ int x86_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 case IDX_SEG_REGS + 5:
 return gdb_get_reg32(mem_buf, env->segs[R_GS].selector);

-case IDX_SEG_REGS + 6:
+case IDX_MSR_REGS:
 if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
 return gdb_get_reg64(mem_buf, env->segs[R_FS].base);
 }
 return gdb_get_reg32(mem_buf, env->segs[R_FS].base);

-case IDX_SEG_REGS + 7:
+case IDX_MSR_REGS + 1:
 if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
 return gdb_get_reg64(mem_buf, env->segs[R_GS].base);
 }
 return gdb_get_reg32(mem_buf, env->segs[R_GS].base);

-case IDX_SEG_REGS + 8:
-#ifdef TARGET_X86_64
+case IDX_MSR_REGS + 2:
+if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
+return gdb_get_reg64(mem_buf, env->sysenter_cs);
+}
+return gdb_get_reg32(mem_buf, env->sysenter_cs);
+
+case IDX_MSR_REGS + 3:
+if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
+return gdb_get_reg64(mem_buf, env->sysenter_esp);
+}
+return gdb_get_reg32(mem_buf, env->sysenter_esp);
+
+case IDX_MSR_REGS + 4:
 if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->kernelgsbase);
+return gdb_get_reg64(mem_buf, env->sysenter_eip);
 }
-return gdb_get_reg32(mem_buf, env->kernelgsbase);
+return gdb_get_reg32(mem_buf, env->sysenter_eip);
+
+case IDX_MSR_REGS + 5:
+if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
+return gdb_get_reg64(mem_buf, env->star);
+}
+return gdb_get_reg32(mem_buf, env->star);
+
+#ifdef TARGET_X86_64
+case IDX_MSR_REGS + 6

A issue about qemu for rbd attach

2021-02-07 Thread Shen, Tao
Hi qemu developer,
I have a question that does qume support cname as host to attach rbd device?
When I want to do that, it return a error:
/# virsh attach-device virtlet-228fa0ac-d53a-tess-node-c7nww disk_vde.yaml
error: Failed to attach device from disk_vde.yaml
error: internal error: unable to execute QEMU command 'device_add': Property 
'virtio-blk-device.drive' can't find value 'drive-virtio-disk20'

/# cat disk_vde.yaml

  
  

  
  





  
  
  
157286400
300
  
  pvc-cf38701d-6f6f-4638-96c5-5cfbe16d6068
  

Could you tell me whether qemu supoort this? If yes, which version? IF no, do 
you any plan for this?

Thanks,
Tao


Re: [PATCH] vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support

2021-02-07 Thread Peter Xu
Hi, Kevin,

On Sun, Feb 07, 2021 at 09:04:55AM +, Tian, Kevin wrote:
> > From: Peter Xu
> > Sent: Friday, February 5, 2021 11:31 PM
> > 
> > > >
> > > >
> > > >> or virtio-iommu
> > > >> since dev-iotlb (or PCIe ATS)
> > > >
> > > >
> > > > We may need to add this in the future.
> > > added Jean-Philippe in CC
> > 
> > So that's the part I'm unsure about..  Since everybody is cced so maybe good
> > time to ask. :)
> > 
> > The thing is I'm still not clear on whether dev-iotlb is useful for a full
> > emulation environment and how that should differ from a normal iotlb, since
> > after all normal iotlb will be attached with device information too.
> 
> dev-iotlb is useful in two manners. First, it's a functional prerequisite for
> supporting I/O page faults.

Is this also a hard requirement for virtio-iommu, which is not a real hardware
after all?

> Second, it has performance benefit as you don't
> need to contend the lock of global iotlb.

Hmm.. are you talking about e.g. vt-d driver or virtio-iommu?

Assuming it's about vt-d, qi_flush_dev_iotlb() will still call qi_submit_sync()
and taking the same global QI lock, as I see it, or I could be wrong somewhere.
I don't see where dev-iotlb has a standalone channel for delivery.

For virtio-iommu, we haven't defined dev-iotlb, right?  Sorry I missed things
when I completely didn't follow virtio-iommu recently - let's say if
virtio-iommu in the future can support per-dev dev-iotlb queue so it doesn't
need a global lock, what if we make it still per-device but still delivering
iotlb message?  Again, it's still a bit unclear to me why a full emulation
iommu would need that definition of "iotlb" and "dev-iotlb".

> 
> > 
> > For real hardwares, they make sense because they ask for two things: iotlb 
> > is
> > for IOMMU, but dev-iotlb is for the device cache.  For emulation
> > environment
> > (virtio-iommu is the case) do we really need that complexity?
> > 
> > Note that even if there're assigned devices under virtio-iommu in the 
> > future,
> > we can still isolate that and iiuc we can easily convert an iotlb (from
> > virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of
> > IOMMU is
> > underneath the vIOMMU.
> > 
> 
> Didn't get this point. Hardware dev-iotlb is updated by hardware (between
> the device and the IOMMU). How could software convert a virtual iotlb
> entry into hardware dev-iotlb?

I mean if virtio-iommu must be run in a guest, then we can trap that message
first, right?  If there're assigned device in the guest, we must convert that
invalidation to whatever message required for the host, that seems to not
require the virtio-iommu to have dev-iotlb knowledge, still?

Thanks,

-- 
Peter Xu




[PATCH 00/26] ppc: qemu: Convert qemu-ppce500 to driver model

2021-02-07 Thread Bin Meng
At present when building qemu-ppce500 the following warnings are seen:

= WARNING ==
This board does not use CONFIG_DM. CONFIG_DM will be
compulsory starting with the v2020.01 release.
Failure to update may result in board removal.
  UPD include/generated/timestamp_autogenerated.h
See doc/driver-model/migration.rst for more info.

= WARNING ==
This board does not use CONFIG_DM_PCI Please update
the board to use CONFIG_DM_PCI before the v2019.07 release.
Failure to update by the deadline may result in board removal.
See doc/driver-model/migration.rst for more info.

= WARNING ==
This board does not use CONFIG_DM_ETH (Driver Model
for Ethernet drivers). Please update the board to use
CONFIG_DM_ETH before the v2020.07 release. Failure to
update by the deadline may result in board removal.
See doc/driver-model/migration.rst for more info.


The conversion of qemu-ppce500 board to driver model is long overdue.

When testing the exisitng qemu-ppce500 support, PCI was found broken.
This is caused by 2 separate issues:

- One issue was caused by U-Boot:
  Commit e002474158d1 ("pci: pci-uclass: Dynamically allocate the PCI regions")
  Patch #1 reverts this commit as it broken all boards that have not converted
  to driver model PCI.
- One issue was caused by QEMU:
  commit e6b4e5f4795b ("PPC: e500: Move CCSR and MMIO space to upper end of 
address space")
  commit cb3778a0455a ("PPC: e500 pci host: Add support for ATMUs")
  Patch #3-4 fixed this issue to keep in sync with latest QEMU upstream

Patch #5-8 are minor fixes and clean-ups.

Starting from patch#9, these are driver model conversion patches.

Patch #11-16 are mainly related to CONFIG_ADDR_MAP, a library to support targets
that have non-identity virtual-physical address mappings. A new command 
'addrmap'
is introduced to aid debugging, and a fix to arch/powerpc/asm/include/io.h is
made to correct the usage of CONFIG_ADDR_MAP as it can only be used in the post-
relocation phase. Also the initialization of this library is moved a bit earlier
in the post-relocation phase otherwise device drivers won't work.

Patch #18-20 are 85xx PCI driver fixes. It adds support to controller register
physical address beyond 32-bit, as well as support to 64-bit bus and cpu address
as current upstream QEMU uses 64-bit cpu address.

Patch #23 is minor fix to the 'virtio' command dependency.

Patch #24 enables the VirtIO NET support as by default a VirtIO standard PCI
networking device is connected as an ethernet interface at PCI address 0.1.0.

Patch #25 moves the qemu-ppce500 boards codes to board/emulation as that is the
place for other QEMU targets like x86, arm, riscv.

Patch #26 adds a reST document to describe how to build and run U-Boot for the
QEMU ppce500 machine.

I hope we can make this series to U-Boot v2021.04 release.

This series is available at u-boot-x86/qemu-ppc for testing.

This cover letter is cc'ed to QEMU mailing list for a heads-up.
A future patch will be sent to QEMU mailing list to bring its in-tree
U-Boot source codes up-to-date.


Bin Meng (26):
  Revert "pci: pci-uclass: Dynamically allocate the PCI regions"
  ppc: qemu: Update MAINTAINERS for correct email address
  common: fdt_support: Support special case of PCI address in
fdt_read_prop()
  ppc: qemu: Support non-identity PCI bus address
  ppc: qemu: Fix CONFIG_SYS_PCI_MAP_END
  ppc: mpc85xx: Wrap LAW related codes with CONFIG_FSL_LAW
  ppc: qemu: Drop init_laws() and print_laws()
  ppc: qemu: Drop board_early_init_f()
  ppc: qemu: Enable OF_CONTROL
  ppc: qemu: Enable driver model
  include: Remove extern from addr_map.h
  lib: addr_map: Move address_map[] type to the header file
  cmd: Add a command to display the address map
  lib: kconfig: Mention CONFIG_ADDR_MAP limitation in the help
  ppc: io.h: Use addrmap_ translation APIs only in post-relocation phase
  common: Move initr_addr_map() to a bit earlier
  ppc: qemu: Switch over to use DM serial
  pci: mpc85xx: Wrap LAW programming with CONFIG_FSL_LAW
  pci: mpc85xx: Support controller register physical address beyond
32-bit
  pci: mpc85xx: Support 64-bit bus and cpu address
  ppc: qemu: Switch over to use DM ETH and PCI
  ppc: qemu: Drop CONFIG_OF_BOARD_SETUP
  cmd: Fix virtio command dependency
  ppc: qemu: Enable VirtIO NET support
  ppc: qemu: Move board directory from board/freescale to
board/emulation
  doc: Add a reST document for qemu-ppce500

 arch/powerpc/cpu/mpc85xx/Kconfig   |   2 +-
 arch/powerpc/cpu/mpc85xx/cpu.c |   2 +
 arch/powerpc/cpu/mpc85xx/cpu_init_early.c  |   2 +
 arch/powerpc/include/asm/io.h  |  15 +-
 .../{freescale => emulation}/qemu-ppce500/Kconfig  |   2 +-
 board

Re: Help with Windows XP in qemu-system-i386

2021-02-07 Thread Michael S. Tsirkin
On Fri, Feb 05, 2021 at 04:08:26PM -0500, Programmingkid wrote:
> 
> 
> > On Feb 5, 2021, at 3:49 PM, Michael S. Tsirkin  wrote:
> > 
> > On Fri, Feb 05, 2021 at 03:25:00PM -0500, Programmingkid wrote:
> >> Hi, I'm noticing that my Windows XP Service Pack 3 VM is causing 
> >> qemu-system-i386 to experience 100% host cpu usage even when the guest is 
> >> at idle. I was wondering if you are seeing this issue as well on any 
> >> version of Windows guest? Windows 2000 doesn't seem to have this problem 
> >> so I'm wondering if this is a bug with QEMU or a problem with my VM. Any 
> >> help would be appreciated.
> >> 
> >> Thank you.
> > 
> > Just tried an xp guest, stays below 10% for me. Suggest discussing this
> > on the mailing list.
> 
> Thank you for the reply. Which service pack is your Windows XP VM using?

SP3




Re: [PATCH 2/2] hw/ssi: xilinx_spips: Implement basic QSPI DMA support

2021-02-07 Thread Bin Meng
Hi Peter,

On Sat, Feb 6, 2021 at 11:28 PM Peter Maydell  wrote:
>
> On Sat, 6 Feb 2021 at 14:38, Bin Meng  wrote:
> >
> > From: Xuzhou Cheng 
> >
> > ZynqMP QSPI supports SPI transfer using DMA mode, but currently this
> > is unimplemented. When QSPI is programmed to use DMA mode, QEMU will
> > crash. This is observed when testing VxWorks 7.
> >
> > Add a basic implementation of QSPI DMA functionality.
> >
> > Signed-off-by: Xuzhou Cheng 
> > Signed-off-by: Bin Meng 
>
> > +static size_t xlnx_zynqmp_gspips_dma_push(XlnxZynqMPQSPIPS *s,
> > +  uint8_t *buf, size_t len, bool 
> > eop)
> > +{
> > +hwaddr dst = (hwaddr)s->regs[R_GQSPI_DMA_ADDR_MSB] << 32
> > + | s->regs[R_GQSPI_DMA_ADDR];
> > +uint32_t size = s->regs[R_GQSPI_DMA_SIZE];
> > +uint32_t mlen = MIN(size, len) & (~3); /* Size is word aligned */
> > +
> > +if (size == 0 || len <= 0) {
> > +return 0;
> > +}
> > +
> > +cpu_physical_memory_write(dst, buf, mlen);
> > +size = xlnx_zynqmp_gspips_dma_advance(s, mlen, dst);
> > +
> > +if (size == 0) {
> > +xlnx_zynqmp_gspips_dma_done(s);
> > +xlnx_zynqmp_qspips_update_ixr(s);
> > +}
> > +
> > +   return mlen;
> > +}
>
> > @@ -861,7 +986,7 @@ static void xlnx_zynqmp_qspips_notify(void *opaque)
> >  recv_fifo = &s->rx_fifo;
> >  }
> >  while (recv_fifo->num >= 4
> > -   && stream_can_push(rq->dma, xlnx_zynqmp_qspips_notify, rq))
> > +   && xlnx_zynqmp_gspips_dma_can_push(rq))
> >  {
> >  size_t ret;
> >  uint32_t num;
> > @@ -874,7 +999,7 @@ static void xlnx_zynqmp_qspips_notify(void *opaque)
> >
> >  memcpy(rq->dma_buf, rxd, num);
> >
> > -ret = stream_push(rq->dma, rq->dma_buf, num, false);
> > +ret = xlnx_zynqmp_gspips_dma_push(rq, rq->dma_buf, num, false);
> >  assert(ret == num);
> >  xlnx_zynqmp_qspips_check_flush(rq);
> >  }
>
> This seems to be removing the existing handling of DMA to the
> TYPE_STREAM_SINK via the stream_* functions -- that doesn't look
> right. I don't know any of the details of this device, but if it
> has two different modes of DMA then we need to support both of them,
> surely ?

This DMA engine is a built-in engine dedicated for QSPI so I think
there is no need to use the stream_* functions.

> If the device really should be doing its own DMA memory
> accesses, please don't use cpu_physical_memory_write() for
> this. The device should take a TYPE_MEMORY_REGION link property,
> and the board code should set this to tell the device what
> its view of the world that it is doing DMA to is. Then the
> device in its realize method calls address_space_init() to create
> an AddressSpace for this MemoryRegion, and does memory accesses
> using functions like address_space_read()/address_space_write()/
> address_space_ld*()/etc. (Examples in hw/dma, eg pl080.c.)
> Note that the address_space* functions have a return value
> indicating whether the access failed, which you should handle.
> (The pl080 code doesn't do that, but that's because it's older code.)

Sure will switch to use DMA AddressSpace in v2.

Regards,
Bin



[PATCH v4 0/6] colo: Introduce resource agent and test suite/CI

2021-02-07 Thread Lukas Straub
Hello Everyone,
So here is v4.

Regards,
Lukas Straub

Changes:

v4:
 -use new yank api that finally has been merged
 -cleanup the test a bit by using numbers instead of "hosta" and "hostb"
 -resource-agent: Don't set master-score to 0 on invalid configuration

v3:
 -resource-agent: Don't determine local qemu state by remote master-score, query
  directly via qmp instead
 -resource-agent: Add max_queue_size parameter for colo-compare
 -resource-agent: Fix monitor action on secondary returning error during
  clean shutdown
 -resource-agent: Fix stop action setting master-score to 0 on primary on
  clean shutdown

v2:
 -use new yank api
 -drop disk_size parameter
 -introduce pick_qemu_util function and use it

Overview:

Hello Everyone,
These patches introduce a resource agent for fully automatic management of colo
and a test suite building upon the resource agent to extensively test colo.

Test suite features:
-Tests failover with peer crashing and hanging and failover during checkpoint
-Tests network using ssh and iperf3
-Quick test requires no special configuration
-Network test for testing colo-compare
-Stress test: failover all the time with network load

Resource agent features:
-Fully automatic management of colo
-Handles many failures: hanging/crashing qemu, replication error, disk error, 
...
-Recovers from hanging qemu by using the "yank" oob command
-Tracks which node has up-to-date data
-Works well in clusters with more than 2 nodes

Run times on my laptop:
Quick test: 200s
Network test: 800s (tagged as slow)
Stress test: 1300s (tagged as slow)

For the last two tests, the test suite needs access to a network bridge to
properly test the network, so some parameters need to be given to the test
run. See tests/acceptance/colo.py for more information.

Regards,
Lukas Straub

Lukas Straub (6):
  avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
  boot_linux.py: Use pick_qemu_util
  colo: Introduce resource agent
  colo: Introduce high-level test suite
  configure,Makefile: Install colo resource-agent
  MAINTAINERS: Add myself as maintainer for COLO resource agent

 MAINTAINERS   |6 +
 configure |7 +
 meson.build   |5 +
 meson_options.txt |2 +
 scripts/colo-resource-agent/colo  | 1527 +
 scripts/colo-resource-agent/crm_master|   44 +
 scripts/colo-resource-agent/crm_resource  |   12 +
 tests/acceptance/avocado_qemu/__init__.py |   15 +
 tests/acceptance/boot_linux.py|   11 +-
 tests/acceptance/colo.py  |  654 +
 10 files changed, 2274 insertions(+), 9 deletions(-)
 create mode 100755 scripts/colo-resource-agent/colo
 create mode 100755 scripts/colo-resource-agent/crm_master
 create mode 100755 scripts/colo-resource-agent/crm_resource
 create mode 100644 tests/acceptance/colo.py

--
2.30.0


pgpT_mVYlRXqA.pgp
Description: OpenPGP digital signature


[PATCH v4 1/6] avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries

2021-02-07 Thread Lukas Straub
This introduces a generic function to pick qemu utility binaries
from the build dir, system or via test parameter.

Signed-off-by: Lukas Straub 
---
 tests/acceptance/avocado_qemu/__init__.py | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index bf54e419da..1f8c41cee0 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -15,6 +15,7 @@ import uuid
 import tempfile

 import avocado
+from avocado.utils.path import find_command

 #: The QEMU build root directory.  It may also be the source directory
 #: if building from the source dir, but it's safer to use BUILD_DIR for
@@ -146,6 +147,20 @@ def exec_command_and_wait_for_pattern(test, command,
 _console_interaction(test, success_message, failure_message, command + 
'\r')

 class Test(avocado.Test):
+def pick_qemu_util(self, util):
+default = os.path.join(BUILD_DIR, util)
+if not os.path.exists(default):
+default = find_command(default, False)
+if not default:
+default = None
+
+ret = self.params.get(util, default=default)
+
+if ret is None:
+self.cancel("Could not find \"%s\"" % util)
+
+return ret
+
 def _get_unique_tag_val(self, tag_name):
 """
 Gets a tag value, if unique for a key
--
2.30.0



pgpJ1_Fl8F74g.pgp
Description: OpenPGP digital signature


[PATCH v4 5/6] configure,Makefile: Install colo resource-agent

2021-02-07 Thread Lukas Straub
Optionally install the resouce-agent so it gets picked up by
pacemaker.

Signed-off-by: Lukas Straub 
---
 configure | 7 +++
 meson.build   | 5 +
 meson_options.txt | 2 ++
 3 files changed, 14 insertions(+)

diff --git a/configure b/configure
index a34f91171d..54fc7e533f 100755
--- a/configure
+++ b/configure
@@ -382,6 +382,7 @@ softmmu="yes"
 linux_user="no"
 bsd_user="no"
 blobs="true"
+install_colo_ra="false"
 pkgversion=""
 pie=""
 qom_cast_debug="yes"
@@ -1229,6 +1230,10 @@ for opt do
   ;;
   --disable-blobs) blobs="false"
   ;;
+  --disable-colo-ra) install_colo_ra="false"
+  ;;
+  --enable-colo-ra) install_colo_ra="true"
+  ;;
   --with-pkgversion=*) pkgversion="$optarg"
   ;;
   --with-coroutine=*) coroutine="$optarg"
@@ -1772,6 +1777,7 @@ Advanced options (experts only):
ucontext, sigaltstack, windows
   --enable-gcovenable test coverage analysis with gcov
   --disable-blobs  disable installing provided firmware blobs
+  --enable-colo-ra enable installing the COLO resource agent for 
pacemaker
   --with-vss-sdk=SDK-path  enable Windows VSS support in QEMU Guest Agent
   --with-win-sdk=SDK-path  path to Windows Platform SDK (to build VSS .tlb)
   --tls-priority   default TLS protocol/cipher priority string
@@ -6414,6 +6420,7 @@ NINJA=$ninja $meson setup \
 -Dzstd=$zstd -Dseccomp=$seccomp -Dvirtfs=$virtfs -Dcap_ng=$cap_ng \
 -Dattr=$attr -Ddefault_devices=$default_devices \
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
+-Dinstall_colo_ra=$install_colo_ra \
 -Dvhost_user_blk_server=$vhost_user_blk_server \
 -Dfuse=$fuse -Dfuse_lseek=$fuse_lseek 
-Dguest_agent_msi=$guest_agent_msi \
 $(if test "$default_features" = no; then echo 
"-Dauto_features=disabled"; fi) \
diff --git a/meson.build b/meson.build
index 2d8b433ff0..82efa75e36 100644
--- a/meson.build
+++ b/meson.build
@@ -2263,6 +2263,10 @@ elif get_option('guest_agent_msi').enabled()
   error('Guest agent MSI requested, but the guest agent is not being built')
 endif

+if get_option('install_colo_ra')
+  install_data('scripts/colo-resource-agent/colo', install_dir: 
get_option('libdir') / 'ocf/resource.d/qemu')
+endif
+
 # Don't build qemu-keymap if xkbcommon is not explicitly enabled
 # when we don't build tools or system
 if xkbcommon.found()
@@ -2398,6 +2402,7 @@ summary_info += {'system-mode emulation': have_system}
 summary_info += {'user-mode emulation': have_user}
 summary_info += {'block layer':   have_block}
 summary_info += {'Install blobs': get_option('install_blobs')}
+summary_info += {'Install COLO resource agent': get_option('install_colo_ra')}
 summary_info += {'module support':config_host.has_key('CONFIG_MODULES')}
 if config_host.has_key('CONFIG_MODULES')
   summary_info += {'alternative module path': 
config_host.has_key('CONFIG_MODULE_UPGRADES')}
diff --git a/meson_options.txt b/meson_options.txt
index 95f1079829..907d5dff61 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -15,6 +15,8 @@ option('gettext', type : 'feature', value : 'auto',
description: 'Localization of the GTK+ user interface')
 option('install_blobs', type : 'boolean', value : true,
description: 'install provided firmware blobs')
+option('install_colo_ra', type : 'boolean', value : false,
+   description: 'install the COLO resource agent for pacemaker')
 option('sparse', type : 'feature', value : 'auto',
description: 'sparse checker')
 option('guest_agent_msi', type : 'feature', value : 'auto',
--
2.30.0



pgp6DVBEOzat7.pgp
Description: OpenPGP digital signature


[PATCH v4 2/6] boot_linux.py: Use pick_qemu_util

2021-02-07 Thread Lukas Straub
Replace duplicate code with pick_qemu_util.

Signed-off-by: Lukas Straub 
---
 tests/acceptance/boot_linux.py | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
index 1da4a53d6a..38029f8c70 100644
--- a/tests/acceptance/boot_linux.py
+++ b/tests/acceptance/boot_linux.py
@@ -31,15 +31,8 @@ class BootLinuxBase(Test):
 def download_boot(self):
 self.log.debug('Looking for and selecting a qemu-img binary to be '
'used to create the bootable snapshot image')
-# If qemu-img has been built, use it, otherwise the system wide one
-# will be used.  If none is available, the test will cancel.
-qemu_img = os.path.join(BUILD_DIR, 'qemu-img')
-if not os.path.exists(qemu_img):
-qemu_img = find_command('qemu-img', False)
-if qemu_img is False:
-self.cancel('Could not find "qemu-img", which is required to '
-'create the bootable image')
-vmimage.QEMU_IMG = qemu_img
+
+vmimage.QEMU_IMG = self.pick_qemu_util("qemu-img")

 self.log.info('Downloading/preparing boot image')
 # Fedora 31 only provides ppc64le images
--
2.30.0



pgp1CBmcpECDG.pgp
Description: OpenPGP digital signature


[PATCH v4 4/6] colo: Introduce high-level test suite

2021-02-07 Thread Lukas Straub
Add high-level test relying on the colo resource-agent to test
all failover cases while checking guest network connectivity.

Signed-off-by: Lukas Straub 
---
 scripts/colo-resource-agent/crm_master   |  44 ++
 scripts/colo-resource-agent/crm_resource |  12 +
 tests/acceptance/colo.py | 654 +++
 3 files changed, 710 insertions(+)
 create mode 100755 scripts/colo-resource-agent/crm_master
 create mode 100755 scripts/colo-resource-agent/crm_resource
 create mode 100644 tests/acceptance/colo.py

diff --git a/scripts/colo-resource-agent/crm_master 
b/scripts/colo-resource-agent/crm_master
new file mode 100755
index 00..886f523bda
--- /dev/null
+++ b/scripts/colo-resource-agent/crm_master
@@ -0,0 +1,44 @@
+#!/bin/bash
+
+# Fake crm_master for COLO testing
+#
+# Copyright (c) Lukas Straub 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+TMPDIR="$HA_RSCTMP"
+score=0
+query=0
+
+OPTIND=1
+while getopts 'Qql:Dv:N:G' opt; do
+case "$opt" in
+Q|q)
+# Noop
+;;
+"l")
+# Noop
+;;
+"D")
+score=0
+;;
+"v")
+score=$OPTARG
+;;
+"N")
+TMPDIR="$COLO_TEST_REMOTE_TMP"
+;;
+"G")
+query=1
+;;
+esac
+done
+
+if (( query )); then
+cat "${TMPDIR}/master_score" || exit 1
+else
+echo $score > "${TMPDIR}/master_score" || exit 1
+fi
+
+exit 0
diff --git a/scripts/colo-resource-agent/crm_resource 
b/scripts/colo-resource-agent/crm_resource
new file mode 100755
index 00..ad69ff3c6b
--- /dev/null
+++ b/scripts/colo-resource-agent/crm_resource
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+# Fake crm_resource for COLO testing
+#
+# Copyright (c) Lukas Straub 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+# Noop
+
+exit 0
diff --git a/tests/acceptance/colo.py b/tests/acceptance/colo.py
new file mode 100644
index 00..2a0027f0c8
--- /dev/null
+++ b/tests/acceptance/colo.py
@@ -0,0 +1,654 @@
+# High-level test suite for qemu COLO testing all failover cases while checking
+# guest network connectivity
+#
+# Copyright (c) Lukas Straub 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+# HOWTO:
+#
+# This test has the following parameters:
+# bridge_name: name of the bridge interface to connect qemu to
+# host_address: ip address of the bridge interface
+# guest_address: ip address that the guest gets from the dhcp server
+# bridge_helper: path to the brige helper
+#(default: /usr/lib/qemu/qemu-bridge-helper)
+# install_cmd: command to run to install iperf3 and memtester in the guest
+#  (default: "sudo -n dnf -q -y install iperf3 memtester")
+#
+# To run the network tests, you have to specify the parameters.
+#
+# Example for running the colo tests:
+# make check-acceptance FEDORA_31_ARCHES="x86_64" AVOCADO_TAGS="-t colo \
+#  -p bridge_name=br0 -p host_address=192.168.220.1 \
+#  -p guest_address=192.168.220.222"
+#
+# The colo tests currently only use x86_64 test vm images. With the
+# FEDORA_31_ARCHES make variable as in the example, only the x86_64 images will
+# be downloaded.
+#
+# If you're running the network tests as an unprivileged user, you need to set
+# the suid bit on the bridge helper (chmod +s ).
+#
+# The dhcp server should assign a static ip to the guest, else the test may be
+# unreliable. The Mac address for the guest is always 52:54:00:12:34:56.
+
+
+import sys
+import subprocess
+import shutil
+import os
+import signal
+import os.path
+import time
+import tempfile
+
+from avocado import skipUnless
+from avocado.utils import network
+from avocado.utils import vmimage
+from avocado.utils import cloudinit
+from avocado.utils import ssh
+from avocado.utils.path import find_command, CmdNotFoundError
+
+from avocado_qemu import Test, pick_default_qemu_bin, SOURCE_DIR
+from qemu.qmp import QEMUMonitorProtocol
+
+def iperf3_available():
+try:
+find_command("iperf3")
+except CmdNotFoundError:
+return False
+return True
+
+class Host:
+
+logdir = ""
+tmpdir = ""
+pid_file = ""
+master_score_file = ""
+qmp_sock = ""
+image = ""
+bridge_port = 0
+
+class ColoTest(Test):
+
+# Constants
+OCF_SUCCESS = 0
+OCF_ERR_GENERIC = 1
+OCF_ERR_ARGS = 2
+OCF_ERR_UNIMPLEMENTED = 3
+OCF_ERR_PERM = 4
+OCF_ERR_INSTALLED = 5
+OCF_ERR_CONFIGURED = 6
+OCF_NOT_RUNNING = 7
+OCF_RUNNING_MASTER = 8
+OCF_FAILED_MASTER = 9
+
+QEMU_OPTIONS = (" -display none -vga none -enable-kvm"
+" -smp 2 -cpu host -m 768"
+" -device e1000,mac=52:54:00:12:34:56,netdev=hn0"
+" -device virtio-blk,drive=colo-di

[PATCH v4 3/6] colo: Introduce resource agent

2021-02-07 Thread Lukas Straub
Introduce a resource agent which can be used to manage qemu COLO
in a pacemaker cluster.

Signed-off-by: Lukas Straub 
---
 scripts/colo-resource-agent/colo | 1527 ++
 1 file changed, 1527 insertions(+)
 create mode 100755 scripts/colo-resource-agent/colo

diff --git a/scripts/colo-resource-agent/colo b/scripts/colo-resource-agent/colo
new file mode 100755
index 00..dc53c2e601
--- /dev/null
+++ b/scripts/colo-resource-agent/colo
@@ -0,0 +1,1527 @@
+#!/usr/bin/env python3
+
+# Resource agent for qemu COLO for use with Pacemaker CRM
+#
+# Copyright (c) Lukas Straub 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+import subprocess
+import sys
+import os
+import os.path
+import signal
+import socket
+import select
+import json
+import re
+import time
+import logging
+import logging.handlers
+
+# Constants
+OCF_SUCCESS = 0
+OCF_ERR_GENERIC = 1
+OCF_ERR_ARGS = 2
+OCF_ERR_UNIMPLEMENTED = 3
+OCF_ERR_PERM = 4
+OCF_ERR_INSTALLED = 5
+OCF_ERR_CONFIGURED = 6
+OCF_NOT_RUNNING = 7
+OCF_RUNNING_MASTER = 8
+OCF_FAILED_MASTER = 9
+
+# Get environment variables
+OCF_RESKEY_CRM_meta_notify_type \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_type")
+OCF_RESKEY_CRM_meta_notify_operation \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_operation")
+OCF_RESKEY_CRM_meta_notify_key_operation \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_key_operation")
+OCF_RESKEY_CRM_meta_notify_start_uname \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_start_uname", "")
+OCF_RESKEY_CRM_meta_notify_stop_uname \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_stop_uname", "")
+OCF_RESKEY_CRM_meta_notify_active_uname \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_active_uname", "")
+OCF_RESKEY_CRM_meta_notify_promote_uname \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_promote_uname", "")
+OCF_RESKEY_CRM_meta_notify_demote_uname \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_demote_uname", "")
+OCF_RESKEY_CRM_meta_notify_master_uname \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_master_uname", "")
+OCF_RESKEY_CRM_meta_notify_slave_uname \
+= os.getenv("OCF_RESKEY_CRM_meta_notify_slave_uname", "")
+
+HA_RSCTMP = os.getenv("HA_RSCTMP", "/run/resource-agents")
+HA_LOGFACILITY = os.getenv("HA_LOGFACILITY")
+HA_LOGFILE = os.getenv("HA_LOGFILE")
+HA_DEBUG = os.getenv("HA_debug", "0")
+HA_DEBUGLOG = os.getenv("HA_DEBUGLOG")
+OCF_RESOURCE_INSTANCE = os.getenv("OCF_RESOURCE_INSTANCE", "default-instance")
+OCF_RESKEY_CRM_meta_timeout \
+= os.getenv("OCF_RESKEY_CRM_meta_timeout", "6")
+OCF_RESKEY_CRM_meta_interval \
+= int(os.getenv("OCF_RESKEY_CRM_meta_interval", "1"))
+OCF_RESKEY_CRM_meta_clone_max \
+= int(os.getenv("OCF_RESKEY_CRM_meta_clone_max", "1"))
+OCF_RESKEY_CRM_meta_clone_node_max \
+= int(os.getenv("OCF_RESKEY_CRM_meta_clone_node_max", "1"))
+OCF_RESKEY_CRM_meta_master_max \
+= int(os.getenv("OCF_RESKEY_CRM_meta_master_max", "1"))
+OCF_RESKEY_CRM_meta_master_node_max \
+= int(os.getenv("OCF_RESKEY_CRM_meta_master_node_max", "1"))
+OCF_RESKEY_CRM_meta_notify \
+= os.getenv("OCF_RESKEY_CRM_meta_notify")
+OCF_RESKEY_CRM_meta_globally_unique \
+= os.getenv("OCF_RESKEY_CRM_meta_globally_unique")
+
+HOSTNAME = os.getenv("OCF_RESKEY_CRM_meta_on_node", socket.gethostname())
+
+OCF_ACTION = os.getenv("__OCF_ACTION")
+if not OCF_ACTION and len(sys.argv) == 2:
+OCF_ACTION = sys.argv[1]
+
+# Resource parameters
+OCF_RESKEY_qemu_binary_default = "qemu-system-x86_64"
+OCF_RESKEY_qemu_img_binary_default = "qemu-img"
+OCF_RESKEY_log_dir_default = HA_RSCTMP
+OCF_RESKEY_options_default = ""
+OCF_RESKEY_active_hidden_dir_default = ""
+OCF_RESKEY_listen_address_default = "0.0.0.0"
+OCF_RESKEY_base_port_default = "9000"
+OCF_RESKEY_checkpoint_interval_default = "2"
+OCF_RESKEY_compare_timeout_default = "3000"
+OCF_RESKEY_expired_scan_cycle_default = "3000"
+OCF_RESKEY_max_queue_size_default = "1024"
+OCF_RESKEY_use_filter_rewriter_default = "true"
+OCF_RESKEY_vnet_hdr_default = "false"
+OCF_RESKEY_max_disk_errors_default = "1"
+OCF_RESKEY_monitor_timeout_default = "2"
+OCF_RESKEY_yank_timeout_default = "1"
+OCF_RESKEY_fail_fast_timeout_default = "5000"
+OCF_RESKEY_debug_default = "0"
+
+OCF_RESKEY_qemu_binary \
+= os.getenv("OCF_RESKEY_qemu_binary", OCF_RESKEY_qemu_binary_default)
+OCF_RESKEY_qemu_img_binary \
+= os.getenv("OCF_RESKEY_qemu_img_binary", 
OCF_RESKEY_qemu_img_binary_default)
+OCF_RESKEY_log_dir \
+= os.getenv("OCF_RESKEY_log_dir", OCF_RESKEY_log_dir_default)
+OCF_RESKEY_options \
+= os.getenv("OCF_RESKEY_options", OCF_RESKEY_options_default)
+OCF_RESKEY_active_hidden_dir \
+= os.getenv("OCF_RESKEY_active_hidden_dir", 
OCF_RESKEY_active_hidden_dir_default)
+OCF_RESKEY_listen_address \
+= os.getenv("OCF_RESKEY_listen_address", OCF_RESKEY_listen_address_default)
+OCF_RESKEY_base_port \
+= os.getenv("OCF_RESKEY_base_port", OCF_RESKEY_base_port_default)
+O

[PATCH v4 6/6] MAINTAINERS: Add myself as maintainer for COLO resource agent

2021-02-07 Thread Lukas Straub
Signed-off-by: Lukas Straub 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8d8b0bf966..d04567aa4d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2773,6 +2773,12 @@ F: net/colo*
 F: net/filter-rewriter.c
 F: net/filter-mirror.c

+COLO resource agent and testing
+M: Lukas Straub 
+S: Odd fixes
+F: scripts/colo-resource-agent/*
+F: tests/acceptance/colo.py
+
 Record/replay
 M: Pavel Dovgalyuk 
 R: Paolo Bonzini 
--
2.30.0


pgpptq8GTsga6.pgp
Description: OpenPGP digital signature


Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls

2021-02-07 Thread Stefan Weil

Am 04.02.21 um 02:44 schrieb Richard Henderson:


This requires adjusting where arguments are stored.
Place them on the stack at left-aligned positions.
Adjust the stack frame to be at entirely positive offsets.

Signed-off-by: Richard Henderson 
---

[...]

diff --git a/tcg/tci.c b/tcg/tci.c
index 6843e837ae..d27db9f720 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -18,6 +18,13 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu-common.h"
+#include "tcg/tcg.h"   /* MAX_OPC_PARAM_IARGS */
+#include "exec/cpu_ldst.h"
+#include "tcg/tcg-op.h"
+#include "qemu/compiler.h"
+#include 
+



ffi.h is not found on macOS with Homebrew.

This can be fixed by using pkg-config to find the right compiler (and 
maybe also linker) flags:


% pkg-config --cflags libffi
-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
% pkg-config --libs libffi
-lffi

Regards,

Stefan





Interested in contributing to QEMU

2021-02-07 Thread Niteesh G. S.
Hello all,

I am Niteesh, a junior student(3rd year) pursuing Electronics and
Communication
engineering. I was also a GSoC student for RTEMS last year. My main area of
interest is low-level development (OS, Emulators, Hardware design, etc).

I wanted to start contributing from last year itself but was occupied with
academic
work. I have started working on small patches. My ultimate goal is to learn
about
how QEMU works, contribute and learn as much as possible.

I tried going through the Arduino emulation code. I was able to understand
it from
a high level but couldn't understand underlying details. I went through few
blog
posts related to QEMU internals but they didn't help much. I plan to step
through
the code but the sheer size of the codebase is scary(Tips regarding
debugging are
very much welcomed). AFAIK the source code is mostly the documentation for
QEMU. If someone knows any docs or articles that will help a beginner get
started
it would be great.

I would also like to take part in GSoC this year. I find the below two
projects interesting
1)
https://wiki.qemu.org/Google_Summer_of_Code_2020#QEMU_emulated_Arduino_board_visualizer
This one is from last year AFAIK no one has worked on it. If so I would like
to work on it. I have CC'ed the mentors of this project to share some more
details
regarding it. Have you guys decided on the netlist parser lib, UI lib? Is
there something
that I could work on or read to get myself familiarized with the JSON event
IO stuff?

2)
https://wiki.qemu.org/Google_Summer_of_Code_2021#Interactive.2C_asynchronous_QEMU_Machine_Protocol_.28QMP.29_text_user_interface_.28TUI.29
This is something that I don't know much about. I have a basic idea about
what
QMP is but I never used it. The docs say that the Async QMP library is a
work
in progress. If someone can hook me up with some small tasks in this
library it would
be really helpful in improving my understanding.

I would like to work on these projects even outside of GSoC if someone is
ready to
mentor in their free time :).

Thanks
Niteesh.


Re: [PATCH] migration: Drop unused VMSTATE_FLOAT64 support

2021-02-07 Thread Philippe Mathieu-Daudé
On 10/22/20 2:08 PM, Peter Maydell wrote:
> Commit ef96e3ae9698d6 in January 2019 removed the last user of the
> VMSTATE_FLOAT64* macros. These were used by targets which defined
> their floating point register file as an array of 'float64'.

Similar candidate: VMSTATE_CPUDOUBLE_ARRAY()

> We used to try to maintain a stricter distinction between
> 'float64' (a type for holding an integer representing an IEEE float)
> and 'uint64_t', including having a debug option for 'float64' being
> a struct and supposedly mandatory macros for converting between
> float64 and uint64_t. We no longer think that's a usefully
> strong distinction to draw and we allow ourselves to freely
> assume that float64 really is just a 64-bit integer type, so
> for new targets we would simply recommend use of the uint64_t type
> for a floating point register file. The float64 type remains
> as a useful way of documenting in the type signature of helper
> functions and the like that they expect to receive an IEEE float
> from the TCG generated code rather than an arbitrary integer.
> 
> Since the VMSTATE_FLOAT64* macros have no remaining users and
> we don't recommend new code uses them, delete them.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/migration/vmstate.h | 13 -
>  migration/vmstate-types.c   | 26 --
>  2 files changed, 39 deletions(-)



Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls

2021-02-07 Thread Richard Henderson
On 2/7/21 8:25 AM, Stefan Weil wrote:
>> +#include "qemu-common.h"
>> +#include "tcg/tcg.h"   /* MAX_OPC_PARAM_IARGS */
>> +#include "exec/cpu_ldst.h"
>> +#include "tcg/tcg-op.h"
>> +#include "qemu/compiler.h"
>> +#include 
>> +
> 
> 
> ffi.h is not found on macOS with Homebrew.
> 
> This can be fixed by using pkg-config to find the right compiler (and maybe
> also linker) flags:
> 
> % pkg-config --cflags libffi
> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
> % pkg-config --libs libffi
> -lffi


Which is exactly what I do in the previous patch:


> +++ b/meson.build
> @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
>'tcg/tcg-op.c',
>'tcg/tcg.c',
>  ))
> -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
> +
> +if get_option('tcg_interpreter')
> +  libffi = dependency('libffi', version: '>=3.0',
> +  static: enable_static, method: 'pkg-config',
> +  required: true)
> +  specific_ss.add(libffi)
> +  specific_ss.add(files('tcg/tci.c'))
> +endif

Did you need a PKG_CONFIG_LIBDIR set for homebrew?


r~



Re: [PATCH v4 2/5] acpi: Permit OEM ID and OEM table ID fields to be changed

2021-02-07 Thread Marian Postevca
"Michael S. Tsirkin"  writes:

>
>
> I queued this but there's a lot of code duplication with this.
> Further, the use of g_strdup adds unnecessary dynamic memory
> management where it's not needed.
> I'd prefer
> -   a new struct AcpiBuildOem including the correct strings
> -   use sizeof of fields in above instead of 8/6
> -   move shared strings and code into a common header
>

So how should I approach this since the patches are queued? A new patch
with the suggested changes, or resending the original patches?



[Bug 1914117] Re: Short files returned via FTP on Qemu with various architectures and OSes

2021-02-07 Thread Chris Pinnock
The more I look at this, the more I think it may be a macOS bug
underneath.

I've tested OpenBSD as a guest on a Debian AWS instance running 4.2.1 - all is 
fine.
I've tested OpenBSD as a guest on a FreeBSD AWS instance running whatever is in 
ports and all is fine.

Also others are having trouble:
https://twitter.com/astr0baby/status/1354952352713887754
Mac OS on M1 silicon with Free and NetBSD as guest OS.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1914117

Title:
  Short files returned via FTP on Qemu with various architectures and
  OSes

Status in QEMU:
  New

Bug description:
  
  Qemu 5.2 on Mac OS X Big Sur.

  I originally thought that it might be caused by the home-brew version of 
Qemu, but this evening I have removed the brew edition and compiled from 
scratch (using Ninja & Xcode compiler). 
  Still getting the same problem,.

  On the following architectures: 
  arm64, amd64 and sometimes i386 running NetBSD host OS; 
  i386 running OpenBSD host OS:

  I have seen a consistent problem with FTP returning short files. The
  file will be a couple of bytes too short. I do not believe this is a
  problem with the OS. Downloading the perl source code from CPAN does
  not work properly, nor does downloading bind from isc. I've tried this
  on different architectures as above.

  (Qemu 4.2 on Ubuntu/x86_64 with NetBSD/i386 seems to function fine. My
  gut feel is there is something not right on the Mac OS version of Qemu
  or a bug in 5.2 - obviously in the network layer somewhere. If you
  have anything you want me to try, please let me know - happy to help
  get a resolution.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1914117/+subscriptions



Re: Increased execution time with TCI in latest git master (was: Re: [PULL 00/46] tcg patch queue)

2021-02-07 Thread Richard Henderson
On 2/7/21 2:50 AM, Stefan Weil wrote:
> Your latest code from the rth7680/tci-next branch is twice as fast as my code
> with BIOS boot and qemu-x86_64 on sparc64. That's great.
> 
> With that code I don't get any BIOS output at all when running qemu-i386.
> That's not so good.
> 
> Did I test the correct branch? If yes, I could try the same test on amd64 and
> arm64 hosts.

Yes, tci-next is the correct branch.  I've just rebased it against master,
which includes the first 30-odd patches.

What host do you not see bios output from qemu-system-i386 (I assume that's a
typo above)?  I see correct output on x86_64, sparc64, ppc64le, and aarch64 
hosts.


r~



Re: [PATCH v4 2/5] acpi: Permit OEM ID and OEM table ID fields to be changed

2021-02-07 Thread Michael S. Tsirkin
On Sun, Feb 07, 2021 at 08:23:33PM +0200, Marian Postevca wrote:
> "Michael S. Tsirkin"  writes:
> 
> >
> >
> > I queued this but there's a lot of code duplication with this.
> > Further, the use of g_strdup adds unnecessary dynamic memory
> > management where it's not needed.
> > I'd prefer
> > -   a new struct AcpiBuildOem including the correct strings
> > -   use sizeof of fields in above instead of 8/6
> > -   move shared strings and code into a common header
> >
> 
> So how should I approach this since the patches are queued? A new patch
> with the suggested changes, or resending the original patches?

A patch on top please. They are merged so really easy, just basing on
masted should be good.

-- 
MST




Re: [PATCH] migration: Drop unused VMSTATE_FLOAT64 support

2021-02-07 Thread Peter Maydell
On Sun, 7 Feb 2021 at 17:10, Philippe Mathieu-Daudé  wrote:
>
> On 10/22/20 2:08 PM, Peter Maydell wrote:
> > Commit ef96e3ae9698d6 in January 2019 removed the last user of the
> > VMSTATE_FLOAT64* macros. These were used by targets which defined
> > their floating point register file as an array of 'float64'.
>
> Similar candidate: VMSTATE_CPUDOUBLE_ARRAY()

Isn't that still used by target/sparc ?

-- PMM



Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls

2021-02-07 Thread Peter Maydell
On Sun, 7 Feb 2021 at 17:41, Richard Henderson
 wrote:
>
> On 2/7/21 8:25 AM, Stefan Weil wrote:
> >> +#include "qemu-common.h"
> >> +#include "tcg/tcg.h"   /* MAX_OPC_PARAM_IARGS */
> >> +#include "exec/cpu_ldst.h"
> >> +#include "tcg/tcg-op.h"
> >> +#include "qemu/compiler.h"
> >> +#include 
> >> +
> >
> >
> > ffi.h is not found on macOS with Homebrew.
> >
> > This can be fixed by using pkg-config to find the right compiler (and maybe
> > also linker) flags:
> >
> > % pkg-config --cflags libffi
> > -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
> > % pkg-config --libs libffi
> > -lffi
>
>
> Which is exactly what I do in the previous patch:
>
>
> > +++ b/meson.build
> > @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
> >'tcg/tcg-op.c',
> >'tcg/tcg.c',
> >  ))
> > -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: 
> > files('tcg/tci.c'))
> > +
> > +if get_option('tcg_interpreter')
> > +  libffi = dependency('libffi', version: '>=3.0',
> > +  static: enable_static, method: 'pkg-config',
> > +  required: true)
> > +  specific_ss.add(libffi)
> > +  specific_ss.add(files('tcg/tci.c'))
> > +endif
>
> Did you need a PKG_CONFIG_LIBDIR set for homebrew?

Is this the "meson doesn't actually add the cflags everywhere it should"
bug again ?

thanks
-- PMM



Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls

2021-02-07 Thread Richard Henderson
On 2/7/21 11:52 AM, Peter Maydell wrote:
> On Sun, 7 Feb 2021 at 17:41, Richard Henderson
>  wrote:
>>
>> On 2/7/21 8:25 AM, Stefan Weil wrote:
 +#include "qemu-common.h"
 +#include "tcg/tcg.h"   /* MAX_OPC_PARAM_IARGS */
 +#include "exec/cpu_ldst.h"
 +#include "tcg/tcg-op.h"
 +#include "qemu/compiler.h"
 +#include 
 +
>>>
>>>
>>> ffi.h is not found on macOS with Homebrew.
>>>
>>> This can be fixed by using pkg-config to find the right compiler (and maybe
>>> also linker) flags:
>>>
>>> % pkg-config --cflags libffi
>>> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
>>> % pkg-config --libs libffi
>>> -lffi
>>
>>
>> Which is exactly what I do in the previous patch:
>>
>>
>>> +++ b/meson.build
>>> @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
>>>'tcg/tcg-op.c',
>>>'tcg/tcg.c',
>>>  ))
>>> -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: 
>>> files('tcg/tci.c'))
>>> +
>>> +if get_option('tcg_interpreter')
>>> +  libffi = dependency('libffi', version: '>=3.0',
>>> +  static: enable_static, method: 'pkg-config',
>>> +  required: true)
>>> +  specific_ss.add(libffi)
>>> +  specific_ss.add(files('tcg/tci.c'))
>>> +endif
>>
>> Did you need a PKG_CONFIG_LIBDIR set for homebrew?
> 
> Is this the "meson doesn't actually add the cflags everywhere it should"
> bug again ?

I guess so.  I realized after sending this reply that PKG_CONFIG_LIBDIR can't
be the answer, since the original configure should have failed if pkg-config
didn't find ffi.

Was there a resolution to said meson bug?


r~



Re: [PATCH 0/2] utils/fifo8: minor updates

2021-02-07 Thread Mark Cave-Ayland

On 28/01/2021 22:17, Mark Cave-Ayland wrote:


This patchset contains a couple of minor updates to QEMU's Fifo8 implementation
conceived whilst working on the next revision of the ESP series.

Patch 1 has already been reviewed on-list whilst patch 2 adds a new
VMSTATE_FIFO8_TEST macro which is required to allow the updated ESP series
to handle incoming migrations from previous QEMU versions.

Signed-off-by: Mark Cave-Ayland 


Mark Cave-Ayland (2):
   utils/fifo8: change fatal errors from abort() to assert()
   utils/fifo8: add VMSTATE_FIFO8_TEST macro

  include/qemu/fifo8.h | 16 ++--
  util/fifo8.c | 16 
  2 files changed, 14 insertions(+), 18 deletions(-)


I've applied these to my qemu-sparc branch and will send a PR shortly since they are 
a pre-requisite to the respin of the ESP patchset.



ATB,

Mark.



Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls

2021-02-07 Thread Stefan Weil
On 07.02.21 21:12, Richard Henderson wrote:
> On 2/7/21 11:52 AM, Peter Maydell wrote:
>> On Sun, 7 Feb 2021 at 17:41, Richard Henderson
>>  wrote:
>>>
>>> On 2/7/21 8:25 AM, Stefan Weil wrote:
> +#include "qemu-common.h"
> +#include "tcg/tcg.h"   /* MAX_OPC_PARAM_IARGS */
> +#include "exec/cpu_ldst.h"
> +#include "tcg/tcg-op.h"
> +#include "qemu/compiler.h"
> +#include 
> +


 ffi.h is not found on macOS with Homebrew.

 This can be fixed by using pkg-config to find the right compiler (and maybe
 also linker) flags:

 % pkg-config --cflags libffi
 -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
 % pkg-config --libs libffi
 -lffi
>>>
>>>
>>> Which is exactly what I do in the previous patch:
>>>
>>>
 +++ b/meson.build
 @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
'tcg/tcg-op.c',
'tcg/tcg.c',
  ))
 -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: 
 files('tcg/tci.c'))
 +
 +if get_option('tcg_interpreter')
 +  libffi = dependency('libffi', version: '>=3.0',
 +  static: enable_static, method: 'pkg-config',
 +  required: true)
 +  specific_ss.add(libffi)
 +  specific_ss.add(files('tcg/tci.c'))
 +endif
>>>
>>> Did you need a PKG_CONFIG_LIBDIR set for homebrew?
>>
>> Is this the "meson doesn't actually add the cflags everywhere it should"
>> bug again ?
> 
> I guess so.  I realized after sending this reply that PKG_CONFIG_LIBDIR can't
> be the answer, since the original configure should have failed if pkg-config
> didn't find ffi.
> 
> Was there a resolution to said meson bug?

Meanwhile I noticed an additional detail:

There exist two different pkg-config configurations for libffi on Homebrew:

% pkg-config --cflags libffi
-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
% export PKG_CONFIG_PATH="/opt/homebrew/opt/libffi/lib/pkgconfig"
% pkg-config --cflags libffi
-I/opt/homebrew/Cellar/libffi/3.3_2/include

By default it points to a system directory which does not exist at all
on my Mac, so that will never work.

With the right PKG_CONFIG_PATH a correct include directory is set, and
the latest rebased tci-next branch now works for me with a compiler warning:

/opt/homebrew/Cellar/libffi/3.3_2/include/ffi.h:441:5: warning:
'FFI_GO_CLOSURES' is not defined, evaluates to 0 [-Wundef]

Stefan



[PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

A Write Zeroes commands should not be counted in either the 'Data Units
Written' or in 'Host Write Commands' SMART/Health Information Log page.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b46925ddd18..e4a01cf9edc5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2088,7 +2088,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
  nvme_rw_cb, req);
 }
 } else {
-block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE);
 req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
req);
-- 
2.30.0




[PATCH RFC v2 4/8] hw/block/nvme: try to deal with the iov/qsg duality

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

Introduce NvmeSg and try to deal with that pesky qsg/iov duality that
haunts all the memory-related functions.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h |   8 ++-
 hw/block/nvme.c | 171 
 2 files changed, 90 insertions(+), 89 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index cb2b5175f1a1..0e4fbd6990ad 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -29,6 +29,11 @@ typedef struct NvmeAsyncEvent {
 NvmeAerResult result;
 } NvmeAsyncEvent;
 
+typedef struct NvmeSg {
+QEMUSGList   qsg;
+QEMUIOVector iov;
+} NvmeSg;
+
 typedef struct NvmeRequest {
 struct NvmeSQueue   *sq;
 struct NvmeNamespace*ns;
@@ -38,8 +43,7 @@ typedef struct NvmeRequest {
 NvmeCqe cqe;
 NvmeCmd cmd;
 BlockAcctCookie acct;
-QEMUSGList  qsg;
-QEMUIOVectoriov;
+NvmeSg  sg;
 QTAILQ_ENTRY(NvmeRequest)entry;
 } NvmeRequest;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 29902038d618..a0009c057f1e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -428,14 +428,20 @@ static void nvme_req_clear(NvmeRequest *req)
 req->status = NVME_SUCCESS;
 }
 
-static void nvme_req_exit(NvmeRequest *req)
+static inline void nvme_sg_init(NvmeCtrl *n, NvmeSg *sg)
 {
-if (req->qsg.sg) {
-qemu_sglist_destroy(&req->qsg);
+pci_dma_sglist_init(&sg->qsg, &n->parent_obj, 0);
+qemu_iovec_init(&sg->iov, 0);
+}
+
+static inline void nvme_sg_unmap(NvmeSg *sg)
+{
+if (sg->qsg.sg) {
+qemu_sglist_destroy(&sg->qsg);
 }
 
-if (req->iov.iov) {
-qemu_iovec_destroy(&req->iov);
+if (sg->iov.iov) {
+qemu_iovec_destroy(&sg->iov);
 }
 }
 
@@ -473,8 +479,7 @@ static uint16_t nvme_map_addr_pmr(NvmeCtrl *n, QEMUIOVector 
*iov, hwaddr addr,
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
-  hwaddr addr, size_t len)
+static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len)
 {
 bool cmb = false, pmr = false;
 
@@ -491,34 +496,22 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList 
*qsg, QEMUIOVector *iov,
 }
 
 if (cmb || pmr) {
-if (qsg && qsg->sg) {
+if (sg->qsg.nsg) {
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
-assert(iov);
-
-if (!iov->iov) {
-qemu_iovec_init(iov, 1);
-}
-
 if (cmb) {
-return nvme_map_addr_cmb(n, iov, addr, len);
+return nvme_map_addr_cmb(n, &sg->iov, addr, len);
 } else {
-return nvme_map_addr_pmr(n, iov, addr, len);
+return nvme_map_addr_pmr(n, &sg->iov, addr, len);
 }
 }
 
-if (iov && iov->iov) {
+if (sg->iov.niov) {
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
-assert(qsg);
-
-if (!qsg->sg) {
-pci_dma_sglist_init(qsg, &n->parent_obj, 1);
-}
-
-qemu_sglist_add(qsg, addr, len);
+qemu_sglist_add(&sg->qsg, addr, len);
 
 return NVME_SUCCESS;
 }
@@ -532,20 +525,13 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 uint16_t status;
 int ret;
 
-QEMUSGList *qsg = &req->qsg;
-QEMUIOVector *iov = &req->iov;
-
 trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
 
-if (nvme_addr_is_cmb(n, prp1) || (nvme_addr_is_pmr(n, prp1))) {
-qemu_iovec_init(iov, num_prps);
-} else {
-pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
-}
+nvme_sg_init(n, &req->sg);
 
-status = nvme_map_addr(n, qsg, iov, prp1, trans_len);
+status = nvme_map_addr(n, &req->sg, prp1, trans_len);
 if (status) {
-return status;
+goto unmap;
 }
 
 len -= trans_len;
@@ -560,7 +546,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
 if (ret) {
 trace_pci_nvme_err_addr_read(prp2);
-return NVME_DATA_TRAS_ERROR;
+status = NVME_DATA_TRAS_ERROR;
+goto unmap;
 }
 while (len != 0) {
 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
@@ -568,7 +555,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 if (i == n->max_prp_ents - 1 && len > n->page_size) {
 if (unlikely(prp_ent & (n->page_size - 1))) {
 trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
-return NVME_INVALID_PRP_OFFSET | NVME_DNR;
+status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
+goto unmap;
 }
 
 i = 0;
@@ -578,20 +566,22 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 

[PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

The 'len' member of the nvme_compare_ctx struct is redundant since the
same information is available in the 'iov' member.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b84e34843f5..6b46925ddd18 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1656,7 +1656,6 @@ static void nvme_aio_copy_in_cb(void *opaque, int ret)
 struct nvme_compare_ctx {
 QEMUIOVector iov;
 uint8_t *bounce;
-size_t len;
 };
 
 static void nvme_compare_cb(void *opaque, int ret)
@@ -1677,16 +1676,16 @@ static void nvme_compare_cb(void *opaque, int ret)
 goto out;
 }
 
-buf = g_malloc(ctx->len);
+buf = g_malloc(ctx->iov.size);
 
-status = nvme_dma(nvme_ctrl(req), buf, ctx->len, DMA_DIRECTION_TO_DEVICE,
-  req);
+status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
+  DMA_DIRECTION_TO_DEVICE, req);
 if (status) {
 req->status = status;
 goto out;
 }
 
-if (memcmp(buf, ctx->bounce, ctx->len)) {
+if (memcmp(buf, ctx->bounce, ctx->iov.size)) {
 req->status = NVME_CMP_FAILURE;
 }
 
@@ -1924,7 +1923,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 ctx = g_new(struct nvme_compare_ctx, 1);
 ctx->bounce = bounce;
-ctx->len = len;
 
 req->opaque = ctx;
 
-- 
2.30.0




[PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

Fix missing sign inversion.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e4a01cf9edc5..29902038d618 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1150,7 +1150,7 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 break;
 }
 
-trace_pci_nvme_err_aio(nvme_cid(req), strerror(ret), status);
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
 
 error_setg_errno(&local_err, -ret, "aio failed");
 error_report_err(local_err);
-- 
2.30.0




[PATCH RFC v2 6/8] hw/block/nvme: refactor nvme_dma

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

The nvme_dma function doesn't just do DMA (QEMUSGList-based) memory transfers;
it also handles QEMUIOVector copies.

Introduce the NvmeTxDirection enum and rename to nvme_tx. Remove mapping
of PRPs/SGLs from nvme_tx and instead assert that they have been mapped
previously. This allows more fine-grained use in subsequent patches.

Add new (better named) helpers, nvme_{c2h,h2c}, that does both PRP/SGL
mapping and transfer.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 143 ++--
 1 file changed, 77 insertions(+), 66 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 24156699b035..2752e5d8572a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -843,48 +843,72 @@ static uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, 
size_t len,
 }
 }
 
-static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
- DMADirection dir, NvmeRequest *req)
+typedef enum NvmeTxDirection {
+NVME_TX_DIRECTION_TO_DEVICE   = 0,
+NVME_TX_DIRECTION_FROM_DEVICE = 1,
+} NvmeTxDirection;
+
+static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, uint8_t *ptr, uint32_t len,
+NvmeTxDirection dir)
 {
-uint16_t status = NVME_SUCCESS;
+/* assert that exactly one of qsg and iov carries data */
+assert((sg->qsg.nsg > 0) != (sg->iov.niov > 0));
+
+if (sg->qsg.nsg > 0) {
+uint64_t residual;
+
+if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+residual = dma_buf_write(ptr, len, &sg->qsg);
+} else {
+residual = dma_buf_read(ptr, len, &sg->qsg);
+}
+
+if (unlikely(residual)) {
+trace_pci_nvme_err_invalid_dma();
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+} else {
+size_t bytes;
+
+if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+bytes = qemu_iovec_to_buf(&sg->iov, 0, ptr, len);
+} else {
+bytes = qemu_iovec_from_buf(&sg->iov, 0, ptr, len);
+}
+
+if (unlikely(bytes != len)) {
+trace_pci_nvme_err_invalid_dma();
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+}
+
+return NVME_SUCCESS;
+}
+
+static inline uint16_t nvme_c2h(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+NvmeRequest *req)
+{
+uint16_t status;
 
 status = nvme_map_dptr(n, &req->sg, len, &req->cmd);
 if (status) {
 return status;
 }
 
-/* assert that only one of qsg and iov carries data */
-assert((req->sg.qsg.nsg > 0) != (req->sg.iov.niov > 0));
+return nvme_tx(n, &req->sg, ptr, len, NVME_TX_DIRECTION_FROM_DEVICE);
+}
 
-if (req->sg.qsg.nsg > 0) {
-uint64_t residual;
+static inline uint16_t nvme_h2c(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+NvmeRequest *req)
+{
+uint16_t status;
 
-if (dir == DMA_DIRECTION_TO_DEVICE) {
-residual = dma_buf_write(ptr, len, &req->sg.qsg);
-} else {
-residual = dma_buf_read(ptr, len, &req->sg.qsg);
-}
-
-if (unlikely(residual)) {
-trace_pci_nvme_err_invalid_dma();
-status = NVME_INVALID_FIELD | NVME_DNR;
-}
-} else {
-size_t bytes;
-
-if (dir == DMA_DIRECTION_TO_DEVICE) {
-bytes = qemu_iovec_to_buf(&req->sg.iov, 0, ptr, len);
-} else {
-bytes = qemu_iovec_from_buf(&req->sg.iov, 0, ptr, len);
-}
-
-if (unlikely(bytes != len)) {
-trace_pci_nvme_err_invalid_dma();
-status = NVME_INVALID_FIELD | NVME_DNR;
-}
+status = nvme_map_dptr(n, &req->sg, len, &req->cmd);
+if (status) {
+return status;
 }
 
-return status;
+return nvme_tx(n, &req->sg, ptr, len, NVME_TX_DIRECTION_TO_DEVICE);
 }
 
 static inline void nvme_blk_read(BlockBackend *blk, int64_t offset,
@@ -1683,8 +1707,7 @@ static void nvme_compare_cb(void *opaque, int ret)
 
 buf = g_malloc(ctx->iov.size);
 
-status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(nvme_ctrl(req), buf, ctx->iov.size, req);
 if (status) {
 req->status = status;
 goto out;
@@ -1720,8 +1743,7 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
 NvmeDsmRange range[nr];
 uintptr_t *discards = (uintptr_t *)&req->opaque;
 
-status = nvme_dma(n, (uint8_t *)range, sizeof(range),
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(n, (uint8_t *)range, sizeof(range), req);
 if (status) {
 return status;
 }
@@ -1803,8 +1825,8 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 
 range = g_new(NvmeCopySourceRange, nr);
 
-status = nvme_dma(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(n, (uin

[PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

This is RFC v2 of a series that adds support for metadata and end-to-end
data protection.

First, on the subject of metadata, in v1, support was restricted to
extended logical blocks, which was pretty trivial to implement, but
required special initialization and broke DULBE. In v2, metadata is
always stored continuously at the end of the underlying block device.
This has the advantage of not breaking DULBE since the data blocks
remains aligned and allows bdrv_block_status to be used to determinate
allocation status. It comes at the expense of complicating the extended
LBA emulation, but on the other hand it also gains support for metadata
transfered as a separate buffer.

The end-to-end data protection support blew up in terms of required
changes. This is due to the fact that a bunch of new commands has been
added to the device since v1 (zone append, compare, copy), and they all
require various special handling for protection information. If
potential reviewers would like it split up into multiple patches, each
adding pi support to one command, shout out.

The core of the series (metadata and eedp) is preceeded by a set of
patches that refactors mapping (yes, again) and tries to deal with the
qsg/iov duality mess (maybe also again?).

Support fro metadata and end-to-end data protection is all joint work
with Gollu Appalanaidu.

Klaus Jensen (8):
  hw/block/nvme: remove redundant len member in compare context
  hw/block/nvme: remove block accounting for write zeroes
  hw/block/nvme: fix strerror printing
  hw/block/nvme: try to deal with the iov/qsg duality
  hw/block/nvme: remove the req dependency in map functions
  hw/block/nvme: refactor nvme_dma
  hw/block/nvme: add metadata support
  hw/block/nvme: end-to-end data protection

 hw/block/nvme-ns.h|   41 +-
 hw/block/nvme.h   |   44 +-
 include/block/nvme.h  |   26 +-
 hw/block/nvme-ns.c|   29 +-
 hw/block/nvme.c   | 1687 +++--
 hw/block/trace-events |   19 +-
 6 files changed, 1574 insertions(+), 272 deletions(-)

-- 
2.30.0




[PATCH RFC v2 7/8] hw/block/nvme: add metadata support

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

Add support for metadata in the form of extended logical blocks as well
as a separate buffer of data. The new `ms` nvme-ns device parameter
specifies the size of metadata per logical block in bytes. The `mset`
nvme-ns device parameter controls whether metadata is transfered as part
of an extended lba (set to '1') or in a separate buffer (set to '0',
the default).

Regardsless of the scheme chosen with `mset`, metadata is stored at the
end of the namespace backing block device. This requires the user
provided PRP/SGLs to be walked and "split" into data and metadata
scatter/gather lists if the extended logical block scheme is used, but
has the advantage of not breaking the deallocated blocks support.

Signed-off-by: Klaus Jensen 
Signed-off-by: Gollu Appalanaidu 
---
 hw/block/nvme-ns.h|  39 ++-
 hw/block/nvme-ns.c|  18 +-
 hw/block/nvme.c   | 652 --
 hw/block/trace-events |   4 +-
 4 files changed, 618 insertions(+), 95 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 7af6884862b5..2281fd39930a 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,9 @@ typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 QemuUUID uuid;
 
+uint16_t ms;
+uint8_t  mset;
+
 uint16_t mssrl;
 uint32_t mcl;
 uint8_t  msrc;
@@ -47,6 +50,7 @@ typedef struct NvmeNamespace {
 BlockConfblkconf;
 int32_t  bootindex;
 int64_t  size;
+int64_t  mdata_offset;
 NvmeIdNs id_ns;
 const uint32_t *iocs;
 uint8_t  csi;
@@ -99,18 +103,41 @@ static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
 return nvme_ns_lbaf(ns)->ds;
 }
 
-/* calculate the number of LBAs that the namespace can accomodate */
-static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
-{
-return ns->size >> nvme_ns_lbads(ns);
-}
-
 /* convert an LBA to the equivalent in bytes */
 static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
 {
 return lba << nvme_ns_lbads(ns);
 }
 
+static inline size_t nvme_lsize(NvmeNamespace *ns)
+{
+return 1 << nvme_ns_lbads(ns);
+}
+
+static inline uint16_t nvme_msize(NvmeNamespace *ns)
+{
+return nvme_ns_lbaf(ns)->ms;
+}
+
+static inline size_t nvme_m2b(NvmeNamespace *ns, uint64_t lba)
+{
+return nvme_msize(ns) * lba;
+}
+
+static inline bool nvme_ns_ext(NvmeNamespace *ns)
+{
+return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas);
+}
+
+/* calculate the number of LBAs that the namespace can accomodate */
+static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
+{
+if (ns->params.ms) {
+return ns->size / (nvme_lsize(ns) + nvme_msize(ns));
+}
+return ns->size >> nvme_ns_lbads(ns);
+}
+
 typedef struct NvmeCtrl NvmeCtrl;
 
 static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone)
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c3b513b0fc78..7a662c170428 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -37,13 +37,25 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 BlockDriverInfo bdi;
 NvmeIdNs *id_ns = &ns->id_ns;
 int lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
-int npdg;
+int npdg, nlbas;
 
 ns->id_ns.dlfeat = 0x9;
 
 id_ns->lbaf[lba_index].ds = 31 - clz32(ns->blkconf.logical_block_size);
+id_ns->lbaf[lba_index].ms = ns->params.ms;
 
-id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
+if (ns->params.ms) {
+id_ns->mc = 0x3;
+
+if (ns->params.mset) {
+id_ns->flbas |= 0x10;
+}
+}
+
+nlbas = nvme_ns_nlbas(ns);
+
+id_ns->nsze = cpu_to_le64(nlbas);
+ns->mdata_offset = nvme_l2b(ns, nlbas);
 
 ns->csi = NVME_CSI_NVM;
 
@@ -395,6 +407,8 @@ static Property nvme_ns_props[] = {
  NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
+DEFINE_PROP_UINT16("ms", NvmeNamespace, params.ms, 0),
+DEFINE_PROP_UINT8("mset", NvmeNamespace, params.mset, 0),
 DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
 DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128),
 DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 2752e5d8572a..8aa892ec3106 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -339,6 +339,26 @@ static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 return pci_dma_read(&n->parent_obj, addr, buf, size);
 }
 
+static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+{
+hwaddr hi = addr + size - 1;
+if (hi < addr) {
+return 1;
+}
+
+if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
+memcpy(nvme_addr_to_cmb(n, addr), buf, size);
+return 0;
+}
+
+if (nvme_addr_is_pmr(n, addr) && nvme_addr_is_pmr(n, hi)) {
+memcpy(nvme_addr_to_pmr(n, addr), buf, size);
+  

[PATCH RFC v2 5/8] hw/block/nvme: remove the req dependency in map functions

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

The PRP and SGL mapping functions does not have any particular need for
the entire NvmeRequest as a parameter. Clean it up.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 61 ++-
 hw/block/trace-events |  4 +--
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a0009c057f1e..24156699b035 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -516,8 +516,8 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, 
hwaddr addr, size_t len)
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
- uint32_t len, NvmeRequest *req)
+static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1,
+ uint64_t prp2, uint32_t len)
 {
 hwaddr trans_len = n->page_size - (prp1 % n->page_size);
 trans_len = MIN(len, trans_len);
@@ -527,9 +527,9 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 
 trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
 
-nvme_sg_init(n, &req->sg);
+nvme_sg_init(n, sg);
 
-status = nvme_map_addr(n, &req->sg, prp1, trans_len);
+status = nvme_map_addr(n, sg, prp1, trans_len);
 if (status) {
 goto unmap;
 }
@@ -579,7 +579,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 }
 
 trans_len = MIN(len, n->page_size);
-status = nvme_map_addr(n, &req->sg, prp_ent, trans_len);
+status = nvme_map_addr(n, sg, prp_ent, trans_len);
 if (status) {
 goto unmap;
 }
@@ -593,7 +593,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
 goto unmap;
 }
-status = nvme_map_addr(n, &req->sg, prp2, len);
+status = nvme_map_addr(n, sg, prp2, len);
 if (status) {
 goto unmap;
 }
@@ -603,7 +603,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 return NVME_SUCCESS;
 
 unmap:
-nvme_sg_unmap(&req->sg);
+nvme_sg_unmap(sg);
 return status;
 }
 
@@ -613,7 +613,7 @@ unmap:
  */
 static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
   NvmeSglDescriptor *segment, uint64_t nsgld,
-  size_t *len, NvmeRequest *req)
+  size_t *len, NvmeCmd *cmd)
 {
 dma_addr_t addr, trans_len;
 uint32_t dlen;
@@ -624,7 +624,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
 
 switch (type) {
 case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
-if (req->cmd.opcode == NVME_CMD_WRITE) {
+if (cmd->opcode == NVME_CMD_WRITE) {
 continue;
 }
 case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
@@ -653,7 +653,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
 break;
 }
 
-trace_pci_nvme_err_invalid_sgl_excess_length(nvme_cid(req));
+trace_pci_nvme_err_invalid_sgl_excess_length(dlen);
 return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
 }
 
@@ -682,7 +682,7 @@ next:
 }
 
 static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
- size_t len, NvmeRequest *req)
+ size_t len, NvmeCmd *cmd)
 {
 /*
  * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
@@ -705,14 +705,14 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 sgld = &sgl;
 addr = le64_to_cpu(sgl.addr);
 
-trace_pci_nvme_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), len);
+trace_pci_nvme_map_sgl(NVME_SGL_TYPE(sgl.type), len);
 
 /*
  * If the entire transfer can be described with a single data block it can
  * be mapped directly.
  */
 if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
-status = nvme_map_sgl_data(n, sg, sgld, 1, &len, req);
+status = nvme_map_sgl_data(n, sg, sgld, 1, &len, cmd);
 if (status) {
 goto unmap;
 }
@@ -751,7 +751,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 }
 
 status = nvme_map_sgl_data(n, sg, segment, SEG_CHUNK_SIZE,
-   &len, req);
+   &len, cmd);
 if (status) {
 goto unmap;
 }
@@ -777,7 +777,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 switch (NVME_SGL_TYPE(last_sgld->type)) {
 case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
 case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
-status = nvme_map_sgl_data(n, sg, 

[PATCH RFC v2 8/8] hw/block/nvme: end-to-end data protection

2021-02-07 Thread Klaus Jensen
From: Klaus Jensen 

Add support for namespaces formatted with protection information. The
type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is
selected with the `pi` nvme-ns device parameter. If the number of
metadata bytes is larger than 8, the `pil` nvme-ns device parameter may
be used to control the location of the 8-byte DIF tuple. The default
`pil` value of '0', causes the DIF tuple to be transferred as the last
8 bytes of the metadata. Set to 1 to store this in the first eight bytes
instead.

This patch is based on work by Gollu Appalanaidu.

Signed-off-by: Klaus Jensen 
Signed-off-by: Gollu Appalanaidu 
---
 hw/block/nvme-ns.h|   2 +
 hw/block/nvme.h   |  36 +++
 include/block/nvme.h  |  26 +-
 hw/block/nvme-ns.c|  11 +
 hw/block/nvme.c   | 731 --
 hw/block/trace-events |  11 +
 6 files changed, 793 insertions(+), 24 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 2281fd39930a..e537bfba18b8 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -31,6 +31,8 @@ typedef struct NvmeNamespaceParams {
 
 uint16_t ms;
 uint8_t  mset;
+uint8_t  pi;
+uint8_t  pil;
 
 uint16_t mssrl;
 uint32_t mcl;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 0e4fbd6990ad..f81580353868 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -212,6 +212,42 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
 return sq->ctrl;
 }
 
+/* from Linux kernel (crypto/crct10dif_common.c) */
+static const uint16_t t10_dif_crc_table[256] = {
+0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
+
 int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 
 #endif /* HW_NVME_H */
diff --git a/include/block/nvme.h b/include/block/nvme.h
index f82b5ffc2c1d..b6a3fb5e1f0f 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -695,12 +695,17 @@ enum {
 NVME_RW_DSM_LATENCY_LOW = 3 << 4,
 NVME_RW_DSM_SEQ_REQ = 1 << 6,
 NVME_RW_DSM_COMPRESSED  = 1 << 7,
+NVME_RW_PIREMAP = 1 << 9,
 NVME_RW_PRINFO_PRACT= 1 << 13,
 NVME_RW_PRINFO_PRCHK_GUARD  = 1 << 12,
 NVME_RW_PRINFO_PRCHK_APP= 1 << 11,
 NVME_RW_PRINFO_PRCHK_REF= 1 << 10,
+NVME_RW_PRINFO_PRCHK_MASK   = 7 << 10,
+
 };
 
+#define NVME_RW_PRINFO(control) ((control >> 10) & 0xf)
+
 typedef struct QEMU_PACKED NvmeDsmCmd {
 uint8_t opcode;
 uint8_t flags;
@@ -1292,14 +1297,22 @@ typedef struct QEMU_PACKED NvmeIdNsZoned {
 #define NVME_ID_NS_DPC_TYPE_MASK0x7
 
 enum NvmeIdNsDps {
-DPS_TYPE_NONE   = 0,
-DPS_TYPE_1  = 1,
-DPS_TYPE_2  = 2,
-DPS_TYPE_3  = 3,
-DPS_TYPE_MASK   = 0x7,
-DPS_FIRST_EIGHT = 8,
+NVME_ID_NS_DPS_TYPE_NONE   = 0,
+NVME_ID_NS_DPS_TYPE_1  = 1,
+NVME_

Re: Increased execution time with TCI in latest git master (was: Re: [PULL 00/46] tcg patch queue)

2021-02-07 Thread Stefan Weil
On 07.02.21 19:37, Richard Henderson wrote:
> On 2/7/21 2:50 AM, Stefan Weil wrote:
>> Your latest code from the rth7680/tci-next branch is twice as fast as my code
>> with BIOS boot and qemu-x86_64 on sparc64. That's great.
>>
>> With that code I don't get any BIOS output at all when running qemu-i386.
>> That's not so good.
>>
>> Did I test the correct branch? If yes, I could try the same test on amd64 and
>> arm64 hosts.
> 
> Yes, tci-next is the correct branch.  I've just rebased it against master,
> which includes the first 30-odd patches.
> 
> What host do you not see bios output from qemu-system-i386 (I assume that's a
> typo above)?  I see correct output on x86_64, sparc64, ppc64le, and aarch64 
> hosts.

Right, the TCI test was done with qemu-system-i386 of course.

I repeated the TCI test with qemu-system-i386 and qemu-system-x86_64 and
the rebased branch.

The system emulation for a BIOS boot works on Apple M1 arm64 with less
that 5 s user time (similar fast as before the latest TCI changes):

./qemu-system-i386 --nographic
  4,28s user 0,03s system 37% cpu 11,398 total
./qemu-system-x86_64 --nographic
  4,39s user 0,03s system 34% cpu 12,982 total

The same test shows similar timings on an AMD64 server:

./qemu-system-i386 --nographic
 user 0m4,958s before tcg-next, 0m5,115s after tcg-next

./qemu-system-x86_64 --nographic
 user 0m4,967s before tcg-next, 0m5,263s after tcg-next

Here tci-next is slightly slower than the old code.

The results on sparc64 did not change with the rebased tci-next:
qemu-system-i386 still fails to run, and qemu-system-x86_64 takes about
20 s user time.

Stefan



[PULL 0/2] qemu-sparc queue 20210207

2021-02-07 Thread Mark Cave-Ayland
The following changes since commit 5b19cb63d9dfda41b412373b8c9fe14641bcab60:

  Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210205' into 
staging (2021-02-05 22:59:12 +)

are available in the Git repository at:

  git://github.com/mcayland/qemu.git tags/qemu-sparc-20210207

for you to fetch changes up to cdf01ca4810203e229bcac822b42eba58e1abbf9:

  utils/fifo8: add VMSTATE_FIFO8_TEST macro (2021-02-07 20:38:34 +)


qemu-sparc queue


Mark Cave-Ayland (2):
  utils/fifo8: change fatal errors from abort() to assert()
  utils/fifo8: add VMSTATE_FIFO8_TEST macro

 include/qemu/fifo8.h | 16 ++--
 util/fifo8.c | 16 
 2 files changed, 14 insertions(+), 18 deletions(-)



[PULL 2/2] utils/fifo8: add VMSTATE_FIFO8_TEST macro

2021-02-07 Thread Mark Cave-Ayland
Rewrite the existing VMSTATE_FIFO8 macro to use VMSTATE_FIFO8_TEST as per the
standard pattern in include/migration/vmstate.h.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Peter Maydell 
Message-Id: <20210128221728.14887-3-mark.cave-ayl...@ilande.co.uk>
---
 include/qemu/fifo8.h | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/qemu/fifo8.h b/include/qemu/fifo8.h
index 489c354291..28bf2cee57 100644
--- a/include/qemu/fifo8.h
+++ b/include/qemu/fifo8.h
@@ -148,12 +148,16 @@ uint32_t fifo8_num_used(Fifo8 *fifo);
 
 extern const VMStateDescription vmstate_fifo8;
 
-#define VMSTATE_FIFO8(_field, _state) {  \
-.name   = (stringify(_field)),   \
-.size   = sizeof(Fifo8), \
-.vmsd   = &vmstate_fifo8,\
-.flags  = VMS_STRUCT,\
-.offset = vmstate_offset_value(_state, _field, Fifo8),   \
+#define VMSTATE_FIFO8_TEST(_field, _state, _test) {  \
+.name = (stringify(_field)), \
+.field_exists = (_test), \
+.size = sizeof(Fifo8),   \
+.vmsd = &vmstate_fifo8,  \
+.flags= VMS_STRUCT,  \
+.offset   = vmstate_offset_value(_state, _field, Fifo8), \
 }
 
+#define VMSTATE_FIFO8(_field, _state)\
+VMSTATE_FIFO8_TEST(_field, _state, NULL)
+
 #endif /* QEMU_FIFO8_H */
-- 
2.20.1




[PULL 1/2] utils/fifo8: change fatal errors from abort() to assert()

2021-02-07 Thread Mark Cave-Ayland
Developer errors are better represented with assert() rather than abort(). Also
improve the strictness of the checks by using range checks within the assert()
rather than converting the existing equality checks to inequality checks.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Claudio Fontana 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20210121102518.20112-1-mark.cave-ayl...@ilande.co.uk>
---
 util/fifo8.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/util/fifo8.c b/util/fifo8.c
index a5dd789ce5..d4d1c135e0 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -31,9 +31,7 @@ void fifo8_destroy(Fifo8 *fifo)
 
 void fifo8_push(Fifo8 *fifo, uint8_t data)
 {
-if (fifo->num == fifo->capacity) {
-abort();
-}
+assert(fifo->num < fifo->capacity);
 fifo->data[(fifo->head + fifo->num) % fifo->capacity] = data;
 fifo->num++;
 }
@@ -42,9 +40,7 @@ void fifo8_push_all(Fifo8 *fifo, const uint8_t *data, 
uint32_t num)
 {
 uint32_t start, avail;
 
-if (fifo->num + num > fifo->capacity) {
-abort();
-}
+assert(fifo->num + num <= fifo->capacity);
 
 start = (fifo->head + fifo->num) % fifo->capacity;
 
@@ -63,9 +59,7 @@ uint8_t fifo8_pop(Fifo8 *fifo)
 {
 uint8_t ret;
 
-if (fifo->num == 0) {
-abort();
-}
+assert(fifo->num > 0);
 ret = fifo->data[fifo->head++];
 fifo->head %= fifo->capacity;
 fifo->num--;
@@ -76,9 +70,7 @@ const uint8_t *fifo8_pop_buf(Fifo8 *fifo, uint32_t max, 
uint32_t *num)
 {
 uint8_t *ret;
 
-if (max == 0 || max > fifo->num) {
-abort();
-}
+assert(max > 0 && max <= fifo->num);
 *num = MIN(fifo->capacity - fifo->head, max);
 ret = &fifo->data[fifo->head];
 fifo->head += *num;
-- 
2.20.1




[RFC PATCH 0/6] exec: Remove "tcg/tcg.h" from "exec/cpu_ldst.h"

2021-02-07 Thread Philippe Mathieu-Daudé
Hi,

I wondered why changing something in "tcg/tcg.h" would trigger
rebuilding the whole tree and figured the inclusion in
"exec/cpu_ldst.h".

By making tlb_addr_write() static to accel/tcg/cputlb.c we can
remove the "tcg/tcg.h" inclusion and reduce the number of objects
to rebuild.

I added tlb_assert_iotlb_entry_for_ptr_present() but there is
this comment in target/arm/mte_helper.c which I don't understand
much (so have no clue how to fix this TODO) but I suppose this
would be to add a proper implementation and not need this ugly
tlb_assert_iotlb_entry_for_ptr_present():

 * TODO: Perhaps there should be a cputlb helper that returns a
 * matching tlb entry + iotlb entry.

Regards,

Phil.

Philippe Mathieu-Daudé (6):
  target: Replace tcg_debug_assert() by assert()
  target/m68k: Include missing "tcg/tcg.h" header
  target/mips: Include missing "tcg/tcg.h" header
  accel/tcg: Include missing "tcg/tcg.h" header
  accel/tcg: Refactor debugging tlb_assert_iotlb_entry_for_ptr_present()
  exec/cpu_ldst: Move tlb* declarations to "exec/exec-all.h"

 include/exec/cpu_ldst.h | 52 -
 include/exec/exec-all.h | 47 ++
 target/arm/translate.h  |  4 +-
 accel/tcg/cputlb.c  | 23 +++
 accel/tcg/tcg-accel-ops-mttcg.c |  1 +
 accel/tcg/tcg-accel-ops-rr.c|  1 +
 target/arm/mte_helper.c | 15 ++-
 target/arm/sve_helper.c | 18 +++--
 target/arm/translate-a64.c  | 12 +++---
 target/arm/translate-sve.c  |  4 +-
 target/arm/translate.c  | 36 -
 target/hppa/translate.c |  4 +-
 target/m68k/op_helper.c |  1 +
 target/mips/msa_helper.c|  1 +
 target/rx/op_helper.c   |  6 +--
 target/rx/translate.c   | 14 +++
 target/sh4/translate.c  |  4 +-
 target/riscv/insn_trans/trans_rvv.c.inc |  2 +-
 18 files changed, 127 insertions(+), 118 deletions(-)

-- 
2.26.2




[PATCH 6/6] exec/cpu_ldst: Move tlb* declarations to "exec/exec-all.h"

2021-02-07 Thread Philippe Mathieu-Daudé
Keep MMU functions in "exec/cpu_ldst.h", and move TLB functions
to "exec/exec-all.h". As tlb_addr_write() is only called in
accel/tcg/cputlb.c, make move it there as a static function.

Doing so we removed the "tcg/tcg.h" dependency on "exec/cpu_ldst.h".

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/exec/cpu_ldst.h | 52 -
 include/exec/exec-all.h | 38 ++
 accel/tcg/cputlb.c  |  9 +++
 3 files changed, 47 insertions(+), 52 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index ef54cb7e1f8..cb0a096497f 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -291,34 +291,6 @@ static inline void cpu_stq_le_mmuidx_ra(CPUArchState *env, 
abi_ptr addr,
 
 #else
 
-/* Needed for TCG_OVERSIZED_GUEST */
-#include "tcg/tcg.h"
-
-static inline target_ulong tlb_addr_write(const CPUTLBEntry *entry)
-{
-#if TCG_OVERSIZED_GUEST
-return entry->addr_write;
-#else
-return qatomic_read(&entry->addr_write);
-#endif
-}
-
-/* Find the TLB index corresponding to the mmu_idx + address pair.  */
-static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
-  target_ulong addr)
-{
-uintptr_t size_mask = env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS;
-
-return (addr >> TARGET_PAGE_BITS) & size_mask;
-}
-
-/* Find the TLB entry corresponding to the mmu_idx + address pair.  */
-static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
- target_ulong addr)
-{
-return &env_tlb(env)->f[mmu_idx].table[tlb_index(env, mmu_idx, addr)];
-}
-
 uint32_t cpu_ldub_mmuidx_ra(CPUArchState *env, abi_ptr addr,
 int mmu_idx, uintptr_t ra);
 int cpu_ldsb_mmuidx_ra(CPUArchState *env, abi_ptr addr,
@@ -422,28 +394,4 @@ static inline int cpu_ldsw_code(CPUArchState *env, abi_ptr 
addr)
 return (int16_t)cpu_lduw_code(env, addr);
 }
 
-/**
- * tlb_vaddr_to_host:
- * @env: CPUArchState
- * @addr: guest virtual address to look up
- * @access_type: 0 for read, 1 for write, 2 for execute
- * @mmu_idx: MMU index to use for lookup
- *
- * Look up the specified guest virtual index in the TCG softmmu TLB.
- * If we can translate a host virtual address suitable for direct RAM
- * access, without causing a guest exception, then return it.
- * Otherwise (TLB entry is for an I/O access, guest software
- * TLB fill required, etc) return NULL.
- */
-#ifdef CONFIG_USER_ONLY
-static inline void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
-  MMUAccessType access_type, int mmu_idx)
-{
-return g2h(addr);
-}
-#else
-void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
-MMUAccessType access_type, int mmu_idx);
-#endif
-
 #endif /* CPU_LDST_H */
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index c5e8e355b7f..5024b9abd4a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -297,6 +297,38 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
   hwaddr paddr, int prot,
   int mmu_idx, target_ulong size);
 
+/**
+ * tlb_vaddr_to_host:
+ * @env: CPUArchState
+ * @addr: guest virtual address to look up
+ * @access_type: 0 for read, 1 for write, 2 for execute
+ * @mmu_idx: MMU index to use for lookup
+ *
+ * Look up the specified guest virtual index in the TCG softmmu TLB.
+ * If we can translate a host virtual address suitable for direct RAM
+ * access, without causing a guest exception, then return it.
+ * Otherwise (TLB entry is for an I/O access, guest software
+ * TLB fill required, etc) return NULL.
+ */
+void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
+MMUAccessType access_type, int mmu_idx);
+
+/* Find the TLB index corresponding to the mmu_idx + address pair.  */
+static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
+  target_ulong addr)
+{
+uintptr_t size_mask = env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS;
+
+return (addr >> TARGET_PAGE_BITS) & size_mask;
+}
+
+/* Find the TLB entry corresponding to the mmu_idx + address pair.  */
+static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
+ target_ulong addr)
+{
+return &env_tlb(env)->f[mmu_idx].table[tlb_index(env, mmu_idx, addr)];
+}
+
 /*
  * Find the iotlbentry for ptr.  This *must* be present in the TLB
  * because we just found the mapping.
@@ -374,6 +406,12 @@ tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState 
*cpu, target_ulong addr,
   uint16_t idxmap, unsigned bits)
 {
 }
+
+static inline void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
+  MMUAccessType access_type, int mmu_idx)
+{
+return g2h(addr);
+}
 #endif
 /**
  * probe_access:
diff --git a/accel

[PATCH 2/6] target/m68k: Include missing "tcg/tcg.h" header

2021-02-07 Thread Philippe Mathieu-Daudé
Commit 14f944063af ("target-m68k: add cas/cas2 ops") introduced
use of typedef/prototypes declared in "tcg/tcg.h" without including
it. This was not a problem because "tcg/tcg.h" is pulled in by
"exec/cpu_ldst.h". To be able to remove this header there, we
first need to include it here in op_helper.c, else we get:

  [953/1018] Compiling C object 
libqemu-m68k-softmmu.fa.p/target_m68k_op_helper.c.o
  target/m68k/op_helper.c: In function ‘do_cas2l’:
  target/m68k/op_helper.c:774:5: error: unknown type name ‘TCGMemOpIdx’
774 | TCGMemOpIdx oi;
| ^~~
  target/m68k/op_helper.c:787:18: error: implicit declaration of function 
‘make_memop_idx’ [-Werror=implicit-function-declaration]
787 | oi = make_memop_idx(MO_BEQ, mmu_idx);
|  ^~
  target/m68k/op_helper.c:787:18: error: nested extern declaration of 
‘make_memop_idx’ [-Werror=nested-externs]
  target/m68k/op_helper.c:788:17: error: implicit declaration of function 
‘helper_atomic_cmpxchgq_be_mmu’; did you mean ‘helper_atomic_cmpxchgq_be’? 
[-Werror=implicit-function-declaration]
788 | l = helper_atomic_cmpxchgq_be_mmu(env, a1, c, u, oi, ra);
| ^
| helper_atomic_cmpxchgq_be
  target/m68k/op_helper.c:788:17: error: nested extern declaration of 
‘helper_atomic_cmpxchgq_be_mmu’ [-Werror=nested-externs]
  cc1: all warnings being treated as errors

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/m68k/op_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index 202498deb51..36b68fd318f 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -18,6 +18,7 @@
  */
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "tcg/tcg.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
-- 
2.26.2




[RFC PATCH 1/6] target: Replace tcg_debug_assert() by assert()

2021-02-07 Thread Philippe Mathieu-Daudé
Since commit 262a69f4282 ("osdep.h: Prohibit disabling assert()
in supported builds") we can not build QEMU with assert() disabled.

tcg_debug_assert() does nothing until QEMU is configured with
--enable-debug-tcg.

Since there is no obvious logic whether to use tcg_debug_assert()
or assert() for files under target/, simplify by using plain
assert() everywhere. Keep tcg_debug_assert() for the tcg/ and
accel/ directories.

Patch created mechanically using:

  $ sed -i s/tcg_debug_assert/assert/ \
  $(git grep -l tcg_debug_assert target/)

Signed-off-by: Philippe Mathieu-Daudé 
---
If there is a logic, we should document it, and include "tcg/tcg.h"
in these files.
---
 target/arm/translate.h  |  4 +--
 target/arm/mte_helper.c |  4 +--
 target/arm/sve_helper.c |  8 +++---
 target/arm/translate-a64.c  | 12 -
 target/arm/translate-sve.c  |  4 +--
 target/arm/translate.c  | 36 -
 target/hppa/translate.c |  4 +--
 target/rx/op_helper.c   |  6 ++---
 target/rx/translate.c   | 14 +-
 target/sh4/translate.c  |  4 +--
 target/riscv/insn_trans/trans_rvv.c.inc |  2 +-
 11 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index 423b0e08df0..e2ddf87629c 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -220,7 +220,7 @@ static inline void set_pstate_bits(uint32_t bits)
 {
 TCGv_i32 p = tcg_temp_new_i32();
 
-tcg_debug_assert(!(bits & CACHED_PSTATE_BITS));
+assert(!(bits & CACHED_PSTATE_BITS));
 
 tcg_gen_ld_i32(p, cpu_env, offsetof(CPUARMState, pstate));
 tcg_gen_ori_i32(p, p, bits);
@@ -233,7 +233,7 @@ static inline void clear_pstate_bits(uint32_t bits)
 {
 TCGv_i32 p = tcg_temp_new_i32();
 
-tcg_debug_assert(!(bits & CACHED_PSTATE_BITS));
+assert(!(bits & CACHED_PSTATE_BITS));
 
 tcg_gen_ld_i32(p, cpu_env, offsetof(CPUARMState, pstate));
 tcg_gen_andi_i32(p, p, ~bits);
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 153bd1e9df8..6cea9d1b506 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -166,8 +166,8 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
  * not set in the cputlb lookup above.
  */
 mr = memory_region_from_host(host, &ptr_ra);
-tcg_debug_assert(mr != NULL);
-tcg_debug_assert(memory_region_is_ram(mr));
+assert(mr != NULL);
+assert(memory_region_is_ram(mr));
 ptr_paddr = ptr_ra;
 do {
 ptr_paddr += mr->addr;
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 844db08bd57..c8cdf7618eb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4030,7 +4030,7 @@ static intptr_t find_next_active(uint64_t *vg, intptr_t 
reg_off,
 reg_off += ctz64(pg);
 
 /* We should never see an out of range predicate bit set.  */
-tcg_debug_assert(reg_off < reg_max);
+assert(reg_off < reg_max);
 return reg_off;
 }
 
@@ -4186,7 +4186,7 @@ static bool sve_cont_ldst_elements(SVEContLdSt *info, 
target_ulong addr,
 /* No active elements, no pages touched. */
 return false;
 }
-tcg_debug_assert(reg_off_last >= 0 && reg_off_last < reg_max);
+assert(reg_off_last >= 0 && reg_off_last < reg_max);
 
 info->reg_off_first[0] = reg_off_first;
 info->mem_off_first[0] = (reg_off_first >> esz) * msize;
@@ -4235,7 +4235,7 @@ static bool sve_cont_ldst_elements(SVEContLdSt *info, 
target_ulong addr,
  * this may affect the address reported in an exception.
  */
 reg_off_split = find_next_active(vg, reg_off_split, reg_max, esz);
-tcg_debug_assert(reg_off_split <= reg_off_last);
+assert(reg_off_split <= reg_off_last);
 info->reg_off_first[1] = reg_off_split;
 info->mem_off_first[1] = (reg_off_split >> esz) * msize;
 info->reg_off_last[1] = reg_off_last;
@@ -4794,7 +4794,7 @@ void sve_ldnfff1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 /* Probe the page(s). */
 if (!sve_cont_ldst_pages(&info, fault, env, addr, MMU_DATA_LOAD, retaddr)) 
{
 /* Fault on first element. */
-tcg_debug_assert(fault == FAULT_NO);
+assert(fault == FAULT_NO);
 memset(vd, 0, reg_max);
 goto do_fault;
 }
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index ffc060e5d70..f570506133c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -144,7 +144,7 @@ static void set_btype(DisasContext *s, int val)
 TCGv_i32 tcg_val;
 
 /* BTYPE is a 2-bit field, and 0 should be done with reset_btype.  */
-tcg_debug_assert(val >= 1 && val <= 3);
+assert(val >= 1 && val <= 3);
 
 tcg_val = tcg_const_i32(val);
 tcg_gen_st_i32(tcg_val, cpu_env, offsetof(CPUARMState, btype));
@@ -10659,7 +10659,7 @@ static void handle_vec_simd_shri(Disas

[PATCH 3/6] target/mips: Include missing "tcg/tcg.h" header

2021-02-07 Thread Philippe Mathieu-Daudé
Commit 83be6b54123 ("Fix MSA instructions LD. on big endian
host") introduced use of typedef/prototypes declared in "tcg/tcg.h"
without including it. This was not a problem because "tcg/tcg.h" is
pulled in by "exec/cpu_ldst.h". To be able to remove this header
there, we first need to include it here in op_helper.c, else we get:

  [222/337] Compiling C object 
libqemu-mips-softmmu.fa.p/target_mips_msa_helper.c.o
  target/mips/msa_helper.c: In function ‘helper_msa_ld_b’:
  target/mips/msa_helper.c:8214:9: error: unknown type name ‘TCGMemOpIdx’
   8214 | TCGMemOpIdx oi = make_memop_idx(MO_TE | DF | MO_UNALN,  \
| ^~~
  target/mips/msa_helper.c:8224:5: note: in expansion of macro ‘MEMOP_IDX’
   8224 | MEMOP_IDX(DF_BYTE)
| ^
  target/mips/msa_helper.c:8214:26: error: implicit declaration of function 
‘make_memop_idx’ [-Werror=implicit-function-declaration]
   8214 | TCGMemOpIdx oi = make_memop_idx(MO_TE | DF | MO_UNALN,  \
|  ^~
  target/mips/msa_helper.c:8227:18: error: implicit declaration of function 
‘helper_ret_ldub_mmu’ [-Werror=implicit-function-declaration]
   8227 | pwd->b[0]  = helper_ret_ldub_mmu(env, addr + (0  << DF_BYTE), oi, 
GETPC());
|  ^~~
  cc1: all warnings being treated as errors

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/msa_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index 1298a1917ce..4caefe29ad7 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "internal.h"
+#include "tcg/tcg.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
 #include "exec/memop.h"
-- 
2.26.2




[PATCH 4/6] accel/tcg: Include missing "tcg/tcg.h" header

2021-02-07 Thread Philippe Mathieu-Daudé
Commit 3468b59e18b ("tcg: enable multiple TCG contexts in softmmu")
introduced use of typedef/prototypes declared in "tcg/tcg.h" without
including it. This was not a problem because "tcg/tcg.h" is pulled
in by "exec/cpu_ldst.h". To be able to remove this header there, we
first need to include it here in op_helper.c, else we get:

  accel/tcg/tcg-accel-ops-mttcg.c: In function ‘mttcg_cpu_thread_fn’:
  accel/tcg/tcg-accel-ops-mttcg.c:52:5: error: implicit declaration of function 
‘tcg_register_thread’; did you mean ‘rcu_register_thread’? 
[-Werror=implicit-function-declaration]
 52 | tcg_register_thread();
| ^~~
| rcu_register_thread
  accel/tcg/tcg-accel-ops-mttcg.c:52:5: error: nested extern declaration of 
‘tcg_register_thread’ [-Werror=nested-externs]
  cc1: all warnings being treated as errors

  accel/tcg/tcg-accel-ops-rr.c: In function ‘rr_cpu_thread_fn’:
  accel/tcg/tcg-accel-ops-rr.c:153:5: error: implicit declaration of function 
‘tcg_register_thread’; did you mean ‘rcu_register_thread’? 
[-Werror=implicit-function-declaration]
153 | tcg_register_thread();
| ^~~
| rcu_register_thread
  accel/tcg/tcg-accel-ops-rr.c:153:5: error: nested extern declaration of 
‘tcg_register_thread’ [-Werror=nested-externs]
  cc1: all warnings being treated as errors

Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/tcg/tcg-accel-ops-mttcg.c | 1 +
 accel/tcg/tcg-accel-ops-rr.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index 42973fb062b..ddbca6c5b8c 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -32,6 +32,7 @@
 #include "exec/exec-all.h"
 #include "hw/boards.h"
 
+#include "tcg/tcg.h"
 #include "tcg-accel-ops.h"
 #include "tcg-accel-ops-mttcg.h"
 
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 4a66055e0d7..1bb1d0f8f1c 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -32,6 +32,7 @@
 #include "exec/exec-all.h"
 #include "hw/boards.h"
 
+#include "tcg/tcg.h"
 #include "tcg-accel-ops.h"
 #include "tcg-accel-ops-rr.h"
 #include "tcg-accel-ops-icount.h"
-- 
2.26.2




[RFC PATCH 5/6] accel/tcg: Refactor debugging tlb_assert_iotlb_entry_for_ptr_present()

2021-02-07 Thread Philippe Mathieu-Daudé
Refactor debug code as tlb_assert_iotlb_entry_for_ptr_present() helper.

Signed-off-by: Philippe Mathieu-Daudé 
---
What this code does is out of my league, but refactoring it allow
keeping tlb_addr_write() local to accel/tcg/cputlb.c in the next
patch.
---
 include/exec/exec-all.h |  9 +
 accel/tcg/cputlb.c  | 14 ++
 target/arm/mte_helper.c | 11 ++-
 target/arm/sve_helper.c | 10 ++
 4 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index f933c74c446..c5e8e355b7f 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -296,6 +296,15 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 void tlb_set_page(CPUState *cpu, target_ulong vaddr,
   hwaddr paddr, int prot,
   int mmu_idx, target_ulong size);
+
+/*
+ * Find the iotlbentry for ptr.  This *must* be present in the TLB
+ * because we just found the mapping.
+ */
+void tlb_assert_iotlb_entry_for_ptr_present(CPUArchState *env, int ptr_mmu_idx,
+uint64_t ptr,
+MMUAccessType ptr_access,
+uintptr_t index);
 #else
 static inline void tlb_init(CPUState *cpu)
 {
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 8a7b779270a..a6247da34a0 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -429,6 +429,20 @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
 tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS);
 }
 
+void tlb_assert_iotlb_entry_for_ptr_present(CPUArchState *env, int ptr_mmu_idx,
+uint64_t ptr,
+MMUAccessType ptr_access,
+uintptr_t index)
+{
+#ifdef CONFIG_DEBUG_TCG
+CPUTLBEntry *entry = tlb_entry(env, ptr_mmu_idx, ptr);
+target_ulong comparator = (ptr_access == MMU_DATA_LOAD
+   ? entry->addr_read
+   : tlb_addr_write(entry));
+g_assert(tlb_hit(comparator, ptr));
+#endif
+}
+
 static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
   target_ulong page, target_ulong mask)
 {
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 6cea9d1b506..f47d3b4570e 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -111,15 +111,8 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
  * matching tlb entry + iotlb entry.
  */
 index = tlb_index(env, ptr_mmu_idx, ptr);
-# ifdef CONFIG_DEBUG_TCG
-{
-CPUTLBEntry *entry = tlb_entry(env, ptr_mmu_idx, ptr);
-target_ulong comparator = (ptr_access == MMU_DATA_LOAD
-   ? entry->addr_read
-   : tlb_addr_write(entry));
-g_assert(tlb_hit(comparator, ptr));
-}
-# endif
+tlb_assert_iotlb_entry_for_ptr_present(env, ptr_mmu_idx, ptr,
+   ptr_access, index);
 iotlbentry = &env_tlb(env)->d[ptr_mmu_idx].iotlb[index];
 
 /* If the virtual page MemAttr != Tagged, access unchecked. */
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index c8cdf7618eb..a5708da0f2f 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4089,14 +4089,8 @@ static bool sve_probe_page(SVEHostPage *info, bool 
nofault,
 {
 uintptr_t index = tlb_index(env, mmu_idx, addr);
 
-# ifdef CONFIG_DEBUG_TCG
-CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-target_ulong comparator = (access_type == MMU_DATA_LOAD
-   ? entry->addr_read
-   : tlb_addr_write(entry));
-g_assert(tlb_hit(comparator, addr));
-# endif
-
+tlb_assert_iotlb_entry_for_ptr_present(env, mmu_idx, addr,
+   access_type, index);
 CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
 info->attrs = iotlbentry->attrs;
 }
-- 
2.26.2




Re: [PATCH 6/6] exec/cpu_ldst: Move tlb* declarations to "exec/exec-all.h"

2021-02-07 Thread Philippe Mathieu-Daudé
On 2/7/21 11:57 PM, Philippe Mathieu-Daudé wrote:
> Keep MMU functions in "exec/cpu_ldst.h", and move TLB functions
> to "exec/exec-all.h". As tlb_addr_write() is only called in
> accel/tcg/cputlb.c, make move it there as a static function.
> 
> Doing so we removed the "tcg/tcg.h" dependency on "exec/cpu_ldst.h".
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/exec/cpu_ldst.h | 52 -
>  include/exec/exec-all.h | 38 ++
>  accel/tcg/cputlb.c  |  9 +++
>  3 files changed, 47 insertions(+), 52 deletions(-)

> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index c5e8e355b7f..5024b9abd4a 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -297,6 +297,38 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>hwaddr paddr, int prot,
>int mmu_idx, target_ulong size);
>  
> +/**
> + * tlb_vaddr_to_host:
> + * @env: CPUArchState
> + * @addr: guest virtual address to look up
> + * @access_type: 0 for read, 1 for write, 2 for execute
> + * @mmu_idx: MMU index to use for lookup
> + *
> + * Look up the specified guest virtual index in the TCG softmmu TLB.
> + * If we can translate a host virtual address suitable for direct RAM
> + * access, without causing a guest exception, then return it.
> + * Otherwise (TLB entry is for an I/O access, guest software
> + * TLB fill required, etc) return NULL.
> + */
> +void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
> +MMUAccessType access_type, int mmu_idx);

Non-TCG build failure because abi_ptr is defined in "exec/cpu_ldst.h":

  typedef target_ulong abi_ptr;



[RFC PATCH v2 0/6] exec: Remove "tcg/tcg.h" from "exec/cpu_ldst.h"

2021-02-07 Thread Philippe Mathieu-Daudé
Since v1:
- Do not move tlb_vaddr_to_host()

Hi,

I wondered why changing something in "tcg/tcg.h" would trigger
rebuilding the whole tree and figured the inclusion in
"exec/cpu_ldst.h".

By making tlb_addr_write() static to accel/tcg/cputlb.c we can
remove the "tcg/tcg.h" inclusion and reduce the number of objects
to rebuild.

I added tlb_assert_iotlb_entry_for_ptr_present() but there is
this comment in target/arm/mte_helper.c which I don't understand
much (so have no clue how to fix this TODO) but I suppose this
would be to add a proper implementation and not need this ugly
tlb_assert_iotlb_entry_for_ptr_present():

 * TODO: Perhaps there should be a cputlb helper that returns a
 * matching tlb entry + iotlb entry.

Regards,

Phil.

Philippe Mathieu-Daudé (6):
  target: Replace tcg_debug_assert() by assert()
  target/m68k: Include missing "tcg/tcg.h" header
  target/mips: Include missing "tcg/tcg.h" header
  accel/tcg: Include missing "tcg/tcg.h" header
  accel/tcg: Refactor debugging tlb_assert_iotlb_entry_for_ptr_present()
  exec/cpu_ldst: Move tlb* declarations to "exec/exec-all.h"

 include/exec/cpu_ldst.h | 28 ---
 include/exec/exec-all.h | 25 +
 target/arm/translate.h  |  4 +--
 accel/tcg/cputlb.c  | 23 
 accel/tcg/tcg-accel-ops-mttcg.c |  1 +
 accel/tcg/tcg-accel-ops-rr.c|  1 +
 target/arm/mte_helper.c | 15 +++
 target/arm/sve_helper.c | 18 +
 target/arm/translate-a64.c  | 12 -
 target/arm/translate-sve.c  |  4 +--
 target/arm/translate.c  | 36 -
 target/hppa/translate.c |  4 +--
 target/m68k/op_helper.c |  1 +
 target/mips/msa_helper.c|  1 +
 target/rx/op_helper.c   |  6 ++---
 target/rx/translate.c   | 14 +-
 target/sh4/translate.c  |  4 +--
 target/riscv/insn_trans/trans_rvv.c.inc |  2 +-
 18 files changed, 105 insertions(+), 94 deletions(-)

-- 
2.26.2




[PATCH v2 3/6] target/mips: Include missing "tcg/tcg.h" header

2021-02-07 Thread Philippe Mathieu-Daudé
Commit 83be6b54123 ("Fix MSA instructions LD. on big endian
host") introduced use of typedef/prototypes declared in "tcg/tcg.h"
without including it. This was not a problem because "tcg/tcg.h" is
pulled in by "exec/cpu_ldst.h". To be able to remove this header
there, we first need to include it here in op_helper.c, else we get:

  [222/337] Compiling C object 
libqemu-mips-softmmu.fa.p/target_mips_msa_helper.c.o
  target/mips/msa_helper.c: In function ‘helper_msa_ld_b’:
  target/mips/msa_helper.c:8214:9: error: unknown type name ‘TCGMemOpIdx’
   8214 | TCGMemOpIdx oi = make_memop_idx(MO_TE | DF | MO_UNALN,  \
| ^~~
  target/mips/msa_helper.c:8224:5: note: in expansion of macro ‘MEMOP_IDX’
   8224 | MEMOP_IDX(DF_BYTE)
| ^
  target/mips/msa_helper.c:8214:26: error: implicit declaration of function 
‘make_memop_idx’ [-Werror=implicit-function-declaration]
   8214 | TCGMemOpIdx oi = make_memop_idx(MO_TE | DF | MO_UNALN,  \
|  ^~
  target/mips/msa_helper.c:8227:18: error: implicit declaration of function 
‘helper_ret_ldub_mmu’ [-Werror=implicit-function-declaration]
   8227 | pwd->b[0]  = helper_ret_ldub_mmu(env, addr + (0  << DF_BYTE), oi, 
GETPC());
|  ^~~
  cc1: all warnings being treated as errors

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/msa_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index 1298a1917ce..4caefe29ad7 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "internal.h"
+#include "tcg/tcg.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
 #include "exec/memop.h"
-- 
2.26.2




[PATCH v2 2/6] target/m68k: Include missing "tcg/tcg.h" header

2021-02-07 Thread Philippe Mathieu-Daudé
Commit 14f944063af ("target-m68k: add cas/cas2 ops") introduced
use of typedef/prototypes declared in "tcg/tcg.h" without including
it. This was not a problem because "tcg/tcg.h" is pulled in by
"exec/cpu_ldst.h". To be able to remove this header there, we
first need to include it here in op_helper.c, else we get:

  [953/1018] Compiling C object 
libqemu-m68k-softmmu.fa.p/target_m68k_op_helper.c.o
  target/m68k/op_helper.c: In function ‘do_cas2l’:
  target/m68k/op_helper.c:774:5: error: unknown type name ‘TCGMemOpIdx’
774 | TCGMemOpIdx oi;
| ^~~
  target/m68k/op_helper.c:787:18: error: implicit declaration of function 
‘make_memop_idx’ [-Werror=implicit-function-declaration]
787 | oi = make_memop_idx(MO_BEQ, mmu_idx);
|  ^~
  target/m68k/op_helper.c:787:18: error: nested extern declaration of 
‘make_memop_idx’ [-Werror=nested-externs]
  target/m68k/op_helper.c:788:17: error: implicit declaration of function 
‘helper_atomic_cmpxchgq_be_mmu’; did you mean ‘helper_atomic_cmpxchgq_be’? 
[-Werror=implicit-function-declaration]
788 | l = helper_atomic_cmpxchgq_be_mmu(env, a1, c, u, oi, ra);
| ^
| helper_atomic_cmpxchgq_be
  target/m68k/op_helper.c:788:17: error: nested extern declaration of 
‘helper_atomic_cmpxchgq_be_mmu’ [-Werror=nested-externs]
  cc1: all warnings being treated as errors

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/m68k/op_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index 202498deb51..36b68fd318f 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -18,6 +18,7 @@
  */
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "tcg/tcg.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
-- 
2.26.2




[RFC PATCH v2 1/6] target: Replace tcg_debug_assert() by assert()

2021-02-07 Thread Philippe Mathieu-Daudé
Since commit 262a69f4282 ("osdep.h: Prohibit disabling assert()
in supported builds") we can not build QEMU with assert() disabled.

tcg_debug_assert() does nothing until QEMU is configured with
--enable-debug-tcg.

Since there is no obvious logic whether to use tcg_debug_assert()
or assert() for files under target/, simplify by using plain
assert() everywhere. Keep tcg_debug_assert() for the tcg/ and
accel/ directories.

Patch created mechanically using:

  $ sed -i s/tcg_debug_assert/assert/ \
  $(git grep -l tcg_debug_assert target/)

Signed-off-by: Philippe Mathieu-Daudé 
---
If there is a logic, we should document it, and include "tcg/tcg.h"
in these files.
---
 target/arm/translate.h  |  4 +--
 target/arm/mte_helper.c |  4 +--
 target/arm/sve_helper.c |  8 +++---
 target/arm/translate-a64.c  | 12 -
 target/arm/translate-sve.c  |  4 +--
 target/arm/translate.c  | 36 -
 target/hppa/translate.c |  4 +--
 target/rx/op_helper.c   |  6 ++---
 target/rx/translate.c   | 14 +-
 target/sh4/translate.c  |  4 +--
 target/riscv/insn_trans/trans_rvv.c.inc |  2 +-
 11 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index 423b0e08df0..e2ddf87629c 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -220,7 +220,7 @@ static inline void set_pstate_bits(uint32_t bits)
 {
 TCGv_i32 p = tcg_temp_new_i32();
 
-tcg_debug_assert(!(bits & CACHED_PSTATE_BITS));
+assert(!(bits & CACHED_PSTATE_BITS));
 
 tcg_gen_ld_i32(p, cpu_env, offsetof(CPUARMState, pstate));
 tcg_gen_ori_i32(p, p, bits);
@@ -233,7 +233,7 @@ static inline void clear_pstate_bits(uint32_t bits)
 {
 TCGv_i32 p = tcg_temp_new_i32();
 
-tcg_debug_assert(!(bits & CACHED_PSTATE_BITS));
+assert(!(bits & CACHED_PSTATE_BITS));
 
 tcg_gen_ld_i32(p, cpu_env, offsetof(CPUARMState, pstate));
 tcg_gen_andi_i32(p, p, ~bits);
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 153bd1e9df8..6cea9d1b506 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -166,8 +166,8 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
  * not set in the cputlb lookup above.
  */
 mr = memory_region_from_host(host, &ptr_ra);
-tcg_debug_assert(mr != NULL);
-tcg_debug_assert(memory_region_is_ram(mr));
+assert(mr != NULL);
+assert(memory_region_is_ram(mr));
 ptr_paddr = ptr_ra;
 do {
 ptr_paddr += mr->addr;
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 844db08bd57..c8cdf7618eb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4030,7 +4030,7 @@ static intptr_t find_next_active(uint64_t *vg, intptr_t 
reg_off,
 reg_off += ctz64(pg);
 
 /* We should never see an out of range predicate bit set.  */
-tcg_debug_assert(reg_off < reg_max);
+assert(reg_off < reg_max);
 return reg_off;
 }
 
@@ -4186,7 +4186,7 @@ static bool sve_cont_ldst_elements(SVEContLdSt *info, 
target_ulong addr,
 /* No active elements, no pages touched. */
 return false;
 }
-tcg_debug_assert(reg_off_last >= 0 && reg_off_last < reg_max);
+assert(reg_off_last >= 0 && reg_off_last < reg_max);
 
 info->reg_off_first[0] = reg_off_first;
 info->mem_off_first[0] = (reg_off_first >> esz) * msize;
@@ -4235,7 +4235,7 @@ static bool sve_cont_ldst_elements(SVEContLdSt *info, 
target_ulong addr,
  * this may affect the address reported in an exception.
  */
 reg_off_split = find_next_active(vg, reg_off_split, reg_max, esz);
-tcg_debug_assert(reg_off_split <= reg_off_last);
+assert(reg_off_split <= reg_off_last);
 info->reg_off_first[1] = reg_off_split;
 info->mem_off_first[1] = (reg_off_split >> esz) * msize;
 info->reg_off_last[1] = reg_off_last;
@@ -4794,7 +4794,7 @@ void sve_ldnfff1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 /* Probe the page(s). */
 if (!sve_cont_ldst_pages(&info, fault, env, addr, MMU_DATA_LOAD, retaddr)) 
{
 /* Fault on first element. */
-tcg_debug_assert(fault == FAULT_NO);
+assert(fault == FAULT_NO);
 memset(vd, 0, reg_max);
 goto do_fault;
 }
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index ffc060e5d70..f570506133c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -144,7 +144,7 @@ static void set_btype(DisasContext *s, int val)
 TCGv_i32 tcg_val;
 
 /* BTYPE is a 2-bit field, and 0 should be done with reset_btype.  */
-tcg_debug_assert(val >= 1 && val <= 3);
+assert(val >= 1 && val <= 3);
 
 tcg_val = tcg_const_i32(val);
 tcg_gen_st_i32(tcg_val, cpu_env, offsetof(CPUARMState, btype));
@@ -10659,7 +10659,7 @@ static void handle_vec_simd_shri(Disas

[PATCH v2 4/6] accel/tcg: Include missing "tcg/tcg.h" header

2021-02-07 Thread Philippe Mathieu-Daudé
Commit 3468b59e18b ("tcg: enable multiple TCG contexts in softmmu")
introduced use of typedef/prototypes declared in "tcg/tcg.h" without
including it. This was not a problem because "tcg/tcg.h" is pulled
in by "exec/cpu_ldst.h". To be able to remove this header there, we
first need to include it here in op_helper.c, else we get:

  accel/tcg/tcg-accel-ops-mttcg.c: In function ‘mttcg_cpu_thread_fn’:
  accel/tcg/tcg-accel-ops-mttcg.c:52:5: error: implicit declaration of function 
‘tcg_register_thread’; did you mean ‘rcu_register_thread’? 
[-Werror=implicit-function-declaration]
 52 | tcg_register_thread();
| ^~~
| rcu_register_thread
  accel/tcg/tcg-accel-ops-mttcg.c:52:5: error: nested extern declaration of 
‘tcg_register_thread’ [-Werror=nested-externs]
  cc1: all warnings being treated as errors

  accel/tcg/tcg-accel-ops-rr.c: In function ‘rr_cpu_thread_fn’:
  accel/tcg/tcg-accel-ops-rr.c:153:5: error: implicit declaration of function 
‘tcg_register_thread’; did you mean ‘rcu_register_thread’? 
[-Werror=implicit-function-declaration]
153 | tcg_register_thread();
| ^~~
| rcu_register_thread
  accel/tcg/tcg-accel-ops-rr.c:153:5: error: nested extern declaration of 
‘tcg_register_thread’ [-Werror=nested-externs]
  cc1: all warnings being treated as errors

Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/tcg/tcg-accel-ops-mttcg.c | 1 +
 accel/tcg/tcg-accel-ops-rr.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index 42973fb062b..ddbca6c5b8c 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -32,6 +32,7 @@
 #include "exec/exec-all.h"
 #include "hw/boards.h"
 
+#include "tcg/tcg.h"
 #include "tcg-accel-ops.h"
 #include "tcg-accel-ops-mttcg.h"
 
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 4a66055e0d7..1bb1d0f8f1c 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -32,6 +32,7 @@
 #include "exec/exec-all.h"
 #include "hw/boards.h"
 
+#include "tcg/tcg.h"
 #include "tcg-accel-ops.h"
 #include "tcg-accel-ops-rr.h"
 #include "tcg-accel-ops-icount.h"
-- 
2.26.2




[RFC PATCH v2 5/6] accel/tcg: Refactor debugging tlb_assert_iotlb_entry_for_ptr_present()

2021-02-07 Thread Philippe Mathieu-Daudé
Refactor debug code as tlb_assert_iotlb_entry_for_ptr_present() helper.

Signed-off-by: Philippe Mathieu-Daudé 
---
What this code does is out of my league, but refactoring it allow
keeping tlb_addr_write() local to accel/tcg/cputlb.c in the next
patch.
---
 include/exec/exec-all.h |  9 +
 accel/tcg/cputlb.c  | 14 ++
 target/arm/mte_helper.c | 11 ++-
 target/arm/sve_helper.c | 10 ++
 4 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index f933c74c446..c5e8e355b7f 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -296,6 +296,15 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 void tlb_set_page(CPUState *cpu, target_ulong vaddr,
   hwaddr paddr, int prot,
   int mmu_idx, target_ulong size);
+
+/*
+ * Find the iotlbentry for ptr.  This *must* be present in the TLB
+ * because we just found the mapping.
+ */
+void tlb_assert_iotlb_entry_for_ptr_present(CPUArchState *env, int ptr_mmu_idx,
+uint64_t ptr,
+MMUAccessType ptr_access,
+uintptr_t index);
 #else
 static inline void tlb_init(CPUState *cpu)
 {
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 8a7b779270a..a6247da34a0 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -429,6 +429,20 @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
 tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS);
 }
 
+void tlb_assert_iotlb_entry_for_ptr_present(CPUArchState *env, int ptr_mmu_idx,
+uint64_t ptr,
+MMUAccessType ptr_access,
+uintptr_t index)
+{
+#ifdef CONFIG_DEBUG_TCG
+CPUTLBEntry *entry = tlb_entry(env, ptr_mmu_idx, ptr);
+target_ulong comparator = (ptr_access == MMU_DATA_LOAD
+   ? entry->addr_read
+   : tlb_addr_write(entry));
+g_assert(tlb_hit(comparator, ptr));
+#endif
+}
+
 static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
   target_ulong page, target_ulong mask)
 {
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 6cea9d1b506..f47d3b4570e 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -111,15 +111,8 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
  * matching tlb entry + iotlb entry.
  */
 index = tlb_index(env, ptr_mmu_idx, ptr);
-# ifdef CONFIG_DEBUG_TCG
-{
-CPUTLBEntry *entry = tlb_entry(env, ptr_mmu_idx, ptr);
-target_ulong comparator = (ptr_access == MMU_DATA_LOAD
-   ? entry->addr_read
-   : tlb_addr_write(entry));
-g_assert(tlb_hit(comparator, ptr));
-}
-# endif
+tlb_assert_iotlb_entry_for_ptr_present(env, ptr_mmu_idx, ptr,
+   ptr_access, index);
 iotlbentry = &env_tlb(env)->d[ptr_mmu_idx].iotlb[index];
 
 /* If the virtual page MemAttr != Tagged, access unchecked. */
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index c8cdf7618eb..a5708da0f2f 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4089,14 +4089,8 @@ static bool sve_probe_page(SVEHostPage *info, bool 
nofault,
 {
 uintptr_t index = tlb_index(env, mmu_idx, addr);
 
-# ifdef CONFIG_DEBUG_TCG
-CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-target_ulong comparator = (access_type == MMU_DATA_LOAD
-   ? entry->addr_read
-   : tlb_addr_write(entry));
-g_assert(tlb_hit(comparator, addr));
-# endif
-
+tlb_assert_iotlb_entry_for_ptr_present(env, mmu_idx, addr,
+   access_type, index);
 CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
 info->attrs = iotlbentry->attrs;
 }
-- 
2.26.2




[PATCH v2 6/6] exec/cpu_ldst: Move tlb* declarations to "exec/exec-all.h"

2021-02-07 Thread Philippe Mathieu-Daudé
Keep MMU functions in "exec/cpu_ldst.h", and move TLB functions
to "exec/exec-all.h". As tlb_addr_write() is only called in
accel/tcg/cputlb.c, make move it there as a static function.

Doing so we removed the "tcg/tcg.h" dependency on "exec/cpu_ldst.h".

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/exec/cpu_ldst.h | 28 
 include/exec/exec-all.h | 16 
 accel/tcg/cputlb.c  |  9 +
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index ef54cb7e1f8..c1753a64dfd 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -291,34 +291,6 @@ static inline void cpu_stq_le_mmuidx_ra(CPUArchState *env, 
abi_ptr addr,
 
 #else
 
-/* Needed for TCG_OVERSIZED_GUEST */
-#include "tcg/tcg.h"
-
-static inline target_ulong tlb_addr_write(const CPUTLBEntry *entry)
-{
-#if TCG_OVERSIZED_GUEST
-return entry->addr_write;
-#else
-return qatomic_read(&entry->addr_write);
-#endif
-}
-
-/* Find the TLB index corresponding to the mmu_idx + address pair.  */
-static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
-  target_ulong addr)
-{
-uintptr_t size_mask = env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS;
-
-return (addr >> TARGET_PAGE_BITS) & size_mask;
-}
-
-/* Find the TLB entry corresponding to the mmu_idx + address pair.  */
-static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
- target_ulong addr)
-{
-return &env_tlb(env)->f[mmu_idx].table[tlb_index(env, mmu_idx, addr)];
-}
-
 uint32_t cpu_ldub_mmuidx_ra(CPUArchState *env, abi_ptr addr,
 int mmu_idx, uintptr_t ra);
 int cpu_ldsb_mmuidx_ra(CPUArchState *env, abi_ptr addr,
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index c5e8e355b7f..8e54b537189 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -297,6 +297,22 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
   hwaddr paddr, int prot,
   int mmu_idx, target_ulong size);
 
+/* Find the TLB index corresponding to the mmu_idx + address pair.  */
+static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
+  target_ulong addr)
+{
+uintptr_t size_mask = env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS;
+
+return (addr >> TARGET_PAGE_BITS) & size_mask;
+}
+
+/* Find the TLB entry corresponding to the mmu_idx + address pair.  */
+static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
+ target_ulong addr)
+{
+return &env_tlb(env)->f[mmu_idx].table[tlb_index(env, mmu_idx, addr)];
+}
+
 /*
  * Find the iotlbentry for ptr.  This *must* be present in the TLB
  * because we just found the mapping.
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index a6247da34a0..084d19b52d7 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -429,6 +429,15 @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
 tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS);
 }
 
+static inline target_ulong tlb_addr_write(const CPUTLBEntry *entry)
+{
+#if TCG_OVERSIZED_GUEST
+return entry->addr_write;
+#else
+return qatomic_read(&entry->addr_write);
+#endif
+}
+
 void tlb_assert_iotlb_entry_for_ptr_present(CPUArchState *env, int ptr_mmu_idx,
 uint64_t ptr,
 MMUAccessType ptr_access,
-- 
2.26.2




Re: [PATCH v2] hw/block/nvme: add missing mor/mar constraint checks

2021-02-07 Thread Dmitry Fomichev
On Tue, 2021-01-26 at 13:15 +0100, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Firstly, if zoned.max_active is non-zero, zoned.max_open must be less
> than or equal to zoned.max_active.
> 
> Secondly, if only zones.max_active is set, we have to explicitly set
> zones.max_open or we end up with an invalid MAR/MOR configuration. This
> is an artifact of the parameters not being zeroes-based like in the
> spec.
> 
> Cc: Dmitry Fomichev 
> Reported-by: Gollu Appalanaidu 
> Signed-off-by: Klaus Jensen 

Reviewed-by: Dmitry Fomichev 

> ---
> 
> v2:
> 
>   * Jumped the gun on removing the check on zoned.max_open. It should
> still be done since the device might only have a constraint on open
> zones, not active.
>   * Instead, added an explicit set of zoned.max_open if only
> zoned.max_active is specifed.
> 
>  hw/block/nvme-ns.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index 62b25cf69bfa..df514287b58f 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c
> @@ -153,6 +153,18 @@ static int 
> nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp)
>  return -1;
>  }
>  
> 
> 
> 
> +if (ns->params.max_active_zones) {
> +if (ns->params.max_open_zones > ns->params.max_active_zones) {
> +error_setg(errp, "max_open_zones (%u) exceeds max_active_zones 
> (%u)",
> +   ns->params.max_open_zones, 
> ns->params.max_active_zones);
> +return -1;
> +}
> +
> +if (!ns->params.max_open_zones) {
> +ns->params.max_open_zones = ns->params.max_active_zones;
> +}
> +}
> +
>  if (ns->params.zd_extension_size) {
>  if (ns->params.zd_extension_size & 0x3f) {
>  error_setg(errp,



[PATCH] hw/block/nvme: fix Close Zone

2021-02-07 Thread Dmitry Fomichev
Implicitly and Explicitly Open zones can be closed by Close Zone
management function. This got broken by a recent commit and now such
commands fail with Invalid Zone State Transition status.

Modify nvm_zrm_close() function to make Close Zone work correctly.

Signed-off-by: Dmitry Fomichev 
Fixes: 053b5a302c3("hw/block/nvme: refactor zone resource management")
---
 hw/block/nvme.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b84e34843..c2f0c88fbf 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1308,14 +1308,13 @@ static uint16_t nvme_zrm_finish(NvmeNamespace *ns, 
NvmeZone *zone)
 static uint16_t nvme_zrm_close(NvmeNamespace *ns, NvmeZone *zone)
 {
 switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_CLOSED:
-return NVME_SUCCESS;
-
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
 nvme_aor_dec_open(ns);
 nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
 /* fall through */
+case NVME_ZONE_STATE_CLOSED:
+return NVME_SUCCESS;
 
 default:
 return NVME_ZONE_INVAL_TRANSITION;
-- 
2.28.0




Re: [PATCH] scsi: mptsas: dequeue request object in case of an error (CVE-2021-3392)

2021-02-07 Thread Li Qiang
P J P  于2021年2月2日周二 下午9:23写道:
>
> From: Prasad J Pandit 
>
> While processing SCSI i/o requests in mptsas_process_scsi_io_request(),
> the Megaraid emulator appends new MPTSASRequest object 'req' to
> the 's->pending' queue. In case of an error, this same object gets
> dequeued in mptsas_free_request() only if SCSIRequest object
> 'req->sreq' is initialised. This may lead to a use-after-free issue.
> Unconditionally dequeue 'req' object from 's->pending' to avoid it.
>
> Fixes: CVE-2021-3392
> Buglink: https://bugs.launchpad.net/qemu/+bug/1914236
> Reported-by: Cheolwoo Myung 
> Signed-off-by: Prasad J Pandit 

Reviewed-by: Li Qiang 

> ---
>  hw/scsi/mptsas.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/scsi/mptsas.c b/hw/scsi/mptsas.c
> index f86616544b..adff5b0bf2 100644
> --- a/hw/scsi/mptsas.c
> +++ b/hw/scsi/mptsas.c
> @@ -257,8 +257,8 @@ static void mptsas_free_request(MPTSASRequest *req)
>  req->sreq->hba_private = NULL;
>  scsi_req_unref(req->sreq);
>  req->sreq = NULL;
> -QTAILQ_REMOVE(&s->pending, req, next);
>  }
> +QTAILQ_REMOVE(&s->pending, req, next);
>  qemu_sglist_destroy(&req->qsg);
>  g_free(req);
>  }
> --
> 2.29.2
>
>



[PATCH v3 00/70] TCI fixes and cleanups

2021-02-07 Thread Richard Henderson
Changes since v2:
  * 20-something patches are now upstream.
  * Increase testing timeout for tci.
  * Gitlab testing for tci w/ 32-bit host.


r~


Richard Henderson (70):
  gdbstub: Fix handle_query_xfer_auxv
  tcg: Split out tcg_raise_tb_overflow
  tcg: Manage splitwx in tc_ptr_to_region_tree by hand
  tcg/tci: Merge identical cases in generation
  tcg/tci: Remove tci_read_r8
  tcg/tci: Remove tci_read_r8s
  tcg/tci: Remove tci_read_r16
  tcg/tci: Remove tci_read_r16s
  tcg/tci: Remove tci_read_r32
  tcg/tci: Remove tci_read_r32s
  tcg/tci: Reduce use of tci_read_r64
  tcg/tci: Merge basic arithmetic operations
  tcg/tci: Merge extension operations
  tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64
  tcg/tci: Merge bswap operations
  tcg/tci: Merge mov, not and neg operations
  tcg/tci: Rename tci_read_r to tci_read_rval
  tcg/tci: Split out tci_args_rrs
  tcg/tci: Split out tci_args_rr
  tcg/tci: Split out tci_args_rrr
  tcg/tci: Split out tci_args_rrrc
  tcg/tci: Split out tci_args_l
  tcg/tci: Split out tci_args_rc
  tcg/tci: Split out tci_args_rrcl and tci_args_cl
  tcg/tci: Split out tci_args_ri and tci_args_rI
  tcg/tci: Reuse tci_args_l for calls.
  tcg/tci: Reuse tci_args_l for exit_tb
  tcg/tci: Reuse tci_args_l for goto_tb
  tcg/tci: Split out tci_args_rr
  tcg/tci: Split out tci_args_
  tcg/tci: Clean up deposit operations
  tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits
  tcg/tci: Split out tci_args_{rrm,rrrm,m}
  tcg/tci: Hoist op_size checking into tci_args_*
  tcg/tci: Remove tci_disas
  tcg/tci: Implement the disassembler properly
  tcg: Build ffi data structures for helpers
  tcg/tci: Use ffi for calls
  tcg/tci: Improve tcg_target_call_clobber_regs
  tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order
  tcg/tci: Push opcode emit into each case
  tcg/tci: Split out tcg_out_op_rrs
  tcg/tci: Split out tcg_out_op_l
  tcg/tci: Split out tcg_out_op_p
  tcg/tci: Split out tcg_out_op_rr
  tcg/tci: Split out tcg_out_op_rrr
  tcg/tci: Split out tcg_out_op_rrrc
  tcg/tci: Split out tcg_out_op_rc
  tcg/tci: Split out tcg_out_op_rrrbb
  tcg/tci: Split out tcg_out_op_rrcl
  tcg/tci: Split out tcg_out_op_rr
  tcg/tci: Split out tcg_out_op_
  tcg/tci: Split out tcg_out_op_cl
  tcg/tci: Split out tcg_out_op_{rrm,rrrm,m}
  tcg/tci: Split out tcg_out_op_v
  tcg/tci: Split out tcg_out_op_np
  tcg/tci: Split out tcg_out_op_r[iI]
  tcg/tci: Reserve r13 for a temporary
  tcg/tci: Emit setcond before brcond
  tcg/tci: Remove tci_write_reg
  tcg/tci: Change encoding to uint32_t units
  tcg/tci: Implement goto_ptr
  tcg/tci: Implement movcond
  tcg/tci: Implement andc, orc, eqv, nand, nor
  tcg/tci: Implement extract, sextract
  tcg/tci: Implement clz, ctz, ctpop
  tcg/tci: Implement mulu2, muls2
  tcg/tci: Implement add2, sub2
  tests/tcg: Increase timeout for TCI
  gitlab: Enable cross-i386 builds of TCI

 configure |3 +
 meson.build   |9 +-
 include/exec/helper-ffi.h |  115 ++
 include/exec/helper-tcg.h |   24 +-
 include/tcg/tcg-opc.h |6 +-
 include/tcg/tcg.h |1 +
 target/hppa/helper.h  |2 +
 target/i386/ops_sse_header.h  |6 +
 target/m68k/helper.h  |1 +
 target/ppc/helper.h   |3 +
 tcg/tci/tcg-target-con-set.h  |2 +-
 tcg/tci/tcg-target.h  |   81 +-
 disas/tci.c   |   61 -
 gdbstub.c |   17 +-
 tcg/tcg.c |  117 +-
 tcg/tci.c | 1536 ++---
 tcg/tci/tcg-target.c.inc  |  926 +-
 .gitlab-ci.d/crossbuilds.yml  |   17 +-
 tcg/tci/README|   20 +-
 .../dockerfiles/fedora-i386-cross.docker  |1 +
 tests/docker/dockerfiles/fedora.docker|1 +
 tests/tcg/Makefile.target |6 +-
 22 files changed, 1727 insertions(+), 1228 deletions(-)
 create mode 100644 include/exec/helper-ffi.h
 delete mode 100644 disas/tci.c

-- 
2.25.1




[PATCH v3 01/70] gdbstub: Fix handle_query_xfer_auxv

2021-02-07 Thread Richard Henderson
The main problem was that we were treating a guest address
as a host address with a mere cast.

Use the correct interface for accessing guest memory.  Do not
allow offset == auxv_len, which would result in an empty packet.

Fixes: 51c623b0de1 ("gdbstub: add support to Xfer:auxv:read: packet")
Signed-off-by: Richard Henderson 
---
 gdbstub.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index c7ca7e9f88..759bb00bcf 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -2245,7 +2245,6 @@ static void handle_query_xfer_auxv(GdbCmdContext 
*gdb_ctx, void *user_ctx)
 {
 TaskState *ts;
 unsigned long offset, len, saved_auxv, auxv_len;
-const char *mem;
 
 if (gdb_ctx->num_params < 2) {
 put_packet("E22");
@@ -2257,8 +2256,8 @@ static void handle_query_xfer_auxv(GdbCmdContext 
*gdb_ctx, void *user_ctx)
 ts = gdbserver_state.c_cpu->opaque;
 saved_auxv = ts->info->saved_auxv;
 auxv_len = ts->info->auxv_len;
-mem = (const char *)(saved_auxv + offset);
-if (offset > auxv_len) {
+
+if (offset >= auxv_len) {
 put_packet("E00");
 return;
 }
@@ -2269,12 +2268,20 @@ static void handle_query_xfer_auxv(GdbCmdContext 
*gdb_ctx, void *user_ctx)
 
 if (len < auxv_len - offset) {
 g_string_assign(gdbserver_state.str_buf, "m");
-memtox(gdbserver_state.str_buf, mem, len);
 } else {
 g_string_assign(gdbserver_state.str_buf, "l");
-memtox(gdbserver_state.str_buf, mem, auxv_len - offset);
+len = auxv_len - offset;
 }
 
+g_byte_array_set_size(gdbserver_state.mem_buf, len);
+if (target_memory_rw_debug(gdbserver_state.g_cpu, saved_auxv + offset,
+   gdbserver_state.mem_buf->data, len, false)) {
+put_packet("E14");
+return;
+}
+
+memtox(gdbserver_state.str_buf,
+   (const char *)gdbserver_state.mem_buf->data, len);
 put_packet_binary(gdbserver_state.str_buf->str,
   gdbserver_state.str_buf->len, true);
 }
-- 
2.25.1




[PATCH v3 06/70] tcg/tci: Remove tci_read_r8s

2021-02-07 Thread Richard Henderson
Use explicit casts for ext8s opcodes.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 25 -
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index c44a4aec7b..25db479e62 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong 
*regs, TCGReg index)
 return regs[index];
 }
 
-#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-static int8_t tci_read_reg8s(const tcg_target_ulong *regs, TCGReg index)
-{
-return (int8_t)tci_read_reg(regs, index);
-}
-#endif
-
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 static int16_t tci_read_reg16s(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -164,16 +157,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 return value;
 }
 
-#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-/* Read indexed register (8 bit signed) from bytecode. */
-static int8_t tci_read_r8s(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
-{
-int8_t value = tci_read_reg8s(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-#endif
-
 /* Read indexed register (16 bit) from bytecode. */
 static uint16_t tci_read_r16(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
@@ -712,8 +695,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext8s_i32
 case INDEX_op_ext8s_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r8s(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (int8_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32
@@ -927,8 +910,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext8s_i64
 case INDEX_op_ext8s_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r8s(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (int8_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i64
-- 
2.25.1




[PATCH v3 07/70] tcg/tci: Remove tci_read_r16

2021-02-07 Thread Richard Henderson
Use explicit casts for ext16u opcodes, and allow truncation
to happen with the store for st16 opcodes, and with the call
for bswap16 opcodes.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 25db479e62..547be0c2f0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -71,11 +71,6 @@ static int32_t tci_read_reg32s(const tcg_target_ulong *regs, 
TCGReg index)
 }
 #endif
 
-static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index)
-{
-return (uint16_t)tci_read_reg(regs, index);
-}
-
 static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index)
 {
 return (uint32_t)tci_read_reg(regs, index);
@@ -157,15 +152,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 return value;
 }
 
-/* Read indexed register (16 bit) from bytecode. */
-static uint16_t tci_read_r16(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-uint16_t value = tci_read_reg16(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 /* Read indexed register (16 bit signed) from bytecode. */
 static int16_t tci_read_r16s(const tcg_target_ulong *regs,
@@ -526,7 +512,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 *(uint8_t *)(t1 + t2) = t0;
 break;
 CASE_32_64(st16)
-t0 = tci_read_r16(regs, &tb_ptr);
+t0 = tci_read_r(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint16_t *)(t1 + t2) = t0;
@@ -716,14 +702,14 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext16u_i32
 case INDEX_op_ext16u_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (uint16_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32
 case INDEX_op_bswap16_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap16(t1));
 break;
 #endif
@@ -924,8 +910,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext16u_i64
 case INDEX_op_ext16u_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (uint16_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
@@ -947,7 +933,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_bswap16_i64
 case INDEX_op_bswap16_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap16(t1));
 break;
 #endif
-- 
2.25.1




[PATCH v3 02/70] tcg: Split out tcg_raise_tb_overflow

2021-02-07 Thread Richard Henderson
Allow other places in tcg to restart with a smaller tb.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 63a12b197b..bbe3dcee03 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -346,6 +346,12 @@ static void set_jmp_reset_offset(TCGContext *s, int which)
 s->tb_jmp_reset_offset[which] = tcg_current_code_size(s);
 }
 
+/* Signal overflow, starting over with fewer guest insns. */
+static void QEMU_NORETURN tcg_raise_tb_overflow(TCGContext *s)
+{
+siglongjmp(s->jmp_trans, -2);
+}
+
 #define C_PFX1(P, A)P##A
 #define C_PFX2(P, A, B) P##A##_##B
 #define C_PFX3(P, A, B, C)  P##A##_##B##_##C
@@ -1310,8 +1316,7 @@ static TCGTemp *tcg_temp_alloc(TCGContext *s)
 int n = s->nb_temps++;
 
 if (n >= TCG_MAX_TEMPS) {
-/* Signal overflow, starting over with fewer guest insns. */
-siglongjmp(s->jmp_trans, -2);
+tcg_raise_tb_overflow(s);
 }
 return memset(&s->temps[n], 0, sizeof(TCGTemp));
 }
-- 
2.25.1




[PATCH v3 03/70] tcg: Manage splitwx in tc_ptr_to_region_tree by hand

2021-02-07 Thread Richard Henderson
The use in tcg_tb_lookup is given a random pc that comes from the pc
of a signal handler.  Do not assert that the pointer is already within
the code gen buffer at all, much less the writable mirror of it.

Fixes: db0c51a3803
Signed-off-by: Richard Henderson 
---

For TCI, this indicates a bug in handle_cpu_signal, in that we
are taking PC from the host signal frame.  Which is, nearly,
unrelated to TCI at all.

The TCI "pc" is tci_tb_ptr (fixed in the next patch to at least
be thread-local).  We update this only on calls, since we don't
expect SEGV during the interpretation loop.  Which works ok for
softmmu, in which we pass down pc by hand to the helpers, but
is not ok for user-only, where we simply perform the raw memory
operation.

I don't know how to fix this, exactly.  Probably by storing to
tci_tb_ptr before each qemu_ld/qemu_st operation, with barriers.
Then Doing the Right Thing in handle_cpu_signal.  And perhaps
by clearing tci_tb_ptr whenever we're not expecting a SEGV on
behalf of the guest (and thus anything left is a qemu host bug).

---
v2: Retain full struct initialization
---
 tcg/tcg.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bbe3dcee03..2991112829 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -513,11 +513,21 @@ static void tcg_region_trees_init(void)
 }
 }
 
-static struct tcg_region_tree *tc_ptr_to_region_tree(const void *cp)
+static struct tcg_region_tree *tc_ptr_to_region_tree(const void *p)
 {
-void *p = tcg_splitwx_to_rw(cp);
 size_t region_idx;
 
+/*
+ * Like tcg_splitwx_to_rw, with no assert.  The pc may come from
+ * a signal handler over which the caller has no control.
+ */
+if (!in_code_gen_buffer(p)) {
+p -= tcg_splitwx_diff;
+if (!in_code_gen_buffer(p)) {
+return NULL;
+}
+}
+
 if (p < region.start_aligned) {
 region_idx = 0;
 } else {
@@ -536,6 +546,7 @@ void tcg_tb_insert(TranslationBlock *tb)
 {
 struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
 
+g_assert(rt != NULL);
 qemu_mutex_lock(&rt->lock);
 g_tree_insert(rt->tree, &tb->tc, tb);
 qemu_mutex_unlock(&rt->lock);
@@ -545,6 +556,7 @@ void tcg_tb_remove(TranslationBlock *tb)
 {
 struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
 
+g_assert(rt != NULL);
 qemu_mutex_lock(&rt->lock);
 g_tree_remove(rt->tree, &tb->tc);
 qemu_mutex_unlock(&rt->lock);
@@ -561,6 +573,10 @@ TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr)
 TranslationBlock *tb;
 struct tb_tc s = { .ptr = (void *)tc_ptr };
 
+if (rt == NULL) {
+return NULL;
+}
+
 qemu_mutex_lock(&rt->lock);
 tb = g_tree_lookup(rt->tree, &s);
 qemu_mutex_unlock(&rt->lock);
-- 
2.25.1




[PATCH v3 14/70] tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64

2021-02-07 Thread Richard Henderson
These operations are always available under different names:
INDEX_op_ext_i32_i64 and INDEX_op_extu_i32_i64, so we remove
no code with the ifdef.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index cdfd9b7af8..1819652c5a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -796,17 +796,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 }
 break;
-#if TCG_TARGET_HAS_ext32s_i64
 case INDEX_op_ext32s_i64:
-#endif
 case INDEX_op_ext_i32_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int32_t)t1);
 break;
-#if TCG_TARGET_HAS_ext32u_i64
 case INDEX_op_ext32u_i64:
-#endif
 case INDEX_op_extu_i32_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1




[PATCH v3 08/70] tcg/tci: Remove tci_read_r16s

2021-02-07 Thread Richard Henderson
Use explicit casts for ext16s opcodes.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 547be0c2f0..d2bfcb3c93 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong 
*regs, TCGReg index)
 return regs[index];
 }
 
-#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
-static int16_t tci_read_reg16s(const tcg_target_ulong *regs, TCGReg index)
-{
-return (int16_t)tci_read_reg(regs, index);
-}
-#endif
-
 #if TCG_TARGET_REG_BITS == 64
 static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -152,17 +145,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 return value;
 }
 
-#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
-/* Read indexed register (16 bit signed) from bytecode. */
-static int16_t tci_read_r16s(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-int16_t value = tci_read_reg16s(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-#endif
-
 /* Read indexed register (32 bit) from bytecode. */
 static uint32_t tci_read_r32(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
@@ -688,8 +670,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext16s_i32
 case INDEX_op_ext16s_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r16s(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (int16_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32
@@ -903,8 +885,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext16s_i64
 case INDEX_op_ext16s_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r16s(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (int16_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i64
-- 
2.25.1




[PATCH v3 04/70] tcg/tci: Merge identical cases in generation

2021-02-07 Thread Richard Henderson
Use CASE_32_64 and CASE_64 to reduce ifdefs and merge
cases that are identical between 32-bit and 64-bit hosts.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 204 ++-
 1 file changed, 73 insertions(+), 131 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index feac4659cc..c79f9c32d8 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -380,6 +380,18 @@ static inline void tcg_out_call(TCGContext *s, const 
tcg_insn_unit *arg)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+#if TCG_TARGET_REG_BITS == 64
+# define CASE_32_64(x) \
+case glue(glue(INDEX_op_, x), _i64): \
+case glue(glue(INDEX_op_, x), _i32):
+# define CASE_64(x) \
+case glue(glue(INDEX_op_, x), _i64):
+#else
+# define CASE_32_64(x) \
+case glue(glue(INDEX_op_, x), _i32):
+# define CASE_64(x)
+#endif
+
 static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
const int *const_args)
 {
@@ -391,6 +403,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 case INDEX_op_exit_tb:
 tcg_out64(s, args[0]);
 break;
+
 case INDEX_op_goto_tb:
 if (s->tb_jmp_insn_offset) {
 /* Direct jump method. */
@@ -404,15 +417,18 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 }
 set_jmp_reset_offset(s, args[0]);
 break;
+
 case INDEX_op_br:
 tci_out_label(s, arg_label(args[0]));
 break;
-case INDEX_op_setcond_i32:
+
+CASE_32_64(setcond)
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_out_r(s, args[2]);
 tcg_out8(s, args[3]);   /* condition */
 break;
+
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
 /* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
@@ -423,60 +439,54 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 tcg_out_r(s, args[4]);
 tcg_out8(s, args[5]);   /* condition */
 break;
-#elif TCG_TARGET_REG_BITS == 64
-case INDEX_op_setcond_i64:
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_out8(s, args[3]);   /* condition */
-break;
 #endif
-case INDEX_op_ld8u_i32:
-case INDEX_op_ld8s_i32:
-case INDEX_op_ld16u_i32:
-case INDEX_op_ld16s_i32:
+
+CASE_32_64(ld8u)
+CASE_32_64(ld8s)
+CASE_32_64(ld16u)
+CASE_32_64(ld16s)
 case INDEX_op_ld_i32:
-case INDEX_op_st8_i32:
-case INDEX_op_st16_i32:
+CASE_64(ld32u)
+CASE_64(ld32s)
+CASE_64(ld)
+CASE_32_64(st8)
+CASE_32_64(st16)
 case INDEX_op_st_i32:
-case INDEX_op_ld8u_i64:
-case INDEX_op_ld8s_i64:
-case INDEX_op_ld16u_i64:
-case INDEX_op_ld16s_i64:
-case INDEX_op_ld32u_i64:
-case INDEX_op_ld32s_i64:
-case INDEX_op_ld_i64:
-case INDEX_op_st8_i64:
-case INDEX_op_st16_i64:
-case INDEX_op_st32_i64:
-case INDEX_op_st_i64:
+CASE_64(st32)
+CASE_64(st)
 stack_bounds_check(args[1], args[2]);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_debug_assert(args[2] == (int32_t)args[2]);
 tcg_out32(s, args[2]);
 break;
-case INDEX_op_add_i32:
-case INDEX_op_sub_i32:
-case INDEX_op_mul_i32:
-case INDEX_op_and_i32:
-case INDEX_op_andc_i32: /* Optional (TCG_TARGET_HAS_andc_i32). */
-case INDEX_op_eqv_i32:  /* Optional (TCG_TARGET_HAS_eqv_i32). */
-case INDEX_op_nand_i32: /* Optional (TCG_TARGET_HAS_nand_i32). */
-case INDEX_op_nor_i32:  /* Optional (TCG_TARGET_HAS_nor_i32). */
-case INDEX_op_or_i32:
-case INDEX_op_orc_i32:  /* Optional (TCG_TARGET_HAS_orc_i32). */
-case INDEX_op_xor_i32:
-case INDEX_op_shl_i32:
-case INDEX_op_shr_i32:
-case INDEX_op_sar_i32:
-case INDEX_op_rotl_i32: /* Optional (TCG_TARGET_HAS_rot_i32). */
-case INDEX_op_rotr_i32: /* Optional (TCG_TARGET_HAS_rot_i32). */
+
+CASE_32_64(add)
+CASE_32_64(sub)
+CASE_32_64(mul)
+CASE_32_64(and)
+CASE_32_64(or)
+CASE_32_64(xor)
+CASE_32_64(andc) /* Optional (TCG_TARGET_HAS_andc_*). */
+CASE_32_64(orc)  /* Optional (TCG_TARGET_HAS_orc_*). */
+CASE_32_64(eqv)  /* Optional (TCG_TARGET_HAS_eqv_*). */
+CASE_32_64(nand) /* Optional (TCG_TARGET_HAS_nand_*). */
+CASE_32_64(nor)  /* Optional (TCG_TARGET_HAS_nor_*). */
+CASE_32_64(shl)
+CASE_32_64(shr)
+CASE_32_64(sar)
+CASE_32_64(rotl) /* Optional (TCG_TARGET_HAS_rot_*). */
+CASE_32_64(rotr) /* Optional (TCG_TARGET_HAS_rot_*). */
+CASE_32_64(div)  /* Optional (TCG_TARGET_HAS_div_*). */
+CASE_32_64(divu) /* Optional (TCG_TARGET_HAS_div_*). */
+CASE_32_64(rem)  /* Optional (TCG_TARGET_HAS_div_*). */
+CASE_32_64(remu) /* Optional

[PATCH v3 10/70] tcg/tci: Remove tci_read_r32s

2021-02-07 Thread Richard Henderson
Use explicit casts for ext32s opcodes.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 20 ++--
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 72ec63e18e..9c8395397a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong 
*regs, TCGReg index)
 return regs[index];
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
-{
-return (int32_t)tci_read_reg(regs, index);
-}
-#endif
-
 #if TCG_TARGET_REG_BITS == 64
 static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -149,15 +142,6 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
 return tci_uint64(tci_read_r(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
-/* Read indexed register (32 bit signed) from bytecode. */
-static int32_t tci_read_r32s(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-int32_t value = tci_read_reg32s(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-
 /* Read indexed register (64 bit) from bytecode. */
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
@@ -887,8 +871,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 case INDEX_op_ext_i32_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r32s(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (int32_t)t1);
 break;
 #if TCG_TARGET_HAS_ext32u_i64
 case INDEX_op_ext32u_i64:
-- 
2.25.1




[PATCH v3 15/70] tcg/tci: Merge bswap operations

2021-02-07 Thread Richard Henderson
This includes bswap16 and bswap32.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1819652c5a..c979215332 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -652,15 +652,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, (uint16_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_bswap16_i32
-case INDEX_op_bswap16_i32:
+#if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
+CASE_32_64(bswap16)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap16(t1));
 break;
 #endif
-#if TCG_TARGET_HAS_bswap32_i32
-case INDEX_op_bswap32_i32:
+#if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
+CASE_32_64(bswap32)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap32(t1));
@@ -808,20 +808,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint32_t)t1);
 break;
-#if TCG_TARGET_HAS_bswap16_i64
-case INDEX_op_bswap16_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap16(t1));
-break;
-#endif
-#if TCG_TARGET_HAS_bswap32_i64
-case INDEX_op_bswap32_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap32(t1));
-break;
-#endif
 #if TCG_TARGET_HAS_bswap64_i64
 case INDEX_op_bswap64_i64:
 t0 = *tb_ptr++;
-- 
2.25.1




[PATCH v3 09/70] tcg/tci: Remove tci_read_r32

2021-02-07 Thread Richard Henderson
Use explicit casts for ext32u opcodes, and allow truncation
to happen for other users.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 122 --
 1 file changed, 54 insertions(+), 68 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index d2bfcb3c93..72ec63e18e 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -64,11 +64,6 @@ static int32_t tci_read_reg32s(const tcg_target_ulong *regs, 
TCGReg index)
 }
 #endif
 
-static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index)
-{
-return (uint32_t)tci_read_reg(regs, index);
-}
-
 #if TCG_TARGET_REG_BITS == 64
 static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -145,22 +140,13 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 return value;
 }
 
-/* Read indexed register (32 bit) from bytecode. */
-static uint32_t tci_read_r32(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-uint32_t value = tci_read_reg32(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-
 #if TCG_TARGET_REG_BITS == 32
 /* Read two indexed registers (2 * 32 bit) from bytecode. */
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
 {
-uint32_t low = tci_read_r32(regs, tb_ptr);
-return tci_uint64(tci_read_r32(regs, tb_ptr), low);
+uint32_t low = tci_read_r(regs, tb_ptr);
+return tci_uint64(tci_read_r(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
 /* Read indexed register (32 bit signed) from bytecode. */
@@ -421,8 +407,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 case INDEX_op_setcond_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_r32(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
 break;
@@ -445,7 +431,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 case INDEX_op_mov_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1);
 break;
 case INDEX_op_tci_movi_i32:
@@ -501,7 +487,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 case INDEX_op_st_i32:
 CASE_64(st32)
-t0 = tci_read_r32(regs, &tb_ptr);
+t0 = tci_read_r(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint32_t *)(t1 + t2) = t0;
@@ -511,62 +497,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 case INDEX_op_add_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_r32(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 + t2);
 break;
 case INDEX_op_sub_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_r32(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 - t2);
 break;
 case INDEX_op_mul_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_r32(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 * t2);
 break;
 case INDEX_op_div_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_r32(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
 break;
 case INDEX_op_divu_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_r32(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 / t2);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (uint32_t)t1 / (uint32_t)t2);
 break;
 case INDEX_op_rem_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_r32(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
 break;
 case INDEX_op_remu_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_rea

[PATCH v3 05/70] tcg/tci: Remove tci_read_r8

2021-02-07 Thread Richard Henderson
Use explicit casts for ext8u opcodes, and allow truncation
to happen with the store for st8 opcodes.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index fb3c97aaf1..c44a4aec7b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -78,11 +78,6 @@ static int32_t tci_read_reg32s(const tcg_target_ulong *regs, 
TCGReg index)
 }
 #endif
 
-static uint8_t tci_read_reg8(const tcg_target_ulong *regs, TCGReg index)
-{
-return (uint8_t)tci_read_reg(regs, index);
-}
-
 static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index)
 {
 return (uint16_t)tci_read_reg(regs, index);
@@ -169,14 +164,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 return value;
 }
 
-/* Read indexed register (8 bit) from bytecode. */
-static uint8_t tci_read_r8(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
-{
-uint8_t value = tci_read_reg8(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
 /* Read indexed register (8 bit signed) from bytecode. */
 static int8_t tci_read_r8s(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
@@ -550,7 +537,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
 break;
 CASE_32_64(st8)
-t0 = tci_read_r8(regs, &tb_ptr);
+t0 = tci_read_r(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint8_t *)(t1 + t2) = t0;
@@ -739,8 +726,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext8u_i32
 case INDEX_op_ext8u_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r8(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (uint8_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32
@@ -933,8 +920,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext8u_i64
 case INDEX_op_ext8u_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r8(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (uint8_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext8s_i64
-- 
2.25.1




[PATCH v3 12/70] tcg/tci: Merge basic arithmetic operations

2021-02-07 Thread Richard Henderson
This includes add, sub, mul, and, or, xor.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 83 +--
 1 file changed, 25 insertions(+), 58 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0246e663a3..894e87e1b0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -468,26 +468,47 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 *(uint32_t *)(t1 + t2) = t0;
 break;
 
-/* Arithmetic operations (32 bit). */
+/* Arithmetic operations (mixed 32/64 bit). */
 
-case INDEX_op_add_i32:
+CASE_32_64(add)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 + t2);
 break;
-case INDEX_op_sub_i32:
+CASE_32_64(sub)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 - t2);
 break;
-case INDEX_op_mul_i32:
+CASE_32_64(mul)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 * t2);
 break;
+CASE_32_64(and)
+t0 = *tb_ptr++;
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, t1 & t2);
+break;
+CASE_32_64(or)
+t0 = *tb_ptr++;
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, t1 | t2);
+break;
+CASE_32_64(xor)
+t0 = *tb_ptr++;
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, t1 ^ t2);
+break;
+
+/* Arithmetic operations (32 bit). */
+
 case INDEX_op_div_i32:
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
@@ -512,24 +533,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint32_t)t1 % (uint32_t)t2);
 break;
-case INDEX_op_and_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 & t2);
-break;
-case INDEX_op_or_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 | t2);
-break;
-case INDEX_op_xor_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 ^ t2);
-break;
 
 /* Shift/rotate operations (32 bit). */
 
@@ -712,24 +715,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 /* Arithmetic operations (64 bit). */
 
-case INDEX_op_add_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 + t2);
-break;
-case INDEX_op_sub_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 - t2);
-break;
-case INDEX_op_mul_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 * t2);
-break;
 case INDEX_op_div_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
@@ -754,24 +739,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
 break;
-case INDEX_op_and_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 & t2);
-break;
-case INDEX_op_or_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 | t2);
-break;
-case INDEX_op_xor_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 ^ t2);
-break;
 
 /* Shift/rotate operations (64 bit). */
 
-- 
2.25.1




[PATCH v3 18/70] tcg/tci: Split out tci_args_rrs

2021-02-07 Thread Richard Henderson
Begin splitting out functions that do pure argument decode,
without actually loading values from the register set.

This means that decoding need not concern itself between
input and output registers.  We can assert that the register
number is in range during decode, so that it is safe to
simply dereference from regs[] later.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 111 --
 1 file changed, 67 insertions(+), 44 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 20aaaca959..be298ae39d 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -83,6 +83,20 @@ static uint64_t tci_uint64(uint32_t high, uint32_t low)
 }
 #endif
 
+/* Read constant byte from bytecode. */
+static uint8_t tci_read_b(const uint8_t **tb_ptr)
+{
+return *(tb_ptr[0]++);
+}
+
+/* Read register number from bytecode. */
+static TCGReg tci_read_r(const uint8_t **tb_ptr)
+{
+uint8_t regno = tci_read_b(tb_ptr);
+tci_assert(regno < TCG_TARGET_NB_REGS);
+return regno;
+}
+
 /* Read constant (native size) from bytecode. */
 static tcg_target_ulong tci_read_i(const uint8_t **tb_ptr)
 {
@@ -161,6 +175,23 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
 return label;
 }
 
+/*
+ * Load sets of arguments all at once.  The naming convention is:
+ *   tci_args_
+ * where arguments is a sequence of
+ *
+ *   r = register
+ *   s = signed ldst offset
+ */
+
+static void tci_args_rrs(const uint8_t **tb_ptr,
+ TCGReg *r0, TCGReg *r1, int32_t *i2)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*i2 = tci_read_s32(tb_ptr);
+}
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
 bool result = false;
@@ -328,6 +359,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint8_t op_size = tb_ptr[1];
 const uint8_t *old_code_ptr = tb_ptr;
 #endif
+TCGReg r0, r1;
 tcg_target_ulong t0;
 tcg_target_ulong t1;
 tcg_target_ulong t2;
@@ -342,6 +374,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint64_t v64;
 #endif
 TCGMemOpIdx oi;
+int32_t ofs;
+void *ptr;
 
 /* Skip opcode and size entry. */
 tb_ptr += 2;
@@ -418,54 +452,46 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 /* Load/store operations (32 bit). */
 
 CASE_32_64(ld8u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(uint8_t *)ptr;
 break;
 CASE_32_64(ld8s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(int8_t *)ptr;
 break;
 CASE_32_64(ld16u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(uint16_t *)ptr;
 break;
 CASE_32_64(ld16s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(int16_t *)ptr;
 break;
 case INDEX_op_ld_i32:
 CASE_64(ld32u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(uint32_t *)ptr;
 break;
 CASE_32_64(st8)
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-*(uint8_t *)(t1 + t2) = t0;
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+*(uint8_t *)ptr = regs[r0];
 break;
 CASE_32_64(st16)
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-*(uint16_t *)(t1 + t2) = t0;
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+*(uint16_t *)ptr = regs[r0];
 break;
 

[PATCH v3 13/70] tcg/tci: Merge extension operations

2021-02-07 Thread Richard Henderson
This includes ext8s, ext8u, ext16s, ext16u.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 44 
 1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 894e87e1b0..cdfd9b7af8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -624,29 +624,29 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
 break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
-#if TCG_TARGET_HAS_ext8s_i32
-case INDEX_op_ext8s_i32:
+#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
+CASE_32_64(ext8s)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int8_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_ext16s_i32
-case INDEX_op_ext16s_i32:
+#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
+CASE_32_64(ext16s)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int16_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_ext8u_i32
-case INDEX_op_ext8u_i32:
+#if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
+CASE_32_64(ext8u)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint8_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_ext16u_i32
-case INDEX_op_ext16u_i32:
+#if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
+CASE_32_64(ext16u)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint16_t)t1);
@@ -796,34 +796,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 }
 break;
-#if TCG_TARGET_HAS_ext8u_i64
-case INDEX_op_ext8u_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint8_t)t1);
-break;
-#endif
-#if TCG_TARGET_HAS_ext8s_i64
-case INDEX_op_ext8s_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int8_t)t1);
-break;
-#endif
-#if TCG_TARGET_HAS_ext16s_i64
-case INDEX_op_ext16s_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int16_t)t1);
-break;
-#endif
-#if TCG_TARGET_HAS_ext16u_i64
-case INDEX_op_ext16u_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint16_t)t1);
-break;
-#endif
 #if TCG_TARGET_HAS_ext32s_i64
 case INDEX_op_ext32s_i64:
 #endif
-- 
2.25.1




[PATCH v3 11/70] tcg/tci: Reduce use of tci_read_r64

2021-02-07 Thread Richard Henderson
In all cases restricted to 64-bit hosts, tcg_read_r is
identical.  We retain the 64-bit symbol for the single
case of INDEX_op_qemu_st_i64.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 93 +--
 1 file changed, 42 insertions(+), 51 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 9c8395397a..0246e663a3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong 
*regs, TCGReg index)
 return regs[index];
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
-{
-return tci_read_reg(regs, index);
-}
-#endif
-
 static void
 tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
 {
@@ -146,9 +139,7 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
 {
-uint64_t value = tci_read_reg64(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
+return tci_read_r(regs, tb_ptr);
 }
 #endif
 
@@ -407,8 +398,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
 break;
@@ -689,7 +680,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_mov_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1);
 break;
 case INDEX_op_tci_movi_i64:
@@ -713,7 +704,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
 break;
 case INDEX_op_st_i64:
-t0 = tci_read_r64(regs, &tb_ptr);
+t0 = tci_read_r(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint64_t *)(t1 + t2) = t0;
@@ -723,62 +714,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 case INDEX_op_add_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 + t2);
 break;
 case INDEX_op_sub_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 - t2);
 break;
 case INDEX_op_mul_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 * t2);
 break;
 case INDEX_op_div_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
 break;
 case INDEX_op_divu_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
 break;
 case INDEX_op_rem_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
 break;
 case INDEX_op_remu_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
 break;
 case INDEX_op_and_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_p

[PATCH v3 16/70] tcg/tci: Merge mov, not and neg operations

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 29 +
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index c979215332..225cb698e8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -404,7 +404,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
 break;
 #endif
-case INDEX_op_mov_i32:
+CASE_32_64(mov)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1);
@@ -666,26 +666,21 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, bswap32(t1));
 break;
 #endif
-#if TCG_TARGET_HAS_not_i32
-case INDEX_op_not_i32:
+#if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
+CASE_32_64(not)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, ~t1);
 break;
 #endif
-#if TCG_TARGET_HAS_neg_i32
-case INDEX_op_neg_i32:
+#if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
+CASE_32_64(neg)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, -t1);
 break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
-case INDEX_op_mov_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
-break;
 case INDEX_op_tci_movi_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_i64(&tb_ptr);
@@ -815,20 +810,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, bswap64(t1));
 break;
 #endif
-#if TCG_TARGET_HAS_not_i64
-case INDEX_op_not_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, ~t1);
-break;
-#endif
-#if TCG_TARGET_HAS_neg_i64
-case INDEX_op_neg_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, -t1);
-break;
-#endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
 
 /* QEMU specific operations. */
-- 
2.25.1




[PATCH v3 20/70] tcg/tci: Split out tci_args_rrr

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 154 --
 1 file changed, 57 insertions(+), 97 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0bc5294e8b..1736234bfd 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -191,6 +191,14 @@ static void tci_args_rr(const uint8_t **tb_ptr,
 *r1 = tci_read_r(tb_ptr);
 }
 
+static void tci_args_rrr(const uint8_t **tb_ptr,
+ TCGReg *r0, TCGReg *r1, TCGReg *r2)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_rrs(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
@@ -366,7 +374,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint8_t op_size = tb_ptr[1];
 const uint8_t *old_code_ptr = tb_ptr;
 #endif
-TCGReg r0, r1;
+TCGReg r0, r1, r2;
 tcg_target_ulong t0;
 tcg_target_ulong t1;
 tcg_target_ulong t2;
@@ -503,101 +511,71 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 /* Arithmetic operations (mixed 32/64 bit). */
 
 CASE_32_64(add)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 + t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] + regs[r2];
 break;
 CASE_32_64(sub)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 - t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] - regs[r2];
 break;
 CASE_32_64(mul)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 * t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] * regs[r2];
 break;
 CASE_32_64(and)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 & t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] & regs[r2];
 break;
 CASE_32_64(or)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 | t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] | regs[r2];
 break;
 CASE_32_64(xor)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 ^ t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] ^ regs[r2];
 break;
 
 /* Arithmetic operations (32 bit). */
 
 case INDEX_op_div_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (int32_t)regs[r1] / (int32_t)regs[r2];
 break;
 case INDEX_op_divu_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1 / (uint32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (uint32_t)regs[r1] / (uint32_t)regs[r2];
 break;
 case INDEX_op_rem_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (int32_t)regs[r1] % (int32_t)regs[r2];
 break;
 case INDEX_op_remu_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1 % (uint32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (uint32_t)regs[r1] % (uint32_t)regs[r2];
 break;
 
 /* Shift/rotate operations (32 bit). */
 
 case INDEX_op_shl_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1 << (t2 & 31));
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (uint32_t)regs[r1] << (regs[r2] & 31);
 break;
 case INDEX_op_shr_i32:
-t0 = *tb

[PATCH v3 17/70] tcg/tci: Rename tci_read_r to tci_read_rval

2021-02-07 Thread Richard Henderson
In the next patches, we want to use tci_read_r to return
the raw register number.  So rename the existing function,
which returns the register value, to tci_read_rval.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 192 +++---
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 225cb698e8..20aaaca959 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -119,7 +119,7 @@ static uint64_t tci_read_i64(const uint8_t **tb_ptr)
 
 /* Read indexed register (native size) from bytecode. */
 static tcg_target_ulong
-tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
+tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 {
 tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
 *tb_ptr += 1;
@@ -131,15 +131,15 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
 {
-uint32_t low = tci_read_r(regs, tb_ptr);
-return tci_uint64(tci_read_r(regs, tb_ptr), low);
+uint32_t low = tci_read_rval(regs, tb_ptr);
+return tci_uint64(tci_read_rval(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
 /* Read indexed register (64 bit) from bytecode. */
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
 {
-return tci_read_r(regs, tb_ptr);
+return tci_read_rval(regs, tb_ptr);
 }
 #endif
 
@@ -147,9 +147,9 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
 static target_ulong
 tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 {
-target_ulong taddr = tci_read_r(regs, tb_ptr);
+target_ulong taddr = tci_read_rval(regs, tb_ptr);
 #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-taddr += (uint64_t)tci_read_r(regs, tb_ptr) << 32;
+taddr += (uint64_t)tci_read_rval(regs, tb_ptr) << 32;
 #endif
 return taddr;
 }
@@ -382,8 +382,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 case INDEX_op_setcond_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
+t2 = tci_read_rval(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
 break;
@@ -398,15 +398,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
+t2 = tci_read_rval(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
 break;
 #endif
 CASE_32_64(mov)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1);
 break;
 case INDEX_op_tci_movi_i32:
@@ -419,51 +419,51 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 CASE_32_64(ld8u)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
 break;
 CASE_32_64(ld8s)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
 break;
 CASE_32_64(ld16u)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
 break;
 CASE_32_64(ld16s)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
 break;
 case INDEX_op_ld_i32:
 CASE_64(ld32u)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
 break;
 CASE_32_64(st8)
-t0 = tci_read_r(regs, &tb_ptr);
-t1 = tci_read_r(regs, &tb_ptr);
+t0 = tci_read_rval(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint8_t *)(t1 + t2) = t0;
 

[PATCH v3 19/70] tcg/tci: Split out tci_args_rr

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 67 +--
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index be298ae39d..0bc5294e8b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -184,6 +184,13 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   s = signed ldst offset
  */
 
+static void tci_args_rr(const uint8_t **tb_ptr,
+TCGReg *r0, TCGReg *r1)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_rrs(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
@@ -439,9 +446,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 CASE_32_64(mov)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = regs[r1];
 break;
 case INDEX_op_tci_movi_i32:
 t0 = *tb_ptr++;
@@ -652,58 +658,50 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
 CASE_32_64(ext8s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int8_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (int8_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 CASE_32_64(ext16s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int16_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (int16_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
 CASE_32_64(ext8u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint8_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (uint8_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
 CASE_32_64(ext16u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint16_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (uint16_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
 CASE_32_64(bswap16)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap16(t1));
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = bswap16(regs[r1]);
 break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
 CASE_32_64(bswap32)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap32(t1));
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = bswap32(regs[r1]);
 break;
 #endif
 #if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
 CASE_32_64(not)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, ~t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = ~regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
 CASE_32_64(neg)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, -t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = -regs[r1];
 break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
@@ -816,21 +814,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 case INDEX_op_ext32s_i64:
 case INDEX_op_ext_i32_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int32_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (int32_t)regs[r1];
 break;
 case INDEX_op_ext32u_i64:
 case INDEX_op_extu_i32_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (uint32_t)regs[r1];
 break;
 #if TCG_TARGET_HAS_bswap64_i64
 case INDEX_op_bswap64_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap64(t1));
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = bswap64(regs[r1]);
 break;
 #endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
-- 
2.25.1




[PATCH v3 21/70] tcg/tci: Split out tci_args_rrrc

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1736234bfd..86625061f1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -207,6 +207,15 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
 *i2 = tci_read_s32(tb_ptr);
 }
 
+static void tci_args_rrrc(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*c3 = tci_read_b(tb_ptr);
+}
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
 bool result = false;
@@ -430,11 +439,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tb_ptr = (uint8_t *)label;
 continue;
 case INDEX_op_setcond_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
+tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+regs[r0] = tci_compare32(regs[r1], regs[r2], condition);
 break;
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
@@ -446,11 +452,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
+tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+regs[r0] = tci_compare64(regs[r1], regs[r2], condition);
 break;
 #endif
 CASE_32_64(mov)
-- 
2.25.1




[PATCH v3 22/70] tcg/tci: Split out tci_args_l

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 86625061f1..8bc9dd27b0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -184,6 +184,11 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   s = signed ldst offset
  */
 
+static void tci_args_l(const uint8_t **tb_ptr, void **l0)
+{
+*l0 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rr(const uint8_t **tb_ptr,
 TCGReg *r0, TCGReg *r1)
 {
@@ -434,9 +439,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 break;
 case INDEX_op_br:
-label = tci_read_label(&tb_ptr);
+tci_args_l(&tb_ptr, &ptr);
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 case INDEX_op_setcond_i32:
 tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
-- 
2.25.1




[PATCH v3 29/70] tcg/tci: Split out tci_args_rrrrrr

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0301ee63a7..84d77855ee 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -258,6 +258,17 @@ static void tci_args_rc(const uint8_t **tb_ptr, TCGReg 
*r0, TCGReg *r1,
 *r4 = tci_read_r(tb_ptr);
 *c5 = tci_read_b(tb_ptr);
 }
+
+static void tci_args_rr(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*r4 = tci_read_r(tb_ptr);
+*r5 = tci_read_r(tb_ptr);
+}
 #endif
 
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
@@ -437,7 +448,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-TCGReg r3, r4;
+TCGReg r3, r4, r5;
 uint64_t T1, T2;
 #endif
 TCGMemOpIdx oi;
@@ -643,18 +654,16 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_add2_i32:
-t0 = *tb_ptr++;
-t1 = *tb_ptr++;
-tmp64 = tci_read_r64(regs, &tb_ptr);
-tmp64 += tci_read_r64(regs, &tb_ptr);
-tci_write_reg64(regs, t1, t0, tmp64);
+tci_args_rr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+T1 = tci_uint64(regs[r3], regs[r2]);
+T2 = tci_uint64(regs[r5], regs[r4]);
+tci_write_reg64(regs, r1, r0, T1 + T2);
 break;
 case INDEX_op_sub2_i32:
-t0 = *tb_ptr++;
-t1 = *tb_ptr++;
-tmp64 = tci_read_r64(regs, &tb_ptr);
-tmp64 -= tci_read_r64(regs, &tb_ptr);
-tci_write_reg64(regs, t1, t0, tmp64);
+tci_args_rr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+T1 = tci_uint64(regs[r3], regs[r2]);
+T2 = tci_uint64(regs[r5], regs[r4]);
+tci_write_reg64(regs, r1, r0, T1 - T2);
 break;
 case INDEX_op_brcond2_i32:
 tci_args_cl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
-- 
2.25.1




[PATCH v3 26/70] tcg/tci: Reuse tci_args_l for calls.

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 5cc05fa554..92b13829c3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -452,30 +452,30 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 switch (opc) {
 case INDEX_op_call:
-t0 = tci_read_i(&tb_ptr);
+tci_args_l(&tb_ptr, &ptr);
 tci_tb_ptr = (uintptr_t)tb_ptr;
 #if TCG_TARGET_REG_BITS == 32
-tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
-  tci_read_reg(regs, TCG_REG_R1),
-  tci_read_reg(regs, TCG_REG_R2),
-  tci_read_reg(regs, TCG_REG_R3),
-  tci_read_reg(regs, TCG_REG_R4),
-  tci_read_reg(regs, TCG_REG_R5),
-  tci_read_reg(regs, TCG_REG_R6),
-  tci_read_reg(regs, TCG_REG_R7),
-  tci_read_reg(regs, TCG_REG_R8),
-  tci_read_reg(regs, TCG_REG_R9),
-  tci_read_reg(regs, TCG_REG_R10),
-  tci_read_reg(regs, TCG_REG_R11));
+tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
+   tci_read_reg(regs, TCG_REG_R1),
+   tci_read_reg(regs, TCG_REG_R2),
+   tci_read_reg(regs, TCG_REG_R3),
+   tci_read_reg(regs, TCG_REG_R4),
+   tci_read_reg(regs, TCG_REG_R5),
+   tci_read_reg(regs, TCG_REG_R6),
+   tci_read_reg(regs, TCG_REG_R7),
+   tci_read_reg(regs, TCG_REG_R8),
+   tci_read_reg(regs, TCG_REG_R9),
+   tci_read_reg(regs, TCG_REG_R10),
+   tci_read_reg(regs, TCG_REG_R11));
 tci_write_reg(regs, TCG_REG_R0, tmp64);
 tci_write_reg(regs, TCG_REG_R1, tmp64 >> 32);
 #else
-tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
-  tci_read_reg(regs, TCG_REG_R1),
-  tci_read_reg(regs, TCG_REG_R2),
-  tci_read_reg(regs, TCG_REG_R3),
-  tci_read_reg(regs, TCG_REG_R4),
-  tci_read_reg(regs, TCG_REG_R5));
+tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
+   tci_read_reg(regs, TCG_REG_R1),
+   tci_read_reg(regs, TCG_REG_R2),
+   tci_read_reg(regs, TCG_REG_R3),
+   tci_read_reg(regs, TCG_REG_R4),
+   tci_read_reg(regs, TCG_REG_R5));
 tci_write_reg(regs, TCG_REG_R0, tmp64);
 #endif
 break;
-- 
2.25.1




[PATCH v3 23/70] tcg/tci: Split out tci_args_rrrrrc

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 8bc9dd27b0..692b95b5c2 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -221,6 +221,19 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 *c3 = tci_read_b(tb_ptr);
 }
 
+#if TCG_TARGET_REG_BITS == 32
+static void tci_args_rc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*r4 = tci_read_r(tb_ptr);
+*c5 = tci_read_b(tb_ptr);
+}
+#endif
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
 bool result = false;
@@ -400,7 +413,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-uint64_t v64;
+TCGReg r3, r4;
+uint64_t v64, T1, T2;
 #endif
 TCGMemOpIdx oi;
 int32_t ofs;
@@ -449,11 +463,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
-t0 = *tb_ptr++;
-tmp64 = tci_read_r64(regs, &tb_ptr);
-v64 = tci_read_r64(regs, &tb_ptr);
-condition = *tb_ptr++;
-tci_write_reg(regs, t0, tci_compare64(tmp64, v64, condition));
+tci_args_rc(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &condition);
+T1 = tci_uint64(regs[r2], regs[r1]);
+T2 = tci_uint64(regs[r4], regs[r3]);
+regs[r0] = tci_compare64(T1, T2, condition);
 break;
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
-- 
2.25.1




[PATCH v3 24/70] tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 52 
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 692b95b5c2..1e2f78a9f9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -212,6 +212,15 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
 *i2 = tci_read_s32(tb_ptr);
 }
 
+static void tci_args_rrcl(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*c2 = tci_read_b(tb_ptr);
+*l3 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rrrc(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
@@ -222,6 +231,17 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tci_args_cl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*c4 = tci_read_b(tb_ptr);
+*l5 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
 {
@@ -405,7 +425,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tcg_target_ulong t0;
 tcg_target_ulong t1;
 tcg_target_ulong t2;
-tcg_target_ulong label;
 TCGCond condition;
 target_ulong taddr;
 uint8_t tmp8;
@@ -414,7 +433,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
 TCGReg r3, r4;
-uint64_t v64, T1, T2;
+uint64_t T1, T2;
 #endif
 TCGMemOpIdx oi;
 int32_t ofs;
@@ -611,13 +630,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 case INDEX_op_brcond_i32:
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-label = tci_read_label(&tb_ptr);
-if (tci_compare32(t0, t1, condition)) {
+tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
+if (tci_compare32(regs[r0], regs[r1], condition)) {
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 }
 break;
@@ -637,13 +653,12 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg64(regs, t1, t0, tmp64);
 break;
 case INDEX_op_brcond2_i32:
-tmp64 = tci_read_r64(regs, &tb_ptr);
-v64 = tci_read_r64(regs, &tb_ptr);
-condition = *tb_ptr++;
-label = tci_read_label(&tb_ptr);
-if (tci_compare64(tmp64, v64, condition)) {
+tci_args_cl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
+T1 = tci_uint64(regs[r1], regs[r0]);
+T2 = tci_uint64(regs[r3], regs[r2]);
+if (tci_compare64(T1, T2, condition)) {
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 }
 break;
@@ -783,13 +798,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 case INDEX_op_brcond_i64:
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-label = tci_read_label(&tb_ptr);
-if (tci_compare64(t0, t1, condition)) {
+tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
+if (tci_compare64(regs[r0], regs[r1], condition)) {
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 }
 break;
-- 
2.25.1




[PATCH v3 32/70] tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits

2021-02-07 Thread Richard Henderson
We are currently using the "natural" size routine, which
uses 64-bits on a 64-bit host.  The TCGMemOpIdx operand
has 11 bits, so we can safely reduce to 32-bits.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c| 8 
 tcg/tci/tcg-target.c.inc | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e10ccfc344..ddc138359b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -855,7 +855,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_qemu_ld_i32:
 t0 = *tb_ptr++;
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
 case MO_UB:
 tmp32 = qemu_ld_ub;
@@ -892,7 +892,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t1 = *tb_ptr++;
 }
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
 case MO_UB:
 tmp64 = qemu_ld_ub;
@@ -941,7 +941,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_qemu_st_i32:
 t0 = tci_read_rval(regs, &tb_ptr);
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
 case MO_UB:
 qemu_st_b(t0);
@@ -965,7 +965,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_qemu_st_i64:
 tmp64 = tci_read_r64(regs, &tb_ptr);
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
 case MO_UB:
 qemu_st_b(tmp64);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 640407b4a8..6c187a25cc 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -550,7 +550,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
 tcg_out_r(s, *args++);
 }
-tcg_out_i(s, *args++);
+tcg_out32(s, *args++);
 break;
 
 case INDEX_op_qemu_ld_i64:
@@ -563,7 +563,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
 tcg_out_r(s, *args++);
 }
-tcg_out_i(s, *args++);
+tcg_out32(s, *args++);
 break;
 
 case INDEX_op_mb:
-- 
2.25.1




[PATCH v3 30/70] tcg/tci: Split out tci_args_rrrr

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 84d77855ee..cb24295cd9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -237,6 +237,15 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tci_args_(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_cl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
 {
@@ -676,11 +685,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 }
 break;
 case INDEX_op_mulu2_i32:
-t0 = *tb_ptr++;
-t1 = *tb_ptr++;
-t2 = tci_read_rval(regs, &tb_ptr);
-tmp64 = (uint32_t)tci_read_rval(regs, &tb_ptr);
-tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
+tci_args_(&tb_ptr, &r0, &r1, &r2, &r3);
+tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
 break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-- 
2.25.1




[PATCH v3 27/70] tcg/tci: Reuse tci_args_l for exit_tb

2021-02-07 Thread Richard Henderson
Do not emit a uint64_t, but a tcg_target_ulong, aka uintptr_t.
This reduces the size of the constant on 32-bit hosts.
The assert for label != NULL has to be removed because that
is a valid value for exit_tb.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c| 13 -
 tcg/tci/tcg-target.c.inc |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 92b13829c3..57b6defe09 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -160,9 +160,7 @@ tci_read_ulong(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
-tcg_target_ulong label = tci_read_i(tb_ptr);
-tci_assert(label != 0);
-return label;
+return tci_read_i(tb_ptr);
 }
 
 /*
@@ -417,7 +415,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tcg_target_ulong regs[TCG_TARGET_NB_REGS];
 long tcg_temps[CPU_TEMP_BUF_NLONGS];
 uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
-uintptr_t ret = 0;
 
 regs[TCG_AREG0] = (tcg_target_ulong)env;
 regs[TCG_REG_CALL_STACK] = sp_value;
@@ -832,9 +829,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 /* QEMU specific operations. */
 
 case INDEX_op_exit_tb:
-ret = *(uint64_t *)tb_ptr;
-goto exit;
-break;
+tci_args_l(&tb_ptr, &ptr);
+return (uintptr_t)ptr;
+
 case INDEX_op_goto_tb:
 /* Jump address is aligned */
 tb_ptr = QEMU_ALIGN_PTR_UP(tb_ptr, 4);
@@ -992,6 +989,4 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 }
 tci_assert(tb_ptr == old_code_ptr + op_size);
 }
-exit:
-return ret;
 }
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index c79f9c32d8..ff8040510f 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -401,7 +401,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 
 switch (opc) {
 case INDEX_op_exit_tb:
-tcg_out64(s, args[0]);
+tcg_out_i(s, args[0]);
 break;
 
 case INDEX_op_goto_tb:
-- 
2.25.1




[PATCH v3 28/70] tcg/tci: Reuse tci_args_l for goto_tb

2021-02-07 Thread Richard Henderson
Convert to indirect jumps, as it's less complicated.
Then we just have a pointer to the tb address at which
the chain is stored, from which we read.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h | 11 +++
 tcg/tci.c|  8 +++-
 tcg/tci/tcg-target.c.inc | 13 +++--
 3 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 9c0021a26f..9285c930a2 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -87,7 +87,7 @@
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_goto_ptr 0
-#define TCG_TARGET_HAS_direct_jump  1
+#define TCG_TARGET_HAS_direct_jump  0
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #if TCG_TARGET_REG_BITS == 64
@@ -174,12 +174,7 @@ void tci_disas(uint8_t opc);
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
-static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
-uintptr_t jmp_rw, uintptr_t addr)
-{
-/* patch the branch destination */
-qatomic_set((int32_t *)jmp_rw, addr - (jmp_rx + 4));
-/* no need to flush icache explicitly */
-}
+/* not defined -- call should be eliminated at compile time */
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #endif /* TCG_TARGET_H */
diff --git a/tcg/tci.c b/tcg/tci.c
index 57b6defe09..0301ee63a7 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -833,13 +833,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 return (uintptr_t)ptr;
 
 case INDEX_op_goto_tb:
-/* Jump address is aligned */
-tb_ptr = QEMU_ALIGN_PTR_UP(tb_ptr, 4);
-t0 = qatomic_read((int32_t *)tb_ptr);
-tb_ptr += sizeof(int32_t);
+tci_args_l(&tb_ptr, &ptr);
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr += (int32_t)t0;
+tb_ptr = *(void **)ptr;
 continue;
+
 case INDEX_op_qemu_ld_i32:
 t0 = *tb_ptr++;
 taddr = tci_read_ulong(regs, &tb_ptr);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index ff8040510f..2c64b4f617 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -405,16 +405,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 break;
 
 case INDEX_op_goto_tb:
-if (s->tb_jmp_insn_offset) {
-/* Direct jump method. */
-/* Align for atomic patching and thread safety */
-s->code_ptr = QEMU_ALIGN_PTR_UP(s->code_ptr, 4);
-s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
-tcg_out32(s, 0);
-} else {
-/* Indirect jump method. */
-TODO();
-}
+tcg_debug_assert(s->tb_jmp_insn_offset == 0);
+/* indirect jump method. */
+tcg_out_i(s, (uintptr_t)(s->tb_jmp_target_addr + args[0]));
 set_jmp_reset_offset(s, args[0]);
 break;
 
-- 
2.25.1




[PATCH v3 34/70] tcg/tci: Hoist op_size checking into tci_args_*

2021-02-07 Thread Richard Henderson
This performs the size check while reading the arguments,
which means that we don't have to arrange for it to be
done after the operation.  Which tidies all of the branches.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 87 ++-
 1 file changed, 73 insertions(+), 14 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index a1846825ea..3dc89ed829 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -24,7 +24,7 @@
 #if defined(CONFIG_DEBUG_TCG)
 # define tci_assert(cond) assert(cond)
 #else
-# define tci_assert(cond) ((void)0)
+# define tci_assert(cond) ((void)(cond))
 #endif
 
 #include "qemu-common.h"
@@ -135,146 +135,217 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   s = signed ldst offset
  */
 
+static void check_size(const uint8_t *start, const uint8_t **tb_ptr)
+{
+const uint8_t *old_code_ptr = start - 2;
+uint8_t op_size = old_code_ptr[1];
+tci_assert(*tb_ptr == old_code_ptr + op_size);
+}
+
 static void tci_args_l(const uint8_t **tb_ptr, void **l0)
 {
+const uint8_t *start = *tb_ptr;
+
 *l0 = (void *)tci_read_label(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rr(const uint8_t **tb_ptr,
 TCGReg *r0, TCGReg *r1)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_ri(const uint8_t **tb_ptr,
 TCGReg *r0, tcg_target_ulong *i1)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *i1 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 #if TCG_TARGET_REG_BITS == 64
 static void tci_args_rI(const uint8_t **tb_ptr,
 TCGReg *r0, tcg_target_ulong *i1)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *i1 = tci_read_i(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 #endif
 
 static void tci_args_rrm(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGMemOpIdx *m2)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *m2 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrr(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrs(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *i2 = tci_read_s32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrcl(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *c2 = tci_read_b(tb_ptr);
 *l3 = (void *)tci_read_label(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrc(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *c3 = tci_read_b(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrm(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGMemOpIdx *m3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *m3 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
TCGReg *r2, uint8_t *i3, uint8_t *i4)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *i3 = tci_read_b(tb_ptr);
 *i4 = tci_read_b(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_m(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
TCGReg *r2, TCGReg *r3, TCGMemOpIdx *m4)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *r3 = tci_read_r(tb_ptr);
 *m4 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *r3 = tci_read_r(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_cl(const ui

[PATCH v3 40/70] tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order

2021-02-07 Thread Richard Henderson
As the only call-clobbered regs for TCI, these should
receive the least priority.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 4dae09deda..53edc50a3b 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -170,8 +170,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 }
 
 static const int tcg_target_reg_alloc_order[] = {
-TCG_REG_R0,
-TCG_REG_R1,
 TCG_REG_R2,
 TCG_REG_R3,
 TCG_REG_R4,
@@ -186,6 +184,8 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_R13,
 TCG_REG_R14,
 TCG_REG_R15,
+TCG_REG_R1,
+TCG_REG_R0,
 };
 
 #if MAX_OPC_PARAM_IARGS != 6
-- 
2.25.1




[PATCH v3 25/70] tcg/tci: Split out tci_args_ri and tci_args_rI

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1e2f78a9f9..5cc05fa554 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -121,16 +121,6 @@ static int32_t tci_read_s32(const uint8_t **tb_ptr)
 return value;
 }
 
-#if TCG_TARGET_REG_BITS == 64
-/* Read constant (64 bit) from bytecode. */
-static uint64_t tci_read_i64(const uint8_t **tb_ptr)
-{
-uint64_t value = *(const uint64_t *)(*tb_ptr);
-*tb_ptr += sizeof(value);
-return value;
-}
-#endif
-
 /* Read indexed register (native size) from bytecode. */
 static tcg_target_ulong
 tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
@@ -180,6 +170,8 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   tci_args_
  * where arguments is a sequence of
  *
+ *   i = immediate (uint32_t)
+ *   I = immediate (tcg_target_ulong)
  *   r = register
  *   s = signed ldst offset
  */
@@ -196,6 +188,22 @@ static void tci_args_rr(const uint8_t **tb_ptr,
 *r1 = tci_read_r(tb_ptr);
 }
 
+static void tci_args_ri(const uint8_t **tb_ptr,
+TCGReg *r0, tcg_target_ulong *i1)
+{
+*r0 = tci_read_r(tb_ptr);
+*i1 = tci_read_i32(tb_ptr);
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static void tci_args_rI(const uint8_t **tb_ptr,
+TCGReg *r0, tcg_target_ulong *i1)
+{
+*r0 = tci_read_r(tb_ptr);
+*i1 = tci_read_i(tb_ptr);
+}
+#endif
+
 static void tci_args_rrr(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
@@ -498,9 +506,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 regs[r0] = regs[r1];
 break;
 case INDEX_op_tci_movi_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_i32(&tb_ptr);
-tci_write_reg(regs, t0, t1);
+tci_args_ri(&tb_ptr, &r0, &t1);
+regs[r0] = t1;
 break;
 
 /* Load/store operations (32 bit). */
@@ -720,9 +727,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_tci_movi_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_i64(&tb_ptr);
-tci_write_reg(regs, t0, t1);
+tci_args_rI(&tb_ptr, &r0, &t1);
+regs[r0] = t1;
 break;
 
 /* Load/store operations (64 bit). */
-- 
2.25.1




[PATCH v3 33/70] tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm}

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 147 ++
 1 file changed, 81 insertions(+), 66 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index ddc138359b..a1846825ea 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -66,22 +66,18 @@ tci_write_reg(tcg_target_ulong *regs, TCGReg index, 
tcg_target_ulong value)
 regs[index] = value;
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
 uint32_t low_index, uint64_t value)
 {
 tci_write_reg(regs, low_index, value);
 tci_write_reg(regs, high_index, value >> 32);
 }
-#endif
 
-#if TCG_TARGET_REG_BITS == 32
 /* Create a 64 bit value from two 32 bit values. */
 static uint64_t tci_uint64(uint32_t high, uint32_t low)
 {
 return ((uint64_t)high << 32) + low;
 }
-#endif
 
 /* Read constant byte from bytecode. */
 static uint8_t tci_read_b(const uint8_t **tb_ptr)
@@ -121,43 +117,6 @@ static int32_t tci_read_s32(const uint8_t **tb_ptr)
 return value;
 }
 
-/* Read indexed register (native size) from bytecode. */
-static tcg_target_ulong
-tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-
-#if TCG_TARGET_REG_BITS == 32
-/* Read two indexed registers (2 * 32 bit) from bytecode. */
-static uint64_t tci_read_r64(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-uint32_t low = tci_read_rval(regs, tb_ptr);
-return tci_uint64(tci_read_rval(regs, tb_ptr), low);
-}
-#elif TCG_TARGET_REG_BITS == 64
-/* Read indexed register (64 bit) from bytecode. */
-static uint64_t tci_read_r64(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-return tci_read_rval(regs, tb_ptr);
-}
-#endif
-
-/* Read indexed register(s) with target address from bytecode. */
-static target_ulong
-tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-target_ulong taddr = tci_read_rval(regs, tb_ptr);
-#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-taddr += (uint64_t)tci_read_rval(regs, tb_ptr) << 32;
-#endif
-return taddr;
-}
-
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
 return tci_read_i(tb_ptr);
@@ -171,6 +130,7 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   b = immediate (bit position)
  *   i = immediate (uint32_t)
  *   I = immediate (tcg_target_ulong)
+ *   m = immediate (TCGMemOpIdx)
  *   r = register
  *   s = signed ldst offset
  */
@@ -203,6 +163,14 @@ static void tci_args_rI(const uint8_t **tb_ptr,
 }
 #endif
 
+static void tci_args_rrm(const uint8_t **tb_ptr,
+ TCGReg *r0, TCGReg *r1, TCGMemOpIdx *m2)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*m2 = tci_read_i32(tb_ptr);
+}
+
 static void tci_args_rrr(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
@@ -237,6 +205,15 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 *c3 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_rrrm(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGMemOpIdx *m3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*m3 = tci_read_i32(tb_ptr);
+}
+
 static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
TCGReg *r2, uint8_t *i3, uint8_t *i4)
 {
@@ -247,6 +224,16 @@ static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg 
*r0, TCGReg *r1,
 *i4 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_m(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+   TCGReg *r2, TCGReg *r3, TCGMemOpIdx *m4)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*m4 = tci_read_i32(tb_ptr);
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
@@ -457,8 +444,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint8_t op_size = tb_ptr[1];
 const uint8_t *old_code_ptr = tb_ptr;
 #endif
-TCGReg r0, r1, r2;
-tcg_target_ulong t0;
+TCGReg r0, r1, r2, r3;
 tcg_target_ulong t1;
 TCGCond condition;
 target_ulong taddr;
@@ -466,7 +452,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-TCGReg r3, r4, r5;
+TCGReg r4, r5;
 uint64_t T1, T2;
 #endif
 TCGMemOpIdx oi;
@@ -853,9 +839,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 
 case INDEX_op_qemu_ld_i32:
-t0 = *tb_ptr++;
-taddr = tci_read_ulong(r

[PATCH v3 35/70] tcg/tci: Remove tci_disas

2021-02-07 Thread Richard Henderson
This function is unused.  It's not even the disassembler,
which is print_insn_tci, located in disas/tci.c.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  2 --
 tcg/tci/tcg-target.c.inc | 10 --
 2 files changed, 12 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 9285c930a2..52af6d8bc5 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -163,8 +163,6 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET0
 #define TCG_TARGET_STACK_ALIGN  16
 
-void tci_disas(uint8_t opc);
-
 #define HAVE_TCG_QEMU_TB_EXEC
 
 /* We could notice __i386__ or __s390x__ and reduce the barriers depending
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 6c187a25cc..7fb3b04eaf 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -253,16 +253,6 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 return true;
 }
 
-#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
-/* Show current bytecode. Used by tcg interpreter. */
-void tci_disas(uint8_t opc)
-{
-const TCGOpDef *def = &tcg_op_defs[opc];
-fprintf(stderr, "TCG %s %u, %u, %u\n",
-def->name, def->nb_oargs, def->nb_iargs, def->nb_cargs);
-}
-#endif
-
 /* Write value (native size). */
 static void tcg_out_i(TCGContext *s, tcg_target_ulong v)
 {
-- 
2.25.1




[PATCH v3 42/70] tcg/tci: Split out tcg_out_op_rrs

2021-02-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 84 +++-
 1 file changed, 39 insertions(+), 45 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 050d514853..707f801099 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -283,32 +283,38 @@ static void stack_bounds_check(TCGReg base, target_long 
offset)
 }
 }
 
-static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
-   intptr_t arg2)
+static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
+   TCGReg r0, TCGReg r1, intptr_t i2)
 {
 uint8_t *old_code_ptr = s->code_ptr;
 
-stack_bounds_check(arg1, arg2);
-if (type == TCG_TYPE_I32) {
-tcg_out_op_t(s, INDEX_op_ld_i32);
-tcg_out_r(s, ret);
-tcg_out_r(s, arg1);
-tcg_out32(s, arg2);
-} else {
-tcg_debug_assert(type == TCG_TYPE_I64);
-#if TCG_TARGET_REG_BITS == 64
-tcg_out_op_t(s, INDEX_op_ld_i64);
-tcg_out_r(s, ret);
-tcg_out_r(s, arg1);
-tcg_debug_assert(arg2 == (int32_t)arg2);
-tcg_out32(s, arg2);
-#else
-TODO();
-#endif
-}
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_debug_assert(i2 == (int32_t)i2);
+tcg_out32(s, i2);
+
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+   intptr_t offset)
+{
+stack_bounds_check(base, offset);
+switch (type) {
+case TCG_TYPE_I32:
+tcg_out_op_rrs(s, INDEX_op_ld_i32, val, base, offset);
+break;
+#if TCG_TARGET_REG_BITS == 64
+case TCG_TYPE_I64:
+tcg_out_op_rrs(s, INDEX_op_ld_i64, val, base, offset);
+break;
+#endif
+default:
+g_assert_not_reached();
+}
+}
+
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
 uint8_t *old_code_ptr = s->code_ptr;
@@ -444,12 +450,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 CASE_64(st32)
 CASE_64(st)
 stack_bounds_check(args[1], args[2]);
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_debug_assert(args[2] == (int32_t)args[2]);
-tcg_out32(s, args[2]);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rrs(s, opc, args[0], args[1], args[2]);
 break;
 
 CASE_32_64(add)
@@ -597,29 +598,22 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 }
 }
 
-static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
-   intptr_t arg2)
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+   intptr_t offset)
 {
-uint8_t *old_code_ptr = s->code_ptr;
-
-stack_bounds_check(arg1, arg2);
-if (type == TCG_TYPE_I32) {
-tcg_out_op_t(s, INDEX_op_st_i32);
-tcg_out_r(s, arg);
-tcg_out_r(s, arg1);
-tcg_out32(s, arg2);
-} else {
-tcg_debug_assert(type == TCG_TYPE_I64);
+stack_bounds_check(base, offset);
+switch (type) {
+case TCG_TYPE_I32:
+tcg_out_op_rrs(s, INDEX_op_st_i32, val, base, offset);
+break;
 #if TCG_TARGET_REG_BITS == 64
-tcg_out_op_t(s, INDEX_op_st_i64);
-tcg_out_r(s, arg);
-tcg_out_r(s, arg1);
-tcg_out32(s, arg2);
-#else
-TODO();
+case TCG_TYPE_I64:
+tcg_out_op_rrs(s, INDEX_op_st_i64, val, base, offset);
+break;
 #endif
+default:
+g_assert_not_reached();
 }
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
 static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
-- 
2.25.1




[PATCH v3 31/70] tcg/tci: Clean up deposit operations

2021-02-07 Thread Richard Henderson
Use the correct set of asserts during code generation.
We do not require the first input to overlap the output;
the existing interpreter already supported that.

Split out tci_args_rrrbb in the translator.
Use the deposit32/64 functions rather than inline expansion.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target-con-set.h |  1 -
 tcg/tci.c| 33 -
 tcg/tci/tcg-target.c.inc | 24 ++--
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
index f51b7bcb13..316730f32c 100644
--- a/tcg/tci/tcg-target-con-set.h
+++ b/tcg/tci/tcg-target-con-set.h
@@ -13,7 +13,6 @@ C_O0_I2(r, r)
 C_O0_I3(r, r, r)
 C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
-C_O1_I2(r, 0, r)
 C_O1_I2(r, r, r)
 C_O1_I4(r, r, r, r, r)
 C_O2_I1(r, r, r)
diff --git a/tcg/tci.c b/tcg/tci.c
index cb24295cd9..e10ccfc344 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -168,6 +168,7 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   tci_args_
  * where arguments is a sequence of
  *
+ *   b = immediate (bit position)
  *   i = immediate (uint32_t)
  *   I = immediate (tcg_target_ulong)
  *   r = register
@@ -236,6 +237,16 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 *c3 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+   TCGReg *r2, uint8_t *i3, uint8_t *i4)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*i3 = tci_read_b(tb_ptr);
+*i4 = tci_read_b(tb_ptr);
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
@@ -449,11 +460,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 TCGReg r0, r1, r2;
 tcg_target_ulong t0;
 tcg_target_ulong t1;
-tcg_target_ulong t2;
 TCGCond condition;
 target_ulong taddr;
-uint8_t tmp8;
-uint16_t tmp16;
+uint8_t pos, len;
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
@@ -644,13 +653,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 #if TCG_TARGET_HAS_deposit_i32
 case INDEX_op_deposit_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tmp16 = *tb_ptr++;
-tmp8 = *tb_ptr++;
-tmp32 = (((1 << tmp8) - 1) << tmp16);
-tci_write_reg(regs, t0, (t1 & ~tmp32) | ((t2 << tmp16) & tmp32));
+tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+regs[r0] = deposit32(regs[r1], pos, len, regs[r2]);
 break;
 #endif
 case INDEX_op_brcond_i32:
@@ -806,13 +810,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 #if TCG_TARGET_HAS_deposit_i64
 case INDEX_op_deposit_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tmp16 = *tb_ptr++;
-tmp8 = *tb_ptr++;
-tmp64 = (((1ULL << tmp8) - 1) << tmp16);
-tci_write_reg(regs, t0, (t1 & ~tmp64) | ((t2 << tmp16) & tmp64));
+tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+regs[r0] = deposit64(regs[r1], pos, len, regs[r2]);
 break;
 #endif
 case INDEX_op_brcond_i64:
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 2c64b4f617..640407b4a8 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -126,11 +126,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_rotr_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
-return C_O1_I2(r, r, r);
-
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
-return C_O1_I2(r, 0, r);
+return C_O1_I2(r, r, r);
 
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
@@ -480,13 +478,19 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 break;
 
 CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_debug_assert(args[3] <= UINT8_MAX);
-tcg_out8(s, args[3]);
-tcg_debug_assert(args[4] <= UINT8_MAX);
-tcg_out8(s, args[4]);
+{
+TCGArg pos = args[3], len = args[4];
+TCGArg max = opc == INDEX_op_deposit_i32 ? 32 : 64;
+
+tcg_debug_assert(pos < max);
+tcg_debug_assert(pos + len <= max);
+
+tcg_out_r(s, args[0]);
+tcg_out_r(s, args[1]);
+tcg_out_r(s, args[2]);
+tcg_out8(s, pos);
+tcg_out8(s, len);
+}
 break;
 
 

[PATCH v3 36/70] tcg/tci: Implement the disassembler properly

2021-02-07 Thread Richard Henderson
Actually print arguments as opposed to simply the opcodes
and, uselessly, the argument counts.  Reuse all of the helpers
developed as part of the interpreter.

Signed-off-by: Richard Henderson 
---
 meson.build   |   2 +-
 include/tcg/tcg-opc.h |   2 -
 disas/tci.c   |  61 -
 tcg/tci.c | 283 ++
 4 files changed, 284 insertions(+), 64 deletions(-)
 delete mode 100644 disas/tci.c

diff --git a/meson.build b/meson.build
index 2d8b433ff0..475d8a94ea 100644
--- a/meson.build
+++ b/meson.build
@@ -1901,7 +1901,7 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tcg/tcg-op.c',
   'tcg/tcg.c',
 ))
-specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('disas/tci.c', 
'tcg/tci.c'))
+specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
 
 subdir('backends')
 subdir('disas')
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 900984c005..bbb0884af8 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -278,10 +278,8 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 #ifdef TCG_TARGET_INTERPRETER
 /* These opcodes are only for use between the tci generator and interpreter. */
 DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
-#if TCG_TARGET_REG_BITS == 64
 DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 #endif
-#endif
 
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
diff --git a/disas/tci.c b/disas/tci.c
deleted file mode 100644
index f1d6c6b469..00
--- a/disas/tci.c
+++ /dev/null
@@ -1,61 +0,0 @@
-/*
- * Tiny Code Interpreter for QEMU - disassembler
- *
- * Copyright (c) 2011 Stefan Weil
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
- */
-
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "disas/dis-asm.h"
-#include "tcg/tcg.h"
-
-/* Disassemble TCI bytecode. */
-int print_insn_tci(bfd_vma addr, disassemble_info *info)
-{
-int length;
-uint8_t byte;
-int status;
-TCGOpcode op;
-
-status = info->read_memory_func(addr, &byte, 1, info);
-if (status != 0) {
-info->memory_error_func(status, addr, info);
-return -1;
-}
-op = byte;
-
-addr++;
-status = info->read_memory_func(addr, &byte, 1, info);
-if (status != 0) {
-info->memory_error_func(status, addr, info);
-return -1;
-}
-length = byte;
-
-if (op >= tcg_op_defs_max) {
-info->fprintf_func(info->stream, "illegal opcode %d", op);
-} else {
-const TCGOpDef *def = &tcg_op_defs[op];
-int nb_oargs = def->nb_oargs;
-int nb_iargs = def->nb_iargs;
-int nb_cargs = def->nb_cargs;
-/* TODO: Improve disassembler output. */
-info->fprintf_func(info->stream, "%s\to=%d i=%d c=%d",
-   def->name, nb_oargs, nb_iargs, nb_cargs);
-}
-
-return length;
-}
diff --git a/tcg/tci.c b/tcg/tci.c
index 3dc89ed829..6843e837ae 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -1076,3 +1076,286 @@ uintptr_t QEMU_DISABLE_CFI 
tcg_qemu_tb_exec(CPUArchState *env,
 }
 }
 }
+
+/*
+ * Disassembler that matches the interpreter
+ */
+
+static const char *str_r(TCGReg r)
+{
+static const char regs[TCG_TARGET_NB_REGS][4] = {
+"r0", "r1", "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+"r8", "r9", "r10", "r11", "r12", "r13", "env", "sp"
+};
+
+QEMU_BUILD_BUG_ON(TCG_AREG0 != TCG_REG_R14);
+QEMU_BUILD_BUG_ON(TCG_REG_CALL_STACK != TCG_REG_R15);
+
+assert((unsigned)r < TCG_TARGET_NB_REGS);
+return regs[r];
+}
+
+static const char *str_c(TCGCond c)
+{
+static const char cond[16][8] = {
+[TCG_COND_NEVER] = "never",
+[TCG_COND_ALWAYS] = "always",
+[TCG_COND_EQ] = "eq",
+[TCG_COND_NE] = "ne",
+[TCG_COND_LT] = "lt",
+[TCG_COND_GE] = "ge",
+[TCG_COND_LE] = "le",
+[TCG_COND_GT] = "gt",
+[TCG_COND_LTU] = "ltu",
+[TCG_COND_GEU] = "geu",
+[TCG_COND_LEU] = "leu",
+[TCG_COND_GTU] = "gtu",
+};
+
+assert((unsigned)c < ARRAY_SIZE(cond));
+assert(cond[c][0] != 0);
+return cond[c];
+}
+
+/* Disassemble TCI bytecode. */
+int print_insn_tci(bfd_vma addr, disassemble_info *info)
+{
+uint8_t buf[256];
+int length, status;
+const TCGOpDef *def;
+const char *op_name;
+TCGOpcode op;
+TCGReg r0, r1, r2, r3;
+#if TCG

[PATCH v3 39/70] tcg/tci: Improve tcg_target_call_clobber_regs

2021-02-07 Thread Richard Henderson
The current setting is much too pessimistic.  Indicating only
the one or two registers that are actually assigned after a
call should avoid unnecessary movement between the register
array and the stack array.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 8d75482546..4dae09deda 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -623,8 +623,14 @@ static void tcg_target_init(TCGContext *s)
 tcg_target_available_regs[TCG_TYPE_I32] = BIT(TCG_TARGET_NB_REGS) - 1;
 /* Registers available for 64 bit operations. */
 tcg_target_available_regs[TCG_TYPE_I64] = BIT(TCG_TARGET_NB_REGS) - 1;
-/* TODO: Which registers should be set here? */
-tcg_target_call_clobber_regs = BIT(TCG_TARGET_NB_REGS) - 1;
+/*
+ * The interpreter "registers" are in the local stack frame and
+ * cannot be clobbered by the called helper functions.  However,
+ * the interpreter assumes a 64-bit return value and assigns to
+ * the return value registers.
+ */
+tcg_target_call_clobber_regs =
+MAKE_64BIT_MASK(TCG_REG_R0, 64 / TCG_TARGET_REG_BITS);
 
 s->reserved_regs = 0;
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-- 
2.25.1




  1   2   3   >