On Fri, 2020-10-30 at 05:01 -0400, Michael S. Tsirkin wrote: > On Wed, Sep 30, 2020 at 10:32:42AM +0530, P J P wrote: > > > > [+Paolo, +Fam Zheng - for scsi] > > > > +-- On Mon, 28 Sep 2020, P J P wrote --+ > > > +-- On Wed, 16 Sep 2020, Peter Maydell wrote --+ > > > > On Wed, 16 Sep 2020 at 07:28, P J P <ppan...@redhat.com> wrote: > > > > > -> > > > > > https://ruhr-uni-bochum.sciebo.de/s/NNWP2GfwzYKeKwE?path=%2Flsi_nullptr1 > > > > > ==1183858==Hint: address points to the zero page. > > > > > #0 pci_change_irq_level hw/pci/pci.c:259 > > > > > #1 pci_irq_handler hw/pci/pci.c:1445 > > > > > #2 pci_set_irq hw/pci/pci.c:1463 > > > > > #3 lsi_set_irq hw/scsi/lsi53c895a.c:488 > > > > > #4 lsi_update_irq hw/scsi/lsi53c895a.c:523 > > > > > #5 lsi_script_scsi_interrupt hw/scsi/lsi53c895a.c:554 > > > > > #6 lsi_execute_script hw/scsi/lsi53c895a.c:1149 > > > > > #7 lsi_reg_writeb hw/scsi/lsi53c895a.c:1984 > > > > > #8 lsi_io_write hw/scsi/lsi53c895a.c:2146 > > > > > > ... > > > > Generally we don't bother to assert() that pointers that > > > > shouldn't be NULL > > > > really are NULL immediately before dereferencing them, because > > > > the > > > > dereference provides an equally easy-to-debug crash to the > > > > assert, and so > > > > the assert doesn't provide anything extra. assert()ing that a > > > > pointer is > > > > non-NULL is more useful if it is done in a place that > > > > identifies the problem > > > > at an earlier and easier-to-debug point in execution rather > > > > than at a later > > > > point which is distantly removed from the place where the bogus > > > > pointer was > > > > introduced. > > > > > > * The NULL dereference above occurs because the 'pci_dev->qdev- > > > >parent_bus' > > > address gets overwritten (with 0x0) during scsi 'Memory Move' > > > operation in > > > > > > ../hw/scsi/lsi53c895a.c > > > #define LSI_BUF_SIZE 4096 > > > > > > lsi_mmio_write > > > lsi_reg_writeb > > > lsi_execute_script > > > static void lsi_memcpy(LSIState *s, ... int count=12MB) > > > { > > > int n; > > > uint8_t buf[LSI_BUF_SIZE]; > > > > > > while (count) { > > > n = (count > LSI_BUF_SIZE) ? LSI_BUF_SIZE : count; > > > lsi_mem_read(s, src, buf, n); <== read from DMA > > > memory > > > lsi_mem_write(s, dest, buf, n); <== write to I/O > > > memory > > > src += n; > > > dest += n; > > > count -= n; > > > } > > > } > > > -> > > > https://www.manualslib.com/manual/1407578/Lsi-Lsi53c895a.html?page=254#manual > > > > > > * Above loop moves data between DMA memory to i/o address space. > > > > > > * Going through the manual above, it seems 'Memory Move' can move > > > upto 16MB of > > > data between memory spaces. > > > > > > * I tried to see a suitable fix, but couldn't get one. > > > > > > - Limiting 'count' value does not seem right, as allowed value > > > is upto 16MB. > > > > > > - Manual above talks about moving data via 'dma_buf'. But it > > > doesn't seem to > > > be used here. > > > > > > * During above loop, 'dest' address moves past its 'MemoryRegion > > > mr' and > > > overwrites the adjacent 'mr' memory area, overwritting > > > 'parent_bus' value.
I agree with Igor, I don't understand how pci_dma_rw writing into the next mr can cause the NULL pointer. parent_bus is in the QEMU heap, but mr is backed by different ram areas, protected with boundary check. Is there a backtrace at the corruption time? Fam