On 02/16/2014 03:29 PM, Hoyer, David wrote: > We are using Qemu-1.7.0 with Xen-4.3.0 and Debian jessie. We are > noticing that when we transfer large files from our network to the > guestOS via the e1000 virtual network device that we experience memory > corruption on the guestOS. We have debugged this problem and have > determined where it appears that the corruption is happening and have > created a patch file with a fix (at least the corruption is no longer > happening on our guestOS anymore). Note that our test file is a > large file consisting of the value 0x61 repeated over the entire file. >
> To troubleshoot this issue, we enabled tracing in qemu and used the > xen_map_cache and xen_map_cache_return trace events. We also added some > of our own debug statements in e1000.c before and after the function > call to DMA the network packet to the guestOS descriptor address. Below > is a commented summary of the trace output: > > /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is > corrupted > xen_map_cache want 0xe00000 > xen_map_cache_return 0x7f15c313f000 > /*** It wasn't corrupted before the dma write > /*** DMA a packet of length 0x5aa containing '0x61616161...' to guestOS at > address 0x12ffac2 (which maps to 0x7f15c313eac2) > dma write to 12ffac2 len 5aa > xen_map_cache want 0x12ffac2 > xen_map_cache_return 0x7f15c313eac2 > /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is > corrupted > xen_map_cache want 0xe00000 > xen_map_cache_return 0x7f15c313f000 > /*** It is corrupted now. > e1000: Corrupted 7: test_buf:5aa 5aa > > The DMA address 0x12ffac2 mapped to 0x7f15c313eac2. When you add the > packet length, 0x5aa, the result is 0x7f15c313f06c. This result is 0x6c > bytes into the mapping of guestOS address 0xe00000, which mapped to > 0x7f15c313f000. If you dump 0xe00000 in the guestOS, 0x6c bytes are > corrupted. > We believe that the correct fix is to use qemu_ram_ptr_length instead of > qemu_get_ram_ptr in the function address_space_rw to ensure (from what > we can tell) that the mapped address is valid for the entire length > specified. It looked like this might also be an issue in > cpu_physical_memory_write_rom so we made the change there as well. > Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000. The e1000 packet is at 0x12ffac2 -- 0x7f15c313eac2. (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and (0x5aa - 0x53e) = 0x6c bytes get corrupted. I see here buffer overrun from e1000 and I suspect that your patch just hides this problem. What did I miss? Does e1000 still work with the patch applied? Are all 100% packets delivered fine? > We are fairly new to the qemu source base so we are looking to the > community to see if this problem has previously been identified and to > see if this is the correct fix. > > Following is the patch > > --- orig/exec.c 2013-11-27 16:52:55.000000000 -0600 > +++ new/exec.c 2014-02-15 21:58:34.311518000 -0600 > @@ -1911,7 +1911,7 @@ > } else { > addr1 += memory_region_get_ram_addr(mr); > /* RAM case */ > - ptr = qemu_get_ram_ptr(addr1); > + ptr = qemu_ram_ptr_length(addr1, &l); > memcpy(ptr, buf, l); > invalidate_and_set_dirty(addr1, l); > } > @@ -1945,7 +1945,7 @@ > } > } else { > /* RAM case */ > - ptr = qemu_get_ram_ptr(mr->ram_addr + addr1); > + ptr = qemu_ram_ptr_length(mr->ram_addr + addr1, &l); > memcpy(buf, ptr, l); > } > } > @@ -1995,7 +1995,7 @@ > } else { > addr1 += memory_region_get_ram_addr(mr); > /* ROM/RAM case */ > - ptr = qemu_get_ram_ptr(addr1); > + ptr = qemu_ram_ptr_length(addr1, &l); > memcpy(ptr, buf, l); > invalidate_and_set_dirty(addr1, l); > } > > > David Hoyer > Controller Firmware Development > Array Products Group > > NetApp > 3718 N. Rock Road > Wichita, KS 67226 > 316-636-8047 phone > 316-617-3677 mobile > david.ho...@netapp.com<mailto:david.ho...@lsi.com> > netapp.com<http://www.netapp.com/?ref_source=eSig> > > [Description: http://media.netapp.com/images/netapp-logo-sig-5.gif] > > Alexey