From: Mikulas Patocka > Sent: 03 August 2018 13:05 ... > > Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) > > isn't a good idea. > > In the kernel memcpy_to/fromio() ought to be a better choice but that > > is just an alternate name for memcpy(). > > > > The problem on x86 is that memcpy() is likely to be implemented as > > 'rep movsb' on modern cpu - relying on the cpu hardware to perform > > cache-line sized transfers (etc). > > Unfortunately on uncached locations it has to revert to byte copies. > > So PCIe transfers (especially reads) are very slow. > > > > The transfers need to use the largest size register available. > > > > David > > On x86, the framebuffer is mapped as write-combining memory type, so "rep > movsb" could merge the byte writes to larger chunks. I don't have a cpu > with the ERMS feature - could anyone try it if rep movsb works worse or > better than explicit writes to the framebuffer?
I don't think 'write combining' can help reads, and memcpy_to/fromio() are likely to be used for normal memory mapped io areas. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)