Am 01.10.2013 03:57, schrieb Roland Mainz: > On Tue, Oct 1, 2013 at 3:43 AM, Ian Romanick <i...@freedesktop.org> wrote: >> On 09/30/2013 05:47 PM, Roland Mainz wrote: >>> On Tue, Oct 1, 2013 at 2:27 AM, Ian Romanick <i...@freedesktop.org> wrote: >>>> On 09/27/2013 04:46 PM, Kenneth Graunke wrote: >>>>> This was only used for uploading batchbuffer data, and only on 32-bit >>>>> systems. If this is actually useful, we might want to use it more >>>>> widely. But more than likely, it isn't. >>>> >>>> This probably is still useful, alas. The glibc memcpy wants to do an >>>> Atom-friendly backwards walk of the addresses. >>> >>> Erm... just curious: Are you sure this is done for Atom ? Originally >>> such copy-from-highest-to-lowest-address copying is (should be: "was") >>> done to enable overlapping copies... but at least POSIX mandates that >>> |memcpy()| is not required to support overlapping copies and users >>> should use |memmove()| instead in such cases (for example Solaris uses >>> the POSIX interpretation in this case... and AFAIK Apple OSX even hits >>> you with an |abort()| if you attempt an overlapping copy with >>> |memcpy()| (or |strcpy()|) (and AFAIK "valgrind" will complain about >>> such abuse of |memcpy()|/|strcpy()|/|stpcpy()|, too)). >> >> I was pretty sure it was Atom... though looking at the glibc source, the >> backward memcpy is only used on Core i3, i5, and i7 unless >> USE_AS_MEMMOVE is defined. Hmm... > > Grrrr... wrong glibc used on wrong CPU ? > > Well... on Solaris that issue was solved via the loop-back filesystem: > -- snip -- > $ mount | fgrep libc > /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1 > read/write/setuid/devices/dev=1690002 on Sat Sep 28 17:08:31 2013 > -- snip -- > (the system starts with a generic libc and then figures out which > optimised libc to use and then mounts the matching libc as file (!!) > over /lib/libc.so.1). > > The other solution in this area is to add special CPU type handling to > the linker and runtime linker... the advantage is that you only need > one libc.so.1 binary anymore... the disadvantage is that libc.so.1, > linker and runtime linker must cooperate... and debugging is far more > difficult than using mount(1)+fgrep(1) ... ;-/ > >>>> For some kinds of >>>> mappings (uncached?), this breaks write combining and ruins performance. >>> >>> That more or less breaks performance _everywhere_ because automatic >>> prefetch obtains the next cache line and not the previous one. >> >> Except your out-of-order CPU is really smart, and, IIRC, that makes it >> usually not break. I think. > > I'm not sure whether the out-of-order CPUs can look that deep into the > instruction queue or whether their queue is really _that_ deep (not > even Sun's "ROCK" processor tried that). AFAIK it's more likely that > some "statistics" bits remember that the loop is going backwards... > ... but that's all speculation about crazy CPU architecture. >
Yes, as far as I know all (?) modern cpus can recognize access patterns in memory accesses, so they can correctly prefetch even non-adjacent cache lines (I don't know though if the old atom would qualify, which is only partly really a modern cpu). Not sure though why backward vs. forward would make a difference (I would suspect the difference is tiny). I don't think going backwards necessarily has to break write combining neither, so that might be specific to some cpus as well. Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev