On Tue, Oct 1, 2013 at 3:43 AM, Ian Romanick <[email protected]> wrote: > On 09/30/2013 05:47 PM, Roland Mainz wrote: >> On Tue, Oct 1, 2013 at 2:27 AM, Ian Romanick <[email protected]> wrote: >>> On 09/27/2013 04:46 PM, Kenneth Graunke wrote: >>>> This was only used for uploading batchbuffer data, and only on 32-bit >>>> systems. If this is actually useful, we might want to use it more >>>> widely. But more than likely, it isn't. >>> >>> This probably is still useful, alas. The glibc memcpy wants to do an >>> Atom-friendly backwards walk of the addresses. >> >> Erm... just curious: Are you sure this is done for Atom ? Originally >> such copy-from-highest-to-lowest-address copying is (should be: "was") >> done to enable overlapping copies... but at least POSIX mandates that >> |memcpy()| is not required to support overlapping copies and users >> should use |memmove()| instead in such cases (for example Solaris uses >> the POSIX interpretation in this case... and AFAIK Apple OSX even hits >> you with an |abort()| if you attempt an overlapping copy with >> |memcpy()| (or |strcpy()|) (and AFAIK "valgrind" will complain about >> such abuse of |memcpy()|/|strcpy()|/|stpcpy()|, too)). > > I was pretty sure it was Atom... though looking at the glibc source, the > backward memcpy is only used on Core i3, i5, and i7 unless > USE_AS_MEMMOVE is defined. Hmm...
Grrrr... wrong glibc used on wrong CPU ? Well... on Solaris that issue was solved via the loop-back filesystem: -- snip -- $ mount | fgrep libc /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1 read/write/setuid/devices/dev=1690002 on Sat Sep 28 17:08:31 2013 -- snip -- (the system starts with a generic libc and then figures out which optimised libc to use and then mounts the matching libc as file (!!) over /lib/libc.so.1). The other solution in this area is to add special CPU type handling to the linker and runtime linker... the advantage is that you only need one libc.so.1 binary anymore... the disadvantage is that libc.so.1, linker and runtime linker must cooperate... and debugging is far more difficult than using mount(1)+fgrep(1) ... ;-/ >>> For some kinds of >>> mappings (uncached?), this breaks write combining and ruins performance. >> >> That more or less breaks performance _everywhere_ because automatic >> prefetch obtains the next cache line and not the previous one. > > Except your out-of-order CPU is really smart, and, IIRC, that makes it > usually not break. I think. I'm not sure whether the out-of-order CPUs can look that deep into the instruction queue or whether their queue is really _that_ deep (not even Sun's "ROCK" processor tried that). AFAIK it's more likely that some "statistics" bits remember that the loop is going backwards... ... but that's all speculation about crazy CPU architecture. What about a "configure" probe which checks whether |memcpy()| does or does not support overlapping copies (see warning about Apple OSX in my older posting) and then assume its safe to use if it does _not_ support overlapping copies ... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [email protected] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
