On Wed, Jan 22, 2014 at 12:03 PM, Brian Paul <bri...@vmware.com> wrote: > On 01/21/2014 06:37 PM, Ilia Mirkin wrote: >> >> This was discovered as a result of the draw-elements-base-vertex-neg >> piglit test, which passes very negative offsets in, followed up by large >> indices. The nouveau code correctly adjusts the pointer, but the >> transfer code needs to do the proper inverse correction. Similarly fix >> up the SSE code to do a 64-bit multiply to compute the proper offset. >> >> Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> >> --- >> >> With this change, nouveau passes for the draw-elements-base-vertex-neg >> piglit >> test with user_varrays, on a 64-bit setup both with and without >> GALLIUM_NOSSE=1. I'm pretty sure that the change should be minimal to a >> non-x86 setup since the rexw will be a no-op. I guess there will be an >> extra >> register use for the mov, but it shouldn't be too expensive, esp on >> anything >> remotely current. >> >> src/gallium/auxiliary/translate/translate_generic.c | 2 +- >> src/gallium/auxiliary/translate/translate_sse.c | 8 ++++++-- >> 2 files changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/src/gallium/auxiliary/translate/translate_generic.c >> b/src/gallium/auxiliary/translate/translate_generic.c >> index 5bf97db..5ffce32 100644 >> --- a/src/gallium/auxiliary/translate/translate_generic.c >> +++ b/src/gallium/auxiliary/translate/translate_generic.c >> @@ -638,7 +638,7 @@ static ALWAYS_INLINE void PIPE_CDECL generic_run_one( >> struct translate_generic * >> } >> >> src = tg->attrib[attr].input_ptr + >> - tg->attrib[attr].input_stride * index; >> + (ptrdiff_t)tg->attrib[attr].input_stride * index; >> >> copy_size = tg->attrib[attr].copy_size; >> if(likely(copy_size >= 0)) >> diff --git a/src/gallium/auxiliary/translate/translate_sse.c >> b/src/gallium/auxiliary/translate/translate_sse.c >> index a78ea91..a72454a 100644 >> --- a/src/gallium/auxiliary/translate/translate_sse.c >> +++ b/src/gallium/auxiliary/translate/translate_sse.c >> @@ -1121,7 +1121,9 @@ static boolean init_inputs( struct translate_sse *p, >> x86_cmovcc(p->func, tmp_EAX, buf_max_index, cc_AE); >> } >> >> - x86_imul(p->func, tmp_EAX, buf_stride); >> + x86_mov(p->func, p->tmp2_EDX, buf_stride); >> + x64_rexw(p->func); >> + x86_imul(p->func, tmp_EAX, p->tmp2_EDX); >> x64_rexw(p->func); >> x86_add(p->func, tmp_EAX, buf_base_ptr); >> >> @@ -1207,7 +1209,9 @@ static struct x86_reg get_buffer_ptr( struct >> translate_sse *p, >> x86_cmp(p->func, ptr, buf_max_index); >> x86_cmovcc(p->func, ptr, buf_max_index, cc_AE); >> >> - x86_imul(p->func, ptr, buf_stride); >> + x86_mov(p->func, p->tmp2_EDX, buf_stride); >> + x64_rexw(p->func); >> + x86_imul(p->func, ptr, p->tmp2_EDX); >> x64_rexw(p->func); >> x86_add(p->func, ptr, buf_base_ptr); >> return ptr; >> > > I'm no x86 expert, but this looks OK to me, and if it works for you...
I'm no expert either, but perhaps this will put your mind at ease (as it did mine): void *func(void *a, int b, int c) { return a + (ptrdiff_t)b * c; } gets compiled into, by gcc -O2 -S, movslq %edx, %rdx movslq %esi, %rsi imulq %rdx, %rsi leaq (%rdi,%rsi), %rax Which, to be honest, is pretty clever, using leaq that way. But it's basically an add (but if it were, e.g., int *, it would be leaq (%rdi,%rsi,2), %rax saving on a shl). (Note that this is at&t/gas syntax, so all backwards.) The new code generated with the modifications I made is: 0000000000000020 8b9770040000 mov edx, [rdi+0x470] 0000000000000026 480fafca imul rcx, rdx 000000000000002a 48038f68040000 add rcx, [rdi+0x468] So the difference is that I'm first moving the value into edx, rather than using the [rdi+bla] directly in the imul, and then making the imul use 64-bit registers (by using the REX.W prefix). By moving the 32-bit value into edx, the upper rdx bits are cleared. The only potential problem is that I'm using EDX where it wasn't used before, but looking at the call-site of get_buffer_ptr, that seems like it should be OK. > > Reviewed-by: Brian Paul <bri...@vmware.com> Thanks! Would you be able to commit this as well? I noticed that I messed up the patch subject and a word in the description... s/rtasm/translate, s/transfer/translate. You can see these fixed at https://github.com/imirkin/mesa/commit/3bc3081b2106ab0544adc483768ab2a1321565ff > > Should this go into the 10.0.x branch too? Ehh... I don't know if it qualifies. The only legit way that this can happen (at least for nouveau) is if an application allocates a >4GB VBO. Seems unlikely. (vs the non-0 chance that I messed something up in there.) Up to you though. -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev