Re: [Mesa-dev] [PATCH] rtasm: deal with size overflows by casting to ptrdiff_t

Ilia Mirkin Wed, 22 Jan 2014 08:54:46 -0800

On Wed, Jan 22, 2014 at 12:03 PM, Brian Paul <bri...@vmware.com> wrote:
> On 01/21/2014 06:37 PM, Ilia Mirkin wrote:
>>
>> This was discovered as a result of the draw-elements-base-vertex-neg
>> piglit test, which passes very negative offsets in, followed up by large
>> indices. The nouveau code correctly adjusts the pointer, but the
>> transfer code needs to do the proper inverse correction. Similarly fix
>> up the SSE code to do a 64-bit multiply to compute the proper offset.
>>
>> Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
>> ---
>>
>> With this change, nouveau passes for the draw-elements-base-vertex-neg
>> piglit
>> test with user_varrays, on a 64-bit setup both with and without
>> GALLIUM_NOSSE=1. I'm pretty sure that the change should be minimal to a
>> non-x86 setup since the rexw will be a no-op. I guess there will be an
>> extra
>> register use for the mov, but it shouldn't be too expensive, esp on
>> anything
>> remotely current.
>>
>>   src/gallium/auxiliary/translate/translate_generic.c | 2 +-
>>   src/gallium/auxiliary/translate/translate_sse.c     | 8 ++++++--
>>   2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/translate/translate_generic.c
>> b/src/gallium/auxiliary/translate/translate_generic.c
>> index 5bf97db..5ffce32 100644
>> --- a/src/gallium/auxiliary/translate/translate_generic.c
>> +++ b/src/gallium/auxiliary/translate/translate_generic.c
>> @@ -638,7 +638,7 @@ static ALWAYS_INLINE void PIPE_CDECL generic_run_one(
>> struct translate_generic *
>>            }
>>
>>            src = tg->attrib[attr].input_ptr +
>> -               tg->attrib[attr].input_stride * index;
>> +               (ptrdiff_t)tg->attrib[attr].input_stride * index;
>>
>>            copy_size = tg->attrib[attr].copy_size;
>>            if(likely(copy_size >= 0))
>> diff --git a/src/gallium/auxiliary/translate/translate_sse.c
>> b/src/gallium/auxiliary/translate/translate_sse.c
>> index a78ea91..a72454a 100644
>> --- a/src/gallium/auxiliary/translate/translate_sse.c
>> +++ b/src/gallium/auxiliary/translate/translate_sse.c
>> @@ -1121,7 +1121,9 @@ static boolean init_inputs( struct translate_sse *p,
>>               x86_cmovcc(p->func, tmp_EAX, buf_max_index, cc_AE);
>>            }
>>
>> -         x86_imul(p->func, tmp_EAX, buf_stride);
>> +         x86_mov(p->func, p->tmp2_EDX, buf_stride);
>> +         x64_rexw(p->func);
>> +         x86_imul(p->func, tmp_EAX, p->tmp2_EDX);
>>            x64_rexw(p->func);
>>            x86_add(p->func, tmp_EAX, buf_base_ptr);
>>
>> @@ -1207,7 +1209,9 @@ static struct x86_reg get_buffer_ptr( struct
>> translate_sse *p,
>>         x86_cmp(p->func, ptr, buf_max_index);
>>         x86_cmovcc(p->func, ptr, buf_max_index, cc_AE);
>>
>> -      x86_imul(p->func, ptr, buf_stride);
>> +      x86_mov(p->func, p->tmp2_EDX, buf_stride);
>> +      x64_rexw(p->func);
>> +      x86_imul(p->func, ptr, p->tmp2_EDX);
>>         x64_rexw(p->func);
>>         x86_add(p->func, ptr, buf_base_ptr);
>>         return ptr;
>>
>
> I'm no x86 expert, but this looks OK to me, and if it works for you...


I'm no expert either, but perhaps this will put your mind at ease (as
it did mine):

void *func(void *a, int b, int c) {
  return a + (ptrdiff_t)b * c;
}

gets compiled into, by gcc -O2 -S,

        movslq  %edx, %rdx
        movslq  %esi, %rsi
        imulq   %rdx, %rsi
        leaq    (%rdi,%rsi), %rax

Which, to be honest, is pretty clever, using leaq that way. But it's
basically an add (but if it were, e.g., int *, it would be leaq
(%rdi,%rsi,2), %rax saving on a shl). (Note that this is at&t/gas
syntax, so all backwards.) The new code generated with the
modifications I made is:

0000000000000020 8b9770040000     mov edx, [rdi+0x470]
0000000000000026 480fafca         imul rcx, rdx
000000000000002a 48038f68040000   add rcx, [rdi+0x468]

So the difference is that I'm first moving the value into edx, rather
than using the [rdi+bla] directly in the imul, and then making the
imul use 64-bit registers (by using the REX.W prefix). By moving the
32-bit value into edx, the upper rdx bits are cleared. The only
potential problem is that I'm using EDX where it wasn't used before,
but looking at the call-site of get_buffer_ptr, that seems like it
should be OK.

>
> Reviewed-by: Brian Paul <bri...@vmware.com>

Thanks! Would you be able to commit this as well? I noticed that I
messed up the patch subject and a word in the description...
s/rtasm/translate, s/transfer/translate. You can see these fixed at
https://github.com/imirkin/mesa/commit/3bc3081b2106ab0544adc483768ab2a1321565ff

>
> Should this go into the 10.0.x branch too?

Ehh... I don't know if it qualifies. The only legit way that this can
happen (at least for nouveau) is if an application allocates a >4GB
VBO. Seems unlikely. (vs the non-0 chance that I messed something up
in there.) Up to you though.

  -ilia
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] rtasm: deal with size overflows by casting to ptrdiff_t

Reply via email to