[Mesa-dev] spec/!OpenGL 3.0/gl-3.0-texture-integer fails on r600g

2013-08-02 Thread Martin Andersson
Hi,

I started to look at why the spec/!OpenGL 3.0/gl-3.0-texture-integer
sometimes fails on my AMD 6950, using mesa master. It fails with
errors like this:

texture-integer: failure with format GL_RGB8I_EXT:
  texture color = 100, 9, 71, 0
  expected color = 0.25, 0.5, 0.75, 0
  result color = 0.25098, 0.501961, 0.74902, 1
PIGLIT: {'result': 'fail' }

When I ran the test a bunch of times I found that it only failed when
the last texture color was zero. So when I changed this code in the
pigilt test:

value[0] = rand() % max;
value[1] = rand() % max;
value[2] = rand() % max;
value[3] = rand() % max;

to this:

value[0] = rand() % max;
value[1] = rand() % max;
value[2] = rand() % max;
value[3] = 0;

The test always fails.

I found this bug https://bugs.freedesktop.org/show_bug.cgi?id=63664
But I can't reproduce this bug on my intel G45 (running mesa 9.1.5).

Anyone know what is causing this or how I could debug it further?

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] spec/!OpenGL 3.0/gl-3.0-texture-integer fails on r600g

2013-08-02 Thread Martin Andersson
On Fri, Aug 2, 2013 at 2:52 PM, Marek Olšák  wrote:
> The format doesn't have alpha. See what the texture fetch writes to
> the alpha channel.

I looked at the code but I can't figure out where the texture fetch
happens, could you point me in the right direction?

>
> You may try setting "texture-integer.c:290" to "expected[3] = 1.0;"

The test passes if I do that.

//Martin

>
> Marek
>
> On Fri, Aug 2, 2013 at 2:15 PM, Martin Andersson  wrote:
>> Hi,
>>
>> I started to look at why the spec/!OpenGL 3.0/gl-3.0-texture-integer
>> sometimes fails on my AMD 6950, using mesa master. It fails with
>> errors like this:
>>
>> texture-integer: failure with format GL_RGB8I_EXT:
>>   texture color = 100, 9, 71, 0
>>   expected color = 0.25, 0.5, 0.75, 0
>>   result color = 0.25098, 0.501961, 0.74902, 1
>> PIGLIT: {'result': 'fail' }
>>
>> When I ran the test a bunch of times I found that it only failed when
>> the last texture color was zero. So when I changed this code in the
>> pigilt test:
>>
>> value[0] = rand() % max;
>> value[1] = rand() % max;
>> value[2] = rand() % max;
>> value[3] = rand() % max;
>>
>> to this:
>>
>> value[0] = rand() % max;
>> value[1] = rand() % max;
>> value[2] = rand() % max;
>> value[3] = 0;
>>
>> The test always fails.
>>
>> I found this bug https://bugs.freedesktop.org/show_bug.cgi?id=63664
>> But I can't reproduce this bug on my intel G45 (running mesa 9.1.5).
>>
>> Anyone know what is causing this or how I could debug it further?
>>
>> //Martin
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] spec/!OpenGL 3.0/gl-3.0-texture-integer fails on r600g

2013-08-03 Thread Martin Andersson
Well, I should have been more clear. If I do this:

263: value[3] = 0;
290: expected[3] = 1.0;

The test always passes, but if I only do this:

290: expected[3] = 1.0;

The test fails with this error:

texture-integer: failure with format GL_RGB8I_EXT:
  texture color = 92, 126, 14, 104
  expected color = 0.25, 0.5, 0.75, 1
  result color = 0.25098, 0.501961, 0.74902, 0
PIGLIT: {'result': 'fail' }

//Martin

On Sat, Aug 3, 2013 at 5:23 AM, Marek Olšák  wrote:
> FragShaderText contains the shader code. Anyway, we have found the
> issue: expected[3] should really be set to 1.0, because RGB formats
> must return (r,g,b,1). It's a bug in the piglit test.
>
> Marek
>
> On Fri, Aug 2, 2013 at 11:44 PM, Martin Andersson  wrote:
>> On Fri, Aug 2, 2013 at 2:52 PM, Marek Olšák  wrote:
>>> The format doesn't have alpha. See what the texture fetch writes to
>>> the alpha channel.
>>
>> I looked at the code but I can't figure out where the texture fetch
>> happens, could you point me in the right direction?
>>
>>>
>>> You may try setting "texture-integer.c:290" to "expected[3] = 1.0;"
>>
>> The test passes if I do that.
>>
>> //Martin
>>
>>>
>>> Marek
>>>
>>> On Fri, Aug 2, 2013 at 2:15 PM, Martin Andersson  wrote:
>>>> Hi,
>>>>
>>>> I started to look at why the spec/!OpenGL 3.0/gl-3.0-texture-integer
>>>> sometimes fails on my AMD 6950, using mesa master. It fails with
>>>> errors like this:
>>>>
>>>> texture-integer: failure with format GL_RGB8I_EXT:
>>>>   texture color = 100, 9, 71, 0
>>>>   expected color = 0.25, 0.5, 0.75, 0
>>>>   result color = 0.25098, 0.501961, 0.74902, 1
>>>> PIGLIT: {'result': 'fail' }
>>>>
>>>> When I ran the test a bunch of times I found that it only failed when
>>>> the last texture color was zero. So when I changed this code in the
>>>> pigilt test:
>>>>
>>>> value[0] = rand() % max;
>>>> value[1] = rand() % max;
>>>> value[2] = rand() % max;
>>>> value[3] = rand() % max;
>>>>
>>>> to this:
>>>>
>>>> value[0] = rand() % max;
>>>> value[1] = rand() % max;
>>>> value[2] = rand() % max;
>>>> value[3] = 0;
>>>>
>>>> The test always fails.
>>>>
>>>> I found this bug https://bugs.freedesktop.org/show_bug.cgi?id=63664
>>>> But I can't reproduce this bug on my intel G45 (running mesa 9.1.5).
>>>>
>>>> Anyone know what is causing this or how I could debug it further?
>>>>
>>>> //Martin
>>>> ___
>>>> mesa-dev mailing list
>>>> mesa-dev@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] spec/!OpenGL 3.0/gl-3.0-texture-integer fails on r600g

2013-08-09 Thread Martin Andersson
I think I have found an issue in the piglit test.

Marek, could you take a look at the attached patch and see if it looks
correct. If so I will send it to the piglit list.

//Martin

On Tue, Aug 6, 2013 at 11:20 PM, Marek Olšák  wrote:
> Sorry, I have no idea. You can try to remove support for RGBX integer
> formats and see if it helps.
>
> In is_format_supported, return FALSE for all R?G?B?X_?INT formats.
>
> Marek
>
> On Sat, Aug 3, 2013 at 11:51 AM, Martin Andersson  wrote:
>> Well, I should have been more clear. If I do this:
>>
>> 263: value[3] = 0;
>> 290: expected[3] = 1.0;
>>
>> The test always passes, but if I only do this:
>>
>> 290: expected[3] = 1.0;
>>
>> The test fails with this error:
>>
>> texture-integer: failure with format GL_RGB8I_EXT:
>>   texture color = 92, 126, 14, 104
>>   expected color = 0.25, 0.5, 0.75, 1
>>   result color = 0.25098, 0.501961, 0.74902, 0
>> PIGLIT: {'result': 'fail' }
>>
>> //Martin
>>
>> On Sat, Aug 3, 2013 at 5:23 AM, Marek Olšák  wrote:
>>> FragShaderText contains the shader code. Anyway, we have found the
>>> issue: expected[3] should really be set to 1.0, because RGB formats
>>> must return (r,g,b,1). It's a bug in the piglit test.
>>>
>>> Marek
>>>
>>> On Fri, Aug 2, 2013 at 11:44 PM, Martin Andersson  
>>> wrote:
>>>> On Fri, Aug 2, 2013 at 2:52 PM, Marek Olšák  wrote:
>>>>> The format doesn't have alpha. See what the texture fetch writes to
>>>>> the alpha channel.
>>>>
>>>> I looked at the code but I can't figure out where the texture fetch
>>>> happens, could you point me in the right direction?
>>>>
>>>>>
>>>>> You may try setting "texture-integer.c:290" to "expected[3] = 1.0;"
>>>>
>>>> The test passes if I do that.
>>>>
>>>> //Martin
>>>>
>>>>>
>>>>> Marek
>>>>>
>>>>> On Fri, Aug 2, 2013 at 2:15 PM, Martin Andersson  
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I started to look at why the spec/!OpenGL 3.0/gl-3.0-texture-integer
>>>>>> sometimes fails on my AMD 6950, using mesa master. It fails with
>>>>>> errors like this:
>>>>>>
>>>>>> texture-integer: failure with format GL_RGB8I_EXT:
>>>>>>   texture color = 100, 9, 71, 0
>>>>>>   expected color = 0.25, 0.5, 0.75, 0
>>>>>>   result color = 0.25098, 0.501961, 0.74902, 1
>>>>>> PIGLIT: {'result': 'fail' }
>>>>>>
>>>>>> When I ran the test a bunch of times I found that it only failed when
>>>>>> the last texture color was zero. So when I changed this code in the
>>>>>> pigilt test:
>>>>>>
>>>>>> value[0] = rand() % max;
>>>>>> value[1] = rand() % max;
>>>>>> value[2] = rand() % max;
>>>>>> value[3] = rand() % max;
>>>>>>
>>>>>> to this:
>>>>>>
>>>>>> value[0] = rand() % max;
>>>>>> value[1] = rand() % max;
>>>>>> value[2] = rand() % max;
>>>>>> value[3] = 0;
>>>>>>
>>>>>> The test always fails.
>>>>>>
>>>>>> I found this bug https://bugs.freedesktop.org/show_bug.cgi?id=63664
>>>>>> But I can't reproduce this bug on my intel G45 (running mesa 9.1.5).
>>>>>>
>>>>>> Anyone know what is causing this or how I could debug it further?
>>>>>>
>>>>>> //Martin
>>>>>> ___
>>>>>> mesa-dev mailing list
>>>>>> mesa-dev@lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


0001-texture-integer-fix-alpha-issue-with-GL_RGB_INTEGER_.patch
Description: Binary data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] spec/!OpenGL 3.0/gl-3.0-texture-integer fails on r600g

2013-08-09 Thread Martin Andersson
On Fri, Aug 9, 2013 at 8:28 PM, Ian Romanick  wrote:
> On 08/09/2013 04:22 AM, Martin Andersson wrote:
>>
>> I think I have found an issue in the piglit test.
>>
>> Marek, could you take a look at the attached patch and see if it looks
>> correct. If so I will send it to the piglit list.
>
>
> Wow.  That test is confusing.  It would be a lot more obvious (and less
> error prone) if each case statement directly computed expected and bias.  I
> think you're right about the GL_RGB_INTEGER_EXT case, but it seems like the
> GL_ALPHA_INTEGER_EXT case has a similar problem.  Shouldn't value[0],
> value[1], and value[2] be set to 0?

I think you're right that GL_ALPHA_INTEGER_EXT also have a problem,
but it doesn't cause the test to fail. Take this example:

value = 72, 89, 0, 72
expected = 0, 0, 0, 0.25

// computed bias
bias = -72, -89, 0, -71.75

// ivec4 t = texture(tex, gl_TexCoord[0].xy);
t = 0, 0, 0, 72

t + bias
result = -72, -89, 0, 0.25

I guess the negative values are rounded up to zero, which are the
expected values for rgb so the test passes.

This example illustrates why GL_ALPHA_INTEGER_EXT failed when alpha was zero.

value = 121, 68, 32, 0
expected = 0.25, 0.5, 0.75, 0

// computed bias
bias = -120.75, -67.5, -31.25, 0

// ivec4 t = texture(tex, gl_TexCoord[0].xy);
t = 121, 68, 32, 1

t + bias
result = 0.25, 0.5, 0.75, 1

Which fails the test. If value instead is:
value = 121, 68, 32, 1

bias = -120.75, -67.5, -31.25, -1

The result becomes:
result = 0.25, 0.5, 0.75, 0

Which passes the test, for all alpha values greater than one the
result alpha becomes negative which I guess get rounded to zero, which
also passes the test.

This is my analysis of this issue, but my knowledge of these things
are pretty limited so I could be wrong.

//Martin

>
>> //Martin
>>
>> On Tue, Aug 6, 2013 at 11:20 PM, Marek Olšák  wrote:
>>>
>>> Sorry, I have no idea. You can try to remove support for RGBX integer
>>> formats and see if it helps.
>>>
>>> In is_format_supported, return FALSE for all R?G?B?X_?INT formats.
>>>
>>> Marek
>>>
>>> On Sat, Aug 3, 2013 at 11:51 AM, Martin Andersson 
>>> wrote:
>>>>
>>>> Well, I should have been more clear. If I do this:
>>>>
>>>> 263: value[3] = 0;
>>>> 290: expected[3] = 1.0;
>>>>
>>>> The test always passes, but if I only do this:
>>>>
>>>> 290: expected[3] = 1.0;
>>>>
>>>> The test fails with this error:
>>>>
>>>> texture-integer: failure with format GL_RGB8I_EXT:
>>>>texture color = 92, 126, 14, 104
>>>>expected color = 0.25, 0.5, 0.75, 1
>>>>result color = 0.25098, 0.501961, 0.74902, 0
>>>> PIGLIT: {'result': 'fail' }
>>>>
>>>> //Martin
>>>>
>>>> On Sat, Aug 3, 2013 at 5:23 AM, Marek Olšák  wrote:
>>>>>
>>>>> FragShaderText contains the shader code. Anyway, we have found the
>>>>> issue: expected[3] should really be set to 1.0, because RGB formats
>>>>> must return (r,g,b,1). It's a bug in the piglit test.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Fri, Aug 2, 2013 at 11:44 PM, Martin Andersson 
>>>>> wrote:
>>>>>>
>>>>>> On Fri, Aug 2, 2013 at 2:52 PM, Marek Olšák  wrote:
>>>>>>>
>>>>>>> The format doesn't have alpha. See what the texture fetch writes to
>>>>>>> the alpha channel.
>>>>>>
>>>>>>
>>>>>> I looked at the code but I can't figure out where the texture fetch
>>>>>> happens, could you point me in the right direction?
>>>>>>
>>>>>>>
>>>>>>> You may try setting "texture-integer.c:290" to "expected[3] = 1.0;"
>>>>>>
>>>>>>
>>>>>> The test passes if I do that.
>>>>>>
>>>>>> //Martin
>>>>>>
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>> On Fri, Aug 2, 2013 at 2:15 PM, Martin Andersson 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I started to look at why the spec/!OpenGL 3.0/gl-3.0-texture-integer
>>>>>>>> sometimes fails on my AMD 6950, using mesa master. It fails with
>>>

[Mesa-dev] [PATCH] winsys/radeon: Only add bo to hash table when creating flink

2013-03-01 Thread Martin Andersson
The problem is that we mix bo handles and flinked names in the hash
table. Because kms type handles are not flinked they should not be
added to the hash table. If we do that we will sooner or later
get a situation where we will overwrite a correct entry because
the bo handle was the same as a flinked name.
---
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index 2d41c26..f4ac526 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -957,16 +957,16 @@ static boolean radeon_winsys_bo_get_handle(struct 
pb_buffer *buffer,
 
 bo->flinked = TRUE;
 bo->flink = flink.name;
+
+pipe_mutex_lock(bo->mgr->bo_handles_mutex);
+util_hash_table_set(bo->mgr->bo_handles, 
(void*)(uintptr_t)bo->flink, bo);
+pipe_mutex_unlock(bo->mgr->bo_handles_mutex);
 }
 whandle->handle = bo->flink;
 } else if (whandle->type == DRM_API_HANDLE_TYPE_KMS) {
 whandle->handle = bo->handle;
 }
 
-pipe_mutex_lock(bo->mgr->bo_handles_mutex);
-util_hash_table_set(bo->mgr->bo_handles, 
(void*)(uintptr_t)whandle->handle, bo);
-pipe_mutex_unlock(bo->mgr->bo_handles_mutex);
-
 whandle->stride = stride;
 return TRUE;
 }
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] Speedup mesa glReadPixels

2013-03-10 Thread Martin Andersson
My motivation for trying to improve the speed of glReadPixels was the
screenshooter in weston. It took about 2.3 seconds to take a 1920x1200 
screenshot on my machine (Intel Core i7 with AMD 6950).

The first issue was that because the type was GL_UNSIGNED_BYTE it was
hitting the slow path. After looking in other parts of mesa it seemed to
me that GL_UNSIGNED_BYTE could be treated in the same way as
GL_UNSIGNED_INT_8_8_8_8_REV (I could very well be wrong though). That
is the first patch.

But even after hitting the fast path it was quite slow, it now took 0.8
seconds to take a screenshot. Then I found an earlier revision of that
fast memcpy code and that got the time down to 0.2 seconds. The second
patch is a slightly modifed version of that code.

I tried to run some piglit test to verify the code, but i could not
complete a run without lockups, without my patches, of course. I tried 
all.tests, quick.tests and r600.tests. The sanity.tests worked but it 
does not hit the fast path.

Martin Andersson (2):
  mesa: Add GL_UNSIGNED_BYTE fast-path to fast_read_rgba_pixels_memcpy
  mesa: Speedup the xrgb -> argb special case in
fast_read_rgba_pixels_memcpy

 src/mesa/main/readpix.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa: Add GL_UNSIGNED_BYTE fast-path to fast_read_rgba_pixels_memcpy

2013-03-10 Thread Martin Andersson
---
 src/mesa/main/readpix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 2f130ae..349b0bc 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -238,7 +238,7 @@ fast_read_rgba_pixels_memcpy( struct gl_context *ctx,
}
else if (rb->Format == MESA_FORMAT_XRGB &&
format == GL_BGRA &&
-   type == GL_UNSIGNED_INT_8_8_8_8_REV &&
+   (type == GL_UNSIGNED_INT_8_8_8_8_REV || type == GL_UNSIGNED_BYTE) &&
!ctx->Pack.SwapBytes) {
   copy_xrgb = GL_TRUE;
}
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa: Speedup the xrgb -> argb special case in fast_read_rgba_pixels_memcpy

2013-03-10 Thread Martin Andersson
---
 src/mesa/main/readpix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 349b0bc..0f5c84c 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -285,11 +285,12 @@ fast_read_rgba_pixels_memcpy( struct gl_context *ctx,
   }
} else if (copy_xrgb) {
   /* convert xrgb -> argb */
+  int alphaOffset = texelBytes - 1;
   for (j = 0; j < height; j++) {
- GLuint *dst4 = (GLuint *) dst, *map4 = (GLuint *) map;
+ memcpy(dst, map, width * texelBytes);
  int i;
  for (i = 0; i < width; i++) {
-dst4[i] = map4[i] | 0xff00;  /* set A=0xff */
+dst[i * texelBytes + alphaOffset] = 0xff;  /* set A=0xff */
  }
  dst += dstStride;
  map += stride;
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: Add GL_UNSIGNED_BYTE fast-path to fast_read_rgba_pixels_memcpy

2013-03-11 Thread Martin Andersson
On Mon, Mar 11, 2013 at 11:54 AM, Michel Dänzer  wrote:
> On Son, 2013-03-10 at 23:05 +0100, Martin Andersson wrote:
>> ---
>>  src/mesa/main/readpix.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
>> index 2f130ae..349b0bc 100644
>> --- a/src/mesa/main/readpix.c
>> +++ b/src/mesa/main/readpix.c
>> @@ -238,7 +238,7 @@ fast_read_rgba_pixels_memcpy( struct gl_context *ctx,
>> }
>> else if (rb->Format == MESA_FORMAT_XRGB &&
>> format == GL_BGRA &&
>> -   type == GL_UNSIGNED_INT_8_8_8_8_REV &&
>> +   (type == GL_UNSIGNED_INT_8_8_8_8_REV || type == GL_UNSIGNED_BYTE) &&
>
> This cannot be equivalent on little endian and big endian hosts at the
> same time. As it works for you, it's apparently equivalent on little
> endian.

ok, I guess it is also undesirable to have lots of special cases there, with
checks for lots of different combos of types and endianness.

>
> I suspect ReadPixels could be made even faster with similar treatment as
> Marek has applied to TexSubImage etc.

I have looked at TexSubImage and how the radeon dri and intel dri drivers
implement glReadPixels (they use a blit call). But I did not understand
how I could use it for glReadPixels. I will look at it some more, thanks.

>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: Speedup the xrgb -> argb special case in fast_read_rgba_pixels_memcpy

2013-03-11 Thread Martin Andersson
On Mon, Mar 11, 2013 at 3:56 PM, Jose Fonseca  wrote:
> I'm surprised this is is faster.
>
> In particular, for big things we'll be touching memory twice.
>
> Did you measure the speed up?

I tested it with the previous patch, with GL_UNSIGNED_BYTE, and on
that case it was faster, but since that patch was incorrect (I did not
take endianness into account) this patch can also probably be
discarded.

>
> Jose
>
> - Original Message -
>> ---
>>  src/mesa/main/readpix.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
>> index 349b0bc..0f5c84c 100644
>> --- a/src/mesa/main/readpix.c
>> +++ b/src/mesa/main/readpix.c
>> @@ -285,11 +285,12 @@ fast_read_rgba_pixels_memcpy( struct gl_context *ctx,
>>}
>> } else if (copy_xrgb) {
>>/* convert xrgb -> argb */
>> +  int alphaOffset = texelBytes - 1;
>>for (j = 0; j < height; j++) {
>> - GLuint *dst4 = (GLuint *) dst, *map4 = (GLuint *) map;
>> + memcpy(dst, map, width * texelBytes);
>>   int i;
>>   for (i = 0; i < width; i++) {
>> -dst4[i] = map4[i] | 0xff00;  /* set A=0xff */
>> +dst[i * texelBytes + alphaOffset] = 0xff;  /* set A=0xff */
>>   }
>>   dst += dstStride;
>>   map += stride;
>> --
>> 1.8.1.5
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Fwd: [PATCH 5/6] st/mesa: implement blit-based ReadPixels

2013-03-14 Thread Martin Andersson
Thanks for implementing this, I just have one comment.

On Thu, Mar 14, 2013 at 7:45 PM, Marek Olšák  wrote:
> Initial version contributed by: Martin Andersson 
>
> This is only used if the memcpy path cannot be used and if no transfer ops
> are needed. It's pretty similar to our TexImage and GetTexImage
> implementations.
>
> The motivation behind this is to be able to use ReadPixels every frame and
> still have at least 20 fps (or 60 fps with a powerful GPU and CPU)
> instead of 0.5 fps.
> ---
>  src/mesa/state_tracker/st_cb_readpixels.c |  184 
> +++--
>  src/mesa/state_tracker/st_cb_texture.c|   12 +-
>  src/mesa/state_tracker/st_cb_texture.h|6 +
>  3 files changed, 189 insertions(+), 13 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_cb_readpixels.c 
> b/src/mesa/state_tracker/st_cb_readpixels.c
> index 6b824b1..b524738 100644
> --- a/src/mesa/state_tracker/st_cb_readpixels.c
> +++ b/src/mesa/state_tracker/st_cb_readpixels.c
> @@ -25,35 +25,205 @@
>   *
>   **/
>
> -
> +#include "main/image.h"
> +#include "main/pbo.h"
>  #include "main/imports.h"
>  #include "main/readpix.h"
> +#include "main/enums.h"
> +#include "main/framebuffer.h"
> +#include "util/u_inlines.h"
> +#include "util/u_format.h"
>
> +#include "st_cb_fbo.h"
>  #include "st_atom.h"
>  #include "st_context.h"
>  #include "st_cb_bitmap.h"
>  #include "st_cb_readpixels.h"
> +#include "state_tracker/st_cb_texture.h"
> +#include "state_tracker/st_format.h"
> +#include "state_tracker/st_texture.h"
>
>
>  /**
> - * The only special thing we need to do for the state tracker's
> - * glReadPixels is to validate state (to be sure we have up-to-date
> - * framebuffer surfaces) and flush the bitmap cache prior to reading.
> + * This uses a blit to copy the read buffer to a texture format which matches
> + * the format and type combo and then a fast read-back is done using memcpy.
> + * We can do arbitrary X/Y/Z/W/0/1 swizzling here as long as there is
> + * a format which matches the swizzling.
> + *
> + * If such a format isn't available, we fall back to _mesa_readpixels.
> + *
> + * NOTE: Some drivers use a blit to convert between tiled and linear
> + *   texture layouts during texture uploads/downloads, so the blit
> + *   we do here should be free in such cases.
>   */
>  static void
>  st_readpixels(struct gl_context *ctx, GLint x, GLint y,
>GLsizei width, GLsizei height,
>GLenum format, GLenum type,
>const struct gl_pixelstore_attrib *pack,
> -  GLvoid *dest)
> +  GLvoid *pixels)
>  {
> struct st_context *st = st_context(ctx);
> +   struct gl_renderbuffer *rb =
> + _mesa_get_read_renderbuffer_for_format(ctx, format);
> +   struct st_renderbuffer *strb = st_renderbuffer(rb);
> +   struct pipe_context *pipe = st->pipe;
> +   struct pipe_screen *screen = pipe->screen;
> +   struct pipe_resource *src;
> +   struct pipe_resource *dst = NULL;
> +   struct pipe_resource dst_templ;
> +   enum pipe_format dst_format, src_format;
> +   struct pipe_blit_info blit;
> +   unsigned bind = PIPE_BIND_TRANSFER_READ;
> +   struct pipe_transfer *tex_xfer;
> +   ubyte *map = NULL;
>
> +   /* Validate state (to be sure we have up-to-date framebuffer surfaces)
> +* and flush the bitmap cache prior to reading. */
> st_validate_state(st);
> st_flush_bitmap_cache(st);
> -   _mesa_readpixels(ctx, x, y, width, height, format, type, pack, dest);
> -}
>
> +   /* This must be done after state validation. */
> +   src = strb->texture;
> +
> +   /* XXX Fallback for depth-stencil formats due to an incomplete
> +* stencil blit implementation in some drivers. */
> +   if (format == GL_DEPTH_STENCIL) {
> +  goto fallback;
> +   }
> +
> +   /* We are creating a texture of the size of the region being read back.
> +* Need to check for NPOT texture support. */
> +   if (!screen->get_param(screen, PIPE_CAP_NPOT_TEXTURES) &&
> +   (!util_is_power_of_two(width) ||
> +!util_is_power_of_two(height))) {
> +  goto fallback;
> +   }
> +
> +   /* If the base internal format and the texture format don't match, we have
> +* to use the slow path. */
> +   if (rb->_BaseFormat !=
> +   _mesa_get_format_base_format(rb->Format)) {
> +  goto fallback;
> +   }
> +
> +   /* See if the texture format alread

[Mesa-dev] [PATCH] r600g: Use virtual address for PIPE_QUERY_SO* in r600_emit_query_end

2013-03-25 Thread Martin Andersson
Virtual address is used for PIPE_QUERY_SO* queries in
r600_emit_query_begin, but not in r600_emit_query_end.

This will trigger a GPU fault when one of those queries is
made and virtual address is enabled.
---
 src/gallium/drivers/r600/r600_query.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_query.c 
b/src/gallium/drivers/r600/r600_query.c
index 0335189..782ad26 100644
--- a/src/gallium/drivers/r600/r600_query.c
+++ b/src/gallium/drivers/r600/r600_query.c
@@ -186,10 +186,11 @@ static void r600_emit_query_end(struct r600_context *ctx, 
struct r600_query *que
case PIPE_QUERY_PRIMITIVES_GENERATED:
case PIPE_QUERY_SO_STATISTICS:
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
+   va += query->buffer.results_end + query->result_size/2;
cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 2, 0);
cs->buf[cs->cdw++] = 
EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | EVENT_INDEX(3);
-   cs->buf[cs->cdw++] = query->buffer.results_end + 
query->result_size/2;
-   cs->buf[cs->cdw++] = 0;
+   cs->buf[cs->cdw++] = va;
+   cs->buf[cs->cdw++] = (va >> 32UL) & 0xFF;
break;
case PIPE_QUERY_TIME_ELAPSED:
va += query->buffer.results_end + query->result_size/2;
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD

2013-03-29 Thread Martin Andersson
I found an issue with the shader compiler for Cayman when I looked
into why the ext_transform_feedback/order test case caused a GPU stall.
It turned out the stall was an infinite loop that was the result of broken
calculation in the shader function. The issue is that Cayman uses the
tgsi_umad function for UMAD, but that does not work since it does not
populate the y, z and w slots for UMUL that cayman requires.

This patch implements a cayman_umad. There are some things I'm unsure of 
though. 

The UMUL for Cayman is compiled to, as far as I can tell, ALU_OP2_MULLO_INT and
not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the uint
version in cayman_umad. In the patch I used the uint one, because that seemed
the most logical.

The add part of UMAD I copied from tgsi_umad and that had a loop around the
variable lasti, but the variable lasti is usally not used in cayman specific 
code.

This is used in tgsi functions.
int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);

But in cayman specific code this is used instead.
int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3;

It does not work to switch lasti with last_slot, since that makes the loop run 
too
many times (in my test case lasti is 0 and last_slot is 3). So I just removed 
the
loop, was that correct or should i resolve that in some other way?

Martin Andersson (1):
  r600g: Add a Cayman specific version of UMAD

 src/gallium/drivers/r600/r600_shader.c | 47 +-
 1 file changed, 46 insertions(+), 1 deletion(-)

-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD

2013-03-29 Thread Martin Andersson
The tgsi_umad function does not work for Cayman since it does not
populate the y, z and w slots for UMUL that Cayman requires.
---
 src/gallium/drivers/r600/r600_shader.c | 47 +-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 29facf7..aa23907 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -2111,6 +2111,51 @@ static int cayman_mul_int_instr(struct r600_shader_ctx 
*ctx)
return 0;
 }
 
+static int cayman_umad(struct r600_shader_ctx *ctx)
+{
+   struct tgsi_full_instruction *inst = 
&ctx->parse.FullToken.FullInstruction;
+   struct r600_bytecode_alu alu;
+   int i, j, k, r;
+   int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3;
+
+   for (k = 0; k < last_slot; k++) {
+   if (!(inst->Dst[0].Register.WriteMask & (1 << k)))
+   continue;
+
+   for (i = 0 ; i < 4; i++) {
+   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP2_MULLO_UINT;
+   for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
+   r600_bytecode_src(&alu.src[j], &ctx->src[j], k);
+   }
+   alu.dst.chan = i;
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = (i == k);
+   if (i == 3)
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx->bc, &alu);
+   if (r)
+   return r;
+   }
+   }
+
+   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+   tgsi_dst(ctx, &inst->Dst[0], 0, &alu.dst);
+
+   alu.op = ALU_OP2_ADD_INT;
+
+   alu.src[0].sel = ctx->temp_reg;
+   alu.src[0].chan = 0;
+
+   r600_bytecode_src(&alu.src[1], &ctx->src[2], 0);
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx->bc, &alu);
+   if (r)
+   return r;
+
+   return 0;
+}
+
 /*
  * r600 - trunc to -PI..PI range
  * r700 - normalize by dividing by 2PI
@@ -6356,7 +6401,7 @@ static struct r600_shader_tgsi_instruction 
cm_shader_tgsi_instruction[] = {
{TGSI_OPCODE_U2F,   0, ALU_OP1_UINT_TO_FLT, tgsi_op2},
{TGSI_OPCODE_UADD,  0, ALU_OP2_ADD_INT, tgsi_op2},
{TGSI_OPCODE_UDIV,  0, ALU_OP0_NOP, tgsi_udiv},
-   {TGSI_OPCODE_UMAD,  0, ALU_OP0_NOP, tgsi_umad},
+   {TGSI_OPCODE_UMAD,  0, ALU_OP0_NOP, cayman_umad},
{TGSI_OPCODE_UMAX,  0, ALU_OP2_MAX_UINT, tgsi_op2},
{TGSI_OPCODE_UMIN,  0, ALU_OP2_MIN_UINT, tgsi_op2},
{TGSI_OPCODE_UMOD,  0, ALU_OP0_NOP, tgsi_umod},
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD

2013-03-31 Thread Martin Andersson
On Sun, Mar 31, 2013 at 1:08 AM, Vadim Girlin  wrote:
> On 03/30/2013 05:35 AM, Martin Andersson wrote:
>>
>> I found an issue with the shader compiler for Cayman when I looked
>> into why the ext_transform_feedback/order test case caused a GPU stall.
>> It turned out the stall was an infinite loop that was the result of broken
>> calculation in the shader function. The issue is that Cayman uses the
>> tgsi_umad function for UMAD, but that does not work since it does not
>> populate the y, z and w slots for UMUL that cayman requires.
>>
>> This patch implements a cayman_umad. There are some things I'm unsure of
>> though.
>>
>> The UMUL for Cayman is compiled to, as far as I can tell,
>> ALU_OP2_MULLO_INT and
>> not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the
>> uint
>> version in cayman_umad. In the patch I used the uint one, because that
>> seemed
>> the most logical.
>
>
> Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK
> MULLO_UINT should be used.

Ok, I will send a patch for that as well then.

>
>>
>> The add part of UMAD I copied from tgsi_umad and that had a loop around
>> the
>> variable lasti, but the variable lasti is usally not used in cayman
>> specific code.
>>
>
> The only difference with umad on cayman is in the mul part - each MULLO_UINT
> should be expanded to 4 slots on cayman. Add part doesn't need any changes.
>
>
>> This is used in tgsi functions.
>> int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
>
>
> This is used to determine last written vector component from the write mask,
> so that if tgsi instruction doesn't write e.g. W component, we don't have to
> emit R600 instruction(s) for that component.
>
>
>>
>> But in cayman specific code this is used instead.
>> int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3;
>
>
> This is used for instructions like RECIP_xxx (see the comment at
> r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot
> if the write to the W component is required, but MULLO_UINT is different -
> it should be expanded to 4 instruction slots always. By the way, it seems
> cayman_mul_int_instr is incorrect in this regard.
>
>
>>
>> It does not work to switch lasti with last_slot, since that makes the loop
>> run too
>> many times (in my test case lasti is 0 and last_slot is 3). So I just
>> removed the
>> loop, was that correct or should i resolve that in some other way?
>
>
> No, it's not correct, there should be a loop over the vector components for
> addition as well - it should be performed in the same way as on the
> pre-cayman chips. In your patch you are only performing the addition for one
> component.
>
> Basically, the only required change for UMAD on cayman is that you need to
> expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on
> cayman.

Should I keep the cayman_umad function or should I modify tgsi_umad and
add the cayman specific part there?

> Vadim
>
>
>>
>> Martin Andersson (1):
>>r600g: Add a Cayman specific version of UMAD
>>
>>   src/gallium/drivers/r600/r600_shader.c | 47
>> +-
>>   1 file changed, 46 insertions(+), 1 deletion(-)
>>
>

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Possible bug with r600g shader compiler

2013-03-31 Thread Martin Andersson
Hi,

I think have found a bug in the r600g shader compiler. I have a AMD 6950
and I'm running mesa from git.

The bug is exercised by the the vertex shader program in piglit
ext_transform_feedback/order.c

I have simplified the shader program so the compiled shader is easier to read:

#version 130
in uint starting_x;
flat out uint starting_x_copy;
flat out uint iteration_count;
flat out uint shift_reg_final;
uniform uint shift_count;

void main()
{
gl_Position = vec4(0.0);
uint x = starting_x;
uint count = 0u;
uint shift_reg = 1u;
starting_x_copy = starting_x;
uint k;
while (x != 0u) {
shift_reg = shift_count;
for (k = 0u; k < shift_count; ++k)
++count;
x = 0u;
}
iteration_count = count;
shift_reg_final = shift_reg;
}

It compiles to, http://pastebin.com/cQ8rbKCv.

input:
shift_count 64
starting_x 0

actual output:
iteration_count 1
shift_reg 1

expected output:
iteration_count 0
shift_reg 1

When the shader is run with starting_x set to 0 the iteration_count output is 1.
That should be impossible since the ++count is inside the while loop guarded
by x != 0. That the iteration_count is 1 and not 64 is also strange, it seems
to somehow have gotten past the while guard but only executed one iteration in
the for loop before exiting again. Another thing to note is that shift_reg
is not set to 64.

If I write 64 instead of shift_count in the for loop (k < 64u) (effectivily
optimizing it to 64 add statements instead of a loop) or switch the while
to an if, the program behaves as expected. That leads me to belive that
the issue is with the two nested loops.

The docs mentions something about nested flowcontrol for PRED_SETE_64.

"The instruction can also establish a predicate result (execute or skip) for
subsequent predicated instruction execution. This additional control allows a
compiler to support one-instruction issue for if-elseif operations, or an
integer result for nested flow-control, by using single-precision operations
to manipulate a predicate counter."

But the while and for loops are compiled to PRED_SETNE_INT which does not have
that comment. Anyway, I just wanted to include that comment in case it was
relevant.

Anyone knows whats wrong or have any ideas for how I could debug it further?

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Possible bug with r600g shader compiler

2013-03-31 Thread Martin Andersson
On Sun, Mar 31, 2013 at 2:51 PM, Martin Andersson  wrote:
>
> It compiles to, http://pastebin.com/cQ8rbKCv.

There should not be a dot at the end, the correct address is:
http://pastebin.com/cQ8rbKCv

I also configure mesa with --disable-gallium-llvm

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-02 Thread Martin Andersson
The multiplication part of tgsi_umad did not work on Cayman, because it did
not populate the correct vector slots.
---
 src/gallium/drivers/r600/r600_shader.c | 45 --
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 82885d1..6c4cc8f 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5840,7 +5840,7 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
 {
struct tgsi_full_instruction *inst = 
&ctx->parse.FullToken.FullInstruction;
struct r600_bytecode_alu alu;
-   int i, j, r;
+   int i, j, k, r;
int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
 
/* src0 * src1 */
@@ -5848,21 +5848,40 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
if (!(inst->Dst[0].Register.WriteMask & (1 << i)))
continue;
 
-   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+   if (ctx->bc->chip_class == CAYMAN) {
+   for (j = 0 ; j < 4; j++) {
+   memset(&alu, 0, sizeof(struct 
r600_bytecode_alu));
 
-   alu.dst.chan = i;
-   alu.dst.sel = ctx->temp_reg;
-   alu.dst.write = 1;
+   alu.op = ALU_OP2_MULLO_UINT;
+   for (k = 0; k < inst->Instruction.NumSrcRegs; 
k++) {
+   r600_bytecode_src(&alu.src[k], 
&ctx->src[k], i);
+   }
+   tgsi_dst(ctx, &inst->Dst[0], j, &alu.dst);
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = (j == i);
+   if (j == 3)
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx->bc, &alu);
+   if (r)
+   return r;
+   }
+   } else {
+   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
 
-   alu.op = ALU_OP2_MULLO_UINT;
-   for (j = 0; j < 2; j++) {
-   r600_bytecode_src(&alu.src[j], &ctx->src[j], i);
-   }
+   alu.dst.chan = i;
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = 1;
 
-   alu.last = 1;
-   r = r600_bytecode_add_alu(ctx->bc, &alu);
-   if (r)
-   return r;
+   alu.op = ALU_OP2_MULLO_UINT;
+   for (j = 0; j < 2; j++) {
+   r600_bytecode_src(&alu.src[j], &ctx->src[j], i);
+   }
+
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx->bc, &alu);
+   if (r)
+   return r;
+   }
}
 
 
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-07 Thread Martin Andersson
If there are no objections or comments on this, it would be nice if
someone could commit it.

//Martin

On Tue, Apr 2, 2013 at 10:43 PM, Martin Andersson  wrote:
> The multiplication part of tgsi_umad did not work on Cayman, because it did
> not populate the correct vector slots.
> ---
>  src/gallium/drivers/r600/r600_shader.c | 45 
> --
>  1 file changed, 32 insertions(+), 13 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_shader.c 
> b/src/gallium/drivers/r600/r600_shader.c
> index 82885d1..6c4cc8f 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -5840,7 +5840,7 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
>  {
> struct tgsi_full_instruction *inst = 
> &ctx->parse.FullToken.FullInstruction;
> struct r600_bytecode_alu alu;
> -   int i, j, r;
> +   int i, j, k, r;
> int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
>
> /* src0 * src1 */
> @@ -5848,21 +5848,40 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
> if (!(inst->Dst[0].Register.WriteMask & (1 << i)))
> continue;
>
> -   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
> +   if (ctx->bc->chip_class == CAYMAN) {
> +   for (j = 0 ; j < 4; j++) {
> +   memset(&alu, 0, sizeof(struct 
> r600_bytecode_alu));
>
> -   alu.dst.chan = i;
> -   alu.dst.sel = ctx->temp_reg;
> -   alu.dst.write = 1;
> +   alu.op = ALU_OP2_MULLO_UINT;
> +   for (k = 0; k < inst->Instruction.NumSrcRegs; 
> k++) {
> +   r600_bytecode_src(&alu.src[k], 
> &ctx->src[k], i);
> +   }
> +   tgsi_dst(ctx, &inst->Dst[0], j, &alu.dst);
> +   alu.dst.sel = ctx->temp_reg;
> +   alu.dst.write = (j == i);
> +   if (j == 3)
> +   alu.last = 1;
> +   r = r600_bytecode_add_alu(ctx->bc, &alu);
> +   if (r)
> +   return r;
> +   }
> +   } else {
> +   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
>
> -   alu.op = ALU_OP2_MULLO_UINT;
> -   for (j = 0; j < 2; j++) {
> -   r600_bytecode_src(&alu.src[j], &ctx->src[j], i);
> -   }
> +   alu.dst.chan = i;
> +   alu.dst.sel = ctx->temp_reg;
> +   alu.dst.write = 1;
>
> -   alu.last = 1;
> -   r = r600_bytecode_add_alu(ctx->bc, &alu);
> -   if (r)
> -   return r;
> +   alu.op = ALU_OP2_MULLO_UINT;
> +   for (j = 0; j < 2; j++) {
> +   r600_bytecode_src(&alu.src[j], &ctx->src[j], 
> i);
> +   }
> +
> +   alu.last = 1;
> +   r = r600_bytecode_add_alu(ctx->bc, &alu);
> +   if (r)
> +   return r;
> +   }
> }
>
>
> --
> 1.8.2
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-08 Thread Martin Andersson
On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák  wrote:
> Pushed, thanks. The transform feedback test still doesn't pass, but at least
> the hardlocks are gone.

Thanks, I have looked into the other issue as well
http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html

The problem arises when there are nested loops. If I rework the code
so there are
no nested loops the issue disappears. At least one pixel also needs to enter the
outer loop. The pixels that should enter the outer loop behaves
correctly. It is those
pixels that should not enter the outer loop that misbehaves. It does
not matter if they
also fails the test for the inner loop, they will still execute the
instruction inside. That
leads to the strange results for that test.

The strangeness is easier to see if the NUM_POINTS in the
ext_transform_feedback/
order.c are run with smaller values,like 3, 6 and 9. Disable the code
that fail the test
and print starting_x, shift_reg_final and iteration_count.

Marek, since you implemented transform feedback for r600, do you think the issue
is with the tranform feedback code or the shader compiler or some other thing?

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-12 Thread Martin Andersson
e minor ALU grouping changes
>>>>> due
>>>>> to the VLIW4/VLIW5 difference) works fine for me on evergreen.
>>>>>
>>>>> According to the Martin's observations it looks like if the threads
>>>>> that
>>>>> shouldn't execute the loop body were incorrectly left in the active
>>>>> state.
>>>>> LOOP_BREAK should put them into the inactive-break state, but something
>>>>> goes wrong. Do the other piglit tests with nested loops (e.g.
>>>>> glsl-fs-loop-nested) work on cayman? Though possibly there are no other
>>>>> tests with the diverging loops as in this case.
>>>>>
>>>>> I'll try to write a simpler test with the diverging loops to see if the
>>>>> issue is really caused by the incorrect control flow handling, and to
>>>>> figure out the exact instruction that results in the incorrect active
>>>>> state.
>>>>>
>>>>> Also probably it worth checking if the stack size is correct for that
>>>>> shader (latest mesa should print nstack value in the shader disassemble
>>>>> header, I think it should be 3 for that shader) and maybe try adding
>>>>> some
>>>>> constant, e.g. 4 to the bc->nstack in the r600_bytecode_build just to
>>>>> be
>>>>> sure that we reserve enough of stack space, though I don't think stack
>>>>> size
>>>>> is the cause of this issue.
>>>>>
>>>>> Vadim
>>>>>
>>>>>
>>>>>
>>>>>   Marek
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin 
>>>>>> wrote:
>>>>>>
>>>>>>On 04/09/2013 10:58 AM, Martin Andersson wrote:
>>>>>>
>>>>>>>
>>>>>>>On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák 
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>Pushed, thanks. The transform feedback test still doesn't pass,
>>>>>>>> but
>>>>>>>> at
>>>>>>>>
>>>>>>>>> least
>>>>>>>>> the hardlocks are gone.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   Thanks, I have looked into the other issue as well
>>>>>>>>
>>>>>>>>
>>>>>>>> http://lists.freedesktop.org/**archives/mesa-dev/2013-**March/**<http://lists.freedesktop.org/archives/mesa-dev/2013-March/**>
>>>>>>>> **036941.html<http://lists.**freedesktop.org/**archives/**
>>>>>>>>
>>>>>>>> mesa-dev/2013-March/**036941.**html<http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> <http://lists.**freedesktop.**org/archives/mesa-**<http://freedesktop.org/archives/mesa-**>
>>>>>>>> dev/2013-March/036941.html>>>>>>>
>>>>>>>> archives/mesa-dev/2013-March/**036941.html<http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html>
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> The problem arises when there are nested loops. If I rework the code
>>>>>>>> so there are
>>>>>>>> no nested loops the issue disappears. At least one pixel also needs
>>>>>>>> to
>>>>>>>> enter the
>>>>>>>> outer loop. The pixels that should enter the outer loop behaves
>>>>>>>> correctly. It is those
>>>>>>>> pixels that should not enter the outer loop that misbehaves. It does
>>>>>>>> not matter if they
>>>>>>>> also fails the test for the inner loop, they will still execute the
>>>>>>>> instruction inside. That
>>>>>>>> leads to the strange results for that test.
>>>>>>>>
>>>>>>>>
>>>>>>>>   Please test the attached patch.
>>>>>>>
>>>>>>>
>>>>>>> Vadim
>>>>>>>
>>>>>>>
>>>>>>>The strangeness is easier to see if the NUM_POINTS in the
>>>>>>>
>>>>>>>> ext_transform_feedback/
>>>>>>>> order.c are run with smaller values,like 3, 6 and 9. Disable the
>>>>>>>> code
>>>>>>>> that fail the test
>>>>>>>> and print starting_x, shift_reg_final and iteration_count.
>>>>>>>>
>>>>>>>> Marek, since you implemented transform feedback for r600, do you
>>>>>>>> think
>>>>>>>> the issue
>>>>>>>> is with the tranform feedback code or the shader compiler or some
>>>>>>>> other
>>>>>>>> thing?
>>>>>>>>
>>>>>>>> //Martin
>>>>>>>> __**_
>>>>>>>> mesa-dev mailing list
>>>>>>>> mesa-dev@lists.freedesktop.org
>>>>>>>>
>>>>>>>> http://lists.freedesktop.org/**mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>
>>>>>>>> http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>
>>>>>>>> <http://lists.freedesktop.**org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   ___
>>>>>>>
>>>>>>> mesa-dev mailing list
>>>>>>> mesa-dev@lists.freedesktop.org
>>>>>>>
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


0001-r600g-HACK-Fix-nested-loops-on-Cayman.patch
Description: Binary data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-13 Thread Martin Andersson
On Sat, Apr 13, 2013 at 4:23 AM, Vadim Girlin  wrote:
> On 04/12/2013 11:36 PM, Martin Andersson wrote:
>>
>> I have made some progress with this issue.
>>
>> Vadim, I did as you suggested and tried to mimic the output from the
>> shader analyser
>> tool. I used your patch as a base and then tried various ways to see
>> what would work.
>> After many tries (and lockups) I did managed to get the
>> ext_transform_feedback/order
>> test to pass.
>>
>> It is a very ugly patch but it should illustrate what the problem (and
>> potential solution) is.
>>
>> Your test program fails however because explicit break statements do
>> not work. It
>> should be possible to use the same code for the explicit breaks as for
>> the implicit
>> loop break.The reason it does not is that I detect the implicit break
>> with a hack and
>> it does notwork for explicit breaks.
>>
>> The problem is that I need to detect the break statement when creating the
>> corresponding if statement. So that I can treat it differently than
>> other "regular" if
>> statements. Anyone knows how I could do that, or is this the wrong
>> approach?
>>
>
> It doesn't work with my test app because IF/ENDIF blocks with BRK may
> contain other code, so you can't simply throw away IF/ENDIF making that code
> execute unconditionally.

Yeah my hack is not an viable option.

> By the way, shader analyzer in some cases also produces the code with
> JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really
> work as expected with catalyst. Possibly we're simply missing something in
> the hardware configuration.
>
> Also there is one thing that I didn't take into account in my initial patch
> - r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain
> why my initial patch doesn't work. Possibly if we prevent that optimization
> for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough
> to make it work. I'm attaching the additional patch that will force POP to
> be a separate instruction in this case, please test it (on top of the my
> first patch). This would be at least not very intrusive.

No, that patch did not help either.

> If this won't help, then I think we should understand what exactly we are
> trying to fix before implementing any big changes, possibly there is a
> better solution or at least a more clean workaround. In the worst case we
> can return to your approach and improve it to handle other cases.

I'm starting to think that there is nothing wrong with the shader
compiler. It seems to me that a push, pop inside a nested loop clears
the break status on a thread.

shift_reg = 1u;
count = 0u;
while (true) {
if (x == 1u)
break;
 while (true) {
 if (x != 1u)
  count = 10u;
 if (x == 1u)
  count = 20u;
 break;
 }
 shift_reg = 2u;
 break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 20
expected output: shift_reg == 1, count == 0

If I swap the if statements in the inner loop I get different results.

shift_reg = 1u;
count = 0u;
while (true) {
if (x == 1u)
break;
 while (true) {
 if (x == 1u)
  count = 20u;
 if (x != 1u)
  count = 10u;
 break;
 }
 shift_reg = 2u;
 break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 0
expected output: shift_reg == 1, count == 0

I tested both cases on mesa master and mesa master + Vadims two
patches with the same results.

//Martin

> Vadim
>
>
>> //Martin
>>
>> On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin 
>> wrote:
>>>
>>> On 04/11/2013 02:08 AM, Marek Olšák wrote:
>>>>
>>>>
>>>> Here's the output:
>>>>
>>>> creating vs ...
>>>> shader compilation status: OK
>>>> creating fs ...
>>>> shader compilation status: OK
>>>> thread #0 (0;0) : ref = 16608
>>>> thread #1 (1;0) : ref = 27873
>>>> thread #2 (0;1) : ref = 16608
>>>> thread #3 (1;1) : ref = 27877
>>>> results:
>>>>thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
>>>>thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>>>thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
>>>>thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>>>
>>>
&g

[Mesa-dev] [PATCH] r600g: Workaround for a nested loop bug on Cayman

2013-04-14 Thread Martin Andersson
There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested
loops may put the branch stack into a state such that ALU_PUSH_BEFORE
doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE
with a PUSH + ALU for nested loops.

Fixes piglit tests:
spec/!OpenGL 1.1/read-front
spec/EXT_transform_feedback/order*
spec/glsl-1.40/uniform_buffer/fs-struct-pad

No piglit regressions.
---
 src/gallium/drivers/r600/r600_shader.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6dbca50..aee011e 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -252,6 +252,7 @@ static int tgsi_endif(struct r600_shader_ctx *ctx);
 static int tgsi_bgnloop(struct r600_shader_ctx *ctx);
 static int tgsi_endloop(struct r600_shader_ctx *ctx);
 static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx);
+static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx);
 
 /*
  * bytestream -> r600 shader
@@ -5490,7 +5491,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
return 0;
 }
 
-static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
+static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int 
alu_type)
 {
struct r600_bytecode_alu alu;
int r;
@@ -5510,7 +5511,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, 
int opcode)
 
alu.last = 1;
 
-   r = r600_bytecode_add_alu_type(ctx->bc, &alu, CF_OP_ALU_PUSH_BEFORE);
+   r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type);
if (r)
return r;
return 0;
@@ -5730,7 +5731,20 @@ static void break_loop_on_flag(struct r600_shader_ctx 
*ctx, unsigned fc_sp)
 
 static int tgsi_if(struct r600_shader_ctx *ctx)
 {
-   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
+   int alu_type = CF_OP_ALU_PUSH_BEFORE;
+
+   /* 
+  There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for 
nested
+  loops may put the branch stack into a state such that 
ALU_PUSH_BEFORE 
+  doesn't work as expected. Workaround this by replacing the 
ALU_PUSH_BEFORE
+  with a PUSH + ALU for nested loops.
+*/
+   if (ctx->bc->chip_class == CAYMAN && 
need_cayman_loop_bug_workaround(ctx)) {
+   r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH);
+   alu_type = CF_OP_ALU;
+   }
+
+   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);
 
r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP);
 
@@ -5834,6 +5848,19 @@ static int tgsi_loop_brk_cont(struct r600_shader_ctx 
*ctx)
return 0;
 }
 
+static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx)
+{
+   unsigned int fscp;
+   int num_loops = 0;
+   for (fscp = ctx->bc->fc_sp; fscp > 0; fscp--)
+   {
+   if (FC_LOOP == ctx->bc->fc_stack[fscp].type)
+   ++num_loops;
+   }
+
+   return num_loops >= 2;
+}
+
 static int tgsi_umad(struct r600_shader_ctx *ctx)
 {
struct tgsi_full_instruction *inst = 
&ctx->parse.FullToken.FullInstruction;
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Workaround for a nested loop bug on Cayman

2013-04-14 Thread Martin Andersson
On Mon, Apr 15, 2013 at 1:09 AM, Vadim Girlin  wrote:
> On 04/15/2013 01:05 AM, Martin Andersson wrote:
>>
>> There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested
>> loops may put the branch stack into a state such that ALU_PUSH_BEFORE
>> doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE
>> with a PUSH + ALU for nested loops.
>>
>> Fixes piglit tests:
>> spec/!OpenGL 1.1/read-front
>> spec/EXT_transform_feedback/order*
>> spec/glsl-1.40/uniform_buffer/fs-struct-pad
>>
>> No piglit regressions.
>> ---
>>   src/gallium/drivers/r600/r600_shader.c | 33
>> ++---
>>   1 file changed, 30 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_shader.c
>> b/src/gallium/drivers/r600/r600_shader.c
>> index 6dbca50..aee011e 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -252,6 +252,7 @@ static int tgsi_endif(struct r600_shader_ctx *ctx);
>>   static int tgsi_bgnloop(struct r600_shader_ctx *ctx);
>>   static int tgsi_endloop(struct r600_shader_ctx *ctx);
>>   static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx);
>> +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx);
>>
>>   /*
>>* bytestream -> r600 shader
>> @@ -5490,7 +5491,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
>> return 0;
>>   }
>>
>> -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
>> +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int
>> alu_type)
>>   {
>> struct r600_bytecode_alu alu;
>> int r;
>> @@ -5510,7 +5511,7 @@ static int emit_logic_pred(struct r600_shader_ctx
>> *ctx, int opcode)
>>
>> alu.last = 1;
>>
>> -   r = r600_bytecode_add_alu_type(ctx->bc, &alu,
>> CF_OP_ALU_PUSH_BEFORE);
>> +   r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type);
>> if (r)
>> return r;
>> return 0;
>> @@ -5730,7 +5731,20 @@ static void break_loop_on_flag(struct
>> r600_shader_ctx *ctx, unsigned fc_sp)
>>
>>   static int tgsi_if(struct r600_shader_ctx *ctx)
>>   {
>> -   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
>> +   int alu_type = CF_OP_ALU_PUSH_BEFORE;
>> +
>> +   /*
>> +  There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx
>> for nested
>> +  loops may put the branch stack into a state such that
>> ALU_PUSH_BEFORE
>> +  doesn't work as expected. Workaround this by replacing the
>> ALU_PUSH_BEFORE
>> +  with a PUSH + ALU for nested loops.
>> +*/
>> +   if (ctx->bc->chip_class == CAYMAN &&
>> need_cayman_loop_bug_workaround(ctx)) {
>
>
> We already have current loop level for the stack size computation, see
> r600_bytecode::stack, so I think need_cayman_loop_bug_workaround call may be
> replaced with "ctx->bc->stack.loop > 1", if I'm not missing something.

Ok, will try that tonight. Should I add a comment that it is a hardware bug?

> Vadim
>
>
>> +   r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH);
>> +   alu_type = CF_OP_ALU;
>> +   }
>> +
>> +   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);
>>
>> r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP);
>>
>> @@ -5834,6 +5848,19 @@ static int tgsi_loop_brk_cont(struct
>> r600_shader_ctx *ctx)
>> return 0;
>>   }
>>
>> +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx)
>> +{
>> +   unsigned int fscp;
>> +   int num_loops = 0;
>> +   for (fscp = ctx->bc->fc_sp; fscp > 0; fscp--)
>> +   {
>> +   if (FC_LOOP == ctx->bc->fc_stack[fscp].type)
>> +   ++num_loops;
>> +   }
>> +
>> +   return num_loops >= 2;
>> +}
>> +
>>   static int tgsi_umad(struct r600_shader_ctx *ctx)
>>   {
>> struct tgsi_full_instruction *inst =
>> &ctx->parse.FullToken.FullInstruction;
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] r600g: Workaround for a harware bug with nested loops on Cayman

2013-04-15 Thread Martin Andersson
There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
LOOP_STARTxxx for nested loops may put the branch stack into a state
such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
by replacing the ALU_PUSH_BEFORE with a PUSH + ALU

Fixes piglit tests EXT_transform_feedback/order*

v2: Use existing loop count and improve comment
---
 src/gallium/drivers/r600/r600_shader.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6dbca50..f4398fd 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5490,7 +5490,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
return 0;
 }
 
-static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
+static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int 
alu_type)
 {
struct r600_bytecode_alu alu;
int r;
@@ -5510,7 +5510,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, 
int opcode)
 
alu.last = 1;
 
-   r = r600_bytecode_add_alu_type(ctx->bc, &alu, CF_OP_ALU_PUSH_BEFORE);
+   r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type);
if (r)
return r;
return 0;
@@ -5730,7 +5730,18 @@ static void break_loop_on_flag(struct r600_shader_ctx 
*ctx, unsigned fc_sp)
 
 static int tgsi_if(struct r600_shader_ctx *ctx)
 {
-   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
+   int alu_type = CF_OP_ALU_PUSH_BEFORE;
+
+   /* There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
+* LOOP_STARTxxx for nested loops may put the branch stack into a state
+* such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
+* by replacing the ALU_PUSH_BEFORE with a PUSH + ALU */
+   if (ctx->bc->chip_class == CAYMAN && ctx->bc->stack.loop > 1) {
+   r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH);
+   alu_type = CF_OP_ALU;
+   }
+
+   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);
 
r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP);
 
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/egl: Flush resources before presentation

2013-12-26 Thread Martin Andersson
Fixes wayland regression on r600g due to fast clear introduced by commit
edbbfac6.
---
 src/gallium/state_trackers/egl/common/native_helper.c   | 15 +++
 src/gallium/state_trackers/egl/common/native_helper.h   |  5 +
 src/gallium/state_trackers/egl/wayland/native_wayland.c |  4 
 3 files changed, 24 insertions(+)

diff --git a/src/gallium/state_trackers/egl/common/native_helper.c 
b/src/gallium/state_trackers/egl/common/native_helper.c
index 4a77a50..856cbb6 100644
--- a/src/gallium/state_trackers/egl/common/native_helper.c
+++ b/src/gallium/state_trackers/egl/common/native_helper.c
@@ -341,6 +341,21 @@ resource_surface_throttle(struct resource_surface *rsurf)
 }
 
 boolean
+resource_surface_flush_resource(struct resource_surface *rsurf,
+struct native_display *ndpy,
+enum native_attachment which)
+{
+   struct pipe_context *pipe = ndpy_get_copy_context(ndpy);
+
+   if (!pipe)
+  return FALSE;
+
+   pipe->flush_resource(pipe, rsurf->resources[which]);
+
+   return TRUE;
+}
+
+boolean
 resource_surface_flush(struct resource_surface *rsurf,
   struct native_display *ndpy)
 {
diff --git a/src/gallium/state_trackers/egl/common/native_helper.h 
b/src/gallium/state_trackers/egl/common/native_helper.h
index 4c369a7..0b53b28 100644
--- a/src/gallium/state_trackers/egl/common/native_helper.h
+++ b/src/gallium/state_trackers/egl/common/native_helper.h
@@ -91,6 +91,11 @@ resource_surface_copy_swap(struct resource_surface *rsurf,
 boolean
 resource_surface_throttle(struct resource_surface *rsurf);
 
+boolean
+resource_surface_flush_resource(struct resource_surface *rsurf,
+struct native_display *ndpy,
+enum native_attachment which);
+
 /**
  * Flush pending rendering using the copy context. This function saves a
  * marker for upcoming throttles.
diff --git a/src/gallium/state_trackers/egl/wayland/native_wayland.c 
b/src/gallium/state_trackers/egl/wayland/native_wayland.c
index cfdf4f8..0ab4be6 100644
--- a/src/gallium/state_trackers/egl/wayland/native_wayland.c
+++ b/src/gallium/state_trackers/egl/wayland/native_wayland.c
@@ -259,6 +259,10 @@ wayland_surface_swap_buffers(struct native_surface *nsurf)
if (ret == -1)
   return EGL_FALSE;
 
+   (void) resource_surface_flush_resource(surface->rsurf, &display->base,
+  NATIVE_ATTACHMENT_BACK_LEFT);
+   (void) resource_surface_flush(surface->rsurf, &display->base);
+
surface->frame_callback = wl_surface_frame(surface->win->surface);
wl_callback_add_listener(surface->frame_callback, &frame_listener, surface);
wl_proxy_set_queue((struct wl_proxy *) surface->frame_callback,
-- 
1.8.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/egl: Flush resources before presentation

2014-01-07 Thread Martin Andersson
Hi Marek,

Since it seems no one else have any comments on this, maybe you could
commit it for me?

//Martin

On Thu, Dec 26, 2013 at 1:15 PM, Marek Olšák  wrote:
> Reviewed-by: Marek Olšák 
>
> Marek
>
> On Thu, Dec 26, 2013 at 10:33 AM, Martin Andersson  wrote:
>> Fixes wayland regression on r600g due to fast clear introduced by commit
>> edbbfac6.
>> ---
>>  src/gallium/state_trackers/egl/common/native_helper.c   | 15 +++
>>  src/gallium/state_trackers/egl/common/native_helper.h   |  5 +
>>  src/gallium/state_trackers/egl/wayland/native_wayland.c |  4 
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/src/gallium/state_trackers/egl/common/native_helper.c 
>> b/src/gallium/state_trackers/egl/common/native_helper.c
>> index 4a77a50..856cbb6 100644
>> --- a/src/gallium/state_trackers/egl/common/native_helper.c
>> +++ b/src/gallium/state_trackers/egl/common/native_helper.c
>> @@ -341,6 +341,21 @@ resource_surface_throttle(struct resource_surface 
>> *rsurf)
>>  }
>>
>>  boolean
>> +resource_surface_flush_resource(struct resource_surface *rsurf,
>> +struct native_display *ndpy,
>> +enum native_attachment which)
>> +{
>> +   struct pipe_context *pipe = ndpy_get_copy_context(ndpy);
>> +
>> +   if (!pipe)
>> +  return FALSE;
>> +
>> +   pipe->flush_resource(pipe, rsurf->resources[which]);
>> +
>> +   return TRUE;
>> +}
>> +
>> +boolean
>>  resource_surface_flush(struct resource_surface *rsurf,
>>struct native_display *ndpy)
>>  {
>> diff --git a/src/gallium/state_trackers/egl/common/native_helper.h 
>> b/src/gallium/state_trackers/egl/common/native_helper.h
>> index 4c369a7..0b53b28 100644
>> --- a/src/gallium/state_trackers/egl/common/native_helper.h
>> +++ b/src/gallium/state_trackers/egl/common/native_helper.h
>> @@ -91,6 +91,11 @@ resource_surface_copy_swap(struct resource_surface *rsurf,
>>  boolean
>>  resource_surface_throttle(struct resource_surface *rsurf);
>>
>> +boolean
>> +resource_surface_flush_resource(struct resource_surface *rsurf,
>> +struct native_display *ndpy,
>> +enum native_attachment which);
>> +
>>  /**
>>   * Flush pending rendering using the copy context. This function saves a
>>   * marker for upcoming throttles.
>> diff --git a/src/gallium/state_trackers/egl/wayland/native_wayland.c 
>> b/src/gallium/state_trackers/egl/wayland/native_wayland.c
>> index cfdf4f8..0ab4be6 100644
>> --- a/src/gallium/state_trackers/egl/wayland/native_wayland.c
>> +++ b/src/gallium/state_trackers/egl/wayland/native_wayland.c
>> @@ -259,6 +259,10 @@ wayland_surface_swap_buffers(struct native_surface 
>> *nsurf)
>> if (ret == -1)
>>return EGL_FALSE;
>>
>> +   (void) resource_surface_flush_resource(surface->rsurf, &display->base,
>> +  NATIVE_ATTACHMENT_BACK_LEFT);
>> +   (void) resource_surface_flush(surface->rsurf, &display->base);
>> +
>> surface->frame_callback = wl_surface_frame(surface->win->surface);
>> wl_callback_add_listener(surface->frame_callback, &frame_listener, 
>> surface);
>> wl_proxy_set_queue((struct wl_proxy *) surface->frame_callback,
>> --
>> 1.8.5.1
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Random results in piglit spec/!OpenGL 1.1/read-front on r600g

2013-06-06 Thread Martin Andersson
I get random results when I run the spec/!OpenGL 1.1/read-front test.
Sometimes it passes and sometimes it failes, it mostly fails though.
When it fails the observed values are random. I have an AMD 6950,
running mesa git ce67fb4715e0c2fab01de33da475ef4705622020 and kernel
3.10-rc4.

If I insert a delay before the piglit_swap_buffers call in the test
(read-front.c) it always passes. It does not matter where I put the
delay in the piglit_dispay function, as long as it is before the
piglit_swap_buffers call. So it seems to be a race between the
swap_buffers_call and some earlier call (some piglit initialization
perhaps?)

Does anyone know what could be wrong or how I could debug it further?

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Random results in piglit spec/!OpenGL 1.1/read-front on r600g

2013-06-06 Thread Martin Andersson
On Fri, Jun 7, 2013 at 12:37 AM, Marek Olšák  wrote:
> There are bugs in both piglit and DRI2. I haven't looked into the
> issue, but Paul Berry seems to be working on it.
>
> See:
> http://lists.freedesktop.org/archives/piglit/2013-May/005880.html
> http://lists.freedesktop.org/archives/mesa-dev/2013-May/039985.html
>
> Marek
>

ok, it is good to know that it is being worked on, thanks

//Martin

> On Fri, Jun 7, 2013 at 12:04 AM, Martin Andersson  wrote:
>> I get random results when I run the spec/!OpenGL 1.1/read-front test.
>> Sometimes it passes and sometimes it failes, it mostly fails though.
>> When it fails the observed values are random. I have an AMD 6950,
>> running mesa git ce67fb4715e0c2fab01de33da475ef4705622020 and kernel
>> 3.10-rc4.
>>
>> If I insert a delay before the piglit_swap_buffers call in the test
>> (read-front.c) it always passes. It does not matter where I put the
>> delay in the piglit_dispay function, as long as it is before the
>> piglit_swap_buffers call. So it seems to be a race between the
>> swap_buffers_call and some earlier call (some piglit initialization
>> perhaps?)
>>
>> Does anyone know what could be wrong or how I could debug it further?
>>
>> //Martin
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Random results in piglit spec/!OpenGL 1.1/read-front on r600g

2013-06-07 Thread Martin Andersson
On Fri, Jun 7, 2013 at 8:58 AM, Martin Andersson  wrote:
> On Fri, Jun 7, 2013 at 12:37 AM, Marek Olšák  wrote:
>> There are bugs in both piglit and DRI2. I haven't looked into the
>> issue, but Paul Berry seems to be working on it.
>>
>> See:
>> http://lists.freedesktop.org/archives/piglit/2013-May/005880.html
>> http://lists.freedesktop.org/archives/mesa-dev/2013-May/039985.html
>>
>> Marek
>>
>
> ok, it is good to know that it is being worked on, thanks
>
> //Martin
>
>> On Fri, Jun 7, 2013 at 12:04 AM, Martin Andersson  wrote:
>>> I get random results when I run the spec/!OpenGL 1.1/read-front test.
>>> Sometimes it passes and sometimes it failes, it mostly fails though.
>>> When it fails the observed values are random. I have an AMD 6950,
>>> running mesa git ce67fb4715e0c2fab01de33da475ef4705622020 and kernel
>>> 3.10-rc4.
>>>
>>> If I insert a delay before the piglit_swap_buffers call in the test
>>> (read-front.c) it always passes. It does not matter where I put the
>>> delay in the piglit_dispay function, as long as it is before the
>>> piglit_swap_buffers call. So it seems to be a race between the
>>> swap_buffers_call and some earlier call (some piglit initialization
>>> perhaps?)
>>>
>>> Does anyone know what could be wrong or how I could debug it further?
>>>
>>> //Martin
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

This issue was fixed by
http://cgit.freedesktop.org/piglit/commit/?id=4c1d83cb4cf202dd7375593c830fd78db04ff14b

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/radeon: add env var to disable VM on Cayman/Trinity

2013-06-10 Thread Martin Andersson
On Mon, Jun 10, 2013 at 11:55 PM, Alex Deucher  wrote:
> On Mon, Jun 10, 2013 at 5:36 PM, Marek Olšák  wrote:
>> Reviewed-by: Marek Olšák 
>>
>> FYI, Cayman works for me as I said in
>> https://bugs.freedesktop.org/show_bug.cgi?id=62959 . I don't remember
>> if the issue has been fixed upstream.
>
> Yes, the fixes should be upstream, however, Tom has had some strange
> behavior with compute using VM and there are a number of strange bugs
> on TN and cayman that may still be VM related.

I have a bunch of regressions on a 6950 with va enabled, no hardlocks though.

Some of test results are random, one example is
all-colors.shader_test. It consists of a bunch of tests like this and
it fails at different pixels each time I run it:

clear color 0.0 0.0 0.0 0.0
clear
uniform vec4 color 0.0 0.0 0.0 0.0
draw arrays GL_LINES 0 2
probe all rgba 0.0 0.0 0.0 0.0

If i remove the draw arrays line the test always passes, I have spent
some time investigating it, but haven't found anything yet.

//Martin

> Alex
>
>>
>> Marek
>>
>> On Mon, Jun 10, 2013 at 10:34 PM,   wrote:
>>> From: Alex Deucher 
>>>
>>> Set env var RADEON_VA=0 to disable VM on Cayman/Trinity.
>>> Useful for debugging.
>>>
>>> Note: this is a candidate for the 9.1 branch.
>>>
>>> Signed-off-by: Alex Deucher 
>>> ---
>>>  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c |2 ++
>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
>>> b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
>>> index 15d5d31..ee4dfa1 100644
>>> --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
>>> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
>>> @@ -399,6 +399,8 @@ static boolean do_winsys_init(struct radeon_drm_winsys 
>>> *ws)
>>>&ws->info.r600_ib_vm_max_size))
>>>  ws->info.r600_virtual_address = FALSE;
>>>  }
>>> +   if (ws->gen == DRV_R600 && !debug_get_bool_option("RADEON_VA", 
>>> TRUE))
>>> +   ws->info.r600_virtual_address = FALSE;
>>>  }
>>>
>>>  /* Get max pipes, this is only needed for compute shaders.  All 
>>> evergreen+
>>> --
>>> 1.7.7.5
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g cache invalidate issues

2013-06-16 Thread Martin Andersson
I have been investigating why some piglit tests were failing when VA was 
enabled.
What I found was that the problems started with this commit:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=4539f8e20af286d1f521eb016c89c6d9af0b801c

That commit removed the S_0085F0_SMX_ACTION_ENA(1) and S_0085F0_SH_ACTION_ENA(1)
, among other things, from the cache invalidation. So I did some experiments 
where
I enabled those and this is what I found.

It didn't see any difference if I put the SMX and SH under 
R600_CONTEXT_FLUSH_AND_INV
as they were before the above commit or together with the other caches under 
R600_CONTEXT_INVAL_READ_CACHES.

I also tried only SH, only SMX and both together and got fewer failures when 
they both
were enabled. At the very least SH needs to be enabled, I'm not equally sure 
about SMX,
but it does seem to help on some tests under spec/EXT_framebuffer_multisample.

Since the failures are random it is hard to know exactly what tests are 
affected,
there are also unrelated issues that also cause random failures, but these tests
I'm reasonbly sure are fixed:

fast_color_clear/all-colors
fast_color_clear/redundant-clear
spec/!OpenGL 1.1/draw-pixels samples={2,4,6,8}
spec/!OpenGL 1.1/drawbuffer-modes

Then there are a bunch of test under spec/EXT_framebuffer_multisample that if 
not
fixed at least have a much greater chance of passing. Those failures seem 
unrelated
to VA though.

If my patch is wrong then this information can hopefully help someone else find
the real solution to these problems.

This was tested with a 6950 on mesa master 
6d7abafdc86c6d8533bcb798465452c78c2694e8

//Martin

Martin Andersson (1):
  r600g: Include SH and SMX when invalidating read caches

 src/gallium/drivers/r600/r600_hw_context.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
1.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Include SH and SMX when invalidating read caches

2013-06-16 Thread Martin Andersson
Not including the SH and SMX caches when invalidating read caches causes
random failures on some piglit tests when VA is enabled.

Since the failures are random, and there other problems also causing random
failures, it's hard to know exactly what tests were effected, but these
tests now consistently pass:

fast_color_clear/all-colors
fast_color_clear/redundant-clear
spec/!OpenGL 1.1/draw-pixels samples={2,4,6,8}
spec/!OpenGL 1.1/drawbuffer-modes
---
 src/gallium/drivers/r600/r600_hw_context.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 944b666..df20e56 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -231,6 +231,8 @@ void r600_flush_emit(struct r600_context *rctx)
if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
cp_coher_cntl |= S_0085F0_VC_ACTION_ENA(1) |
S_0085F0_TC_ACTION_ENA(1) |
+   S_0085F0_SH_ACTION_ENA(1) |
+   S_0085F0_SMX_ACTION_ENA(1) |
S_0085F0_FULL_CACHE_ENA(1);
emit_flush = 1;
}
-- 
1.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Include SH and SMX when invalidating read caches

2013-06-22 Thread Martin Andersson
On Sat, Jun 22, 2013 at 12:22 PM, Marek Olšák  wrote:
> Reviewed-by: Marek Olšák 
>
> BTW, SMX is a write cache, to maybe it shouldn't be part of this patch.

I made a little experiment where i ran
"ext_framebuffer_multisample-unaligned-blit 4 color downsample -auto"
1 times and found that without SMX the test failed 177 times and
with SMX it didn't fail at all. So I do think the SMX cache should be
invalidated somewhere.

Before 
http://cgit.freedesktop.org/mesa/mesa/commit/?id=4539f8e20af286d1f521eb016c89c6d9af0b801c
it was under R600_CONTEXT_FLUSH_AND_INV, is that a better place?

If that is the proper place for SMX should SH also be there, since it
was also there before the patch, or do you have any other suggestions?

> Marek
>
> On Sun, Jun 16, 2013 at 1:27 PM, Martin Andersson  wrote:
>> Not including the SH and SMX caches when invalidating read caches causes
>> random failures on some piglit tests when VA is enabled.
>>
>> Since the failures are random, and there other problems also causing random
>> failures, it's hard to know exactly what tests were effected, but these
>> tests now consistently pass:
>>
>> fast_color_clear/all-colors
>> fast_color_clear/redundant-clear
>> spec/!OpenGL 1.1/draw-pixels samples={2,4,6,8}
>> spec/!OpenGL 1.1/drawbuffer-modes
>> ---
>>  src/gallium/drivers/r600/r600_hw_context.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
>> b/src/gallium/drivers/r600/r600_hw_context.c
>> index 944b666..df20e56 100644
>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>> @@ -231,6 +231,8 @@ void r600_flush_emit(struct r600_context *rctx)
>> if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
>> cp_coher_cntl |= S_0085F0_VC_ACTION_ENA(1) |
>> S_0085F0_TC_ACTION_ENA(1) |
>> +   S_0085F0_SH_ACTION_ENA(1) |
>> +   S_0085F0_SMX_ACTION_ENA(1) |
>> S_0085F0_FULL_CACHE_ENA(1);
>> emit_flush = 1;
>> }
>> --
>> 1.8.3
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Include SH and SMX when invalidating read caches

2013-06-23 Thread Martin Andersson
On Sun, Jun 23, 2013 at 7:41 PM, Alex Deucher  wrote:
> On Sat, Jun 22, 2013 at 11:53 AM, Martin Andersson  wrote:
>> On Sat, Jun 22, 2013 at 12:22 PM, Marek Olšák  wrote:
>>> Reviewed-by: Marek Olšák 
>>>
>>> BTW, SMX is a write cache, to maybe it shouldn't be part of this patch.
>>
>> I made a little experiment where i ran
>> "ext_framebuffer_multisample-unaligned-blit 4 color downsample -auto"
>> 1 times and found that without SMX the test failed 177 times and
>> with SMX it didn't fail at all. So I do think the SMX cache should be
>> invalidated somewhere.
>>
>> Before 
>> http://cgit.freedesktop.org/mesa/mesa/commit/?id=4539f8e20af286d1f521eb016c89c6d9af0b801c
>> it was under R600_CONTEXT_FLUSH_AND_INV, is that a better place?
>>
>> If that is the proper place for SMX should SH also be there, since it
>> was also there before the patch, or do you have any other suggestions?
>
> Does something like this help?  You might play with some variants of
> this patch.  This might break compute however as Tom had some problems
> with CB flushes on cayman class hw in the past.

I tested this patch as it is and I found that it helped alot. Here is
a comparison of three piglit runs. Take these numbers with a grain of
salt, since the failures are random and I only did one run on each
experiment.

master:   11055/11781
my patch:11216/11781
your patch: 11319/11781

The random failures are all but gone. I did three runs with you patch
and only found three tests that still fail randomly:

spec/!OpenGL 3.0/gl-3.0-texture-integer
spec/glsl-1.40/uniform_buffer/fs-struct-pad
spec/glsl-1.40/uniform_buffer/vs-struct-pad

//Martin

> Alex
>
>
>>
>>> Marek
>>>
>>> On Sun, Jun 16, 2013 at 1:27 PM, Martin Andersson  
>>> wrote:
>>>> Not including the SH and SMX caches when invalidating read caches causes
>>>> random failures on some piglit tests when VA is enabled.
>>>>
>>>> Since the failures are random, and there other problems also causing random
>>>> failures, it's hard to know exactly what tests were effected, but these
>>>> tests now consistently pass:
>>>>
>>>> fast_color_clear/all-colors
>>>> fast_color_clear/redundant-clear
>>>> spec/!OpenGL 1.1/draw-pixels samples={2,4,6,8}
>>>> spec/!OpenGL 1.1/drawbuffer-modes
>>>> ---
>>>>  src/gallium/drivers/r600/r600_hw_context.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
>>>> b/src/gallium/drivers/r600/r600_hw_context.c
>>>> index 944b666..df20e56 100644
>>>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>>>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>>>> @@ -231,6 +231,8 @@ void r600_flush_emit(struct r600_context *rctx)
>>>> if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
>>>> cp_coher_cntl |= S_0085F0_VC_ACTION_ENA(1) |
>>>> S_0085F0_TC_ACTION_ENA(1) |
>>>> +   S_0085F0_SH_ACTION_ENA(1) |
>>>> +   S_0085F0_SMX_ACTION_ENA(1) |
>>>> S_0085F0_FULL_CACHE_ENA(1);
>>>> emit_flush = 1;
>>>> }
>>>> --
>>>> 1.8.3
>>>>
>>>> ___
>>>> mesa-dev mailing list
>>>> mesa-dev@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Include SH and SMX when invalidating read caches

2013-06-23 Thread Martin Andersson
On Sun, Jun 23, 2013 at 11:56 PM, Alex Deucher  wrote:
> On Sun, Jun 23, 2013 at 2:24 PM, Marek Olšák  wrote:
>> Hi Alex,
>>
>> rctx->framebuffer.state.nr_cbufs might not contain what you think it
>> does, because the framebuffer that needs flushing may have been
>> replaced by a new framebuffer and the cache flushing of the old
>> framebuffer usually takes place before the first draw to the new
>> framebuffer. To solve this, we can either set all CB bits, or move
>> setting CP_COHER_CNTL outside of r600_flush_emit.
>
> I think it should be ok to just set all the CB bits.  My only concern
> was whether there would be a problems with flushing unbound CBs.  I
> think it should be ok.  Martin, can you try the attached patch?

I ran two piglit tests and got the essentially the same results as the
first patch. I did find another test that fail randomly
(spec/EXT_framebuffer_blit/fbo-sys-blit) but it is broken with the
first patch as well.

//Martin

> Alex
>
>>
>> Marek
>>
>> On Sun, Jun 23, 2013 at 7:41 PM, Alex Deucher  wrote:
>>> On Sat, Jun 22, 2013 at 11:53 AM, Martin Andersson  
>>> wrote:
>>>> On Sat, Jun 22, 2013 at 12:22 PM, Marek Olšák  wrote:
>>>>> Reviewed-by: Marek Olšák 
>>>>>
>>>>> BTW, SMX is a write cache, to maybe it shouldn't be part of this patch.
>>>>
>>>> I made a little experiment where i ran
>>>> "ext_framebuffer_multisample-unaligned-blit 4 color downsample -auto"
>>>> 1 times and found that without SMX the test failed 177 times and
>>>> with SMX it didn't fail at all. So I do think the SMX cache should be
>>>> invalidated somewhere.
>>>>
>>>> Before 
>>>> http://cgit.freedesktop.org/mesa/mesa/commit/?id=4539f8e20af286d1f521eb016c89c6d9af0b801c
>>>> it was under R600_CONTEXT_FLUSH_AND_INV, is that a better place?
>>>>
>>>> If that is the proper place for SMX should SH also be there, since it
>>>> was also there before the patch, or do you have any other suggestions?
>>>
>>> Does something like this help?  You might play with some variants of
>>> this patch.  This might break compute however as Tom had some problems
>>> with CB flushes on cayman class hw in the past.
>>>
>>> Alex
>>>
>>>
>>>>
>>>>> Marek
>>>>>
>>>>> On Sun, Jun 16, 2013 at 1:27 PM, Martin Andersson  
>>>>> wrote:
>>>>>> Not including the SH and SMX caches when invalidating read caches causes
>>>>>> random failures on some piglit tests when VA is enabled.
>>>>>>
>>>>>> Since the failures are random, and there other problems also causing 
>>>>>> random
>>>>>> failures, it's hard to know exactly what tests were effected, but these
>>>>>> tests now consistently pass:
>>>>>>
>>>>>> fast_color_clear/all-colors
>>>>>> fast_color_clear/redundant-clear
>>>>>> spec/!OpenGL 1.1/draw-pixels samples={2,4,6,8}
>>>>>> spec/!OpenGL 1.1/drawbuffer-modes
>>>>>> ---
>>>>>>  src/gallium/drivers/r600/r600_hw_context.c | 2 ++
>>>>>>  1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
>>>>>> b/src/gallium/drivers/r600/r600_hw_context.c
>>>>>> index 944b666..df20e56 100644
>>>>>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>>>>>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>>>>>> @@ -231,6 +231,8 @@ void r600_flush_emit(struct r600_context *rctx)
>>>>>> if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
>>>>>> cp_coher_cntl |= S_0085F0_VC_ACTION_ENA(1) |
>>>>>> S_0085F0_TC_ACTION_ENA(1) |
>>>>>> +   S_0085F0_SH_ACTION_ENA(1) |
>>>>>> +   S_0085F0_SMX_ACTION_ENA(1) |
>>>>>> S_0085F0_FULL_CACHE_ENA(1);
>>>>>> emit_flush = 1;
>>>>>> }
>>>>>> --
>>>>>> 1.8.3
>>>>>>
>>>>>> ___
>>>>>> mesa-dev mailing list
>>>>>> mesa-dev@lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>> ___
>>>> mesa-dev mailing list
>>>> mesa-dev@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev