[Mesa-dev] [Bug 31255] Unigine Sanctuary v2.2: some surfaces have wrong colours

2010-11-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=31255

Pavel Ondračka  changed:

   What|Removed |Added

Summary|Unigine Sanctuary v 2.2:|Unigine Sanctuary v2.2:
   |some surfaces have wrong|some surfaces have wrong
   |colours |colours
  Component|Mesa core   |Drivers/Gallium/r300
 AssignedTo|mesa-...@lists.freedesktop. |dri-de...@lists.freedesktop
   |org |.org

--- Comment #1 from Pavel Ondračka  2010-11-16 03:04:33 PST 
---
This works fine with Tom Stellards sched-perf-rebase branch, so it seems like
r300g bug after all.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Lower the minimum stride from 512 to 256 bytes to fix bug #31578.

2010-11-16 Thread Keith Whitwell
On Mon, Nov 15, 2010 at 9:46 PM, Alex Deucher  wrote:
> On Mon, Nov 15, 2010 at 4:41 PM, Tilman Sauerbeck  
> wrote:
>> piglit/fbo-readpixels still passes for me.
>>
>> Signed-off-by: Tilman Sauerbeck 
>> ---
>>
>> Please review. And someone please tell me where those 512 and 256 bytes
>> are coming from :)
>
> The alignment depends on the type of tiling in use (linear, 1d, 2d).
> See this drm patch for more info:
> http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=fba4312e223f1187efc8c083daed70e57fa9c9d3
> The info needed can be queried via the tiling info ioctl.

I found the documentation on this pretty hard to follow, but the
kernel code seems to make sense.

Keith
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Lower the minimum stride from 512 to 256 bytes to fix bug #31578.

2010-11-16 Thread Alex Deucher
On Tue, Nov 16, 2010 at 11:35 AM, Keith Whitwell
 wrote:
> On Mon, Nov 15, 2010 at 9:46 PM, Alex Deucher  wrote:
>> On Mon, Nov 15, 2010 at 4:41 PM, Tilman Sauerbeck  
>> wrote:
>>> piglit/fbo-readpixels still passes for me.
>>>
>>> Signed-off-by: Tilman Sauerbeck 
>>> ---
>>>
>>> Please review. And someone please tell me where those 512 and 256 bytes
>>> are coming from :)
>>
>> The alignment depends on the type of tiling in use (linear, 1d, 2d).
>> See this drm patch for more info:
>> http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=fba4312e223f1187efc8c083daed70e57fa9c9d3
>> The info needed can be queried via the tiling info ioctl.
>
> I found the documentation on this pretty hard to follow, but the
> kernel code seems to make sense.
>

Also, mipmap pitch needs to be aligned to group size as well.

Alex

> Keith
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: gallium: Remove redundant sw and debug target helpers

2010-11-16 Thread Keith Whitwell
On Wed, 2010-11-10 at 16:04 -0800, Jakob Bornecrantz wrote:
> Hi all
> 
> We have a bunch of redundant target helpers to wrap screens with debug 
> drivers and for creating the various software drivers. This series removes 
> all but the inline one, I picked it since it gives more flexibility for 
> targets and maybe more importantly is the one that is used in 20 places vs 3 
> for the other one.
> 
> Comments please.
> 
> Cheers Jakob.

This looks goods to me Jakob.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jerome Glisse
Hi,

So i looked a bit more at what path we should try to optimize in the
mesa/gallium/pipe infrastructure. Here are some number gathers from
games :
drawcall / ps constant   vs constant ps samplervs sampler
doom31.45 1.39   9.24  9.86
nexuiz 6.27 5.98   6.84  7.30
openarena  2805.64 1.38   1.51  1.54

(value of 1 mean there is a call of this function for every draw call,
while value of 10 means there is a call to this function every 10 draw
call, average)

Note that openarena ps constant number is understable as it's fixed GL
pipeline which is in use here and the pixel shader constant doesn't
need much change in those case.

So i think clear trend is that there is a lot of constant upload and
sampler changing (allmost at each draw call for some games) Thus i
think we want to make sure that we have real fast path for uploading
constant or changing sampler. I think those path should be change and
should avoid using some of the gallium infrastructure. For shader
constant i think best solution is to provide the ptr to program
constant buffer directly to the pipe driver and let the driver choose
how it wants to upload constant to the GPU (GPU have different
capabilities, some can stream constant buffer inside their command
stream, other can just keep around a pool of buffer into which they
can memcpy, ...) As there is no common denominator i don't think we
should go through the pipe buffer allocation and providing a new pipe
buffer each time.

Optimizing this for r600g allow ~7% increase in games (when draw is
nop) ~5% (when not submitting to gpu) ~3% when no part of the driver
is commented. r600g have others bottleneck that tends to minimize the
gain we can get from such optimization. Patch at
http://people.freedesktop.org/~glisse/gallium_const_path/

For sampler i don't think we want to create persistant object, we are
spending precious time building, hashing, searching for similar
sampler each time in the gallium code, i think best would be to think
state as use once and forget. That said we can provide helper function
to pipe driver that wants to be cache sampler (but even for virtual hw
i don't think this makes sense). I haven't yet implemented a fast path
for sampler to see how much we can win from that but i will report
back once i do.

So a more fundamental question here is should we move away from
persistant state and consider all states (except shader and texture)
as being too much volatile so that caching any of them doesn't make
sense from performance point of view. That would mean change lot of
create/bind/delete interface to simply set interface for the pipe
driver. This could be seen as a simplification. Anyway i think we
should really consider moving more toward set than create/bind/delete
(i loved a lot the create/bind/delete paradigm but it doesn't seems to
be the one you want with GL, at least from number i gather with some
games).

Cheers,
Jerome Glisse
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Roland Scheidegger
On 16.11.2010 20:21, Jerome Glisse wrote:
> Hi,
> 
> So i looked a bit more at what path we should try to optimize in the
> mesa/gallium/pipe infrastructure. Here are some number gathers from
> games :
> drawcall / ps constant   vs constant ps samplervs sampler
> doom31.45 1.39   9.24  9.86
> nexuiz 6.27 5.98   6.84  7.30
> openarena  2805.64 1.38   1.51  1.54
> 
> (value of 1 mean there is a call of this function for every draw call,
> while value of 10 means there is a call to this function every 10 draw
> call, average)
> 
> Note that openarena ps constant number is understable as it's fixed GL
> pipeline which is in use here and the pixel shader constant doesn't
> need much change in those case.
> 
> So i think clear trend is that there is a lot of constant upload and
> sampler changing (allmost at each draw call for some games) Thus i
> think we want to make sure that we have real fast path for uploading
> constant or changing sampler. I think those path should be change and
> should avoid using some of the gallium infrastructure. For shader
> constant i think best solution is to provide the ptr to program
> constant buffer directly to the pipe driver and let the driver choose
> how it wants to upload constant to the GPU (GPU have different
> capabilities, some can stream constant buffer inside their command
> stream, other can just keep around a pool of buffer into which they
> can memcpy, ...) As there is no common denominator i don't think we
> should go through the pipe buffer allocation and providing a new pipe
> buffer each time.
> 
> Optimizing this for r600g allow ~7% increase in games (when draw is
> nop) ~5% (when not submitting to gpu) ~3% when no part of the driver
> is commented. r600g have others bottleneck that tends to minimize the
> gain we can get from such optimization. Patch at
> http://people.freedesktop.org/~glisse/gallium_const_path/
> 
> For sampler i don't think we want to create persistant object, we are
> spending precious time building, hashing, searching for similar
> sampler each time in the gallium code, i think best would be to think
> state as use once and forget. That said we can provide helper function
> to pipe driver that wants to be cache sampler (but even for virtual hw
> i don't think this makes sense). I haven't yet implemented a fast path
> for sampler to see how much we can win from that but i will report
> back once i do.
> 
> So a more fundamental question here is should we move away from
> persistant state and consider all states (except shader and texture)
> as being too much volatile so that caching any of them doesn't make
> sense from performance point of view. That would mean change lot of
> create/bind/delete interface to simply set interface for the pipe
> driver. This could be seen as a simplification. Anyway i think we
> should really consider moving more toward set than create/bind/delete
> (i loved a lot the create/bind/delete paradigm but it doesn't seems to
> be the one you want with GL, at least from number i gather with some
> games).

Why do you think it's faster to create and use a new state rather than
search in the hash cache and reuse this? I was under the impression
(this being a dx10 paradigm) even hw is quite optimized for this (that
is, you just keep all the state objects on the hw somewhere and switch
between them). Also, what functions did you really see? If things work
as expected, it should be mostly bind, not create/delete.
Now it is certainly possible a driver doesn't make good use of this
(i.e. it really does all the time consuming stuff on bind), but this is
outside the scope of the infrastructure.
It is possible hashing is insufficient (could for instance cause too
many collisions hence need to recreate state object) but the principle
mechanism looks quite sound to me.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): glsl: fix assorted MSVC warnings

2010-11-16 Thread Ian Romanick
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/15/2010 05:48 PM, Brian Paul wrote:

> case ir_unop_b2f:
>assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
>for (unsigned c = 0; c < op[0]->type->components(); c++) {
> -  data.f[c] = op[0]->value.b[c] ? 1.0 : 0.0;
> +  data.f[c] = op[0]->value.b[c] ? 1.0F : 0.0F;

Please don't do this.  This particular MSVC warning should just be
disabled.  If this warning were generated for non-literals and for
literals that actually did lose precision being stored to a float, it
might have a chance at having some value.  Instead, it's just noise.

Individual warnings can be disabled with a pragma, and this one should
probably be disabled in mesa/compiler.h:

#pragma warning(disable: 4244)

There may be a way to do it from the command line, but I don't know what
it is.

The F suffixes on constants are also worthless, and they make the code
ugly.  Expecting that they will be added everywhere when no other
compiler generates this warning is a losing battle.

>}
>break;
> case ir_unop_f2b:
>assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
>for (unsigned c = 0; c < op[0]->type->components(); c++) {
> -  data.b[c] = bool(op[0]->value.f[c]);
> +  data.b[c] = op[0]->value.f[c] != 0.0F ? true : false;

This warning should also be disabled for the same reason as the above.
This one isn't even a correctness warning, is a performance warning.
The code that is replacing the case may have even worse performance than
the cast!  The other changes can stay, but this one needs to be reverted.

>}
>break;
> case ir_unop_b2i:
> @@ -163,7 +163,7 @@ ir_expression::constant_expression_value()
> case ir_unop_i2b:
>assert(op[0]->type->is_integer());
>for (unsigned c = 0; c < op[0]->type->components(); c++) {
> -  data.b[c] = bool(op[0]->value.u[c]);
> +  data.b[c] = op[0]->value.u[c] ? true : false;

What warning is this?  I was unable to reproduce a warning on Visual
Studio 2008 Express Edition.  I suspect this should be reverted too.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzi4ZQACgkQX1gOwKyEAw+GGgCgjEJdaEUQCtifgezcJKFKJqki
xzgAn1zmI6KrJ3+6lyujiY/IIf0LUE9o
=MSjO
-END PGP SIGNATURE-
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jerome Glisse
On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger  wrote:
> On 16.11.2010 20:21, Jerome Glisse wrote:
>> Hi,
>>
>> So i looked a bit more at what path we should try to optimize in the
>> mesa/gallium/pipe infrastructure. Here are some number gathers from
>> games :
>> drawcall /     ps constant   vs constant     ps sampler    vs sampler
>> doom3            1.45             1.39               9.24              9.86
>> nexuiz             6.27             5.98               6.84              7.30
>> openarena  2805.64             1.38               1.51              1.54
>>
>> (value of 1 mean there is a call of this function for every draw call,
>> while value of 10 means there is a call to this function every 10 draw
>> call, average)
>>
>> Note that openarena ps constant number is understable as it's fixed GL
>> pipeline which is in use here and the pixel shader constant doesn't
>> need much change in those case.
>>
>> So i think clear trend is that there is a lot of constant upload and
>> sampler changing (allmost at each draw call for some games) Thus i
>> think we want to make sure that we have real fast path for uploading
>> constant or changing sampler. I think those path should be change and
>> should avoid using some of the gallium infrastructure. For shader
>> constant i think best solution is to provide the ptr to program
>> constant buffer directly to the pipe driver and let the driver choose
>> how it wants to upload constant to the GPU (GPU have different
>> capabilities, some can stream constant buffer inside their command
>> stream, other can just keep around a pool of buffer into which they
>> can memcpy, ...) As there is no common denominator i don't think we
>> should go through the pipe buffer allocation and providing a new pipe
>> buffer each time.
>>
>> Optimizing this for r600g allow ~7% increase in games (when draw is
>> nop) ~5% (when not submitting to gpu) ~3% when no part of the driver
>> is commented. r600g have others bottleneck that tends to minimize the
>> gain we can get from such optimization. Patch at
>> http://people.freedesktop.org/~glisse/gallium_const_path/
>>
>> For sampler i don't think we want to create persistant object, we are
>> spending precious time building, hashing, searching for similar
>> sampler each time in the gallium code, i think best would be to think
>> state as use once and forget. That said we can provide helper function
>> to pipe driver that wants to be cache sampler (but even for virtual hw
>> i don't think this makes sense). I haven't yet implemented a fast path
>> for sampler to see how much we can win from that but i will report
>> back once i do.
>>
>> So a more fundamental question here is should we move away from
>> persistant state and consider all states (except shader and texture)
>> as being too much volatile so that caching any of them doesn't make
>> sense from performance point of view. That would mean change lot of
>> create/bind/delete interface to simply set interface for the pipe
>> driver. This could be seen as a simplification. Anyway i think we
>> should really consider moving more toward set than create/bind/delete
>> (i loved a lot the create/bind/delete paradigm but it doesn't seems to
>> be the one you want with GL, at least from number i gather with some
>> games).
>
> Why do you think it's faster to create and use a new state rather than
> search in the hash cache and reuse this? I was under the impression
> (this being a dx10 paradigm) even hw is quite optimized for this (that
> is, you just keep all the state objects on the hw somewhere and switch
> between them). Also, what functions did you really see? If things work
> as expected, it should be mostly bind, not create/delete.
> Now it is certainly possible a driver doesn't make good use of this
> (i.e. it really does all the time consuming stuff on bind), but this is
> outside the scope of the infrastructure.
> It is possible hashing is insufficient (could for instance cause too
> many collisions hence need to recreate state object) but the principle
> mechanism looks quite sound to me.
>
> Roland
>

The create/bin & reuse paradgim is likely good for a directx like api
where api put incentive on application to create  and manage
efficiently the states it wants to use. But GL, which is i believe the
API we should focus on, is a completely different business. From what
i am seeing from games, we repeatly see change to shader constant and
we repeatly see change to sampler. We might be using a tool small hash
or missing opportunity of reuse, i can totaly believe in that. But
nonetheless from what i see it's counter productive to try to hash all
those states and hope for reuse simply because cost of creating state
is too high and the reuse opportunity (even if we improve it) looks
too small. Here you have to think about hundred of thousand call per
frame and wasting time to try to to find a GL states pattern in
application looks doom to failure to me. From what i have se

Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Bridgman, John
>>Why do you think it's faster to create and use a new state rather than search 
>>in the hash cache and reuse this? I was under the impression (this being a 
>>dx10 paradigm) even hw is quite optimized for this (that is, you just keep 
>>all the state objects on the hw somewhere and switch between them). 

I read Jerome's post as suggesting that there wasn't much actual re-use going 
on (so the savings from re-use might be outweighed by the overhead of creating 
re-useable state) but I don't think he explicitly said that. Jerome, can you 
clarify that ?



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl_dri2: Add missing intel chip ids.

2010-11-16 Thread Robert Hooker
Signed-off-by: Robert Hooker 
---
 src/egl/drivers/dri2/egl_dri2.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 6db44c7..a83f32b 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -899,10 +899,20 @@ const int i915_chip_ids[] = {
0x29b2, /* PCI_CHIP_Q35_G */
0x29c2, /* PCI_CHIP_G33_G */
0x29d2, /* PCI_CHIP_Q33_G */
+   0xa001, /* PCI_CHIP_IGD_G */
0xa011, /* Pineview */
 };
 
 const int i965_chip_ids[] = {
+   0x0042, /* PCI_CHIP_ILD_G */
+   0x0046, /* PCI_CHIP_ILM_G */
+   0x0102, /* PCI_CHIP_SANDYBRIDGE_GT1 */
+   0x0106, /* PCI_CHIP_SANDYBRIDGE_M_GT1 */
+   0x010a, /* PCI_CHIP_SANDYBRIDGE_S */
+   0x0112, /* PCI_CHIP_SANDYBRIDGE_GT2 */
+   0x0116, /* PCI_CHIP_SANDYBRIDGE_M_GT2 */
+   0x0122, /* PCI_CHIP_SANDYBRIDGE_GT2_PLUS */
+   0x0126, /* PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS */
0x29a2, /* PCI_CHIP_I965_G */
0x2992, /* PCI_CHIP_I965_Q */
0x2982, /* PCI_CHIP_I965_G_1 */
@@ -914,6 +924,8 @@ const int i965_chip_ids[] = {
0x2e12, /* PCI_CHIP_Q45_G */
0x2e22, /* PCI_CHIP_G45_G */
0x2e32, /* PCI_CHIP_G41_G */
+   0x2e42, /* PCI_CHIP_B43_G */
+   0x2e92, /* PCI_CHIP_B43_G1 */
 };
 
 const int r100_chip_ids[] = {
-- 
1.7.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Roland Scheidegger
On 16.11.2010 20:59, Jerome Glisse wrote:
> On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger  
> wrote:
>> On 16.11.2010 20:21, Jerome Glisse wrote:
>>> Hi,
>>>
>>> So i looked a bit more at what path we should try to optimize in the
>>> mesa/gallium/pipe infrastructure. Here are some number gathers from
>>> games :
>>> drawcall / ps constant   vs constant ps samplervs sampler
>>> doom31.45 1.39   9.24  9.86
>>> nexuiz 6.27 5.98   6.84  
>>> 7.30
>>> openarena  2805.64 1.38   1.51  1.54
>>>
>>> (value of 1 mean there is a call of this function for every draw call,
>>> while value of 10 means there is a call to this function every 10 draw
>>> call, average)
>>>
>>> Note that openarena ps constant number is understable as it's fixed GL
>>> pipeline which is in use here and the pixel shader constant doesn't
>>> need much change in those case.
>>>
>>> So i think clear trend is that there is a lot of constant upload and
>>> sampler changing (allmost at each draw call for some games) Thus i
>>> think we want to make sure that we have real fast path for uploading
>>> constant or changing sampler. I think those path should be change and
>>> should avoid using some of the gallium infrastructure. For shader
>>> constant i think best solution is to provide the ptr to program
>>> constant buffer directly to the pipe driver and let the driver choose
>>> how it wants to upload constant to the GPU (GPU have different
>>> capabilities, some can stream constant buffer inside their command
>>> stream, other can just keep around a pool of buffer into which they
>>> can memcpy, ...) As there is no common denominator i don't think we
>>> should go through the pipe buffer allocation and providing a new pipe
>>> buffer each time.
>>>
>>> Optimizing this for r600g allow ~7% increase in games (when draw is
>>> nop) ~5% (when not submitting to gpu) ~3% when no part of the driver
>>> is commented. r600g have others bottleneck that tends to minimize the
>>> gain we can get from such optimization. Patch at
>>> http://people.freedesktop.org/~glisse/gallium_const_path/
>>>
>>> For sampler i don't think we want to create persistant object, we are
>>> spending precious time building, hashing, searching for similar
>>> sampler each time in the gallium code, i think best would be to think
>>> state as use once and forget. That said we can provide helper function
>>> to pipe driver that wants to be cache sampler (but even for virtual hw
>>> i don't think this makes sense). I haven't yet implemented a fast path
>>> for sampler to see how much we can win from that but i will report
>>> back once i do.
>>>
>>> So a more fundamental question here is should we move away from
>>> persistant state and consider all states (except shader and texture)
>>> as being too much volatile so that caching any of them doesn't make
>>> sense from performance point of view. That would mean change lot of
>>> create/bind/delete interface to simply set interface for the pipe
>>> driver. This could be seen as a simplification. Anyway i think we
>>> should really consider moving more toward set than create/bind/delete
>>> (i loved a lot the create/bind/delete paradigm but it doesn't seems to
>>> be the one you want with GL, at least from number i gather with some
>>> games).
>> Why do you think it's faster to create and use a new state rather than
>> search in the hash cache and reuse this? I was under the impression
>> (this being a dx10 paradigm) even hw is quite optimized for this (that
>> is, you just keep all the state objects on the hw somewhere and switch
>> between them). Also, what functions did you really see? If things work
>> as expected, it should be mostly bind, not create/delete.
>> Now it is certainly possible a driver doesn't make good use of this
>> (i.e. it really does all the time consuming stuff on bind), but this is
>> outside the scope of the infrastructure.
>> It is possible hashing is insufficient (could for instance cause too
>> many collisions hence need to recreate state object) but the principle
>> mechanism looks quite sound to me.
>>
>> Roland
>>
> 
> The create/bin & reuse paradgim is likely good for a directx like api
> where api put incentive on application to create  and manage
> efficiently the states it wants to use. But GL, which is i believe the
> API we should focus on, is a completely different business. From what
> i am seeing from games, we repeatly see change to shader constant and
> we repeatly see change to sampler. We might be using a tool small hash
> or missing opportunity of reuse, i can totaly believe in that. But
> nonetheless from what i see it's counter productive to try to hash all
> those states and hope for reuse simply because cost of creating state
> is too high and the reuse opportunity (even if we improve it) looks
> too small. Here you have to think about hundre

Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jakob Bornecrantz
On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse  wrote:
> Hi,
>
> So i looked a bit more at what path we should try to optimize in the
> mesa/gallium/pipe infrastructure. Here are some number gathers from
> games :
> drawcall /     ps constant   vs constant     ps sampler    vs sampler
> doom3            1.45             1.39               9.24              9.86
> nexuiz             6.27             5.98               6.84              7.30
> openarena  2805.64             1.38               1.51              1.54
>
> (value of 1 mean there is a call of this function for every draw call,
> while value of 10 means there is a call to this function every 10 draw
> call, average)
>
> Note that openarena ps constant number is understable as it's fixed GL
> pipeline which is in use here and the pixel shader constant doesn't
> need much change in those case.
>
> So i think clear trend is that there is a lot of constant upload and
> sampler changing (allmost at each draw call for some games)

Can you look into what actually changes between the sampler states?
Also that vs sampler state change number for OpenArena looks a bit
fishy to me.

Cheers Jakob.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): glsl: fix assorted MSVC warnings

2010-11-16 Thread José Fonseca
On Tue, 2010-11-16 at 11:55 -0800, Ian Romanick wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 11/15/2010 05:48 PM, Brian Paul wrote:
> 
> > case ir_unop_b2f:
> >assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
> >for (unsigned c = 0; c < op[0]->type->components(); c++) {
> > -data.f[c] = op[0]->value.b[c] ? 1.0 : 0.0;
> > +data.f[c] = op[0]->value.b[c] ? 1.0F : 0.0F;
> 
> Please don't do this.  This particular MSVC warning should just be
> disabled.  If this warning were generated for non-literals and for
> literals that actually did lose precision being stored to a float, it
> might have a chance at having some value.  Instead, it's just noise.
> 
> Individual warnings can be disabled with a pragma, and this one should
> probably be disabled in mesa/compiler.h:
> 
> #pragma warning(disable: 4244)
> 
> There may be a way to do it from the command line, but I don't know what
> it is.

It's -wd4244.

> The F suffixes on constants are also worthless, and they make the code
> ugly. 

I had the impression it was more than a warning, namely that the
compilers would use double precision intermediates instead of single
precision floats when constants don't have the 'f' suffix.

Gcc does it. Take for example:

 float foo(float x)
{
return 1.0 / x + 5.0;
}

float foof(float x)
{
return 1.0f / x + 5.0f;
}

If you compile it on x64 with

gcc -g0 -O3 -S -o - test.c

you'll get 
   
.file   "foo.c"
.text
.p2align 4,,15
.globl foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
unpcklps%xmm0, %xmm0
cvtps2pd%xmm0, %xmm1
movsd   .LC0(%rip), %xmm0
divsd   %xmm1, %xmm0
addsd   .LC1(%rip), %xmm0
unpcklpd%xmm0, %xmm0
cvtpd2ps%xmm0, %xmm0
ret
.cfi_endproc
.LFE0:
.size   foo, .-foo
.p2align 4,,15
.globl foof
.type   foof, @function
foof:
.LFB1:
.cfi_startproc
movaps  %xmm0, %xmm1
movss   .LC2(%rip), %xmm0
divss   %xmm1, %xmm0
addss   .LC3(%rip), %xmm0
ret
.cfi_endproc
.LFE1:
.size   foof, .-foof
.section.rodata.cst8,"aM",@progbits,8
.align 8
.LC0:
.long   0
.long   1072693248
.align 8
.LC1:
.long   0
.long   1075052544
.section.rodata.cst4,"aM",@progbits,4
.align 4
.LC2:
.long   1065353216
.align 4
.LC3:
.long   1084227584
.ident  "GCC: (Debian 4.4.5-6) 4.4.5"
.section.note.GNU-stack,"",@progbits

And as you can see, one function uses double precision, and the other
uses floating point.

Code quality is much better in the latter.

> Expecting that they will be added everywhere when no other
> compiler generates this warning is a losing battle.

I really think this is a battle everybody should fight. Perhaps the 

   condition ? 1.0 : 0.0

is something that a compiler should eliminate, but "single precision
expressions should use 'f' suffix on constants" seems to be a good rule
of thumb to follow.

Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Bridgman, John
>>I'm really not sure of that. I think one reason dx10 did this is because
applications mostly did that anyway - at least for subsequent frames
pretty much all the state they are using is going to be the same one as
used on the previous frame. 

That is probably an important point -- there may not be much re-use within a 
single frame but there probably is a lot of re-use from one frame to the next. 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): glsl: fix assorted MSVC warnings

2010-11-16 Thread Brian Paul

On 11/16/2010 12:55 PM, Ian Romanick wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/15/2010 05:48 PM, Brian Paul wrote:


 case ir_unop_b2f:
assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
for (unsigned c = 0; c<  op[0]->type->components(); c++) {
-data.f[c] = op[0]->value.b[c] ? 1.0 : 0.0;
+data.f[c] = op[0]->value.b[c] ? 1.0F : 0.0F;


Please don't do this.  This particular MSVC warning should just be
disabled.  If this warning were generated for non-literals and for
literals that actually did lose precision being stored to a float, it
might have a chance at having some value.  Instead, it's just noise.

Individual warnings can be disabled with a pragma, and this one should
probably be disabled in mesa/compiler.h:

#pragma warning(disable: 4244)

There may be a way to do it from the command line, but I don't know what
it is.

The F suffixes on constants are also worthless, and they make the code
ugly.  Expecting that they will be added everywhere when no other
compiler generates this warning is a losing battle.


I've been in the habit of using F suffixes for many, many years.  Back 
in my IRIX days it told the compiler to use a float and not a double 
for the computation (which was faster).  And according to Jose's email 
which I just spotted, that's still the case with gcc.


I recall another another old unix compiler that I used back then 
(maybe AIX) issued warnings similar to MSVC.  Old habits die hard. 
But I don't think it's a bad habit.






}
break;
 case ir_unop_f2b:
assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
for (unsigned c = 0; c<  op[0]->type->components(); c++) {
-data.b[c] = bool(op[0]->value.f[c]);
+data.b[c] = op[0]->value.f[c] != 0.0F ? true : false;


This warning should also be disabled for the same reason as the above.
This one isn't even a correctness warning, is a performance warning.
The code that is replacing the case may have even worse performance than
the cast!  The other changes can stay, but this one needs to be reverted.


Perhaps data.b[c] = bool((int) op[0]->value.f[c]);  would do the trick.



}
break;
 case ir_unop_b2i:
@@ -163,7 +163,7 @@ ir_expression::constant_expression_value()
 case ir_unop_i2b:
assert(op[0]->type->is_integer());
for (unsigned c = 0; c<  op[0]->type->components(); c++) {
-data.b[c] = bool(op[0]->value.u[c]);
+data.b[c] = op[0]->value.u[c] ? true : false;


What warning is this?  I was unable to reproduce a warning on Visual
Studio 2008 Express Edition.  I suspect this should be reverted too.


The warning was:

src\glsl\ir_constant_expression.cpp(166) : warning C4800: 'unsigned 
int' : forcing value to bool 'true' or 'false' (performance warning)


-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jerome Glisse
On Tue, Nov 16, 2010 at 3:27 PM, Roland Scheidegger  wrote:
> On 16.11.2010 20:59, Jerome Glisse wrote:
>> On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger  
>> wrote:
>>> On 16.11.2010 20:21, Jerome Glisse wrote:
 Hi,

 So i looked a bit more at what path we should try to optimize in the
 mesa/gallium/pipe infrastructure. Here are some number gathers from
 games :
 drawcall /     ps constant   vs constant     ps sampler    vs sampler
 doom3            1.45             1.39               9.24              9.86
 nexuiz             6.27             5.98               6.84              
 7.30
 openarena  2805.64             1.38               1.51              1.54

 (value of 1 mean there is a call of this function for every draw call,
 while value of 10 means there is a call to this function every 10 draw
 call, average)

 Note that openarena ps constant number is understable as it's fixed GL
 pipeline which is in use here and the pixel shader constant doesn't
 need much change in those case.

 So i think clear trend is that there is a lot of constant upload and
 sampler changing (allmost at each draw call for some games) Thus i
 think we want to make sure that we have real fast path for uploading
 constant or changing sampler. I think those path should be change and
 should avoid using some of the gallium infrastructure. For shader
 constant i think best solution is to provide the ptr to program
 constant buffer directly to the pipe driver and let the driver choose
 how it wants to upload constant to the GPU (GPU have different
 capabilities, some can stream constant buffer inside their command
 stream, other can just keep around a pool of buffer into which they
 can memcpy, ...) As there is no common denominator i don't think we
 should go through the pipe buffer allocation and providing a new pipe
 buffer each time.

 Optimizing this for r600g allow ~7% increase in games (when draw is
 nop) ~5% (when not submitting to gpu) ~3% when no part of the driver
 is commented. r600g have others bottleneck that tends to minimize the
 gain we can get from such optimization. Patch at
 http://people.freedesktop.org/~glisse/gallium_const_path/

 For sampler i don't think we want to create persistant object, we are
 spending precious time building, hashing, searching for similar
 sampler each time in the gallium code, i think best would be to think
 state as use once and forget. That said we can provide helper function
 to pipe driver that wants to be cache sampler (but even for virtual hw
 i don't think this makes sense). I haven't yet implemented a fast path
 for sampler to see how much we can win from that but i will report
 back once i do.

 So a more fundamental question here is should we move away from
 persistant state and consider all states (except shader and texture)
 as being too much volatile so that caching any of them doesn't make
 sense from performance point of view. That would mean change lot of
 create/bind/delete interface to simply set interface for the pipe
 driver. This could be seen as a simplification. Anyway i think we
 should really consider moving more toward set than create/bind/delete
 (i loved a lot the create/bind/delete paradigm but it doesn't seems to
 be the one you want with GL, at least from number i gather with some
 games).
>>> Why do you think it's faster to create and use a new state rather than
>>> search in the hash cache and reuse this? I was under the impression
>>> (this being a dx10 paradigm) even hw is quite optimized for this (that
>>> is, you just keep all the state objects on the hw somewhere and switch
>>> between them). Also, what functions did you really see? If things work
>>> as expected, it should be mostly bind, not create/delete.
>>> Now it is certainly possible a driver doesn't make good use of this
>>> (i.e. it really does all the time consuming stuff on bind), but this is
>>> outside the scope of the infrastructure.
>>> It is possible hashing is insufficient (could for instance cause too
>>> many collisions hence need to recreate state object) but the principle
>>> mechanism looks quite sound to me.
>>>
>>> Roland
>>>
>>
>> The create/bin & reuse paradgim is likely good for a directx like api
>> where api put incentive on application to create  and manage
>> efficiently the states it wants to use. But GL, which is i believe the
>> API we should focus on, is a completely different business. From what
>> i am seeing from games, we repeatly see change to shader constant and
>> we repeatly see change to sampler. We might be using a tool small hash
>> or missing opportunity of reuse, i can totaly believe in that. But
>> nonetheless from what i see it's counter productive to try to hash all
>> those states and hope for reuse simply be

Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jerome Glisse
On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz  wrote:
> On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse  wrote:
>> Hi,
>>
>> So i looked a bit more at what path we should try to optimize in the
>> mesa/gallium/pipe infrastructure. Here are some number gathers from
>> games :
>> drawcall /     ps constant   vs constant     ps sampler    vs sampler
>> doom3            1.45             1.39               9.24              9.86
>> nexuiz             6.27             5.98               6.84              7.30
>> openarena  2805.64             1.38               1.51              1.54
>>
>> (value of 1 mean there is a call of this function for every draw call,
>> while value of 10 means there is a call to this function every 10 draw
>> call, average)
>>
>> Note that openarena ps constant number is understable as it's fixed GL
>> pipeline which is in use here and the pixel shader constant doesn't
>> need much change in those case.
>>
>> So i think clear trend is that there is a lot of constant upload and
>> sampler changing (allmost at each draw call for some games)
>
> Can you look into what actually changes between the sampler states?
> Also that vs sampler state change number for OpenArena looks a bit
> fishy to me.
>
> Cheers Jakob.
>

I haven't looked at what change yet, i assume something small, i think
bugle trace of the engine is maybe easier to use than looking at
quake3 source code. For the vs sampler i was surprised too but it's
just the fact that q3 changes the vertex buffer a lot and this trigger
the vs sampler.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Roland Scheidegger
On 16.11.2010 22:15, Jerome Glisse wrote:
> On Tue, Nov 16, 2010 at 3:27 PM, Roland Scheidegger  
> wrote:
>> On 16.11.2010 20:59, Jerome Glisse wrote:
>>> On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger  
>>> wrote:
 On 16.11.2010 20:21, Jerome Glisse wrote:
> Hi,
>
> So i looked a bit more at what path we should try to optimize in the
> mesa/gallium/pipe infrastructure. Here are some number gathers from
> games :
> drawcall / ps constant   vs constant ps samplervs sampler
> doom31.45 1.39   9.24  
> 9.86
> nexuiz 6.27 5.98   6.84  
> 7.30
> openarena  2805.64 1.38   1.51  1.54
>
> (value of 1 mean there is a call of this function for every draw call,
> while value of 10 means there is a call to this function every 10 draw
> call, average)
>
> Note that openarena ps constant number is understable as it's fixed GL
> pipeline which is in use here and the pixel shader constant doesn't
> need much change in those case.
>
> So i think clear trend is that there is a lot of constant upload and
> sampler changing (allmost at each draw call for some games) Thus i
> think we want to make sure that we have real fast path for uploading
> constant or changing sampler. I think those path should be change and
> should avoid using some of the gallium infrastructure. For shader
> constant i think best solution is to provide the ptr to program
> constant buffer directly to the pipe driver and let the driver choose
> how it wants to upload constant to the GPU (GPU have different
> capabilities, some can stream constant buffer inside their command
> stream, other can just keep around a pool of buffer into which they
> can memcpy, ...) As there is no common denominator i don't think we
> should go through the pipe buffer allocation and providing a new pipe
> buffer each time.
>
> Optimizing this for r600g allow ~7% increase in games (when draw is
> nop) ~5% (when not submitting to gpu) ~3% when no part of the driver
> is commented. r600g have others bottleneck that tends to minimize the
> gain we can get from such optimization. Patch at
> http://people.freedesktop.org/~glisse/gallium_const_path/
>
> For sampler i don't think we want to create persistant object, we are
> spending precious time building, hashing, searching for similar
> sampler each time in the gallium code, i think best would be to think
> state as use once and forget. That said we can provide helper function
> to pipe driver that wants to be cache sampler (but even for virtual hw
> i don't think this makes sense). I haven't yet implemented a fast path
> for sampler to see how much we can win from that but i will report
> back once i do.
>
> So a more fundamental question here is should we move away from
> persistant state and consider all states (except shader and texture)
> as being too much volatile so that caching any of them doesn't make
> sense from performance point of view. That would mean change lot of
> create/bind/delete interface to simply set interface for the pipe
> driver. This could be seen as a simplification. Anyway i think we
> should really consider moving more toward set than create/bind/delete
> (i loved a lot the create/bind/delete paradigm but it doesn't seems to
> be the one you want with GL, at least from number i gather with some
> games).
 Why do you think it's faster to create and use a new state rather than
 search in the hash cache and reuse this? I was under the impression
 (this being a dx10 paradigm) even hw is quite optimized for this (that
 is, you just keep all the state objects on the hw somewhere and switch
 between them). Also, what functions did you really see? If things work
 as expected, it should be mostly bind, not create/delete.
 Now it is certainly possible a driver doesn't make good use of this
 (i.e. it really does all the time consuming stuff on bind), but this is
 outside the scope of the infrastructure.
 It is possible hashing is insufficient (could for instance cause too
 many collisions hence need to recreate state object) but the principle
 mechanism looks quite sound to me.

 Roland

>>> The create/bin & reuse paradgim is likely good for a directx like api
>>> where api put incentive on application to create  and manage
>>> efficiently the states it wants to use. But GL, which is i believe the
>>> API we should focus on, is a completely different business. From what
>>> i am seeing from games, we repeatly see change to shader constant and
>>> we repeatly see change to sampler. We might be using a tool small hash
>>> or missing opportunity of reuse, i can totaly beli

Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jakob Bornecrantz
On Tue, Nov 16, 2010 at 9:17 PM, Jerome Glisse  wrote:
> On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz  
> wrote:
>> On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse  wrote:
>>> Hi,
>>>
>>> So i looked a bit more at what path we should try to optimize in the
>>> mesa/gallium/pipe infrastructure. Here are some number gathers from
>>> games :
>>> drawcall /     ps constant   vs constant     ps sampler    vs sampler
>>> doom3            1.45             1.39               9.24              9.86
>>> nexuiz             6.27             5.98               6.84              
>>> 7.30
>>> openarena  2805.64             1.38               1.51              1.54
>>>
>>> (value of 1 mean there is a call of this function for every draw call,
>>> while value of 10 means there is a call to this function every 10 draw
>>> call, average)
>>>
>>> Note that openarena ps constant number is understable as it's fixed GL
>>> pipeline which is in use here and the pixel shader constant doesn't
>>> need much change in those case.
>>>
>>> So i think clear trend is that there is a lot of constant upload and
>>> sampler changing (allmost at each draw call for some games)
>>
>> Can you look into what actually changes between the sampler states?
>> Also that vs sampler state change number for OpenArena looks a bit
>> fishy to me.
>>
>> Cheers Jakob.
>>
>
> I haven't looked at what change yet, i assume something small, i think
> bugle trace of the engine is maybe easier to use than looking at
> quake3 source code. For the vs sampler i was surprised too but it's
> just the fact that q3 changes the vertex buffer a lot and this trigger
> the vs sampler.

I was thinking more along the lines of diffing the pipe_sampler_state
object and see what changed, what I'm suspecting is that its only the
max_lod field that keep changing. Games should usually stay within the
same number of textures and type of texture modes for for most draw
calls.

When you say vs_sampler do you mean bind_vertex_sampler_states or
bind_vertex_elements_state.

Cheers Jakob.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Corbin Simpson
> On Tue, Nov 16, 2010 at 9:17 PM, Jerome Glisse  wrote:
>> On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz  
>> wrote:
>>> On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse  wrote:
 Hi,

 So i looked a bit more at what path we should try to optimize in the
 mesa/gallium/pipe infrastructure. Here are some number gathers from
 games :
 drawcall /     ps constant   vs constant     ps sampler    vs sampler
 doom3            1.45             1.39               9.24              9.86
 nexuiz             6.27             5.98               6.84              
 7.30
 openarena  2805.64             1.38               1.51              1.54

 (value of 1 mean there is a call of this function for every draw call,
 while value of 10 means there is a call to this function every 10 draw
 call, average)

 Note that openarena ps constant number is understable as it's fixed GL
 pipeline which is in use here and the pixel shader constant doesn't
 need much change in those case.

 So i think clear trend is that there is a lot of constant upload and
 sampler changing (allmost at each draw call for some games)
>>>
>>> Can you look into what actually changes between the sampler states?
>>> Also that vs sampler state change number for OpenArena looks a bit
>>> fishy to me.
>>>
>>> Cheers Jakob.
>>>
>>
>> I haven't looked at what change yet, i assume something small, i think
>> bugle trace of the engine is maybe easier to use than looking at
>> quake3 source code. For the vs sampler i was surprised too but it's
>> just the fact that q3 changes the vertex buffer a lot and this trigger
>> the vs sampler.

Could we get some problematic Bugle traces posted that we could all
examine, rather than guessing at this? It'd be very nice to know
whether or not the problems are in the GL state tracker layer before
we move on to optimizing Gallium's interface, mostly because Dx
appears to not suffer these same problems.

-- 
When the facts change, I change my mind. What do you do, sir? ~ Keynes

Corbin Simpson

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 31673] New: GL_FRAGMENT_PRECISION_HIGH preprocessor macro undefined in GLSL ES

2010-11-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=31673

   Summary: GL_FRAGMENT_PRECISION_HIGH preprocessor macro
undefined in GLSL ES
   Product: Mesa
   Version: git
  Platform: All
OS/Version: All
Status: NEW
  Severity: minor
  Priority: medium
 Component: Mesa core
AssignedTo: mesa-dev@lists.freedesktop.org
ReportedBy: kenn...@whitecape.org


According to the GLSL ES specification, section 4.5,

"The built-in macro GL_FRAGMENT_PRECISION_HIGH is defined to one on systems
supporting highp precision in the fragment language

   #define GL_FRAGMENT_PRECISION_HIGH 1

and is not defined on systems not supporting highp precision in the fragment
language.  When defined, this macro is available in both the vertex and
fragment languages.  The highp qualifier is an optional feature in the fragment
language and is not enabled by #extension."

glcpp currently does not define this macro for GLSL ES.  As far as I know, all
Mesa drivers currently support highp, so perhaps we should just define it
unconditionally.  However, I imagine this may not always be the case...

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jerome Glisse
On Tue, Nov 16, 2010 at 6:06 PM, Corbin Simpson
 wrote:
>> On Tue, Nov 16, 2010 at 9:17 PM, Jerome Glisse  wrote:
>>> On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz  
>>> wrote:
 On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse  wrote:
> Hi,
>
> So i looked a bit more at what path we should try to optimize in the
> mesa/gallium/pipe infrastructure. Here are some number gathers from
> games :
> drawcall /     ps constant   vs constant     ps sampler    vs sampler
> doom3            1.45             1.39               9.24              
> 9.86
> nexuiz             6.27             5.98               6.84              
> 7.30
> openarena  2805.64             1.38               1.51              1.54
>
> (value of 1 mean there is a call of this function for every draw call,
> while value of 10 means there is a call to this function every 10 draw
> call, average)
>
> Note that openarena ps constant number is understable as it's fixed GL
> pipeline which is in use here and the pixel shader constant doesn't
> need much change in those case.
>
> So i think clear trend is that there is a lot of constant upload and
> sampler changing (allmost at each draw call for some games)

 Can you look into what actually changes between the sampler states?
 Also that vs sampler state change number for OpenArena looks a bit
 fishy to me.

 Cheers Jakob.

>>>
>>> I haven't looked at what change yet, i assume something small, i think
>>> bugle trace of the engine is maybe easier to use than looking at
>>> quake3 source code. For the vs sampler i was surprised too but it's
>>> just the fact that q3 changes the vertex buffer a lot and this trigger
>>> the vs sampler.
>
> Could we get some problematic Bugle traces posted that we could all
> examine, rather than guessing at this? It'd be very nice to know
> whether or not the problems are in the GL state tracker layer before
> we move on to optimizing Gallium's interface, mostly because Dx
> appears to not suffer these same problems.
>

I haven't looked closely at sampler issue but the shader constant is
obvious on r600g, it's the pipe buffer allocation at each constant
update that kills us, even with somehow fixing pb* there is a too big
overhead in the pb layer. it's only few % of the whole cpu time bug
again things pile up and no matter how small you cut the cpu usage it
directly shows up in the framerate. That's why my feeling is that we
should keep the cpu overhead for state change as low as possible and i
fear the fastest way is to drop create/bind paradigm.

I pretty much use the dri benchmark wiki page for running games in
timedemo, lately i mostly used nexuiz because it's easy to install and
it's rendering is somewhat more complex that quake3 thus a little bit
more closer to what i would like to target for r600g driver.

Anyway my point is that here the gl state tracker is not to blame,
it's only the fact that real application lead to a lot of cso
activities and i am not convinced that what we might possibly win with
cso is more important than what we loose when considering API such as
GL.

Cheers,
Jerome Glisse
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Zack Rusin
On Tuesday 16 November 2010 20:26:03 Jerome Glisse wrote: 
> Anyway my point is that here the gl state tracker is not to blame,
> it's only the fact that real application lead to a lot of cso
> activities and i am not convinced that what we might possibly win with
> cso is more important than what we loose when considering API such as
> GL.

And I disagree so we're at a stalemate and we'll never reach a conclusion. 
What I'm saying is that this isn't how we can ever reach a technical decision. 
There needs to be a compelling evidence for doing something that is obviously 
unintuitive. 

And this is unintuitive because there's a limited number of blend, depth, 
alpha, stencil or rasterizer states any application needs and quite frankly 
it's very small so caching it makes a hell lot of sense. I think it's more 
likely that we stuffed some value into one of  the cso's that should have a 
separate set method or that there's a bug somewhere. 

Anyway what I think is of no consequence, what matters is what you can prove. 
It'd be trivial to see:
1) what exactly changes that caching fails,
2) would a better hashing function and a better hash fix it,
3) whether it's a special case and requires special handling or whether it's 
globally the concept of csos,
4) whether the state tracker can be improved to handle it,
5) how much better things are when we don't cache (trivial to try by just 
changing the cso_set functions to just set stuff instead of using the 
create/bind semantics)

If you can prove your hypothesis, awesome! great find, lets change it. 
Otherwise I think the bikeshed should be blue because I'm a boy and I like 
blue.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jerome Glisse
On Tue, Nov 16, 2010 at 9:15 PM, Zack Rusin  wrote:
> On Tuesday 16 November 2010 20:26:03 Jerome Glisse wrote:
>> Anyway my point is that here the gl state tracker is not to blame,
>> it's only the fact that real application lead to a lot of cso
>> activities and i am not convinced that what we might possibly win with
>> cso is more important than what we loose when considering API such as
>> GL.
>
> And I disagree so we're at a stalemate and we'll never reach a conclusion.
> What I'm saying is that this isn't how we can ever reach a technical decision.
> There needs to be a compelling evidence for doing something that is obviously
> unintuitive.
>
> And this is unintuitive because there's a limited number of blend, depth,
> alpha, stencil or rasterizer states any application needs and quite frankly
> it's very small so caching it makes a hell lot of sense. I think it's more
> likely that we stuffed some value into one of  the cso's that should have a
> separate set method or that there's a bug somewhere.
>
> Anyway what I think is of no consequence, what matters is what you can prove.
> It'd be trivial to see:
> 1) what exactly changes that caching fails,
> 2) would a better hashing function and a better hash fix it,
> 3) whether it's a special case and requires special handling or whether it's
> globally the concept of csos,
> 4) whether the state tracker can be improved to handle it,
> 5) how much better things are when we don't cache (trivial to try by just
> changing the cso_set functions to just set stuff instead of using the
> create/bind semantics)
>
> If you can prove your hypothesis, awesome! great find, lets change it.
> Otherwise I think the bikeshed should be blue because I'm a boy and I like
> blue.
>
> z
>

Agree, i am just trying to get someone to look into it before i do ;)
I am more focusing on fixing the short coming of the r600 pipe driver
first. But i will get back to this cso things, and anyone is more than
welcome to take a look at it (openarena or nexuiz are showing lot of
cso activities with r600g or noop driver). I never meant to say jump
on this new wagon because it looks more promising, i am just trying to
stress out that no one should take the promise of cso caching for
granted because as far as i can tell it's not holding any of it as of
today.

Also noop driver is only marginaly faster than fglrx and you will see
that cso account for around 5%-10% of cpu time of 25% for the whole
mesa activities, also noop is special as the copy/swap buffer of the
current ddx is call, so it also slow done thing (thought i use small
resolution to minimize this).

Note that the shader constant upload part of my mail is disjoint from
cso and for that part i am convinced and i did give number showing
that it's unappropriate to use the pipe buffer allocation path but
that we should rather directly provide the program constant buffer ptr
to pipe driver and let the pipe driver pickup the best solution for
its hw.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Jerome Glisse
On Tue, Nov 16, 2010 at 9:43 PM, Jerome Glisse  wrote:
> On Tue, Nov 16, 2010 at 9:15 PM, Zack Rusin  wrote:
>> On Tuesday 16 November 2010 20:26:03 Jerome Glisse wrote:
>>> Anyway my point is that here the gl state tracker is not to blame,
>>> it's only the fact that real application lead to a lot of cso
>>> activities and i am not convinced that what we might possibly win with
>>> cso is more important than what we loose when considering API such as
>>> GL.
>>
>> And I disagree so we're at a stalemate and we'll never reach a conclusion.
>> What I'm saying is that this isn't how we can ever reach a technical 
>> decision.
>> There needs to be a compelling evidence for doing something that is obviously
>> unintuitive.
>>
>> And this is unintuitive because there's a limited number of blend, depth,
>> alpha, stencil or rasterizer states any application needs and quite frankly
>> it's very small so caching it makes a hell lot of sense. I think it's more
>> likely that we stuffed some value into one of  the cso's that should have a
>> separate set method or that there's a bug somewhere.
>>
>> Anyway what I think is of no consequence, what matters is what you can prove.
>> It'd be trivial to see:
>> 1) what exactly changes that caching fails,
>> 2) would a better hashing function and a better hash fix it,
>> 3) whether it's a special case and requires special handling or whether it's
>> globally the concept of csos,
>> 4) whether the state tracker can be improved to handle it,
>> 5) how much better things are when we don't cache (trivial to try by just
>> changing the cso_set functions to just set stuff instead of using the
>> create/bind semantics)
>>
>> If you can prove your hypothesis, awesome! great find, lets change it.
>> Otherwise I think the bikeshed should be blue because I'm a boy and I like
>> blue.
>>
>> z
>>
>
> Agree, i am just trying to get someone to look into it before i do ;)
> I am more focusing on fixing the short coming of the r600 pipe driver
> first. But i will get back to this cso things, and anyone is more than
> welcome to take a look at it (openarena or nexuiz are showing lot of
> cso activities with r600g or noop driver). I never meant to say jump
> on this new wagon because it looks more promising, i am just trying to
> stress out that no one should take the promise of cso caching for
> granted because as far as i can tell it's not holding any of it as of
> today.
>
> Also noop driver is only marginaly faster than fglrx and you will see
> that cso account for around 5%-10% of cpu time of 25% for the whole
> mesa activities, also noop is special as the copy/swap buffer of the
> current ddx is call, so it also slow done thing (thought i use small
> resolution to minimize this).
>
> Note that the shader constant upload part of my mail is disjoint from
> cso and for that part i am convinced and i did give number showing
> that it's unappropriate to use the pipe buffer allocation path but
> that we should rather directly provide the program constant buffer ptr
> to pipe driver and let the pipe driver pickup the best solution for
> its hw.
>
> Cheers,
> Jerome
>

Before i forget, the fact that cso shows up that high on cpu is likely
the outcome of cso not living long enough, like being deleted right
after being use and those we endup with nothing the cso cache and we
keep rebuilding over and over. Then come the problem of how to
determine what is the best live time of a cso, for DX it's easy but
for GL best we can do is do wild guess, some app might use some GL
state once every minute and those GL state might consumme memory for
no good reason btw those 2 usage ... Anyway just wanted to point out
the obvious of my results.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [patch] pass texFormat to _mesa_init_teximage_fields()

2010-11-16 Thread Brian Paul
This patch passes the texture image format to the 
_mesa_init_teximage_fields() function to make sure the texture image's 
format is always set (see fd.o bug 31544).


I'd appreciate it if someone could apply this patch and test on r200, 
r300 or r600.  I'll commit it later then.


-Brian
diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index 4533d82..ba8be12 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -2593,7 +2593,6 @@ copy_tex_image(struct gl_context *ctx, GLuint dims, 
GLenum target, GLint level,
 {
struct gl_texture_object *texObj;
struct gl_texture_image *texImage;
-   GLsizei postConvWidth = width, postConvHeight = height;
GLenum format, type;
GLint bpp;
void *buf;
@@ -2601,6 +2600,7 @@ copy_tex_image(struct gl_context *ctx, GLuint dims, 
GLenum target, GLint level,
texObj = _mesa_get_current_tex_object(ctx, target);
texImage = _mesa_get_tex_image(ctx, texObj, target, level);
 
+   /* Choose format/type for temporary image buffer */
format = _mesa_base_tex_format(ctx, internalFormat);
type = get_temp_image_type(ctx, format);
bpp = _mesa_bytes_per_pixel(format, type);
@@ -2632,12 +2632,8 @@ copy_tex_image(struct gl_context *ctx, GLuint dims, 
GLenum target, GLint level,
   ctx->Driver.FreeTexImageData(ctx, texImage);
}
 
-   _mesa_init_teximage_fields(ctx, target, texImage,
-  postConvWidth, postConvHeight, 1,
-  border, internalFormat);
-
-   _mesa_choose_texture_format(ctx, texObj, texImage, target, level,
-   internalFormat, GL_NONE, GL_NONE);
+   /* The texture's format was already chosen in _mesa_CopyTexImage() */
+   ASSERT(texImage->TexFormat != MESA_FORMAT_NONE);
 
/*
 * Store texture data (with pixel transfer ops)
@@ -2690,7 +2686,8 @@ _mesa_meta_CopyTexImage2D(struct gl_context *ctx, GLenum 
target, GLint level,
  * Have to be careful with locking and meta state for pixel transfer.
  */
 static void
-copy_tex_sub_image(struct gl_context *ctx, GLuint dims, GLenum target, GLint 
level,
+copy_tex_sub_image(struct gl_context *ctx,
+   GLuint dims, GLenum target, GLint level,
GLint xoffset, GLint yoffset, GLint zoffset,
GLint x, GLint y,
GLsizei width, GLsizei height)
@@ -2704,6 +2701,7 @@ copy_tex_sub_image(struct gl_context *ctx, GLuint dims, 
GLenum target, GLint lev
texObj = _mesa_get_current_tex_object(ctx, target);
texImage = _mesa_select_tex_image(ctx, texObj, target, level);
 
+   /* Choose format/type for temporary image buffer */
format = _mesa_get_format_base_format(texImage->TexFormat);
type = get_temp_image_type(ctx, format);
bpp = _mesa_bytes_per_pixel(format, type);
diff --git a/src/mesa/drivers/dri/intel/intel_tex_image.c 
b/src/mesa/drivers/dri/intel/intel_tex_image.c
index 50fe9bd..0a50be9 100644
--- a/src/mesa/drivers/dri/intel/intel_tex_image.c
+++ b/src/mesa/drivers/dri/intel/intel_tex_image.c
@@ -682,6 +682,7 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target,
struct gl_texture_object *texObj;
struct gl_texture_image *texImage;
int level = 0, internalFormat;
+   gl_format texFormat;
 
texObj = _mesa_get_current_tex_object(ctx, target);
intelObj = intel_texture_object(texObj);
@@ -724,16 +725,18 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target,
   intel_miptree_release(intel, &intelObj->mt);
 
intelObj->mt = mt;
+
+   if (texture_format == __DRI_TEXTURE_FORMAT_RGB)
+  texFormat = MESA_FORMAT_XRGB;
+   else
+  texFormat = MESA_FORMAT_ARGB;
+
_mesa_init_teximage_fields(&intel->ctx, target, texImage,
  rb->region->width, rb->region->height, 1,
- 0, internalFormat);
+ 0, internalFormat, texFormat);
 
intelImage->face = target_to_face(target);
intelImage->level = level;
-   if (texture_format == __DRI_TEXTURE_FORMAT_RGB)
-  texImage->TexFormat = MESA_FORMAT_XRGB;
-   else
-  texImage->TexFormat = MESA_FORMAT_ARGB;
texImage->RowStride = rb->region->pitch;
intel_miptree_reference(&intelImage->mt, intelObj->mt);
 
@@ -789,11 +792,10 @@ intel_image_target_texture_2d(struct gl_context *ctx, 
GLenum target,
intelObj->mt = mt;
_mesa_init_teximage_fields(&intel->ctx, target, texImage,
  image->region->width, image->region->height, 1,
- 0, image->internal_format);
+ 0, image->internal_format, image->format);
 
intelImage->face = target_to_face(target);
intelImage->level = 0;
-   texImage->TexFormat = image->format;
texImage->RowStride = image->region->pitch;
intel_miptree_reference(&intelImage->mt, intelObj->mt);
 
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_texture.c 
b/src/mesa/drivers/dri/nouveau/no

Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Mathias Fröhlich

Hi,

On Tuesday, November 16, 2010 20:21:26 Jerome Glisse wrote:
> So i looked a bit more at what path we should try to optimize in the
> mesa/gallium/pipe infrastructure. Here are some number gathers from
> games :
> drawcall / ps constant   vs constant ps samplervs sampler
> doom31.45 1.39   9.24  9.86
> nexuiz 6.27 5.98   6.84 
> 7.30 openarena  2805.64 1.38   1.51 
> 1.54
[...]

Just an other observation:
I was doing some profiling on OpenSceneGraph based applications. One of which 
the plain osgviewer with numerous models and one of which flightgear.
Drivers and hardware I have is a FireGL 73???,R520,r300g and a 
HD4890,RV770,r600g. Testing is done in the same cpu board.

One comparison is the draw time in osgviewer. The ones that know this 
application might remember the profiling graph, where you can see how long and 
when cull, draw and, if available gpu rendering happens.
I was in this case looking at the draw times which is just the time starting 
from the first state change in a frame to the last draw in a frame *excluding* 
the buffer swap/sync/flush and whatever serializes program execution tith the 
gpu.

Comparing these osgviewer draw times with fglrx with my favourite test model 
(fixed function) that is kind of representative for usage in flightgear.

R520
fglrx   ~0.7ms
r300g, git  ~1.6ms

The profiling picture of my head is that r300g still spends significant amount 
cpu time in current state attribute handling which is too often looping over 
all possible state attributes. BTW: that was much worse before Fancescos last 
copy to current patches. r300g also spends much time in the draw path in mesa, 
where every draw is looping over all 32 state attributes.
Doing some proof of concept work on these code paths improoved the draw times 
to 1.2ms on r300g.
The next cpu hog for r300g is the kernel side of the command stream parser. I 
would expect that something that makes use of preevaluated and validated 
command stream snippets in the kernel that are held for each of the drivers 
state objects and are just used in the executed command stream would help much 
here. Something along the lines of recording command stream macros/substreams 
that are just jumped into when executing the user level command stream. I 
believe that Jerome held some talk about something very similar at this years 
fossdem.

Translating that performance numbers from an example application to a more 
real world one like flightgear brings a framerate of ~85 frames for fglrx and 
~60 with current mesa. With the proof of concept stuff I already saw 65-70 on 
r300g.

Now the picture for r600g:

RV770
fglrx   ~0.8ms
r600g, git   5-7ms

As you can see fglrx is still about the same. But r600g is far off.
Also with r600g I can see the driver spending about as much time in parsing 
and validating in kernel as I can see it spending in the r600g backend code.

I do not remember the flightgear framerates for RV770,fglrx, but I believe they 
were comparable to the R520 ones, but with r600g I still see just about 20-30 
frames. Fiddling with these proof of concept stuff does not show up in r600g in 
a noticable way since this one is just dominated by its own backend cpu 
cycles.

So, I cannot contribute to this discussion which ones of the state objects are 
more heavily used, but looking at the above I see that r300g is already at a 
stage where it makes highly sense to improove some hot paths in mesas top 
layer. The r300 userspace backend code is visible but not high in profiles.
But r600g, using the same mesa/gallium infrastructure above spends much cpu 
cycles in its userspace as well as in the parser/validator code.
Which makes me wonder what is the fundamental difference of these two backends 
that accounts for this difference?

Just my 2cent

Mathias
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-16 Thread Tilman Sauerbeck
Zack Rusin [2010-11-16 21:15]:

> Anyway what I think is of no consequence, what matters is what you can prove. 
> It'd be trivial to see:
> 1) what exactly changes that caching fails,

Maybe I'm totally missing the point but:
 * In OpenArena (running a random demo), context.create_sampler_state is
   called 10 times, ie we only create 10 sampler states
 * context.bind_fragment_sampler_states is called ~64000 times.

Caching of pipe_sampler_states seems to work here.

Regards,
Tilman

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?


pgpaaZgpLVH8a.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev