[Mesa-dev] [Bug 31255] Unigine Sanctuary v2.2: some surfaces have wrong colours
https://bugs.freedesktop.org/show_bug.cgi?id=31255 Pavel Ondračka changed: What|Removed |Added Summary|Unigine Sanctuary v 2.2:|Unigine Sanctuary v2.2: |some surfaces have wrong|some surfaces have wrong |colours |colours Component|Mesa core |Drivers/Gallium/r300 AssignedTo|mesa-...@lists.freedesktop. |dri-de...@lists.freedesktop |org |.org --- Comment #1 from Pavel Ondračka 2010-11-16 03:04:33 PST --- This works fine with Tom Stellards sched-perf-rebase branch, so it seems like r300g bug after all. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Lower the minimum stride from 512 to 256 bytes to fix bug #31578.
On Mon, Nov 15, 2010 at 9:46 PM, Alex Deucher wrote: > On Mon, Nov 15, 2010 at 4:41 PM, Tilman Sauerbeck > wrote: >> piglit/fbo-readpixels still passes for me. >> >> Signed-off-by: Tilman Sauerbeck >> --- >> >> Please review. And someone please tell me where those 512 and 256 bytes >> are coming from :) > > The alignment depends on the type of tiling in use (linear, 1d, 2d). > See this drm patch for more info: > http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=fba4312e223f1187efc8c083daed70e57fa9c9d3 > The info needed can be queried via the tiling info ioctl. I found the documentation on this pretty hard to follow, but the kernel code seems to make sense. Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Lower the minimum stride from 512 to 256 bytes to fix bug #31578.
On Tue, Nov 16, 2010 at 11:35 AM, Keith Whitwell wrote: > On Mon, Nov 15, 2010 at 9:46 PM, Alex Deucher wrote: >> On Mon, Nov 15, 2010 at 4:41 PM, Tilman Sauerbeck >> wrote: >>> piglit/fbo-readpixels still passes for me. >>> >>> Signed-off-by: Tilman Sauerbeck >>> --- >>> >>> Please review. And someone please tell me where those 512 and 256 bytes >>> are coming from :) >> >> The alignment depends on the type of tiling in use (linear, 1d, 2d). >> See this drm patch for more info: >> http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=fba4312e223f1187efc8c083daed70e57fa9c9d3 >> The info needed can be queried via the tiling info ioctl. > > I found the documentation on this pretty hard to follow, but the > kernel code seems to make sense. > Also, mipmap pitch needs to be aligned to group size as well. Alex > Keith > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: gallium: Remove redundant sw and debug target helpers
On Wed, 2010-11-10 at 16:04 -0800, Jakob Bornecrantz wrote: > Hi all > > We have a bunch of redundant target helpers to wrap screens with debug > drivers and for creating the various software drivers. This series removes > all but the inline one, I picked it since it gives more flexibility for > targets and maybe more importantly is the one that is used in 20 places vs 3 > for the other one. > > Comments please. > > Cheers Jakob. This looks goods to me Jakob. Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
Hi, So i looked a bit more at what path we should try to optimize in the mesa/gallium/pipe infrastructure. Here are some number gathers from games : drawcall / ps constant vs constant ps samplervs sampler doom31.45 1.39 9.24 9.86 nexuiz 6.27 5.98 6.84 7.30 openarena 2805.64 1.38 1.51 1.54 (value of 1 mean there is a call of this function for every draw call, while value of 10 means there is a call to this function every 10 draw call, average) Note that openarena ps constant number is understable as it's fixed GL pipeline which is in use here and the pixel shader constant doesn't need much change in those case. So i think clear trend is that there is a lot of constant upload and sampler changing (allmost at each draw call for some games) Thus i think we want to make sure that we have real fast path for uploading constant or changing sampler. I think those path should be change and should avoid using some of the gallium infrastructure. For shader constant i think best solution is to provide the ptr to program constant buffer directly to the pipe driver and let the driver choose how it wants to upload constant to the GPU (GPU have different capabilities, some can stream constant buffer inside their command stream, other can just keep around a pool of buffer into which they can memcpy, ...) As there is no common denominator i don't think we should go through the pipe buffer allocation and providing a new pipe buffer each time. Optimizing this for r600g allow ~7% increase in games (when draw is nop) ~5% (when not submitting to gpu) ~3% when no part of the driver is commented. r600g have others bottleneck that tends to minimize the gain we can get from such optimization. Patch at http://people.freedesktop.org/~glisse/gallium_const_path/ For sampler i don't think we want to create persistant object, we are spending precious time building, hashing, searching for similar sampler each time in the gallium code, i think best would be to think state as use once and forget. That said we can provide helper function to pipe driver that wants to be cache sampler (but even for virtual hw i don't think this makes sense). I haven't yet implemented a fast path for sampler to see how much we can win from that but i will report back once i do. So a more fundamental question here is should we move away from persistant state and consider all states (except shader and texture) as being too much volatile so that caching any of them doesn't make sense from performance point of view. That would mean change lot of create/bind/delete interface to simply set interface for the pipe driver. This could be seen as a simplification. Anyway i think we should really consider moving more toward set than create/bind/delete (i loved a lot the create/bind/delete paradigm but it doesn't seems to be the one you want with GL, at least from number i gather with some games). Cheers, Jerome Glisse ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On 16.11.2010 20:21, Jerome Glisse wrote: > Hi, > > So i looked a bit more at what path we should try to optimize in the > mesa/gallium/pipe infrastructure. Here are some number gathers from > games : > drawcall / ps constant vs constant ps samplervs sampler > doom31.45 1.39 9.24 9.86 > nexuiz 6.27 5.98 6.84 7.30 > openarena 2805.64 1.38 1.51 1.54 > > (value of 1 mean there is a call of this function for every draw call, > while value of 10 means there is a call to this function every 10 draw > call, average) > > Note that openarena ps constant number is understable as it's fixed GL > pipeline which is in use here and the pixel shader constant doesn't > need much change in those case. > > So i think clear trend is that there is a lot of constant upload and > sampler changing (allmost at each draw call for some games) Thus i > think we want to make sure that we have real fast path for uploading > constant or changing sampler. I think those path should be change and > should avoid using some of the gallium infrastructure. For shader > constant i think best solution is to provide the ptr to program > constant buffer directly to the pipe driver and let the driver choose > how it wants to upload constant to the GPU (GPU have different > capabilities, some can stream constant buffer inside their command > stream, other can just keep around a pool of buffer into which they > can memcpy, ...) As there is no common denominator i don't think we > should go through the pipe buffer allocation and providing a new pipe > buffer each time. > > Optimizing this for r600g allow ~7% increase in games (when draw is > nop) ~5% (when not submitting to gpu) ~3% when no part of the driver > is commented. r600g have others bottleneck that tends to minimize the > gain we can get from such optimization. Patch at > http://people.freedesktop.org/~glisse/gallium_const_path/ > > For sampler i don't think we want to create persistant object, we are > spending precious time building, hashing, searching for similar > sampler each time in the gallium code, i think best would be to think > state as use once and forget. That said we can provide helper function > to pipe driver that wants to be cache sampler (but even for virtual hw > i don't think this makes sense). I haven't yet implemented a fast path > for sampler to see how much we can win from that but i will report > back once i do. > > So a more fundamental question here is should we move away from > persistant state and consider all states (except shader and texture) > as being too much volatile so that caching any of them doesn't make > sense from performance point of view. That would mean change lot of > create/bind/delete interface to simply set interface for the pipe > driver. This could be seen as a simplification. Anyway i think we > should really consider moving more toward set than create/bind/delete > (i loved a lot the create/bind/delete paradigm but it doesn't seems to > be the one you want with GL, at least from number i gather with some > games). Why do you think it's faster to create and use a new state rather than search in the hash cache and reuse this? I was under the impression (this being a dx10 paradigm) even hw is quite optimized for this (that is, you just keep all the state objects on the hw somewhere and switch between them). Also, what functions did you really see? If things work as expected, it should be mostly bind, not create/delete. Now it is certainly possible a driver doesn't make good use of this (i.e. it really does all the time consuming stuff on bind), but this is outside the scope of the infrastructure. It is possible hashing is insufficient (could for instance cause too many collisions hence need to recreate state object) but the principle mechanism looks quite sound to me. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): glsl: fix assorted MSVC warnings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/15/2010 05:48 PM, Brian Paul wrote: > case ir_unop_b2f: >assert(op[0]->type->base_type == GLSL_TYPE_BOOL); >for (unsigned c = 0; c < op[0]->type->components(); c++) { > - data.f[c] = op[0]->value.b[c] ? 1.0 : 0.0; > + data.f[c] = op[0]->value.b[c] ? 1.0F : 0.0F; Please don't do this. This particular MSVC warning should just be disabled. If this warning were generated for non-literals and for literals that actually did lose precision being stored to a float, it might have a chance at having some value. Instead, it's just noise. Individual warnings can be disabled with a pragma, and this one should probably be disabled in mesa/compiler.h: #pragma warning(disable: 4244) There may be a way to do it from the command line, but I don't know what it is. The F suffixes on constants are also worthless, and they make the code ugly. Expecting that they will be added everywhere when no other compiler generates this warning is a losing battle. >} >break; > case ir_unop_f2b: >assert(op[0]->type->base_type == GLSL_TYPE_FLOAT); >for (unsigned c = 0; c < op[0]->type->components(); c++) { > - data.b[c] = bool(op[0]->value.f[c]); > + data.b[c] = op[0]->value.f[c] != 0.0F ? true : false; This warning should also be disabled for the same reason as the above. This one isn't even a correctness warning, is a performance warning. The code that is replacing the case may have even worse performance than the cast! The other changes can stay, but this one needs to be reverted. >} >break; > case ir_unop_b2i: > @@ -163,7 +163,7 @@ ir_expression::constant_expression_value() > case ir_unop_i2b: >assert(op[0]->type->is_integer()); >for (unsigned c = 0; c < op[0]->type->components(); c++) { > - data.b[c] = bool(op[0]->value.u[c]); > + data.b[c] = op[0]->value.u[c] ? true : false; What warning is this? I was unable to reproduce a warning on Visual Studio 2008 Express Edition. I suspect this should be reverted too. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkzi4ZQACgkQX1gOwKyEAw+GGgCgjEJdaEUQCtifgezcJKFKJqki xzgAn1zmI6KrJ3+6lyujiY/IIf0LUE9o =MSjO -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger wrote: > On 16.11.2010 20:21, Jerome Glisse wrote: >> Hi, >> >> So i looked a bit more at what path we should try to optimize in the >> mesa/gallium/pipe infrastructure. Here are some number gathers from >> games : >> drawcall / ps constant vs constant ps sampler vs sampler >> doom3 1.45 1.39 9.24 9.86 >> nexuiz 6.27 5.98 6.84 7.30 >> openarena 2805.64 1.38 1.51 1.54 >> >> (value of 1 mean there is a call of this function for every draw call, >> while value of 10 means there is a call to this function every 10 draw >> call, average) >> >> Note that openarena ps constant number is understable as it's fixed GL >> pipeline which is in use here and the pixel shader constant doesn't >> need much change in those case. >> >> So i think clear trend is that there is a lot of constant upload and >> sampler changing (allmost at each draw call for some games) Thus i >> think we want to make sure that we have real fast path for uploading >> constant or changing sampler. I think those path should be change and >> should avoid using some of the gallium infrastructure. For shader >> constant i think best solution is to provide the ptr to program >> constant buffer directly to the pipe driver and let the driver choose >> how it wants to upload constant to the GPU (GPU have different >> capabilities, some can stream constant buffer inside their command >> stream, other can just keep around a pool of buffer into which they >> can memcpy, ...) As there is no common denominator i don't think we >> should go through the pipe buffer allocation and providing a new pipe >> buffer each time. >> >> Optimizing this for r600g allow ~7% increase in games (when draw is >> nop) ~5% (when not submitting to gpu) ~3% when no part of the driver >> is commented. r600g have others bottleneck that tends to minimize the >> gain we can get from such optimization. Patch at >> http://people.freedesktop.org/~glisse/gallium_const_path/ >> >> For sampler i don't think we want to create persistant object, we are >> spending precious time building, hashing, searching for similar >> sampler each time in the gallium code, i think best would be to think >> state as use once and forget. That said we can provide helper function >> to pipe driver that wants to be cache sampler (but even for virtual hw >> i don't think this makes sense). I haven't yet implemented a fast path >> for sampler to see how much we can win from that but i will report >> back once i do. >> >> So a more fundamental question here is should we move away from >> persistant state and consider all states (except shader and texture) >> as being too much volatile so that caching any of them doesn't make >> sense from performance point of view. That would mean change lot of >> create/bind/delete interface to simply set interface for the pipe >> driver. This could be seen as a simplification. Anyway i think we >> should really consider moving more toward set than create/bind/delete >> (i loved a lot the create/bind/delete paradigm but it doesn't seems to >> be the one you want with GL, at least from number i gather with some >> games). > > Why do you think it's faster to create and use a new state rather than > search in the hash cache and reuse this? I was under the impression > (this being a dx10 paradigm) even hw is quite optimized for this (that > is, you just keep all the state objects on the hw somewhere and switch > between them). Also, what functions did you really see? If things work > as expected, it should be mostly bind, not create/delete. > Now it is certainly possible a driver doesn't make good use of this > (i.e. it really does all the time consuming stuff on bind), but this is > outside the scope of the infrastructure. > It is possible hashing is insufficient (could for instance cause too > many collisions hence need to recreate state object) but the principle > mechanism looks quite sound to me. > > Roland > The create/bin & reuse paradgim is likely good for a directx like api where api put incentive on application to create and manage efficiently the states it wants to use. But GL, which is i believe the API we should focus on, is a completely different business. From what i am seeing from games, we repeatly see change to shader constant and we repeatly see change to sampler. We might be using a tool small hash or missing opportunity of reuse, i can totaly believe in that. But nonetheless from what i see it's counter productive to try to hash all those states and hope for reuse simply because cost of creating state is too high and the reuse opportunity (even if we improve it) looks too small. Here you have to think about hundred of thousand call per frame and wasting time to try to to find a GL states pattern in application looks doom to failure to me. From what i have se
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
>>Why do you think it's faster to create and use a new state rather than search >>in the hash cache and reuse this? I was under the impression (this being a >>dx10 paradigm) even hw is quite optimized for this (that is, you just keep >>all the state objects on the hw somewhere and switch between them). I read Jerome's post as suggesting that there wasn't much actual re-use going on (so the savings from re-use might be outweighed by the overhead of creating re-useable state) but I don't think he explicitly said that. Jerome, can you clarify that ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] egl_dri2: Add missing intel chip ids.
Signed-off-by: Robert Hooker --- src/egl/drivers/dri2/egl_dri2.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 6db44c7..a83f32b 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -899,10 +899,20 @@ const int i915_chip_ids[] = { 0x29b2, /* PCI_CHIP_Q35_G */ 0x29c2, /* PCI_CHIP_G33_G */ 0x29d2, /* PCI_CHIP_Q33_G */ + 0xa001, /* PCI_CHIP_IGD_G */ 0xa011, /* Pineview */ }; const int i965_chip_ids[] = { + 0x0042, /* PCI_CHIP_ILD_G */ + 0x0046, /* PCI_CHIP_ILM_G */ + 0x0102, /* PCI_CHIP_SANDYBRIDGE_GT1 */ + 0x0106, /* PCI_CHIP_SANDYBRIDGE_M_GT1 */ + 0x010a, /* PCI_CHIP_SANDYBRIDGE_S */ + 0x0112, /* PCI_CHIP_SANDYBRIDGE_GT2 */ + 0x0116, /* PCI_CHIP_SANDYBRIDGE_M_GT2 */ + 0x0122, /* PCI_CHIP_SANDYBRIDGE_GT2_PLUS */ + 0x0126, /* PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS */ 0x29a2, /* PCI_CHIP_I965_G */ 0x2992, /* PCI_CHIP_I965_Q */ 0x2982, /* PCI_CHIP_I965_G_1 */ @@ -914,6 +924,8 @@ const int i965_chip_ids[] = { 0x2e12, /* PCI_CHIP_Q45_G */ 0x2e22, /* PCI_CHIP_G45_G */ 0x2e32, /* PCI_CHIP_G41_G */ + 0x2e42, /* PCI_CHIP_B43_G */ + 0x2e92, /* PCI_CHIP_B43_G1 */ }; const int r100_chip_ids[] = { -- 1.7.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On 16.11.2010 20:59, Jerome Glisse wrote: > On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger > wrote: >> On 16.11.2010 20:21, Jerome Glisse wrote: >>> Hi, >>> >>> So i looked a bit more at what path we should try to optimize in the >>> mesa/gallium/pipe infrastructure. Here are some number gathers from >>> games : >>> drawcall / ps constant vs constant ps samplervs sampler >>> doom31.45 1.39 9.24 9.86 >>> nexuiz 6.27 5.98 6.84 >>> 7.30 >>> openarena 2805.64 1.38 1.51 1.54 >>> >>> (value of 1 mean there is a call of this function for every draw call, >>> while value of 10 means there is a call to this function every 10 draw >>> call, average) >>> >>> Note that openarena ps constant number is understable as it's fixed GL >>> pipeline which is in use here and the pixel shader constant doesn't >>> need much change in those case. >>> >>> So i think clear trend is that there is a lot of constant upload and >>> sampler changing (allmost at each draw call for some games) Thus i >>> think we want to make sure that we have real fast path for uploading >>> constant or changing sampler. I think those path should be change and >>> should avoid using some of the gallium infrastructure. For shader >>> constant i think best solution is to provide the ptr to program >>> constant buffer directly to the pipe driver and let the driver choose >>> how it wants to upload constant to the GPU (GPU have different >>> capabilities, some can stream constant buffer inside their command >>> stream, other can just keep around a pool of buffer into which they >>> can memcpy, ...) As there is no common denominator i don't think we >>> should go through the pipe buffer allocation and providing a new pipe >>> buffer each time. >>> >>> Optimizing this for r600g allow ~7% increase in games (when draw is >>> nop) ~5% (when not submitting to gpu) ~3% when no part of the driver >>> is commented. r600g have others bottleneck that tends to minimize the >>> gain we can get from such optimization. Patch at >>> http://people.freedesktop.org/~glisse/gallium_const_path/ >>> >>> For sampler i don't think we want to create persistant object, we are >>> spending precious time building, hashing, searching for similar >>> sampler each time in the gallium code, i think best would be to think >>> state as use once and forget. That said we can provide helper function >>> to pipe driver that wants to be cache sampler (but even for virtual hw >>> i don't think this makes sense). I haven't yet implemented a fast path >>> for sampler to see how much we can win from that but i will report >>> back once i do. >>> >>> So a more fundamental question here is should we move away from >>> persistant state and consider all states (except shader and texture) >>> as being too much volatile so that caching any of them doesn't make >>> sense from performance point of view. That would mean change lot of >>> create/bind/delete interface to simply set interface for the pipe >>> driver. This could be seen as a simplification. Anyway i think we >>> should really consider moving more toward set than create/bind/delete >>> (i loved a lot the create/bind/delete paradigm but it doesn't seems to >>> be the one you want with GL, at least from number i gather with some >>> games). >> Why do you think it's faster to create and use a new state rather than >> search in the hash cache and reuse this? I was under the impression >> (this being a dx10 paradigm) even hw is quite optimized for this (that >> is, you just keep all the state objects on the hw somewhere and switch >> between them). Also, what functions did you really see? If things work >> as expected, it should be mostly bind, not create/delete. >> Now it is certainly possible a driver doesn't make good use of this >> (i.e. it really does all the time consuming stuff on bind), but this is >> outside the scope of the infrastructure. >> It is possible hashing is insufficient (could for instance cause too >> many collisions hence need to recreate state object) but the principle >> mechanism looks quite sound to me. >> >> Roland >> > > The create/bin & reuse paradgim is likely good for a directx like api > where api put incentive on application to create and manage > efficiently the states it wants to use. But GL, which is i believe the > API we should focus on, is a completely different business. From what > i am seeing from games, we repeatly see change to shader constant and > we repeatly see change to sampler. We might be using a tool small hash > or missing opportunity of reuse, i can totaly believe in that. But > nonetheless from what i see it's counter productive to try to hash all > those states and hope for reuse simply because cost of creating state > is too high and the reuse opportunity (even if we improve it) looks > too small. Here you have to think about hundre
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse wrote: > Hi, > > So i looked a bit more at what path we should try to optimize in the > mesa/gallium/pipe infrastructure. Here are some number gathers from > games : > drawcall / ps constant vs constant ps sampler vs sampler > doom3 1.45 1.39 9.24 9.86 > nexuiz 6.27 5.98 6.84 7.30 > openarena 2805.64 1.38 1.51 1.54 > > (value of 1 mean there is a call of this function for every draw call, > while value of 10 means there is a call to this function every 10 draw > call, average) > > Note that openarena ps constant number is understable as it's fixed GL > pipeline which is in use here and the pixel shader constant doesn't > need much change in those case. > > So i think clear trend is that there is a lot of constant upload and > sampler changing (allmost at each draw call for some games) Can you look into what actually changes between the sampler states? Also that vs sampler state change number for OpenArena looks a bit fishy to me. Cheers Jakob. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): glsl: fix assorted MSVC warnings
On Tue, 2010-11-16 at 11:55 -0800, Ian Romanick wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 11/15/2010 05:48 PM, Brian Paul wrote: > > > case ir_unop_b2f: > >assert(op[0]->type->base_type == GLSL_TYPE_BOOL); > >for (unsigned c = 0; c < op[0]->type->components(); c++) { > > -data.f[c] = op[0]->value.b[c] ? 1.0 : 0.0; > > +data.f[c] = op[0]->value.b[c] ? 1.0F : 0.0F; > > Please don't do this. This particular MSVC warning should just be > disabled. If this warning were generated for non-literals and for > literals that actually did lose precision being stored to a float, it > might have a chance at having some value. Instead, it's just noise. > > Individual warnings can be disabled with a pragma, and this one should > probably be disabled in mesa/compiler.h: > > #pragma warning(disable: 4244) > > There may be a way to do it from the command line, but I don't know what > it is. It's -wd4244. > The F suffixes on constants are also worthless, and they make the code > ugly. I had the impression it was more than a warning, namely that the compilers would use double precision intermediates instead of single precision floats when constants don't have the 'f' suffix. Gcc does it. Take for example: float foo(float x) { return 1.0 / x + 5.0; } float foof(float x) { return 1.0f / x + 5.0f; } If you compile it on x64 with gcc -g0 -O3 -S -o - test.c you'll get .file "foo.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc unpcklps%xmm0, %xmm0 cvtps2pd%xmm0, %xmm1 movsd .LC0(%rip), %xmm0 divsd %xmm1, %xmm0 addsd .LC1(%rip), %xmm0 unpcklpd%xmm0, %xmm0 cvtpd2ps%xmm0, %xmm0 ret .cfi_endproc .LFE0: .size foo, .-foo .p2align 4,,15 .globl foof .type foof, @function foof: .LFB1: .cfi_startproc movaps %xmm0, %xmm1 movss .LC2(%rip), %xmm0 divss %xmm1, %xmm0 addss .LC3(%rip), %xmm0 ret .cfi_endproc .LFE1: .size foof, .-foof .section.rodata.cst8,"aM",@progbits,8 .align 8 .LC0: .long 0 .long 1072693248 .align 8 .LC1: .long 0 .long 1075052544 .section.rodata.cst4,"aM",@progbits,4 .align 4 .LC2: .long 1065353216 .align 4 .LC3: .long 1084227584 .ident "GCC: (Debian 4.4.5-6) 4.4.5" .section.note.GNU-stack,"",@progbits And as you can see, one function uses double precision, and the other uses floating point. Code quality is much better in the latter. > Expecting that they will be added everywhere when no other > compiler generates this warning is a losing battle. I really think this is a battle everybody should fight. Perhaps the condition ? 1.0 : 0.0 is something that a compiler should eliminate, but "single precision expressions should use 'f' suffix on constants" seems to be a good rule of thumb to follow. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
>>I'm really not sure of that. I think one reason dx10 did this is because applications mostly did that anyway - at least for subsequent frames pretty much all the state they are using is going to be the same one as used on the previous frame. That is probably an important point -- there may not be much re-use within a single frame but there probably is a lot of re-use from one frame to the next. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): glsl: fix assorted MSVC warnings
On 11/16/2010 12:55 PM, Ian Romanick wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/15/2010 05:48 PM, Brian Paul wrote: case ir_unop_b2f: assert(op[0]->type->base_type == GLSL_TYPE_BOOL); for (unsigned c = 0; c< op[0]->type->components(); c++) { -data.f[c] = op[0]->value.b[c] ? 1.0 : 0.0; +data.f[c] = op[0]->value.b[c] ? 1.0F : 0.0F; Please don't do this. This particular MSVC warning should just be disabled. If this warning were generated for non-literals and for literals that actually did lose precision being stored to a float, it might have a chance at having some value. Instead, it's just noise. Individual warnings can be disabled with a pragma, and this one should probably be disabled in mesa/compiler.h: #pragma warning(disable: 4244) There may be a way to do it from the command line, but I don't know what it is. The F suffixes on constants are also worthless, and they make the code ugly. Expecting that they will be added everywhere when no other compiler generates this warning is a losing battle. I've been in the habit of using F suffixes for many, many years. Back in my IRIX days it told the compiler to use a float and not a double for the computation (which was faster). And according to Jose's email which I just spotted, that's still the case with gcc. I recall another another old unix compiler that I used back then (maybe AIX) issued warnings similar to MSVC. Old habits die hard. But I don't think it's a bad habit. } break; case ir_unop_f2b: assert(op[0]->type->base_type == GLSL_TYPE_FLOAT); for (unsigned c = 0; c< op[0]->type->components(); c++) { -data.b[c] = bool(op[0]->value.f[c]); +data.b[c] = op[0]->value.f[c] != 0.0F ? true : false; This warning should also be disabled for the same reason as the above. This one isn't even a correctness warning, is a performance warning. The code that is replacing the case may have even worse performance than the cast! The other changes can stay, but this one needs to be reverted. Perhaps data.b[c] = bool((int) op[0]->value.f[c]); would do the trick. } break; case ir_unop_b2i: @@ -163,7 +163,7 @@ ir_expression::constant_expression_value() case ir_unop_i2b: assert(op[0]->type->is_integer()); for (unsigned c = 0; c< op[0]->type->components(); c++) { -data.b[c] = bool(op[0]->value.u[c]); +data.b[c] = op[0]->value.u[c] ? true : false; What warning is this? I was unable to reproduce a warning on Visual Studio 2008 Express Edition. I suspect this should be reverted too. The warning was: src\glsl\ir_constant_expression.cpp(166) : warning C4800: 'unsigned int' : forcing value to bool 'true' or 'false' (performance warning) -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 3:27 PM, Roland Scheidegger wrote: > On 16.11.2010 20:59, Jerome Glisse wrote: >> On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger >> wrote: >>> On 16.11.2010 20:21, Jerome Glisse wrote: Hi, So i looked a bit more at what path we should try to optimize in the mesa/gallium/pipe infrastructure. Here are some number gathers from games : drawcall / ps constant vs constant ps sampler vs sampler doom3 1.45 1.39 9.24 9.86 nexuiz 6.27 5.98 6.84 7.30 openarena 2805.64 1.38 1.51 1.54 (value of 1 mean there is a call of this function for every draw call, while value of 10 means there is a call to this function every 10 draw call, average) Note that openarena ps constant number is understable as it's fixed GL pipeline which is in use here and the pixel shader constant doesn't need much change in those case. So i think clear trend is that there is a lot of constant upload and sampler changing (allmost at each draw call for some games) Thus i think we want to make sure that we have real fast path for uploading constant or changing sampler. I think those path should be change and should avoid using some of the gallium infrastructure. For shader constant i think best solution is to provide the ptr to program constant buffer directly to the pipe driver and let the driver choose how it wants to upload constant to the GPU (GPU have different capabilities, some can stream constant buffer inside their command stream, other can just keep around a pool of buffer into which they can memcpy, ...) As there is no common denominator i don't think we should go through the pipe buffer allocation and providing a new pipe buffer each time. Optimizing this for r600g allow ~7% increase in games (when draw is nop) ~5% (when not submitting to gpu) ~3% when no part of the driver is commented. r600g have others bottleneck that tends to minimize the gain we can get from such optimization. Patch at http://people.freedesktop.org/~glisse/gallium_const_path/ For sampler i don't think we want to create persistant object, we are spending precious time building, hashing, searching for similar sampler each time in the gallium code, i think best would be to think state as use once and forget. That said we can provide helper function to pipe driver that wants to be cache sampler (but even for virtual hw i don't think this makes sense). I haven't yet implemented a fast path for sampler to see how much we can win from that but i will report back once i do. So a more fundamental question here is should we move away from persistant state and consider all states (except shader and texture) as being too much volatile so that caching any of them doesn't make sense from performance point of view. That would mean change lot of create/bind/delete interface to simply set interface for the pipe driver. This could be seen as a simplification. Anyway i think we should really consider moving more toward set than create/bind/delete (i loved a lot the create/bind/delete paradigm but it doesn't seems to be the one you want with GL, at least from number i gather with some games). >>> Why do you think it's faster to create and use a new state rather than >>> search in the hash cache and reuse this? I was under the impression >>> (this being a dx10 paradigm) even hw is quite optimized for this (that >>> is, you just keep all the state objects on the hw somewhere and switch >>> between them). Also, what functions did you really see? If things work >>> as expected, it should be mostly bind, not create/delete. >>> Now it is certainly possible a driver doesn't make good use of this >>> (i.e. it really does all the time consuming stuff on bind), but this is >>> outside the scope of the infrastructure. >>> It is possible hashing is insufficient (could for instance cause too >>> many collisions hence need to recreate state object) but the principle >>> mechanism looks quite sound to me. >>> >>> Roland >>> >> >> The create/bin & reuse paradgim is likely good for a directx like api >> where api put incentive on application to create and manage >> efficiently the states it wants to use. But GL, which is i believe the >> API we should focus on, is a completely different business. From what >> i am seeing from games, we repeatly see change to shader constant and >> we repeatly see change to sampler. We might be using a tool small hash >> or missing opportunity of reuse, i can totaly believe in that. But >> nonetheless from what i see it's counter productive to try to hash all >> those states and hope for reuse simply be
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz wrote: > On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse wrote: >> Hi, >> >> So i looked a bit more at what path we should try to optimize in the >> mesa/gallium/pipe infrastructure. Here are some number gathers from >> games : >> drawcall / ps constant vs constant ps sampler vs sampler >> doom3 1.45 1.39 9.24 9.86 >> nexuiz 6.27 5.98 6.84 7.30 >> openarena 2805.64 1.38 1.51 1.54 >> >> (value of 1 mean there is a call of this function for every draw call, >> while value of 10 means there is a call to this function every 10 draw >> call, average) >> >> Note that openarena ps constant number is understable as it's fixed GL >> pipeline which is in use here and the pixel shader constant doesn't >> need much change in those case. >> >> So i think clear trend is that there is a lot of constant upload and >> sampler changing (allmost at each draw call for some games) > > Can you look into what actually changes between the sampler states? > Also that vs sampler state change number for OpenArena looks a bit > fishy to me. > > Cheers Jakob. > I haven't looked at what change yet, i assume something small, i think bugle trace of the engine is maybe easier to use than looking at quake3 source code. For the vs sampler i was surprised too but it's just the fact that q3 changes the vertex buffer a lot and this trigger the vs sampler. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On 16.11.2010 22:15, Jerome Glisse wrote: > On Tue, Nov 16, 2010 at 3:27 PM, Roland Scheidegger > wrote: >> On 16.11.2010 20:59, Jerome Glisse wrote: >>> On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger >>> wrote: On 16.11.2010 20:21, Jerome Glisse wrote: > Hi, > > So i looked a bit more at what path we should try to optimize in the > mesa/gallium/pipe infrastructure. Here are some number gathers from > games : > drawcall / ps constant vs constant ps samplervs sampler > doom31.45 1.39 9.24 > 9.86 > nexuiz 6.27 5.98 6.84 > 7.30 > openarena 2805.64 1.38 1.51 1.54 > > (value of 1 mean there is a call of this function for every draw call, > while value of 10 means there is a call to this function every 10 draw > call, average) > > Note that openarena ps constant number is understable as it's fixed GL > pipeline which is in use here and the pixel shader constant doesn't > need much change in those case. > > So i think clear trend is that there is a lot of constant upload and > sampler changing (allmost at each draw call for some games) Thus i > think we want to make sure that we have real fast path for uploading > constant or changing sampler. I think those path should be change and > should avoid using some of the gallium infrastructure. For shader > constant i think best solution is to provide the ptr to program > constant buffer directly to the pipe driver and let the driver choose > how it wants to upload constant to the GPU (GPU have different > capabilities, some can stream constant buffer inside their command > stream, other can just keep around a pool of buffer into which they > can memcpy, ...) As there is no common denominator i don't think we > should go through the pipe buffer allocation and providing a new pipe > buffer each time. > > Optimizing this for r600g allow ~7% increase in games (when draw is > nop) ~5% (when not submitting to gpu) ~3% when no part of the driver > is commented. r600g have others bottleneck that tends to minimize the > gain we can get from such optimization. Patch at > http://people.freedesktop.org/~glisse/gallium_const_path/ > > For sampler i don't think we want to create persistant object, we are > spending precious time building, hashing, searching for similar > sampler each time in the gallium code, i think best would be to think > state as use once and forget. That said we can provide helper function > to pipe driver that wants to be cache sampler (but even for virtual hw > i don't think this makes sense). I haven't yet implemented a fast path > for sampler to see how much we can win from that but i will report > back once i do. > > So a more fundamental question here is should we move away from > persistant state and consider all states (except shader and texture) > as being too much volatile so that caching any of them doesn't make > sense from performance point of view. That would mean change lot of > create/bind/delete interface to simply set interface for the pipe > driver. This could be seen as a simplification. Anyway i think we > should really consider moving more toward set than create/bind/delete > (i loved a lot the create/bind/delete paradigm but it doesn't seems to > be the one you want with GL, at least from number i gather with some > games). Why do you think it's faster to create and use a new state rather than search in the hash cache and reuse this? I was under the impression (this being a dx10 paradigm) even hw is quite optimized for this (that is, you just keep all the state objects on the hw somewhere and switch between them). Also, what functions did you really see? If things work as expected, it should be mostly bind, not create/delete. Now it is certainly possible a driver doesn't make good use of this (i.e. it really does all the time consuming stuff on bind), but this is outside the scope of the infrastructure. It is possible hashing is insufficient (could for instance cause too many collisions hence need to recreate state object) but the principle mechanism looks quite sound to me. Roland >>> The create/bin & reuse paradgim is likely good for a directx like api >>> where api put incentive on application to create and manage >>> efficiently the states it wants to use. But GL, which is i believe the >>> API we should focus on, is a completely different business. From what >>> i am seeing from games, we repeatly see change to shader constant and >>> we repeatly see change to sampler. We might be using a tool small hash >>> or missing opportunity of reuse, i can totaly beli
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 9:17 PM, Jerome Glisse wrote: > On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz > wrote: >> On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse wrote: >>> Hi, >>> >>> So i looked a bit more at what path we should try to optimize in the >>> mesa/gallium/pipe infrastructure. Here are some number gathers from >>> games : >>> drawcall / ps constant vs constant ps sampler vs sampler >>> doom3 1.45 1.39 9.24 9.86 >>> nexuiz 6.27 5.98 6.84 >>> 7.30 >>> openarena 2805.64 1.38 1.51 1.54 >>> >>> (value of 1 mean there is a call of this function for every draw call, >>> while value of 10 means there is a call to this function every 10 draw >>> call, average) >>> >>> Note that openarena ps constant number is understable as it's fixed GL >>> pipeline which is in use here and the pixel shader constant doesn't >>> need much change in those case. >>> >>> So i think clear trend is that there is a lot of constant upload and >>> sampler changing (allmost at each draw call for some games) >> >> Can you look into what actually changes between the sampler states? >> Also that vs sampler state change number for OpenArena looks a bit >> fishy to me. >> >> Cheers Jakob. >> > > I haven't looked at what change yet, i assume something small, i think > bugle trace of the engine is maybe easier to use than looking at > quake3 source code. For the vs sampler i was surprised too but it's > just the fact that q3 changes the vertex buffer a lot and this trigger > the vs sampler. I was thinking more along the lines of diffing the pipe_sampler_state object and see what changed, what I'm suspecting is that its only the max_lod field that keep changing. Games should usually stay within the same number of textures and type of texture modes for for most draw calls. When you say vs_sampler do you mean bind_vertex_sampler_states or bind_vertex_elements_state. Cheers Jakob. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
> On Tue, Nov 16, 2010 at 9:17 PM, Jerome Glisse wrote: >> On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz >> wrote: >>> On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse wrote: Hi, So i looked a bit more at what path we should try to optimize in the mesa/gallium/pipe infrastructure. Here are some number gathers from games : drawcall / ps constant vs constant ps sampler vs sampler doom3 1.45 1.39 9.24 9.86 nexuiz 6.27 5.98 6.84 7.30 openarena 2805.64 1.38 1.51 1.54 (value of 1 mean there is a call of this function for every draw call, while value of 10 means there is a call to this function every 10 draw call, average) Note that openarena ps constant number is understable as it's fixed GL pipeline which is in use here and the pixel shader constant doesn't need much change in those case. So i think clear trend is that there is a lot of constant upload and sampler changing (allmost at each draw call for some games) >>> >>> Can you look into what actually changes between the sampler states? >>> Also that vs sampler state change number for OpenArena looks a bit >>> fishy to me. >>> >>> Cheers Jakob. >>> >> >> I haven't looked at what change yet, i assume something small, i think >> bugle trace of the engine is maybe easier to use than looking at >> quake3 source code. For the vs sampler i was surprised too but it's >> just the fact that q3 changes the vertex buffer a lot and this trigger >> the vs sampler. Could we get some problematic Bugle traces posted that we could all examine, rather than guessing at this? It'd be very nice to know whether or not the problems are in the GL state tracker layer before we move on to optimizing Gallium's interface, mostly because Dx appears to not suffer these same problems. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 31673] New: GL_FRAGMENT_PRECISION_HIGH preprocessor macro undefined in GLSL ES
https://bugs.freedesktop.org/show_bug.cgi?id=31673 Summary: GL_FRAGMENT_PRECISION_HIGH preprocessor macro undefined in GLSL ES Product: Mesa Version: git Platform: All OS/Version: All Status: NEW Severity: minor Priority: medium Component: Mesa core AssignedTo: mesa-dev@lists.freedesktop.org ReportedBy: kenn...@whitecape.org According to the GLSL ES specification, section 4.5, "The built-in macro GL_FRAGMENT_PRECISION_HIGH is defined to one on systems supporting highp precision in the fragment language #define GL_FRAGMENT_PRECISION_HIGH 1 and is not defined on systems not supporting highp precision in the fragment language. When defined, this macro is available in both the vertex and fragment languages. The highp qualifier is an optional feature in the fragment language and is not enabled by #extension." glcpp currently does not define this macro for GLSL ES. As far as I know, all Mesa drivers currently support highp, so perhaps we should just define it unconditionally. However, I imagine this may not always be the case... -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 6:06 PM, Corbin Simpson wrote: >> On Tue, Nov 16, 2010 at 9:17 PM, Jerome Glisse wrote: >>> On Tue, Nov 16, 2010 at 3:51 PM, Jakob Bornecrantz >>> wrote: On Tue, Nov 16, 2010 at 7:21 PM, Jerome Glisse wrote: > Hi, > > So i looked a bit more at what path we should try to optimize in the > mesa/gallium/pipe infrastructure. Here are some number gathers from > games : > drawcall / ps constant vs constant ps sampler vs sampler > doom3 1.45 1.39 9.24 > 9.86 > nexuiz 6.27 5.98 6.84 > 7.30 > openarena 2805.64 1.38 1.51 1.54 > > (value of 1 mean there is a call of this function for every draw call, > while value of 10 means there is a call to this function every 10 draw > call, average) > > Note that openarena ps constant number is understable as it's fixed GL > pipeline which is in use here and the pixel shader constant doesn't > need much change in those case. > > So i think clear trend is that there is a lot of constant upload and > sampler changing (allmost at each draw call for some games) Can you look into what actually changes between the sampler states? Also that vs sampler state change number for OpenArena looks a bit fishy to me. Cheers Jakob. >>> >>> I haven't looked at what change yet, i assume something small, i think >>> bugle trace of the engine is maybe easier to use than looking at >>> quake3 source code. For the vs sampler i was surprised too but it's >>> just the fact that q3 changes the vertex buffer a lot and this trigger >>> the vs sampler. > > Could we get some problematic Bugle traces posted that we could all > examine, rather than guessing at this? It'd be very nice to know > whether or not the problems are in the GL state tracker layer before > we move on to optimizing Gallium's interface, mostly because Dx > appears to not suffer these same problems. > I haven't looked closely at sampler issue but the shader constant is obvious on r600g, it's the pipe buffer allocation at each constant update that kills us, even with somehow fixing pb* there is a too big overhead in the pb layer. it's only few % of the whole cpu time bug again things pile up and no matter how small you cut the cpu usage it directly shows up in the framerate. That's why my feeling is that we should keep the cpu overhead for state change as low as possible and i fear the fastest way is to drop create/bind paradigm. I pretty much use the dri benchmark wiki page for running games in timedemo, lately i mostly used nexuiz because it's easy to install and it's rendering is somewhat more complex that quake3 thus a little bit more closer to what i would like to target for r600g driver. Anyway my point is that here the gl state tracker is not to blame, it's only the fact that real application lead to a lot of cso activities and i am not convinced that what we might possibly win with cso is more important than what we loose when considering API such as GL. Cheers, Jerome Glisse ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tuesday 16 November 2010 20:26:03 Jerome Glisse wrote: > Anyway my point is that here the gl state tracker is not to blame, > it's only the fact that real application lead to a lot of cso > activities and i am not convinced that what we might possibly win with > cso is more important than what we loose when considering API such as > GL. And I disagree so we're at a stalemate and we'll never reach a conclusion. What I'm saying is that this isn't how we can ever reach a technical decision. There needs to be a compelling evidence for doing something that is obviously unintuitive. And this is unintuitive because there's a limited number of blend, depth, alpha, stencil or rasterizer states any application needs and quite frankly it's very small so caching it makes a hell lot of sense. I think it's more likely that we stuffed some value into one of the cso's that should have a separate set method or that there's a bug somewhere. Anyway what I think is of no consequence, what matters is what you can prove. It'd be trivial to see: 1) what exactly changes that caching fails, 2) would a better hashing function and a better hash fix it, 3) whether it's a special case and requires special handling or whether it's globally the concept of csos, 4) whether the state tracker can be improved to handle it, 5) how much better things are when we don't cache (trivial to try by just changing the cso_set functions to just set stuff instead of using the create/bind semantics) If you can prove your hypothesis, awesome! great find, lets change it. Otherwise I think the bikeshed should be blue because I'm a boy and I like blue. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 9:15 PM, Zack Rusin wrote: > On Tuesday 16 November 2010 20:26:03 Jerome Glisse wrote: >> Anyway my point is that here the gl state tracker is not to blame, >> it's only the fact that real application lead to a lot of cso >> activities and i am not convinced that what we might possibly win with >> cso is more important than what we loose when considering API such as >> GL. > > And I disagree so we're at a stalemate and we'll never reach a conclusion. > What I'm saying is that this isn't how we can ever reach a technical decision. > There needs to be a compelling evidence for doing something that is obviously > unintuitive. > > And this is unintuitive because there's a limited number of blend, depth, > alpha, stencil or rasterizer states any application needs and quite frankly > it's very small so caching it makes a hell lot of sense. I think it's more > likely that we stuffed some value into one of the cso's that should have a > separate set method or that there's a bug somewhere. > > Anyway what I think is of no consequence, what matters is what you can prove. > It'd be trivial to see: > 1) what exactly changes that caching fails, > 2) would a better hashing function and a better hash fix it, > 3) whether it's a special case and requires special handling or whether it's > globally the concept of csos, > 4) whether the state tracker can be improved to handle it, > 5) how much better things are when we don't cache (trivial to try by just > changing the cso_set functions to just set stuff instead of using the > create/bind semantics) > > If you can prove your hypothesis, awesome! great find, lets change it. > Otherwise I think the bikeshed should be blue because I'm a boy and I like > blue. > > z > Agree, i am just trying to get someone to look into it before i do ;) I am more focusing on fixing the short coming of the r600 pipe driver first. But i will get back to this cso things, and anyone is more than welcome to take a look at it (openarena or nexuiz are showing lot of cso activities with r600g or noop driver). I never meant to say jump on this new wagon because it looks more promising, i am just trying to stress out that no one should take the promise of cso caching for granted because as far as i can tell it's not holding any of it as of today. Also noop driver is only marginaly faster than fglrx and you will see that cso account for around 5%-10% of cpu time of 25% for the whole mesa activities, also noop is special as the copy/swap buffer of the current ddx is call, so it also slow done thing (thought i use small resolution to minimize this). Note that the shader constant upload part of my mail is disjoint from cso and for that part i am convinced and i did give number showing that it's unappropriate to use the pipe buffer allocation path but that we should rather directly provide the program constant buffer ptr to pipe driver and let the pipe driver pickup the best solution for its hw. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
On Tue, Nov 16, 2010 at 9:43 PM, Jerome Glisse wrote: > On Tue, Nov 16, 2010 at 9:15 PM, Zack Rusin wrote: >> On Tuesday 16 November 2010 20:26:03 Jerome Glisse wrote: >>> Anyway my point is that here the gl state tracker is not to blame, >>> it's only the fact that real application lead to a lot of cso >>> activities and i am not convinced that what we might possibly win with >>> cso is more important than what we loose when considering API such as >>> GL. >> >> And I disagree so we're at a stalemate and we'll never reach a conclusion. >> What I'm saying is that this isn't how we can ever reach a technical >> decision. >> There needs to be a compelling evidence for doing something that is obviously >> unintuitive. >> >> And this is unintuitive because there's a limited number of blend, depth, >> alpha, stencil or rasterizer states any application needs and quite frankly >> it's very small so caching it makes a hell lot of sense. I think it's more >> likely that we stuffed some value into one of the cso's that should have a >> separate set method or that there's a bug somewhere. >> >> Anyway what I think is of no consequence, what matters is what you can prove. >> It'd be trivial to see: >> 1) what exactly changes that caching fails, >> 2) would a better hashing function and a better hash fix it, >> 3) whether it's a special case and requires special handling or whether it's >> globally the concept of csos, >> 4) whether the state tracker can be improved to handle it, >> 5) how much better things are when we don't cache (trivial to try by just >> changing the cso_set functions to just set stuff instead of using the >> create/bind semantics) >> >> If you can prove your hypothesis, awesome! great find, lets change it. >> Otherwise I think the bikeshed should be blue because I'm a boy and I like >> blue. >> >> z >> > > Agree, i am just trying to get someone to look into it before i do ;) > I am more focusing on fixing the short coming of the r600 pipe driver > first. But i will get back to this cso things, and anyone is more than > welcome to take a look at it (openarena or nexuiz are showing lot of > cso activities with r600g or noop driver). I never meant to say jump > on this new wagon because it looks more promising, i am just trying to > stress out that no one should take the promise of cso caching for > granted because as far as i can tell it's not holding any of it as of > today. > > Also noop driver is only marginaly faster than fglrx and you will see > that cso account for around 5%-10% of cpu time of 25% for the whole > mesa activities, also noop is special as the copy/swap buffer of the > current ddx is call, so it also slow done thing (thought i use small > resolution to minimize this). > > Note that the shader constant upload part of my mail is disjoint from > cso and for that part i am convinced and i did give number showing > that it's unappropriate to use the pipe buffer allocation path but > that we should rather directly provide the program constant buffer ptr > to pipe driver and let the pipe driver pickup the best solution for > its hw. > > Cheers, > Jerome > Before i forget, the fact that cso shows up that high on cpu is likely the outcome of cso not living long enough, like being deleted right after being use and those we endup with nothing the cso cache and we keep rebuilding over and over. Then come the problem of how to determine what is the best live time of a cso, for DX it's easy but for GL best we can do is do wild guess, some app might use some GL state once every minute and those GL state might consumme memory for no good reason btw those 2 usage ... Anyway just wanted to point out the obvious of my results. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [patch] pass texFormat to _mesa_init_teximage_fields()
This patch passes the texture image format to the _mesa_init_teximage_fields() function to make sure the texture image's format is always set (see fd.o bug 31544). I'd appreciate it if someone could apply this patch and test on r200, r300 or r600. I'll commit it later then. -Brian diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 4533d82..ba8be12 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -2593,7 +2593,6 @@ copy_tex_image(struct gl_context *ctx, GLuint dims, GLenum target, GLint level, { struct gl_texture_object *texObj; struct gl_texture_image *texImage; - GLsizei postConvWidth = width, postConvHeight = height; GLenum format, type; GLint bpp; void *buf; @@ -2601,6 +2600,7 @@ copy_tex_image(struct gl_context *ctx, GLuint dims, GLenum target, GLint level, texObj = _mesa_get_current_tex_object(ctx, target); texImage = _mesa_get_tex_image(ctx, texObj, target, level); + /* Choose format/type for temporary image buffer */ format = _mesa_base_tex_format(ctx, internalFormat); type = get_temp_image_type(ctx, format); bpp = _mesa_bytes_per_pixel(format, type); @@ -2632,12 +2632,8 @@ copy_tex_image(struct gl_context *ctx, GLuint dims, GLenum target, GLint level, ctx->Driver.FreeTexImageData(ctx, texImage); } - _mesa_init_teximage_fields(ctx, target, texImage, - postConvWidth, postConvHeight, 1, - border, internalFormat); - - _mesa_choose_texture_format(ctx, texObj, texImage, target, level, - internalFormat, GL_NONE, GL_NONE); + /* The texture's format was already chosen in _mesa_CopyTexImage() */ + ASSERT(texImage->TexFormat != MESA_FORMAT_NONE); /* * Store texture data (with pixel transfer ops) @@ -2690,7 +2686,8 @@ _mesa_meta_CopyTexImage2D(struct gl_context *ctx, GLenum target, GLint level, * Have to be careful with locking and meta state for pixel transfer. */ static void -copy_tex_sub_image(struct gl_context *ctx, GLuint dims, GLenum target, GLint level, +copy_tex_sub_image(struct gl_context *ctx, + GLuint dims, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height) @@ -2704,6 +2701,7 @@ copy_tex_sub_image(struct gl_context *ctx, GLuint dims, GLenum target, GLint lev texObj = _mesa_get_current_tex_object(ctx, target); texImage = _mesa_select_tex_image(ctx, texObj, target, level); + /* Choose format/type for temporary image buffer */ format = _mesa_get_format_base_format(texImage->TexFormat); type = get_temp_image_type(ctx, format); bpp = _mesa_bytes_per_pixel(format, type); diff --git a/src/mesa/drivers/dri/intel/intel_tex_image.c b/src/mesa/drivers/dri/intel/intel_tex_image.c index 50fe9bd..0a50be9 100644 --- a/src/mesa/drivers/dri/intel/intel_tex_image.c +++ b/src/mesa/drivers/dri/intel/intel_tex_image.c @@ -682,6 +682,7 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target, struct gl_texture_object *texObj; struct gl_texture_image *texImage; int level = 0, internalFormat; + gl_format texFormat; texObj = _mesa_get_current_tex_object(ctx, target); intelObj = intel_texture_object(texObj); @@ -724,16 +725,18 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target, intel_miptree_release(intel, &intelObj->mt); intelObj->mt = mt; + + if (texture_format == __DRI_TEXTURE_FORMAT_RGB) + texFormat = MESA_FORMAT_XRGB; + else + texFormat = MESA_FORMAT_ARGB; + _mesa_init_teximage_fields(&intel->ctx, target, texImage, rb->region->width, rb->region->height, 1, - 0, internalFormat); + 0, internalFormat, texFormat); intelImage->face = target_to_face(target); intelImage->level = level; - if (texture_format == __DRI_TEXTURE_FORMAT_RGB) - texImage->TexFormat = MESA_FORMAT_XRGB; - else - texImage->TexFormat = MESA_FORMAT_ARGB; texImage->RowStride = rb->region->pitch; intel_miptree_reference(&intelImage->mt, intelObj->mt); @@ -789,11 +792,10 @@ intel_image_target_texture_2d(struct gl_context *ctx, GLenum target, intelObj->mt = mt; _mesa_init_teximage_fields(&intel->ctx, target, texImage, image->region->width, image->region->height, 1, - 0, image->internal_format); + 0, image->internal_format, image->format); intelImage->face = target_to_face(target); intelImage->level = 0; - texImage->TexFormat = image->format; texImage->RowStride = image->region->pitch; intel_miptree_reference(&intelImage->mt, intelObj->mt); diff --git a/src/mesa/drivers/dri/nouveau/nouveau_texture.c b/src/mesa/drivers/dri/nouveau/no
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
Hi, On Tuesday, November 16, 2010 20:21:26 Jerome Glisse wrote: > So i looked a bit more at what path we should try to optimize in the > mesa/gallium/pipe infrastructure. Here are some number gathers from > games : > drawcall / ps constant vs constant ps samplervs sampler > doom31.45 1.39 9.24 9.86 > nexuiz 6.27 5.98 6.84 > 7.30 openarena 2805.64 1.38 1.51 > 1.54 [...] Just an other observation: I was doing some profiling on OpenSceneGraph based applications. One of which the plain osgviewer with numerous models and one of which flightgear. Drivers and hardware I have is a FireGL 73???,R520,r300g and a HD4890,RV770,r600g. Testing is done in the same cpu board. One comparison is the draw time in osgviewer. The ones that know this application might remember the profiling graph, where you can see how long and when cull, draw and, if available gpu rendering happens. I was in this case looking at the draw times which is just the time starting from the first state change in a frame to the last draw in a frame *excluding* the buffer swap/sync/flush and whatever serializes program execution tith the gpu. Comparing these osgviewer draw times with fglrx with my favourite test model (fixed function) that is kind of representative for usage in flightgear. R520 fglrx ~0.7ms r300g, git ~1.6ms The profiling picture of my head is that r300g still spends significant amount cpu time in current state attribute handling which is too often looping over all possible state attributes. BTW: that was much worse before Fancescos last copy to current patches. r300g also spends much time in the draw path in mesa, where every draw is looping over all 32 state attributes. Doing some proof of concept work on these code paths improoved the draw times to 1.2ms on r300g. The next cpu hog for r300g is the kernel side of the command stream parser. I would expect that something that makes use of preevaluated and validated command stream snippets in the kernel that are held for each of the drivers state objects and are just used in the executed command stream would help much here. Something along the lines of recording command stream macros/substreams that are just jumped into when executing the user level command stream. I believe that Jerome held some talk about something very similar at this years fossdem. Translating that performance numbers from an example application to a more real world one like flightgear brings a framerate of ~85 frames for fglrx and ~60 with current mesa. With the proof of concept stuff I already saw 65-70 on r300g. Now the picture for r600g: RV770 fglrx ~0.8ms r600g, git 5-7ms As you can see fglrx is still about the same. But r600g is far off. Also with r600g I can see the driver spending about as much time in parsing and validating in kernel as I can see it spending in the r600g backend code. I do not remember the flightgear framerates for RV770,fglrx, but I believe they were comparable to the R520 ones, but with r600g I still see just about 20-30 frames. Fiddling with these proof of concept stuff does not show up in r600g in a noticable way since this one is just dominated by its own backend cpu cycles. So, I cannot contribute to this discussion which ones of the state objects are more heavily used, but looking at the above I see that r300g is already at a stage where it makes highly sense to improove some hot paths in mesas top layer. The r300 userspace backend code is visible but not high in profiles. But r600g, using the same mesa/gallium infrastructure above spends much cpu cycles in its userspace as well as in the parser/validator code. Which makes me wonder what is the fundamental difference of these two backends that accounts for this difference? Just my 2cent Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
Zack Rusin [2010-11-16 21:15]: > Anyway what I think is of no consequence, what matters is what you can prove. > It'd be trivial to see: > 1) what exactly changes that caching fails, Maybe I'm totally missing the point but: * In OpenArena (running a random demo), context.create_sampler_state is called 10 times, ie we only create 10 sampler states * context.bind_fragment_sampler_states is called ~64000 times. Caching of pipe_sampler_states seems to work here. Regards, Tilman -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? pgpaaZgpLVH8a.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev