Re: [Mesa-dev] [PATCH] swrast: Add glBlitFramebuffer to commands affected by conditional rendering
Anuj Phogat writes: > + /* Page 679 of OpenGL 4.4 spec says: > +*"Added BlitFramebuffer to commands affected by conditional > rendering in > +* section 10.10 (Bug 9562)." > +*/ > + if (!_mesa_check_conditional_render(ctx)) > + return; /* Do not blit */ > + > if (!_mesa_clip_blit(ctx, &srcX0, &srcY0, &srcX1, &srcY1, > &dstX0, &dstY0, &dstX1, &dstY1)) { >return; After I picked this to the 10.1 branch and tested I found the following piglit regression: $ ~/src/piglit/bin/nv_conditional_render-blitframebuffer -auto Probe color at (0,16) Expected: 0.00 1.00 0.00 0.00 Observed: 1.00 0.00 0.00 0.00 PIGLIT: {'result': 'fail' } So I'm moving this patch off of the "applied" queue and onto a new "rejected" queue as can be seen here: http://cworth.org/~cworth/mesa-stable-queue/ Feel free to follow up with any additional information that would be useful, (such as an updated patch or a change to the relevant piglit test). -Carl -- carl.d.wo...@intel.com pgpns8sazsbCO.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swrast: Add glBlitFramebuffer to commands affected by conditional rendering
Carl Worth writes: > After I picked this to the 10.1 branch and tested I found the following > piglit regression: ... > So I'm moving this patch off of the "applied" queue and onto a new > "rejected" queue as can be seen here: Actually, I replied to the wrong email here. The patch I bisected to was the i965 patch with the same description. But now I'm wondering if the swrast patch has the same problem (and whether my testing is adequate...). -Carl -- carl.d.wo...@intel.com pgp4kF1D6NVJr.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] util: Rework endian handling in python code
Ping Richard Sandiford writes: > Ping (with fixed subject) > > Richard Sandiford writes: >> This is a refresh of: >> >>http://lists.freedesktop.org/archives/mesa-dev/2013-June/040594.html >> >> At the moment the python code uses sys.byteorder to decide whether >> u_format_table.c should be for big or little endian. With this series >> it instead generates both forms, using blocks like: >> >> #ifdef PIPE_ARCH_BIG_ENDIAN >> ... >> #else >> ... >> #endif >> >> in cases where endianness matters. >> >> Doing it this way is more cross-compiler-friendly. It also means people >> working on LE systems can see what the differences would be for BE. >> >> Tested on x86_64 and z. I don't have commit access so please apply if OK. >> >> Thanks, >> Richard >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] mesa: Add core support for the GL_INTEL_performance_query extension.
Like AMD_performance_monitor, this extension provides an interface for applications (and OpenGL-based tools) to access GPU performance counters. Since the exact performance counters available vary between vendors and hardware generations, the extension provides an API the application can use to get the names, types, and minimum/maximum values of all available counters. Applications create performance queries based on available query types, and begin/end measurement collection. Multiple queries can be measuring simultaneously. v2: Whitespace changes v3: src/mapi/glapi/gen/gl_API.xml: Also expose the functions to GLES2. v4: Whitespace changes, static_dispatch="false" for all functions, fix dispatch_sanity test for GLES2 functions Signed-off-by: Petri Latvala Reviewed-by: Ian Romanick --- src/mapi/glapi/gen/INTEL_performance_query.xml | 93 +++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 + src/mesa/main/config.h | 8 + src/mesa/main/extensions.c | 1 + src/mesa/main/get.c| 1 + src/mesa/main/get_hash_params.py | 6 + src/mesa/main/mtypes.h | 1 + src/mesa/main/performance_monitor.c| 219 + src/mesa/main/performance_monitor.h| 43 - src/mesa/main/tests/dispatch_sanity.cpp| 24 +++ 11 files changed, 398 insertions(+), 1 deletion(-) create mode 100644 src/mapi/glapi/gen/INTEL_performance_query.xml diff --git a/src/mapi/glapi/gen/INTEL_performance_query.xml b/src/mapi/glapi/gen/INTEL_performance_query.xml new file mode 100644 index 000..25cd181 --- /dev/null +++ b/src/mapi/glapi/gen/INTEL_performance_query.xml @@ -0,0 +1,93 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 6b932e7..409d356 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -171,6 +171,7 @@ API_XML = \ EXT_texture_array.xml \ EXT_texture_integer.xml \ EXT_transform_feedback.xml \ + INTEL_performance_query.xml \ KHR_debug.xml \ NV_conditional_render.xml \ NV_primitive_restart.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 9200cd6..71b39a8 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -12808,6 +12808,8 @@ +http://www.w3.org/2001/XInclude"/> + diff --git a/src/mesa/main/config.h b/src/mesa/main/config.h index 30da5d4..c96502a 100644 --- a/src/mesa/main/config.h +++ b/src/mesa/main/config.h @@ -281,6 +281,14 @@ #define MAX_VERTEX_STREAMS 4 /*@}*/ +/** For GL_INTEL_performance_query */ +/*@{*/ +#define MAX_PERFQUERY_QUERY_NAME_LENGTH 256 +#define MAX_PERFQUERY_COUNTER_NAME_LENGTH 256 +#define MAX_PERFQUERY_COUNTER_DESC_LENGTH 1024 +#define PERFQUERY_HAVE_GPA_EXTENDED_COUNTERS 0 +/*@}*/ + /* * Color channel component order * diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index a72284c..6c1c033 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -330,6 +330,7 @@ static const struct extension extension_table[] = { { "GL_IBM_rasterpos_clip", o(dummy_true), GLL,1996 }, { "GL_IBM_texture_mirrored_repeat", o(dummy_true), GLL,1998 }, { "GL_INGR_blend_func_separate",o(EXT_blend_func_separate), GLL,1999 }, + { "GL_INTEL_performance_query", o(INTEL_performance_query), GL | ES2, 2013 }, { "GL_MESA_pack_invert",o(MESA_pack_invert), GL, 2002 }, { "GL_MESA_texture_signed_rgba",o(EXT_texture_snorm), GL, 2009 }, { "GL_MESA_window_pos", o(dummy_true), GLL,2000 }, diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 6d95790..1897e8d 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -395,6 +395,7 @@ EXTRA_EXT(ARB_viewport_array); EXTRA_EXT(ARB_compute_shader); EXTRA_EXT(ARB_gpu_shader5); EXTRA_EXT2(ARB_transform_feedback3, ARB_gpu_shader5); +EXTRA_EXT(INTEL_performance_query); static const int extra_ARB_color_buffer_float_or_glcore[] = { diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/g
[Mesa-dev] [PATCH 5/6] i965: Enable INTEL_performance_query for Gen5+.
Signed-off-by: Petri Latvala --- src/mesa/drivers/dri/i965/intel_extensions.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 892a048..d6e1494 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -311,8 +311,10 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.ARB_stencil_texturing = true; } - if (brw->gen == 5 || can_write_oacontrol(brw)) + if (brw->gen == 5 || can_write_oacontrol(brw)) { ctx->Extensions.AMD_performance_monitor = true; + ctx->Extensions.INTEL_performance_query = true; + } if (ctx->API == API_OPENGL_CORE) ctx->Extensions.ARB_base_instance = true; -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] mesa: Add INTEL_performance_query enums to tests/enum_strings.cpp
Signed-off-by: Petri Latvala Reviewed-by: Ian Romanick --- src/mesa/main/tests/enum_strings.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/main/tests/enum_strings.cpp b/src/mesa/main/tests/enum_strings.cpp index 3795700..d16eb36 100644 --- a/src/mesa/main/tests/enum_strings.cpp +++ b/src/mesa/main/tests/enum_strings.cpp @@ -807,6 +807,9 @@ const struct enum_info everything[] = { { 0x83F1, "GL_COMPRESSED_RGBA_S3TC_DXT1_EXT" }, { 0x83F2, "GL_COMPRESSED_RGBA_S3TC_DXT3_ANGLE" }, { 0x83F3, "GL_COMPRESSED_RGBA_S3TC_DXT5_ANGLE" }, + { 0x83F9, "GL_PERFQUERY_DONOT_FLUSH_INTEL" }, + { 0x83FA, "GL_PERFQUERY_FLUSH_INTEL" }, + { 0x83FB, "GL_PERFQUERY_WAIT_INTEL" }, { 0x844D, "GL_NEAREST_CLIPMAP_NEAREST_SGIX" }, { 0x844E, "GL_NEAREST_CLIPMAP_LINEAR_SGIX" }, { 0x844F, "GL_LINEAR_CLIPMAP_NEAREST_SGIX" }, @@ -1843,6 +1846,21 @@ const struct enum_info everything[] = { { 0x9271, "GL_COMPRESSED_SIGNED_R11_EAC" }, { 0x9272, "GL_COMPRESSED_RG11_EAC" }, { 0x9273, "GL_COMPRESSED_SIGNED_RG11_EAC" }, + { 0x94F0, "GL_PERFQUERY_COUNTER_EVENT_INTEL" }, + { 0x94F1, "GL_PERFQUERY_COUNTER_DURATION_NORM_INTEL" }, + { 0x94F2, "GL_PERFQUERY_COUNTER_DURATION_RAW_INTEL" }, + { 0x94F3, "GL_PERFQUERY_COUNTER_THROUGHPUT_INTEL" }, + { 0x94F4, "GL_PERFQUERY_COUNTER_RAW_INTEL" }, + { 0x94F5, "GL_PERFQUERY_COUNTER_TIMESTAMP_INTEL" }, + { 0x94F8, "GL_PERFQUERY_COUNTER_DATA_UINT32_INTEL" }, + { 0x94F9, "GL_PERFQUERY_COUNTER_DATA_UINT64_INTEL" }, + { 0x94FA, "GL_PERFQUERY_COUNTER_DATA_FLOAT_INTEL" }, + { 0x94FB, "GL_PERFQUERY_COUNTER_DATA_DOUBLE_INTEL" }, + { 0x94FC, "GL_PERFQUERY_COUNTER_DATA_BOOL32_INTEL" }, + { 0x94FD, "GL_PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL" }, + { 0x94FE, "GL_PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL" }, + { 0x94FF, "GL_PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL" }, + { 0x9500, "GL_PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL" }, { 0x19262, "GL_RASTER_POSITION_UNCLIPPED_IBM" }, { 0, NULL } }; -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] docs: update 10.2 release notes
Signed-off-by: Petri Latvala --- docs/relnotes/10.2.html | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/relnotes/10.2.html b/docs/relnotes/10.2.html index d7d557b..473739c 100644 --- a/docs/relnotes/10.2.html +++ b/docs/relnotes/10.2.html @@ -47,6 +47,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_buffer_storage on i965, r300, r600, and radeonsi GL_ARB_stencil_texturing on i965/gen8+ GL_ARB_texture_view on i965/gen7 +GL_INTEL_performance_query on i965/gen5+ -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] v3: Implement INTEL_performance_query
Third revision of the patch series. Changes: - Rebased to current master - Changes based on Ian's review - Add the extension to 10.2 release notes I didn't change patch 5/6 "Enable INTEL_performance_query for Gen5+" along the review comments yet. It's true that currently drivers can support both by just implementing the current driver hooks, but I'm worried that the situation might change with the upcoming changes to the driver functions to support semantic types and normalized counters. I have no concrete examples that would make that happen, I might just be too paranoid and wary. Summarum, there's more changes for this extension incoming, and that sort of cleanup can be part of it if so desired. I will need someone to push this btw. Petri Latvala (6): Regenerate gl_mangle.h. mesa: Add INTEL_performance_query enums to tests/enum_strings.cpp mesa: Add core support for the GL_INTEL_performance_query extension. mesa: Implement INTEL_performance_query. i965: Enable INTEL_performance_query for Gen5+. docs: update 10.2 release notes docs/relnotes/10.2.html| 1 + include/GL/gl_mangle.h | 371 ++- src/mapi/glapi/gen/INTEL_performance_query.xml | 93 src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 + src/mesa/drivers/dri/i965/intel_extensions.c | 4 +- src/mesa/main/config.h | 8 + src/mesa/main/extensions.c | 1 + src/mesa/main/get.c| 1 + src/mesa/main/get_hash_params.py | 6 + src/mesa/main/mtypes.h | 1 + src/mesa/main/performance_monitor.c| 614 + src/mesa/main/performance_monitor.h| 43 +- src/mesa/main/tests/dispatch_sanity.cpp| 24 + src/mesa/main/tests/enum_strings.cpp | 18 + 15 files changed, 1180 insertions(+), 8 deletions(-) create mode 100644 src/mapi/glapi/gen/INTEL_performance_query.xml -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] mesa: Implement INTEL_performance_query.
Using the existing driver hooks made for AMD_performance_monitor, implement INTEL_performance_query functions. v2: Whitespace changes. v3: Whitespace changes, add a _mesa_warning() Signed-off-by: Petri Latvala Reviewed-by: Ian Romanick --- src/mesa/main/performance_monitor.c | 487 1 file changed, 441 insertions(+), 46 deletions(-) diff --git a/src/mesa/main/performance_monitor.c b/src/mesa/main/performance_monitor.c index 597f633..21b9423 100644 --- a/src/mesa/main/performance_monitor.c +++ b/src/mesa/main/performance_monitor.c @@ -137,6 +137,46 @@ get_counter(const struct gl_perf_monitor_group *group_obj, GLuint id) return &group_obj->Counters[id]; } +/* For INTEL_performance_query, query id 0 is reserved to be invalid. We use + * index to Groups array + 1 as the query id. Same applies to counter id. + */ +static inline GLuint +queryid_to_index(GLuint queryid) +{ + return queryid - 1; +} + +static inline GLuint +index_to_queryid(GLuint index) +{ + return index + 1; +} + +static inline bool +queryid_valid(const struct gl_context *ctx, GLuint queryid) +{ + return get_group(ctx, queryid_to_index(queryid)) != NULL; +} + +static inline GLuint +counterid_to_index(GLuint counterid) +{ + return counterid - 1; +} + +static inline GLuint +index_to_counterid(GLuint index) +{ + return index + 1; +} + +static inline bool +counterid_valid(const struct gl_perf_monitor_group *group_obj, +GLuint counterid) +{ + return get_counter(group_obj, counterid_to_index(counterid)) != NULL; +} + /*/ void GLAPIENTRY @@ -644,6 +684,7 @@ extern void GLAPIENTRY _mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId) { GET_CURRENT_CONTEXT(ctx); + unsigned numGroups; /* The GL_INTEL_performance_query spec says: * @@ -655,16 +696,22 @@ _mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId) return; } + numGroups = ctx->PerfMonitor.NumGroups; + /* The GL_INTEL_performance_query spec says: * *"If the given hardware platform doesn't support any performance *queries, then the value of 0 is returned and INVALID_OPERATION error *is raised." */ + if (numGroups == 0) { + *queryId = 0; + _mesa_error(ctx, GL_INVALID_OPERATION, + "glGetFirstPerfQueryIdINTEL(no queries supported)"); + return; + } - *queryId = 0; - _mesa_error(ctx, GL_INVALID_OPERATION, - "glGetFirstPerfQueryIdINTEL(no queries supported)"); + *queryId = index_to_queryid(0); } extern void GLAPIENTRY @@ -674,40 +721,66 @@ _mesa_GetNextPerfQueryIdINTEL(GLuint queryId, GLuint *nextQueryId) /* The GL_INTEL_performance_query spec says: * -*"If nextQueryId pointer is equal to 0, an INVALID_VALUE error is -*generated." +*"The result is passed in location pointed by nextQueryId. If query +*identified by queryId is the last query available the value of 0 is +*returned. If the specified performance query identifier is invalid +*then INVALID_VALUE error is generated. If nextQueryId pointer is +*equal to 0, an INVALID_VALUE error is generated. Whenever error is +*generated, the value of 0 is returned." */ + if (!nextQueryId) { _mesa_error(ctx, GL_INVALID_VALUE, "glGetNextPerfQueryIdINTEL(nextQueryId == NULL)"); return; } - /* The GL_INTEL_performance_query spec says: -* -*"If the specified performance query identifier is invalid then -*INVALID_VALUE error is generated. Whenever error is generated, the -*value of 0 is returned." -* -* No queries are supported, so all queries are invalid. -*/ - *nextQueryId = 0; - _mesa_error(ctx, GL_INVALID_VALUE, - "glGetNextPerfQueryIdINTEL(invalid query)"); + if (!queryid_valid(ctx, queryId)) { + *nextQueryId = 0; + _mesa_error(ctx, GL_INVALID_VALUE, + "glGetNextPerfQueryIdINTEL(invalid query)"); + return; + } + + ++queryId; + + if (!queryid_valid(ctx, queryId)) { + *nextQueryId = 0; + } else { + *nextQueryId = queryId; + } } extern void GLAPIENTRY _mesa_GetPerfQueryIdByNameINTEL(char *queryName, GLuint *queryId) { GET_CURRENT_CONTEXT(ctx); + unsigned i; /* The GL_INTEL_performance_query spec says: * *"If queryName does not reference a valid query name, an INVALID_VALUE *error is generated." -* -* No queries are supported, so all query names are invalid. */ + if (!queryName) { + _mesa_error(ctx, GL_INVALID_VALUE, + "glGetPerfQueryIdByNameINTEL(queryName == NULL)"); + return; + } + + /* The specification does not state that this produces an error. */ + if (!queryId) { + _mesa_warning(ctx, "glGetPerfQueryIdByNameINTEL(queryId == NULL)"); + return; + } + +
[Mesa-dev] Mixing Pixel Shaders and Compute Shaders
In trying to implement Image support in Clover, I have discovered that the existing CL image related calls result in the generation of Pixel Shader sequences for copies of images to and from the GPU. I initially thought that this would be fine, and was able to implement image read tests that use clEnqueueWriteImage() to get an image into a kernel. The clEnqueueWriteImage(), through the routines in clover/api/transfer.cpp generates a Pixel shader which copies the image to the GPU. The Compute Shader then picks the image up from where the Pixel Shader left it. I had some issues initially with mixing the Pixel and Compute Shaders, until I added a PS_PARTIAL_FLUSH event along with the CS_PARTIAL_FLUSH event at the start of evergreen_init_atom_start_compute_cs(). I think this helped because it made the Pixel Shader Execute before the Compute Shader (not entirely sure?) When I try to call clEnqueueReadImage(), after a clEnqueueNDRangeKernel(); the clover/aop/transfer.cpp again generates a Pixel Shader, which gets integrated into the command stream after the Compute Shader entries (so I send up with: Command Sequence 1 [Initial Configuration] 2 [PixelShader] 3 [ComputeShader] 4 [PixelShader] 5 [Final Configuration/Cleanup/Wait] The problem is, now I am encountering GPU Lockup CP Stalls at the end of 'section 4' and the start of 'section 5' I am not sure I entirely understand why this is happening, but I know it has to do with the fact that the Pixel Shader is in the command stream after the Compute Shader commands. I'm assuming something in how the flushes are configured for the Pixel Shader are not waiting for the Compute Shader to complete before executing, but again, I'm not entirely sure. I figure there are two possible approaches to resolving this: 1. Figure out the right way to get the Compute Shader and Pixel Shader to interact properly 2. Do away with the need for the Pixel Shader by doing the image transfer entirely within the Compute Shader context. (Probably a lot of driver code to replace the existing routines that use the vbo and blitter draw routines?) >From reviewing the R600/Evergreen register documentation, I see that the >CB_COLOR#_INFO registers have a RAT bit (bit 26 in GPU registers >0x28c70-0x28ea4) I also found that if this flag is set, that the surface is treated as a RAT and can only be manipulated by Compute Shader operations. (Which I suppose is the cause of the conflict between the Pixel Shader and Compute Shader trying to manipulate the same Color buffer/Texture.) My biggest issue with this, is I have not really found any documentation that describes how you are supposed to transfer buffers/textures within a compute shader, so I feel like I am missing something that might be a very basic foundation for understanding these routines, which is resulting in my overcomplicating the concepts and confusing myself... If anyone is familiar with this area and is willing to provide some more insight, I would greatly appreciate it. While our team's goal is to implement OpenCL capability in an alternate operating system, my hope is that once I understand all of this and get it working in that environment, I will be able to contribute back Clover image support to the main Mesa baseline. Thanks, Al Dorrington Software Engineer Sr Lockheed Martin, Mission Systems and Training ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mixing Pixel Shaders and Compute Shaders
On Wed, Apr 23, 2014 at 01:27:11PM +, Dorrington, Albert wrote: > In trying to implement Image support in Clover, I have discovered that the > existing CL image related calls result in the generation of Pixel Shader > sequences for copies of images to and from the GPU. > > I initially thought that this would be fine, and was able to implement image > read tests that use clEnqueueWriteImage() to get an image into a kernel. > The clEnqueueWriteImage(), through the routines in clover/api/transfer.cpp > generates a Pixel shader which copies the image to the GPU. > The Compute Shader then picks the image up from where the Pixel Shader left > it. > > I had some issues initially with mixing the Pixel and Compute Shaders, until > I added a PS_PARTIAL_FLUSH event along with the CS_PARTIAL_FLUSH event at the > start of evergreen_init_atom_start_compute_cs(). I think this helped because > it made the Pixel Shader Execute before the Compute Shader (not entirely > sure?) > PS_PARTIAL_FLUSH guarantees all pixel shader have completed before the next packet is processed, so we should be using this any place where we may need the result of a pixel shader. > When I try to call clEnqueueReadImage(), after a clEnqueueNDRangeKernel(); > the clover/aop/transfer.cpp again generates a Pixel Shader, which gets > integrated into the command stream after the Compute Shader entries (so I > send up with: > > Command Sequence > 1 [Initial Configuration] > 2 [PixelShader] Need PS_PARTIAL_FLUSH here > 3 [ComputeShader] Need CS_PARTIAL_FLUSH here > 4 [PixelShader] Need PS_PARTIAL_FLUSH here > 5 [Final Configuration/Cleanup/Wait] > You also may need to flush the various caches after the pixel shader and compute shaders have completed. See r600_flush_emit in r600_hw_context.c > The problem is, now I am encountering GPU Lockup CP Stalls at the end of > 'section 4' and the start of 'section 5' > I am not sure I entirely understand why this is happening, but I know it has > to do with the fact that the Pixel Shader is in the command stream after the > Compute Shader commands. > I'm assuming something in how the flushes are configured for the Pixel Shader > are not waiting for the Compute Shader to complete before executing, but > again, I'm not entirely sure. > > I figure there are two possible approaches to resolving this: > > 1. Figure out the right way to get the Compute Shader and Pixel Shader > to interact properly > > 2. Do away with the need for the Pixel Shader by doing the image > transfer entirely within the Compute Shader context. (Probably a lot of > driver code to replace the existing routines that use the vbo and blitter > draw routines?) > > From reviewing the R600/Evergreen register documentation, I see that the > CB_COLOR#_INFO registers have a RAT bit (bit 26 in GPU registers > 0x28c70-0x28ea4) > I also found that if this flag is set, that the surface is treated as a RAT > and can only be manipulated by Compute Shader operations. (Which I suppose is > the cause of the conflict between the Pixel Shader and Compute Shader trying > to manipulate the same Color buffer/Texture.) > Does the documentation say that setting the RAT bit means it can only be used by compute shaders, or have you discovered this from your testing? If this is the case, you may have to use a different CB_COLOR# for the image when coying it with a pixel shader. > My biggest issue with this, is I have not really found any documentation that > describes how you are supposed to transfer buffers/textures within a compute > shader, so I feel like I am missing something that might be a very basic > foundation for understanding these routines, which is resulting in my > overcomplicating the concepts and confusing myself... > > If anyone is familiar with this area and is willing to provide some more > insight, I would greatly appreciate it. > > While our team's goal is to implement OpenCL capability in an alternate > operating system, my hope is that once I understand all of this and get it > working in that environment, I will be able to contribute back Clover image > support to the main Mesa baseline. > Looking forward to your contribution. As always the sooner you can post the code the better as that will make it easier to review and may help uncover some of your issues. -Tom > Thanks, > Al Dorrington > Software Engineer Sr > Lockheed Martin, Mission Systems and Training > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] util/u_debug: Pass correct size to strncat.
From: José Fonseca Courtesy of Clang static analyzer. --- src/gallium/auxiliary/util/u_debug.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_debug.c b/src/gallium/auxiliary/util/u_debug.c index fe51717..dc840e8 100644 --- a/src/gallium/auxiliary/util/u_debug.c +++ b/src/gallium/auxiliary/util/u_debug.c @@ -334,10 +334,10 @@ debug_dump_flags(const struct debug_named_value *names, while(names->name) { if((names->value & value) == names->value) { if (!first) - util_strncat(output, "|", sizeof(output)); + util_strncat(output, "|", sizeof(output) - strlen(output) - 1); else first = 0; -util_strncat(output, names->name, sizeof(output) - 1); +util_strncat(output, names->name, sizeof(output) - strlen(output) - 1); output[sizeof(output) - 1] = '\0'; value &= ~names->value; } @@ -346,12 +346,12 @@ debug_dump_flags(const struct debug_named_value *names, if (value) { if (!first) -util_strncat(output, "|", sizeof(output)); +util_strncat(output, "|", sizeof(output) - strlen(output) - 1); else first = 0; util_snprintf(rest, sizeof(rest), "0x%08lx", value); - util_strncat(output, rest, sizeof(output) - 1); + util_strncat(output, rest, sizeof(output) - strlen(output) - 1); output[sizeof(output) - 1] = '\0'; } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] mesa/st: Fix pipe_framebuffer_state::height for PIPE_TEXTURE_1D_ARRAY.
From: José Fonseca This prevents buffer overflow w/ llvmpipe when running piglit bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto v2: Compute the framebuffer size as the minimum size, as pointed out by Brian; compacted code; ran piglit quick test list (with no regressions.) --- src/mesa/state_tracker/st_atom_framebuffer.c | 33 ++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_framebuffer.c b/src/mesa/state_tracker/st_atom_framebuffer.c index 4c4f839..f395ec7 100644 --- a/src/mesa/state_tracker/st_atom_framebuffer.c +++ b/src/mesa/state_tracker/st_atom_framebuffer.c @@ -31,6 +31,8 @@ * Brian Paul */ +#include + #include "st_context.h" #include "st_atom.h" #include "st_cb_bitmap.h" @@ -44,6 +46,24 @@ /** + * Update framebuffer size. + * + * framebuffer->width should match fb->Weight, but for PIPE_TEXTURE_1D_ARRAY + * textures fb->Height has the number of layers, and not the surface height. + */ +static void +update_framebuffer_size(struct pipe_framebuffer_state *framebuffer, +struct pipe_surface *surface) +{ + assert(surface); + assert(surface->width < UINT_MAX); + assert(surface->height < UINT_MAX); + framebuffer->width = MIN2(framebuffer->width, surface->width); + framebuffer->height = MIN2(framebuffer->height, surface->height); +} + + +/** * Update framebuffer state (color, depth, stencil, etc. buffers) */ static void @@ -57,11 +77,12 @@ update_framebuffer_state( struct st_context *st ) st_flush_bitmap_cache(st); st->state.fb_orientation = st_fb_orientation(fb); - framebuffer->width = fb->Width; - framebuffer->height = fb->Height; /*printf("-- fb size %d x %d\n", fb->Width, fb->Height);*/ + framebuffer->width = UINT_MAX; + framebuffer->height = UINT_MAX; + /* Examine Mesa's ctx->DrawBuffer->_ColorDrawBuffers state * to determine which surfaces to draw to */ @@ -81,6 +102,7 @@ update_framebuffer_state( struct st_context *st ) if (strb->surface) { pipe_surface_reference(&framebuffer->cbufs[i], strb->surface); +update_framebuffer_size(framebuffer, strb->surface); } strb->defined = GL_TRUE; /* we'll be drawing something */ } @@ -100,12 +122,14 @@ update_framebuffer_state( struct st_context *st ) st_update_renderbuffer_surface(st, strb); } pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else { strb = st_renderbuffer(fb->Attachment[BUFFER_STENCIL].Renderbuffer); if (strb) { assert(strb->surface); pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else pipe_surface_reference(&framebuffer->zsbuf, NULL); @@ -122,6 +146,11 @@ update_framebuffer_state( struct st_context *st ) } #endif + /* _mesa_test_framebuffer_completeness refuses framebuffers with no +* attachments, so this should never happen. */ + assert(framebuffer->width != UINT_MAX); + assert(framebuffer->height != UINT_MAX); + cso_set_framebuffer(st->cso_context, framebuffer); } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] swrast: Fix vertex color in _swsetup_Translate()
From: nick Straightforward fix to properly load dest->color with color data, as opposed to position data as previously implemented. [vsyrjala: I noticed the patch languishing in bugzilla. It looks correct to me so I refreshed it for master. Looks like it's a regression introduced in: commit a164d3aee063580503e5e9a77980059d52c486d7 Author: Brian Date: Fri Jul 6 16:51:19 2007 -0600 In _swsetup_Translate(), update dest->attrib[FRAG_ATTRIB_COL0].] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27499 Cc: Brian Paul Signed-off-by: Ville Syrjälä --- src/mesa/swrast_setup/ss_context.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/swrast_setup/ss_context.c b/src/mesa/swrast_setup/ss_context.c index 12a4735..237f74c 100644 --- a/src/mesa/swrast_setup/ss_context.c +++ b/src/mesa/swrast_setup/ss_context.c @@ -287,7 +287,8 @@ _swsetup_Translate( struct gl_context *ctx, const void *vertex, SWvertex *dest ) _tnl_get_attr( ctx, vertex, _TNL_ATTRIB_COLOR0, dest->attrib[VARYING_SLOT_COL0] ); - UNCLAMPED_FLOAT_TO_RGBA_CHAN( dest->color, tmp ); + UNCLAMPED_FLOAT_TO_RGBA_CHAN( dest->color, + dest->attrib[VARYING_SLOT_COL0] ); _tnl_get_attr( ctx, vertex, _TNL_ATTRIB_COLOR1, dest->attrib[VARYING_SLOT_COL1]); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] util/u_debug: Pass correct size to strncat.
On 04/23/2014 07:55 AM, jfons...@vmware.com wrote: From: José Fonseca Courtesy of Clang static analyzer. --- src/gallium/auxiliary/util/u_debug.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_debug.c b/src/gallium/auxiliary/util/u_debug.c index fe51717..dc840e8 100644 --- a/src/gallium/auxiliary/util/u_debug.c +++ b/src/gallium/auxiliary/util/u_debug.c @@ -334,10 +334,10 @@ debug_dump_flags(const struct debug_named_value *names, while(names->name) { if((names->value & value) == names->value) { if (!first) - util_strncat(output, "|", sizeof(output)); + util_strncat(output, "|", sizeof(output) - strlen(output) - 1); else first = 0; -util_strncat(output, names->name, sizeof(output) - 1); +util_strncat(output, names->name, sizeof(output) - strlen(output) - 1); output[sizeof(output) - 1] = '\0'; value &= ~names->value; } @@ -346,12 +346,12 @@ debug_dump_flags(const struct debug_named_value *names, if (value) { if (!first) -util_strncat(output, "|", sizeof(output)); +util_strncat(output, "|", sizeof(output) - strlen(output) - 1); else first = 0; util_snprintf(rest, sizeof(rest), "0x%08lx", value); - util_strncat(output, rest, sizeof(output) - 1); + util_strncat(output, rest, sizeof(output) - strlen(output) - 1); output[sizeof(output) - 1] = '\0'; } Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa/st: Fix pipe_framebuffer_state::height for PIPE_TEXTURE_1D_ARRAY.
On 04/23/2014 07:55 AM, jfons...@vmware.com wrote: From: José Fonseca This prevents buffer overflow w/ llvmpipe when running piglit bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto v2: Compute the framebuffer size as the minimum size, as pointed out by Brian; compacted code; ran piglit quick test list (with no regressions.) --- src/mesa/state_tracker/st_atom_framebuffer.c | 33 ++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_framebuffer.c b/src/mesa/state_tracker/st_atom_framebuffer.c index 4c4f839..f395ec7 100644 --- a/src/mesa/state_tracker/st_atom_framebuffer.c +++ b/src/mesa/state_tracker/st_atom_framebuffer.c @@ -31,6 +31,8 @@ * Brian Paul */ +#include + #include "st_context.h" #include "st_atom.h" #include "st_cb_bitmap.h" @@ -44,6 +46,24 @@ /** + * Update framebuffer size. + * + * framebuffer->width should match fb->Weight, but for PIPE_TEXTURE_1D_ARRAY "fb->Width" + * textures fb->Height has the number of layers, and not the surface height. + */ The comment seems a bit disconnected from the code. update_framebuffer_size() is used to find the size which is the min of the attached surfaces. The comment about 1D array textures doesn't seem to matter in the code. That just seems a little confusing. +static void +update_framebuffer_size(struct pipe_framebuffer_state *framebuffer, +struct pipe_surface *surface) +{ + assert(surface); + assert(surface->width < UINT_MAX); + assert(surface->height < UINT_MAX); + framebuffer->width = MIN2(framebuffer->width, surface->width); + framebuffer->height = MIN2(framebuffer->height, surface->height); +} + + +/** * Update framebuffer state (color, depth, stencil, etc. buffers) */ static void @@ -57,11 +77,12 @@ update_framebuffer_state( struct st_context *st ) st_flush_bitmap_cache(st); st->state.fb_orientation = st_fb_orientation(fb); - framebuffer->width = fb->Width; - framebuffer->height = fb->Height; /*printf("-- fb size %d x %d\n", fb->Width, fb->Height);*/ + framebuffer->width = UINT_MAX; + framebuffer->height = UINT_MAX; + /* Examine Mesa's ctx->DrawBuffer->_ColorDrawBuffers state * to determine which surfaces to draw to */ @@ -81,6 +102,7 @@ update_framebuffer_state( struct st_context *st ) if (strb->surface) { pipe_surface_reference(&framebuffer->cbufs[i], strb->surface); +update_framebuffer_size(framebuffer, strb->surface); } strb->defined = GL_TRUE; /* we'll be drawing something */ } @@ -100,12 +122,14 @@ update_framebuffer_state( struct st_context *st ) st_update_renderbuffer_surface(st, strb); } pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else { strb = st_renderbuffer(fb->Attachment[BUFFER_STENCIL].Renderbuffer); if (strb) { assert(strb->surface); pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else pipe_surface_reference(&framebuffer->zsbuf, NULL); @@ -122,6 +146,11 @@ update_framebuffer_state( struct st_context *st ) } #endif + /* _mesa_test_framebuffer_completeness refuses framebuffers with no +* attachments, so this should never happen. */ Close */ on next line. + assert(framebuffer->width != UINT_MAX); + assert(framebuffer->height != UINT_MAX); + cso_set_framebuffer(st->cso_context, framebuffer); } Otherwise, Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] EXTERNAL: Re: Mixing Pixel Shaders and Compute Shaders
> -Original Message- > From: Tom Stellard> > On Wed, Apr 23, 2014 at 01:27:11PM +, Dorrington, Albert wrote: > > When I try to call clEnqueueReadImage(), after a > clEnqueueNDRangeKernel(); the clover/aop/transfer.cpp again generates a > Pixel Shader, which gets integrated into the command stream after the > Compute Shader entries (so I send up with: Command Sequence 1 [Initial Configuration] 2 [PixelShader] Need PS_PARTIAL_FLUSH here <-- this one I think I have with what I added to evergren_init_atom_start_compute_cs() 3 [ComputeShader] Need CS_PARTIAL_FLUSH here 4 [PixelShader] Need PS_PARTIAL_FLUSH here 5 [Final Configuration/Cleanup/Wait] The other two, I've been experimenting trying to add them in various places, but I haevn't seen a change in behavior yet (probably not putting them in the right place yet...) > > You also may need to flush the various caches after the pixel shader and > compute shaders have completed. See r600_flush_emit in > r600_hw_context.c That could be why I'm not seeing any changes yet... I have been looking at r600_flush_emit() this morning and experimenting with its use. > Does the documentation say that setting the RAT bit means it can only be > used by compute shaders, or have you discovered this from your testing? > If this is the case, you may have to use a different CB_COLOR# for the image > when coying it with a pixel shader. The documentation I'm referring to is "Radeon Evergreen/Northern Islands Acceleration" Rev 1.0 Dated May 24, 2011 Section 8 CB Programming Section 8.6 Compute Shader Compute shaders can perform atomic writes ("device reduction operations") to memory via the CB. The order of execution of the operations is not guaranteed, only that they are atomic. These writes can include simple operations (min, max, add, and, or, exchange, compare-exchange) and can optionally return a value (pre-op) back to the shader. The CF_export adds two new opcodes for RAT exports: EXPORT_RAT and EXPORT_RAT_CACHELESS. If CB_COLOR_INFO.RAT is programmed, the surface is treated as a Random Access Target and can only be drawn by Compute Shader operations. A set of MRTs can be configured for RATs and normal rendering. The only stipulation is that all RAT MRTs must be assigned to higher number MRTs than normal rendering MRTs. I take the statement in the 3rd paragraph to mean that if the RAT bit is set, that a CB setup within a Compute Shader will not work in a Pixel Shader. However, looking at the command stream executed, the 'r600_draw_vbo()' function that gets called appears to reconfigure all of the Color buffers, so I don't think this is the issue causing the conflict. I'm getting more convinced that the issue is that the Compute Shader needs to run completely, before the Pixel Shader runs. (presumably the need for the CS_PARTIAL_FLUSH and PS_PARTIAL_FLUSH directives) Frustrating thing is, sometimes DRM can recover from these stalls, other times my box locks up. > > > > Looking forward to your contribution. As always the sooner you can post the > code the better as that will make it easier to review and may help uncover > some of your issues. > I'd love to contribute some of what I've done back soon, unfortunately with our teams choice to use the AMD SDK OpenCL compiler rather than LLVM, I can't easily migrate my changes back to the Mesa baseline. We have quite a few work-arounds (hacks? haha) to setup the RATs and Vertex Buffers the way the AMD compiler needs them. Once the crazy hours at work slowdown, I'll be able to have some more free time at home where i can contribute. :) After I reboot the hung box (yet again) I'll be taking another look to verify if the partial flushes are in the command stream or not. Thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swrast: Fix vertex color in _swsetup_Translate()
On 04/23/2014 08:18 AM, ville.syrj...@linux.intel.com wrote: From: nick Straightforward fix to properly load dest->color with color data, as opposed to position data as previously implemented. [vsyrjala: I noticed the patch languishing in bugzilla. It looks correct to me so I refreshed it for master. Looks like it's a regression introduced in: commit a164d3aee063580503e5e9a77980059d52c486d7 Author: Brian Date: Fri Jul 6 16:51:19 2007 -0600 In _swsetup_Translate(), update dest->attrib[FRAG_ATTRIB_COL0].] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27499 Cc: Brian Paul Signed-off-by: Ville Syrjälä --- src/mesa/swrast_setup/ss_context.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/swrast_setup/ss_context.c b/src/mesa/swrast_setup/ss_context.c index 12a4735..237f74c 100644 --- a/src/mesa/swrast_setup/ss_context.c +++ b/src/mesa/swrast_setup/ss_context.c @@ -287,7 +287,8 @@ _swsetup_Translate( struct gl_context *ctx, const void *vertex, SWvertex *dest ) _tnl_get_attr( ctx, vertex, _TNL_ATTRIB_COLOR0, dest->attrib[VARYING_SLOT_COL0] ); - UNCLAMPED_FLOAT_TO_RGBA_CHAN( dest->color, tmp ); + UNCLAMPED_FLOAT_TO_RGBA_CHAN( dest->color, + dest->attrib[VARYING_SLOT_COL0] ); _tnl_get_attr( ctx, vertex, _TNL_ATTRIB_COLOR1, dest->attrib[VARYING_SLOT_COL1]); Reviewed-by: Brian Paul Do you need someone to commit this for you? Tag for stable branches? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 77789] Account request for Andreas Hartmetz
https://bugs.freedesktop.org/show_bug.cgi?id=77789 Brian Paul changed: What|Removed |Added Assignee|mesa-dev@lists.freedesktop. |sitewranglers@lists.freedes |org |ktop.org Product|Mesa|freedesktop.org Component|Other |New Accounts --- Comment #2 from Brian Paul --- Reassigning to fd.o admins. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 77749] PRAGMA_EXPORT_SUPPORTED defined incorrectly on OS X
https://bugs.freedesktop.org/show_bug.cgi?id=77749 --- Comment #4 from Brian Paul --- The patch looks OK to me. I'm always happy to get rid of old system-specific crud. But I can't apply the patch here to test it: $ patch -p0 < ~/mesaport.diff patching file include/GL/gl.h patch: malformed patch at line 16: @@ -92,10 +87,6 @@ Do I need a special option for patch? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swrast: Fix vertex color in _swsetup_Translate()
On Wed, Apr 23, 2014 at 08:40:22AM -0600, Brian Paul wrote: > On 04/23/2014 08:18 AM, ville.syrj...@linux.intel.com wrote: > > From: nick > > > > Straightforward fix to properly load dest->color with color data, as > > opposed to position data as previously implemented. > > > > [vsyrjala: I noticed the patch languishing in bugzilla. > > It looks correct to me so I refreshed it for master. > > > > Looks like it's a regression introduced in: > >commit a164d3aee063580503e5e9a77980059d52c486d7 > >Author: Brian > >Date: Fri Jul 6 16:51:19 2007 -0600 > > > > In _swsetup_Translate(), update dest->attrib[FRAG_ATTRIB_COL0].] > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27499 > > Cc: Brian Paul > > Signed-off-by: Ville Syrjälä > > --- > > src/mesa/swrast_setup/ss_context.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/src/mesa/swrast_setup/ss_context.c > > b/src/mesa/swrast_setup/ss_context.c > > index 12a4735..237f74c 100644 > > --- a/src/mesa/swrast_setup/ss_context.c > > +++ b/src/mesa/swrast_setup/ss_context.c > > @@ -287,7 +287,8 @@ _swsetup_Translate( struct gl_context *ctx, const void > > *vertex, SWvertex *dest ) > > > > _tnl_get_attr( ctx, vertex, _TNL_ATTRIB_COLOR0, > > dest->attrib[VARYING_SLOT_COL0] ); > > - UNCLAMPED_FLOAT_TO_RGBA_CHAN( dest->color, tmp ); > > + UNCLAMPED_FLOAT_TO_RGBA_CHAN( dest->color, > > + dest->attrib[VARYING_SLOT_COL0] ); > > > > _tnl_get_attr( ctx, vertex, _TNL_ATTRIB_COLOR1, > > dest->attrib[VARYING_SLOT_COL1]); > > > > Reviewed-by: Brian Paul > > Do you need someone to commit this for you? Yes please. > > Tag for stable branches? Yeah that would seem appropriate. The offending commit has been there since 7.2, but I guess 10.1 is the only still active stable branch? -- Ville Syrjälä Intel OTC ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa/st: Fix pipe_framebuffer_state::height for PIPE_TEXTURE_1D_ARRAY.
Thanks for the review. - Original Message - > On 04/23/2014 07:55 AM, jfons...@vmware.com wrote: > > From: José Fonseca > > > > This prevents buffer overflow w/ llvmpipe when running piglit > > > >bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level > >-fbo -auto > > > > v2: Compute the framebuffer size as the minimum size, as pointed out by > > Brian; compacted code; ran piglit quick test list (with no > > regressions.) > > --- > > src/mesa/state_tracker/st_atom_framebuffer.c | 33 > > ++-- > > 1 file changed, 31 insertions(+), 2 deletions(-) > > > > diff --git a/src/mesa/state_tracker/st_atom_framebuffer.c > > b/src/mesa/state_tracker/st_atom_framebuffer.c > > index 4c4f839..f395ec7 100644 > > --- a/src/mesa/state_tracker/st_atom_framebuffer.c > > +++ b/src/mesa/state_tracker/st_atom_framebuffer.c > > @@ -31,6 +31,8 @@ > > * Brian Paul > > */ > > > > +#include > > + > > #include "st_context.h" > > #include "st_atom.h" > > #include "st_cb_bitmap.h" > > @@ -44,6 +46,24 @@ > > > > > > /** > > + * Update framebuffer size. > > + * > > + * framebuffer->width should match fb->Weight, but for > > PIPE_TEXTURE_1D_ARRAY > > "fb->Width" > > > > + * textures fb->Height has the number of layers, and not the surface > > height. > > + */ > > The comment seems a bit disconnected from the code. > update_framebuffer_size() is used to find the size which is the min of > the attached surfaces. The comment about 1D array textures doesn't seem > to matter in the code. That just seems a little confusing. Yes, the update_framebuffer_size finds the min size, which I think is obvious. This comment here was supposed to explain why we do it when gl_framebuffer has similar info, ie., the less obvious bit. But I agree that the comment could be better phrased. What about this? "We need to derive pipe_framebuffer size from the bound pipe_surfaces here instead of copying gl_framebuffer size because for certain target types (like PIPE_TEXTURE_1D_ARRAY) gl_framebuffer::Height has the number of layers instead of 1." Jose > > +static void > > +update_framebuffer_size(struct pipe_framebuffer_state *framebuffer, > > +struct pipe_surface *surface) > > +{ > > + assert(surface); > > + assert(surface->width < UINT_MAX); > > + assert(surface->height < UINT_MAX); > > + framebuffer->width = MIN2(framebuffer->width, surface->width); > > + framebuffer->height = MIN2(framebuffer->height, surface->height); > > +} > > + > > + > > +/** > >* Update framebuffer state (color, depth, stencil, etc. buffers) > >*/ > > static void > > @@ -57,11 +77,12 @@ update_framebuffer_state( struct st_context *st ) > > st_flush_bitmap_cache(st); > > > > st->state.fb_orientation = st_fb_orientation(fb); > > - framebuffer->width = fb->Width; > > - framebuffer->height = fb->Height; > > > > /*printf("-- fb size %d x %d\n", fb->Width, fb->Height);*/ > > > > + framebuffer->width = UINT_MAX; > > + framebuffer->height = UINT_MAX; > > + > > /* Examine Mesa's ctx->DrawBuffer->_ColorDrawBuffers state > > * to determine which surfaces to draw to > > */ > > @@ -81,6 +102,7 @@ update_framebuffer_state( struct st_context *st ) > > > >if (strb->surface) { > > pipe_surface_reference(&framebuffer->cbufs[i], > > strb->surface); > > +update_framebuffer_size(framebuffer, strb->surface); > >} > >strb->defined = GL_TRUE; /* we'll be drawing something */ > > } > > @@ -100,12 +122,14 @@ update_framebuffer_state( struct st_context *st ) > >st_update_renderbuffer_surface(st, strb); > > } > > pipe_surface_reference(&framebuffer->zsbuf, strb->surface); > > + update_framebuffer_size(framebuffer, strb->surface); > > } > > else { > > strb = > > st_renderbuffer(fb->Attachment[BUFFER_STENCIL].Renderbuffer); > > if (strb) { > >assert(strb->surface); > >pipe_surface_reference(&framebuffer->zsbuf, strb->surface); > > + update_framebuffer_size(framebuffer, strb->surface); > > } > > else > >pipe_surface_reference(&framebuffer->zsbuf, NULL); > > @@ -122,6 +146,11 @@ update_framebuffer_state( struct st_context *st ) > > } > > #endif > > > > + /* _mesa_test_framebuffer_completeness refuses framebuffers with no > > +* attachments, so this should never happen. */ > > Close */ on next line. > > > > + assert(framebuffer->width != UINT_MAX); > > + assert(framebuffer->height != UINT_MAX); > > + > > cso_set_framebuffer(st->cso_context, framebuffer); > > } > > > > > > Otherwise, Reviewed-by: Brian Paul > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa/st: Fix pipe_framebuffer_state::height for PIPE_TEXTURE_1D_ARRAY.
On 04/23/2014 09:17 AM, Jose Fonseca wrote: Thanks for the review. - Original Message - On 04/23/2014 07:55 AM, jfons...@vmware.com wrote: From: José Fonseca This prevents buffer overflow w/ llvmpipe when running piglit bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto v2: Compute the framebuffer size as the minimum size, as pointed out by Brian; compacted code; ran piglit quick test list (with no regressions.) --- src/mesa/state_tracker/st_atom_framebuffer.c | 33 ++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_framebuffer.c b/src/mesa/state_tracker/st_atom_framebuffer.c index 4c4f839..f395ec7 100644 --- a/src/mesa/state_tracker/st_atom_framebuffer.c +++ b/src/mesa/state_tracker/st_atom_framebuffer.c @@ -31,6 +31,8 @@ * Brian Paul */ +#include + #include "st_context.h" #include "st_atom.h" #include "st_cb_bitmap.h" @@ -44,6 +46,24 @@ /** + * Update framebuffer size. + * + * framebuffer->width should match fb->Weight, but for PIPE_TEXTURE_1D_ARRAY "fb->Width" + * textures fb->Height has the number of layers, and not the surface height. + */ The comment seems a bit disconnected from the code. update_framebuffer_size() is used to find the size which is the min of the attached surfaces. The comment about 1D array textures doesn't seem to matter in the code. That just seems a little confusing. Yes, the update_framebuffer_size finds the min size, which I think is obvious. This comment here was supposed to explain why we do it when gl_framebuffer has similar info, ie., the less obvious bit. But I agree that the comment could be better phrased. What about this? "We need to derive pipe_framebuffer size from the bound pipe_surfaces here instead of copying gl_framebuffer size because for certain target types (like PIPE_TEXTURE_1D_ARRAY) gl_framebuffer::Height has the number of layers instead of 1." That sounds great. -Brian Jose +static void +update_framebuffer_size(struct pipe_framebuffer_state *framebuffer, +struct pipe_surface *surface) +{ + assert(surface); + assert(surface->width < UINT_MAX); + assert(surface->height < UINT_MAX); + framebuffer->width = MIN2(framebuffer->width, surface->width); + framebuffer->height = MIN2(framebuffer->height, surface->height); +} + + +/** * Update framebuffer state (color, depth, stencil, etc. buffers) */ static void @@ -57,11 +77,12 @@ update_framebuffer_state( struct st_context *st ) st_flush_bitmap_cache(st); st->state.fb_orientation = st_fb_orientation(fb); - framebuffer->width = fb->Width; - framebuffer->height = fb->Height; /*printf("-- fb size %d x %d\n", fb->Width, fb->Height);*/ + framebuffer->width = UINT_MAX; + framebuffer->height = UINT_MAX; + /* Examine Mesa's ctx->DrawBuffer->_ColorDrawBuffers state * to determine which surfaces to draw to */ @@ -81,6 +102,7 @@ update_framebuffer_state( struct st_context *st ) if (strb->surface) { pipe_surface_reference(&framebuffer->cbufs[i], strb->surface); +update_framebuffer_size(framebuffer, strb->surface); } strb->defined = GL_TRUE; /* we'll be drawing something */ } @@ -100,12 +122,14 @@ update_framebuffer_state( struct st_context *st ) st_update_renderbuffer_surface(st, strb); } pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else { strb = st_renderbuffer(fb->Attachment[BUFFER_STENCIL].Renderbuffer); if (strb) { assert(strb->surface); pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else pipe_surface_reference(&framebuffer->zsbuf, NULL); @@ -122,6 +146,11 @@ update_framebuffer_state( struct st_context *st ) } #endif + /* _mesa_test_framebuffer_completeness refuses framebuffers with no +* attachments, so this should never happen. */ Close */ on next line. + assert(framebuffer->width != UINT_MAX); + assert(framebuffer->height != UINT_MAX); + cso_set_framebuffer(st->cso_context, framebuffer); } Otherwise, Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [cfe-dev] 3 element vectors in opencl 1.1+
On Tue, 2014-04-22 at 17:50 -0700, Matt Arsenault wrote: > >> I think this is what v96:128 is for > > according to [0], it specifies only alignment, not size. I could not > > find an __attribute__ that would change size either. > > > > It should be possible to have ADMGPUDataLayout: public DataLayout class > > that would intercept the call and fix the reported value, but I think it > > would only move the hack to different place. > > > > I have added pocl-devel list as suggested. > > > > regards, > > Jan > > > > [0]http://llvm.org/docs/LangRef.html#data-layout > > > > Only the size in memory matters, which is what the required alignment > specifies. DataLayout::getTypeAllocSize accounts for the alignment, but > getTypeStoreSize does not. I actually thought this was half of what > getTypeStoreSize was for, but it turns out it isn't. hm, I always thought that alignment only puts restrictions on starting address and using padding was just a tool to do the job. anyway, thanks for the hint, using getTypeAllocSize works nicely. since we are allocating space in the argument vector I think getAllocSize is the right function to use. I'll post a patch. regards, Jan -- Jan Vesely signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/1] clover: use getTypeAllocSize() for kernel arguments
3 element vectors have the size of 4 element ones. See Ch 6.1.5 of OCL 1.1+ specs Thx Matt Arsenault for the hint Signed-off-by: Jan Vesely --- src/gallium/state_trackers/clover/llvm/invocation.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp index a81bdf8..a2022b4 100644 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp @@ -308,7 +308,7 @@ namespace { #endif llvm::Type *arg_type = arg.getType(); -unsigned arg_size = TD.getTypeStoreSize(arg_type); +unsigned arg_size = TD.getTypeAllocSize(arg_type); llvm::Type *target_type = arg_type->isIntegerTy() ? TD.getSmallestLegalIntType(mod->getContext(), arg_size * 8) : -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] swrast: move texture_slices() calls out of loops
--- src/mesa/swrast/s_texture.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/mesa/swrast/s_texture.c b/src/mesa/swrast/s_texture.c index 9273e94..5fd80ca 100644 --- a/src/mesa/swrast/s_texture.c +++ b/src/mesa/swrast/s_texture.c @@ -269,7 +269,7 @@ _swrast_map_texture(struct gl_context *ctx, struct gl_texture_object *texObj) for (level = texObj->BaseLevel; level < MAX_TEXTURE_LEVELS; level++) { struct gl_texture_image *texImage = texObj->Image[face][level]; struct swrast_texture_image *swImage = swrast_texture_image(texImage); - unsigned int i; + unsigned int i, slices; if (!texImage) continue; @@ -289,7 +289,9 @@ _swrast_map_texture(struct gl_context *ctx, struct gl_texture_object *texObj) continue; } - for (i = 0; i < texture_slices(texImage); i++) { + slices = texture_slices(texImage); + + for (i = 0; i < slices; i++) { GLubyte *map; GLint rowStride; @@ -327,7 +329,7 @@ _swrast_unmap_texture(struct gl_context *ctx, struct gl_texture_object *texObj) for (level = texObj->BaseLevel; level < MAX_TEXTURE_LEVELS; level++) { struct gl_texture_image *texImage = texObj->Image[face][level]; struct swrast_texture_image *swImage = swrast_texture_image(texImage); - unsigned int i; + unsigned int i, slices; if (!texImage) continue; @@ -338,7 +340,9 @@ _swrast_unmap_texture(struct gl_context *ctx, struct gl_texture_object *texObj) if (!swImage->ImageSlices) continue; - for (i = 0; i < texture_slices(texImage); i++) { + slices = texture_slices(texImage); + + for (i = 0; i < slices; i++) { if (swImage->ImageSlices[i]) { ctx->Driver.UnmapTextureImage(ctx, texImage, i); swImage->ImageSlices[i] = NULL; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] swrast: move null pointer check earlier in _swrast_map_teximage()
There's no reason to compute texel size, stride, etc. if there's no image data to map. --- src/mesa/swrast/s_texture.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/src/mesa/swrast/s_texture.c b/src/mesa/swrast/s_texture.c index cbfa26b..9273e94 100644 --- a/src/mesa/swrast/s_texture.c +++ b/src/mesa/swrast/s_texture.c @@ -218,6 +218,15 @@ _swrast_map_teximage(struct gl_context *ctx, check_map_teximage(texImage, slice, x, y, w, h); + if (!swImage->Buffer) { + /* Either glTexImage was called with a NULL argument or + * we ran out of memory when allocating texture memory, + */ + *mapOut = NULL; + *rowStrideOut = 0; + return; + } + texelSize = _mesa_get_format_bytes(texImage->TexFormat); stride = _mesa_format_row_stride(texImage->TexFormat, texImage->Width); _mesa_get_format_block_size(texImage->TexFormat, &bw, &bh); @@ -225,12 +234,6 @@ _swrast_map_teximage(struct gl_context *ctx, assert(x % bw == 0); assert(y % bh == 0); - if (!swImage->Buffer) { - /* probably ran out of memory when allocating tex mem */ - *mapOut = NULL; - return; - } - /* This function can only be used with a swrast-allocated buffer, in which * case ImageSlices is populated with pointers into Buffer. */ -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] swrast: allocate swrast_texture_image::ImageSlices array if needed
Fixes a segmentation fault in conform divzero.c test. This happens when glTexImage(level, width=0, height=0) is called. We don't allocate texture memory in that case so the ImageSlices array was never allocated. Cc: "10.1" --- src/mesa/swrast/s_texture.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/mesa/swrast/s_texture.c b/src/mesa/swrast/s_texture.c index c08a4e9..1d449a2 100644 --- a/src/mesa/swrast/s_texture.c +++ b/src/mesa/swrast/s_texture.c @@ -279,6 +279,13 @@ _swrast_map_texture(struct gl_context *ctx, struct gl_texture_object *texObj) continue; } + if (!swImage->ImageSlices) { +swImage->ImageSlices = + calloc(texture_slices(texImage), sizeof(void *)); +if (!swImage->ImageSlices) + continue; + } + for (i = 0; i < texture_slices(texImage); i++) { GLubyte *map; GLint rowStride; @@ -325,6 +332,9 @@ _swrast_unmap_texture(struct gl_context *ctx, struct gl_texture_object *texObj) if (swImage->Buffer) return; + if (!swImage->ImageSlices) +continue; + for (i = 0; i < texture_slices(texImage); i++) { if (swImage->ImageSlices[i]) { ctx->Driver.UnmapTextureImage(ctx, texImage, i); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] swrast: remove _mesa_ prefix from static function
And add a const qualifier. --- src/mesa/swrast/s_texture.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/swrast/s_texture.c b/src/mesa/swrast/s_texture.c index 1d449a2..cbfa26b 100644 --- a/src/mesa/swrast/s_texture.c +++ b/src/mesa/swrast/s_texture.c @@ -177,8 +177,8 @@ _swrast_free_texture_image_buffer(struct gl_context *ctx, * Error checking for debugging only. */ static void -_mesa_check_map_teximage(struct gl_texture_image *texImage, - GLuint slice, GLuint x, GLuint y, GLuint w, GLuint h) +check_map_teximage(const struct gl_texture_image *texImage, + GLuint slice, GLuint x, GLuint y, GLuint w, GLuint h) { if (texImage->TexObject->Target == GL_TEXTURE_1D) @@ -216,7 +216,7 @@ _swrast_map_teximage(struct gl_context *ctx, GLint stride, texelSize; GLuint bw, bh; - _mesa_check_map_teximage(texImage, slice, x, y, w, h); + check_map_teximage(texImage, slice, x, y, w, h); texelSize = _mesa_get_format_bytes(texImage->TexFormat); stride = _mesa_format_row_stride(texImage->TexFormat, texImage->Width); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa/st: Fix pipe_framebuffer_state::height for PIPE_TEXTURE_1D_ARRAY.
Am 23.04.2014 17:22, schrieb Brian Paul: > On 04/23/2014 09:17 AM, Jose Fonseca wrote: >> Thanks for the review. >> >> - Original Message - >>> On 04/23/2014 07:55 AM, jfons...@vmware.com wrote: From: José Fonseca This prevents buffer overflow w/ llvmpipe when running piglit bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto v2: Compute the framebuffer size as the minimum size, as pointed out by Brian; compacted code; ran piglit quick test list (with no regressions.) --- src/mesa/state_tracker/st_atom_framebuffer.c | 33 ++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_framebuffer.c b/src/mesa/state_tracker/st_atom_framebuffer.c index 4c4f839..f395ec7 100644 --- a/src/mesa/state_tracker/st_atom_framebuffer.c +++ b/src/mesa/state_tracker/st_atom_framebuffer.c @@ -31,6 +31,8 @@ * Brian Paul */ +#include + #include "st_context.h" #include "st_atom.h" #include "st_cb_bitmap.h" @@ -44,6 +46,24 @@ /** + * Update framebuffer size. + * + * framebuffer->width should match fb->Weight, but for PIPE_TEXTURE_1D_ARRAY >>> >>> "fb->Width" >>> >>> + * textures fb->Height has the number of layers, and not the surface height. + */ >>> >>> The comment seems a bit disconnected from the code. >>> update_framebuffer_size() is used to find the size which is the min of >>> the attached surfaces. The comment about 1D array textures doesn't seem >>> to matter in the code. That just seems a little confusing. >> >> Yes, the update_framebuffer_size finds the min size, which I think is >> obvious. This comment here was supposed to explain why we do it when >> gl_framebuffer has similar info, ie., the less obvious bit. >> >> But I agree that the comment could be better phrased. What about this? >> >> "We need to derive pipe_framebuffer size from the bound >> pipe_surfaces here instead of copying gl_framebuffer size because for >> certain target types (like PIPE_TEXTURE_1D_ARRAY) >> gl_framebuffer::Height has the number of layers instead of 1." > > That sounds great. > > -Brian > > >> Jose >> >> >> +static void +update_framebuffer_size(struct pipe_framebuffer_state *framebuffer, +struct pipe_surface *surface) +{ + assert(surface); + assert(surface->width < UINT_MAX); + assert(surface->height < UINT_MAX); + framebuffer->width = MIN2(framebuffer->width, surface->width); + framebuffer->height = MIN2(framebuffer->height, surface->height); +} + + +/** * Update framebuffer state (color, depth, stencil, etc. buffers) */ static void @@ -57,11 +77,12 @@ update_framebuffer_state( struct st_context *st ) st_flush_bitmap_cache(st); st->state.fb_orientation = st_fb_orientation(fb); - framebuffer->width = fb->Width; - framebuffer->height = fb->Height; /*printf("-- fb size %d x %d\n", fb->Width, fb->Height);*/ + framebuffer->width = UINT_MAX; + framebuffer->height = UINT_MAX; + /* Examine Mesa's ctx->DrawBuffer->_ColorDrawBuffers state * to determine which surfaces to draw to */ @@ -81,6 +102,7 @@ update_framebuffer_state( struct st_context *st ) if (strb->surface) { pipe_surface_reference(&framebuffer->cbufs[i], strb->surface); +update_framebuffer_size(framebuffer, strb->surface); } strb->defined = GL_TRUE; /* we'll be drawing something */ } @@ -100,12 +122,14 @@ update_framebuffer_state( struct st_context *st ) st_update_renderbuffer_surface(st, strb); } pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else { strb = st_renderbuffer(fb->Attachment[BUFFER_STENCIL].Renderbuffer); if (strb) { assert(strb->surface); pipe_surface_reference(&framebuffer->zsbuf, strb->surface); + update_framebuffer_size(framebuffer, strb->surface); } else pipe_surface_reference(&framebuffer->zsbuf, NULL); @@ -122,6 +146,11 @@ update_framebuffer_state( struct st_context *st ) } #endif + /* _mesa_test_framebuffer_completeness refuses framebuffers with no +* attachments, so this should never happen. */ >>> >>> Close */ on next line. >>> >>> + assert(framebuffer->width != UINT_MAX); + assert(fra
Re: [Mesa-dev] Mesa build instructions
On Fri, Apr 11, 2014 at 10:37 AM, Matt Turner wrote: > Someone asked about my Mesa build set up. Rather than sending it > privately I figured I'd post it for posterity on mesa-dev. > > I build with > > CFLAGS="-O2 -march=native -pipe" CXXFLAGS="$CFLAGS" ./autogen.sh > --with-dri-drivers=i965 --with-gallium-drivers= > --with-egl-platforms=x11,drm --enable-glx-tls --enable-gles1 > --enable-gles2 > > --enable-debug enables assertions and builds with -g, but also > silently adds -O0 to your CFLAGS. > > My build set up consists of > > ~/projects/mesa - contains mesa git checkout > ~/projects/mesa-release - release build directory > ~/projects/mesa-debug - debug build directory > > I do out-of-tree builds into the debug and release directories by > running ../mesa/autogen.sh ... from each of them, with the only > difference being the --enable-debug flag. > > Environment variables allow you to use your built mesa without > installing it: LIBGL_DRIVERS_PATH, LD_LIBRARY_PATH. Mesa's build > system links built files into $(builddir)/$(libdir), e.g., > ~/projects/mesa-release/lib/. I use some wrapper scripts to allow me > to easily run programs against my Mesa builds: > > ~/bin/mesa-release: > #!/bin/sh > LIBGL_DRIVERS_PATH=~/projects/mesa-release/lib > LD_LIBRARY_PATH=~/projects/mesa-release/lib:${LD_LIBRARY_PATH} $@ Note for anyone using this: gmail line wrapped this. LIBGL_DRIVERS_PATH and LD_LIBRARY_PATH need to be set on the line that executes $@. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallium/util: use ui[4] instead of ui in union util_color
From: Roland Scheidegger util_color often merely represents a collection of bytes, however it is inconvenient if those bytes can only be accessed as floats/doubles for int formats exceeding 32bits. (Note that since rgba8 formats use one uint, not 4 bytes, hence the byte and short member were left as is.) --- src/gallium/auxiliary/util/u_pack_color.h | 38 src/gallium/auxiliary/util/u_surface.c |2 +- src/gallium/drivers/freedreno/a2xx/fd2_draw.c |2 +- src/gallium/drivers/i915/i915_clear.c |6 ++-- src/gallium/drivers/i915/i915_surface.c|2 +- src/gallium/drivers/ilo/ilo_blitter_blt.c |4 +-- src/gallium/drivers/nouveau/nv30/nv30_clear.c |2 +- src/gallium/drivers/r300/r300_blit.c |4 +-- src/gallium/drivers/r300/r300_state.c |2 +- src/gallium/drivers/r300/r300_state_derived.c | 12 src/gallium/drivers/svga/svga_pipe_clear.c |2 +- src/mesa/state_tracker/st_atom_pixeltransfer.c |2 +- 12 files changed, 39 insertions(+), 39 deletions(-) diff --git a/src/gallium/auxiliary/util/u_pack_color.h b/src/gallium/auxiliary/util/u_pack_color.h index 166c68b..e0c9018 100644 --- a/src/gallium/auxiliary/util/u_pack_color.h +++ b/src/gallium/auxiliary/util/u_pack_color.h @@ -51,7 +51,7 @@ union util_color { ubyte ub; ushort us; - uint ui; + uint ui[4]; ushort h[4]; /* half float */ float f[4]; double d[4]; @@ -67,32 +67,32 @@ util_pack_color_ub(ubyte r, ubyte g, ubyte b, ubyte a, switch (format) { case PIPE_FORMAT_ABGR_UNORM: { - uc->ui = (r << 24) | (g << 16) | (b << 8) | a; + uc->ui[0] = (r << 24) | (g << 16) | (b << 8) | a; } return; case PIPE_FORMAT_XBGR_UNORM: { - uc->ui = (r << 24) | (g << 16) | (b << 8) | 0xff; + uc->ui[0] = (r << 24) | (g << 16) | (b << 8) | 0xff; } return; case PIPE_FORMAT_BGRA_UNORM: { - uc->ui = (a << 24) | (r << 16) | (g << 8) | b; + uc->ui[0] = (a << 24) | (r << 16) | (g << 8) | b; } return; case PIPE_FORMAT_BGRX_UNORM: { - uc->ui = (0xff << 24) | (r << 16) | (g << 8) | b; + uc->ui[0] = (0xff << 24) | (r << 16) | (g << 8) | b; } return; case PIPE_FORMAT_ARGB_UNORM: { - uc->ui = (b << 24) | (g << 16) | (r << 8) | a; + uc->ui[0] = (b << 24) | (g << 16) | (r << 8) | a; } return; case PIPE_FORMAT_XRGB_UNORM: { - uc->ui = (b << 24) | (g << 16) | (r << 8) | 0xff; + uc->ui[0] = (b << 24) | (g << 16) | (r << 8) | 0xff; } return; case PIPE_FORMAT_B5G6R5_UNORM: @@ -168,7 +168,7 @@ util_unpack_color_ub(enum pipe_format format, union util_color *uc, switch (format) { case PIPE_FORMAT_ABGR_UNORM: { - uint p = uc->ui; + uint p = uc->ui[0]; *r = (ubyte) ((p >> 24) & 0xff); *g = (ubyte) ((p >> 16) & 0xff); *b = (ubyte) ((p >> 8) & 0xff); @@ -177,7 +177,7 @@ util_unpack_color_ub(enum pipe_format format, union util_color *uc, return; case PIPE_FORMAT_XBGR_UNORM: { - uint p = uc->ui; + uint p = uc->ui[0]; *r = (ubyte) ((p >> 24) & 0xff); *g = (ubyte) ((p >> 16) & 0xff); *b = (ubyte) ((p >> 8) & 0xff); @@ -186,7 +186,7 @@ util_unpack_color_ub(enum pipe_format format, union util_color *uc, return; case PIPE_FORMAT_BGRA_UNORM: { - uint p = uc->ui; + uint p = uc->ui[0]; *r = (ubyte) ((p >> 16) & 0xff); *g = (ubyte) ((p >> 8) & 0xff); *b = (ubyte) ((p >> 0) & 0xff); @@ -195,7 +195,7 @@ util_unpack_color_ub(enum pipe_format format, union util_color *uc, return; case PIPE_FORMAT_BGRX_UNORM: { - uint p = uc->ui; + uint p = uc->ui[0]; *r = (ubyte) ((p >> 16) & 0xff); *g = (ubyte) ((p >> 8) & 0xff); *b = (ubyte) ((p >> 0) & 0xff); @@ -204,7 +204,7 @@ util_unpack_color_ub(enum pipe_format format, union util_color *uc, return; case PIPE_FORMAT_ARGB_UNORM: { - uint p = uc->ui; + uint p = uc->ui[0]; *r = (ubyte) ((p >> 8) & 0xff); *g = (ubyte) ((p >> 16) & 0xff); *b = (ubyte) ((p >> 24) & 0xff); @@ -213,7 +213,7 @@ util_unpack_color_ub(enum pipe_format format, union util_color *uc, return; case PIPE_FORMAT_XRGB_UNORM: { - uint p = uc->ui; + uint p = uc->ui[0]; *r = (ubyte) ((p >> 8) & 0xff); *g = (ubyte) ((p >> 16) & 0xff); *b = (ubyte) ((p >> 24) & 0xff); @@ -352,32 +352,32 @@ util_pack_color(const float rgba[4], enum pipe_format format, union util_color * switch (format) { case PIPE_FORMAT_ABGR_UNORM: { - uc->ui = (r << 24) | (g << 16) | (b <<
[Mesa-dev] [PATCH 2/2] llvmpipe: fix clearing of individual color buffers in a fb
From: Roland Scheidegger GL (3.0) allows you to clear individual color buffers in a fb. In fact for fbs containing both int and float/normalized color buffers this is required (because the clearing values are otherwise undefined if applied to all buffers). The gallium interface was changed a while ago, but llvmpipe ignored it (hence doing such individual clears always resulted in clearing all buffers, plus some assorted asserts due to the mixed fbs). So change the clear command to indicate the buffer to be cleared. Also, because indicating the buffer to be cleared would have made lp_rast_arg_cmd larger which is unacceptable (we're trying to shrink it some day) allocate the clear value in the scene and just pass a pointer. There's several advantages and disadvantages here: + clearing individual buffers works (we could also actually bin such clears now if they'd come through clear_render_target() if the surface is in the current fb, though we didn't do this before for the single rb case and still don't try). + since there's one clear per rb, we do the format conversion in setup rather than per bin. Aside from the (drop in the ocean...) performance advantage this means that clearing to very small values (that is, denormal when converted to the format) should work for small float (fp16 etc.) formats, as the util code couldn't handle it correctly before (because cpu denorms are disabled when executing the bin commands, screwing up the magic conversion and flushing the values to 0, though this was not verified). - there's some overhead for traditional old-style clear-all MRT cases, since there's one rast clear command per rb instead of one for all rbs. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=76976. v2: get rid of the ugly manual memcpy stuff and just use union util_color. This is 32 bytes instead of 16 but as the allocation is per scene we can live with those additional 16 bytes (and the additional 128 bytes in the setup context), which makes the code much more obvious. Suggested by Brian. --- src/gallium/drivers/llvmpipe/lp_rast.c | 122 +++-- src/gallium/drivers/llvmpipe/lp_rast.h |9 +- src/gallium/drivers/llvmpipe/lp_scene.c |1 - src/gallium/drivers/llvmpipe/lp_scene.h |1 - src/gallium/drivers/llvmpipe/lp_setup.c | 217 --- src/gallium/drivers/llvmpipe/lp_setup_context.h |3 +- 6 files changed, 190 insertions(+), 163 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index 0ae5976..d50ee48 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -110,25 +110,6 @@ lp_rast_tile_begin(struct lp_rasterizer_task *task, /** - * Examine a framebuffer object to determine if any of the colorbuffers - * use a pure integer format. - * XXX this could be a gallium utility function if useful elsewhere. - */ -static boolean -is_fb_pure_integer(const struct pipe_framebuffer_state *fb) -{ - unsigned i; - for (i = 0; i < fb->nr_cbufs; i++) { - if (fb->cbufs[i] && - util_format_is_pure_integer(fb->cbufs[i]->format)) { - return TRUE; - } - } - return FALSE; -} - - -/** * Clear the rasterizer's current color tile. * This is a bin command called during bin processing. * Clear commands always clear all bound layers. @@ -138,87 +119,42 @@ lp_rast_clear_color(struct lp_rasterizer_task *task, const union lp_rast_cmd_arg arg) { const struct lp_scene *scene = task->scene; + unsigned cbuf = arg.clear_rb->cbuf; + union util_color uc; + enum pipe_format format; - if (scene->fb.nr_cbufs) { - unsigned i; - union util_color uc; - - if (is_fb_pure_integer(&scene->fb)) { - /* - * We expect int/uint clear values here, though some APIs - * might disagree (but in any case util_pack_color() - * couldn't handle it)... - */ - LP_DBG(DEBUG_RAST, "%s pure int 0x%x,0x%x,0x%x,0x%x\n", __FUNCTION__, -arg.clear_color.ui[0], -arg.clear_color.ui[1], -arg.clear_color.ui[2], -arg.clear_color.ui[3]); - - for (i = 0; i < scene->fb.nr_cbufs; i++) { -enum pipe_format format = scene->fb.cbufs[i]->format; - -if (util_format_is_pure_sint(format)) { - util_format_write_4i(format, arg.clear_color.i, 0, &uc, 0, 0, 0, 1, 1); -} -else { - assert(util_format_is_pure_uint(format)); - util_format_write_4ui(format, arg.clear_color.ui, 0, &uc, 0, 0, 0, 1, 1); -} - -util_fill_box(scene->cbufs[i].map, - format, - scene->cbufs[i].stride, - scene->cbufs[i].layer_stride, - task->x, - task->y, -
Re: [Mesa-dev] [PATCH 0/5] util: Rework endian handling in python code
Richard, Michel, Apologies for the long silence. I flagged this thread as worth following, but I failed to noticed the RFC to me. I glanced over it and the series looks good to me AFAICT. I agree that it is a better to defer the endianess to C-preprocessing time. Jose - Original Message - > Ping > > Richard Sandiford writes: > > Ping (with fixed subject) > > > > Richard Sandiford writes: > >> This is a refresh of: > >> > >> > >> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/archives/mesa-dev/2013-June/040594.html&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=qNJxBmC8T0UJrxq1t6ovCuRj4qWejx24a49GkVAHnuk%3D%0A&s=791bcf25d17824bc574aec7183fe264a488fd5b311f6726973a81f52cf580a02 > >> > >> At the moment the python code uses sys.byteorder to decide whether > >> u_format_table.c should be for big or little endian. With this series > >> it instead generates both forms, using blocks like: > >> > >> #ifdef PIPE_ARCH_BIG_ENDIAN > >> ... > >> #else > >> ... > >> #endif > >> > >> in cases where endianness matters. > >> > >> Doing it this way is more cross-compiler-friendly. It also means people > >> working on LE systems can see what the differences would be for BE. > >> > >> Tested on x86_64 and z. I don't have commit access so please apply if OK. > >> > >> Thanks, > >> Richard > >> > >> ___ > >> mesa-dev mailing list > >> mesa-dev@lists.freedesktop.org > >> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=qNJxBmC8T0UJrxq1t6ovCuRj4qWejx24a49GkVAHnuk%3D%0A&s=d03e30542dd6de0bfb0bb613d160a4128997a7be4c52d89f34a970092856ef93 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=qNJxBmC8T0UJrxq1t6ovCuRj4qWejx24a49GkVAHnuk%3D%0A&s=d03e30542dd6de0bfb0bb613d160a4128997a7be4c52d89f34a970092856ef93 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=qNJxBmC8T0UJrxq1t6ovCuRj4qWejx24a49GkVAHnuk%3D%0A&s=d03e30542dd6de0bfb0bb613d160a4128997a7be4c52d89f34a970092856ef93 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swrast: Add glBlitFramebuffer to commands affected by conditional rendering
Carl Worth writes: > After I picked this to the 10.1 branch and tested I found the following > piglit regression: > > $ ~/src/piglit/bin/nv_conditional_render-blitframebuffer -auto > Probe color at (0,16) > Expected: 0.00 1.00 0.00 0.00 > Observed: 1.00 0.00 0.00 0.00 > PIGLIT: {'result': 'fail' } I went to file a bug on this issue and found one open already: [i965 Bisected]Piglit spec/NV_conditional_render_blitframebuffer fails https://bugs.freedesktop.org/show_bug.cgi?id=77702 The discussion there is leaning toward the test being incorrect. Once that bug is resolved I can pick the patch back to the stable branch. -Carl -- carl.d.wo...@intel.com pgp4pOMtk0D_n.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [cfe-dev] 3 element vectors in opencl 1.1+
Jan Vesely writes: > On Tue, 2014-04-22 at 17:50 -0700, Matt Arsenault wrote: > > >> >> I think this is what v96:128 is for >> > according to [0], it specifies only alignment, not size. I could not >> > find an __attribute__ that would change size either. >> > >> > It should be possible to have ADMGPUDataLayout: public DataLayout class >> > that would intercept the call and fix the reported value, but I think it >> > would only move the hack to different place. >> > >> > I have added pocl-devel list as suggested. >> > >> > regards, >> > Jan >> > >> > [0]http://llvm.org/docs/LangRef.html#data-layout >> > >> >> Only the size in memory matters, which is what the required alignment >> specifies. DataLayout::getTypeAllocSize accounts for the alignment, but >> getTypeStoreSize does not. I actually thought this was half of what >> getTypeStoreSize was for, but it turns out it isn't. > > hm, I always thought that alignment only puts restrictions on starting > address and using padding was just a tool to do the job. > > anyway, thanks for the hint, using getTypeAllocSize works nicely. > since we are allocating space in the argument vector I think > getAllocSize is the right function to use. > > I'll post a patch. > I don't think that using getTypeAllocSize() instead of getTypeStoreSize() to calculate clover::argument::size would be a satisfactory solution. By doing that you'd redefine the API argument size exposed to the host for *all* argument types to be the device-dependent aligned size, which is definitely not what you want. AFAIK 3-element vectors are an exception because they are the only types that are defined to have a different API size from their actual usable size, so they probably deserve special handling in invocation.cpp (as you did in your first patch). As the API size is target-independent I don't think that the fix belongs in Clang or LLVM, Clover is likely at fault. Thanks. > regards, > Jan > > > -- > Jan Vesely > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpPhkfcwcCIy.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 77582] [r600g] ogl-samples GL3.2 and GL3.3 tests doesn't run without overriding GL/GLSL version
https://bugs.freedesktop.org/show_bug.cgi?id=77582 Benjamin Bellec changed: What|Removed |Added Resolution|NOTOURBUG |INVALID Assignee|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop |org |.org Component|Mesa core |Drivers/Gallium/r600 --- Comment #7 from Benjamin Bellec --- Old libGL in /usr/local/lib64 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw/llvm: reduce memory usage
Lets make draw_get_option_use_llvm function available unconditionally and use it to avoid useless allocations when LLVM paths are active. TGSI machine is never used when we're using LLVM. Signed-off-by: Zack Rusin --- src/gallium/auxiliary/draw/draw_context.c | 6 ++ src/gallium/auxiliary/draw/draw_context.h | 2 -- src/gallium/auxiliary/draw/draw_gs.c | 26 -- src/gallium/auxiliary/draw/draw_vs.c | 11 +++ src/gallium/auxiliary/draw/draw_vs_exec.c | 2 ++ 5 files changed, 27 insertions(+), 20 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 0a67879..ddc305b 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -68,6 +68,12 @@ draw_get_option_use_llvm(void) } return value; } +#else +boolean +draw_get_option_use_llvm(void) +{ + return FALSE; +} #endif diff --git a/src/gallium/auxiliary/draw/draw_context.h b/src/gallium/auxiliary/draw/draw_context.h index f114f50..48549fe 100644 --- a/src/gallium/auxiliary/draw/draw_context.h +++ b/src/gallium/auxiliary/draw/draw_context.h @@ -288,9 +288,7 @@ draw_get_shader_param(unsigned shader, enum pipe_shader_cap param); int draw_get_shader_param_no_llvm(unsigned shader, enum pipe_shader_cap param); -#ifdef HAVE_LLVM boolean draw_get_option_use_llvm(void); -#endif #endif /* DRAW_CONTEXT_H */ diff --git a/src/gallium/auxiliary/draw/draw_gs.c b/src/gallium/auxiliary/draw/draw_gs.c index 7de5e03..5e503ff 100644 --- a/src/gallium/auxiliary/draw/draw_gs.c +++ b/src/gallium/auxiliary/draw/draw_gs.c @@ -674,11 +674,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader, void draw_geometry_shader_prepare(struct draw_geometry_shader *shader, struct draw_context *draw) { -#ifdef HAVE_LLVM boolean use_llvm = draw_get_option_use_llvm(); -#else - boolean use_llvm = FALSE; -#endif if (!use_llvm && shader && shader->machine->Tokens != shader->state.tokens) { tgsi_exec_machine_bind_shader(shader->machine, shader->state.tokens, @@ -690,16 +686,18 @@ void draw_geometry_shader_prepare(struct draw_geometry_shader *shader, boolean draw_gs_init( struct draw_context *draw ) { - draw->gs.tgsi.machine = tgsi_exec_machine_create(); - if (!draw->gs.tgsi.machine) - return FALSE; - - draw->gs.tgsi.machine->Primitives = align_malloc( - MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16); - if (!draw->gs.tgsi.machine->Primitives) - return FALSE; - memset(draw->gs.tgsi.machine->Primitives, 0, - MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector)); + if (!draw_get_option_use_llvm()) { + draw->gs.tgsi.machine = tgsi_exec_machine_create(); + if (!draw->gs.tgsi.machine) + return FALSE; + + draw->gs.tgsi.machine->Primitives = align_malloc( + MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16); + if (!draw->gs.tgsi.machine->Primitives) + return FALSE; + memset(draw->gs.tgsi.machine->Primitives, 0, + MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector)); + } return TRUE; } diff --git a/src/gallium/auxiliary/draw/draw_vs.c b/src/gallium/auxiliary/draw/draw_vs.c index 55cbeb9..8bb9a7f 100644 --- a/src/gallium/auxiliary/draw/draw_vs.c +++ b/src/gallium/auxiliary/draw/draw_vs.c @@ -149,9 +149,11 @@ draw_vs_init( struct draw_context *draw ) { draw->dump_vs = debug_get_option_gallium_dump_vs(); - draw->vs.tgsi.machine = tgsi_exec_machine_create(); - if (!draw->vs.tgsi.machine) - return FALSE; + if (!draw_get_option_use_llvm()) { + draw->vs.tgsi.machine = tgsi_exec_machine_create(); + if (!draw->vs.tgsi.machine) + return FALSE; + } draw->vs.emit_cache = translate_cache_create(); if (!draw->vs.emit_cache) @@ -173,7 +175,8 @@ draw_vs_destroy( struct draw_context *draw ) if (draw->vs.emit_cache) translate_cache_destroy(draw->vs.emit_cache); - tgsi_exec_machine_destroy(draw->vs.tgsi.machine); + if (draw_get_option_use_llvm()) + tgsi_exec_machine_destroy(draw->vs.tgsi.machine); } diff --git a/src/gallium/auxiliary/draw/draw_vs_exec.c b/src/gallium/auxiliary/draw/draw_vs_exec.c index 133b116..6a18d8c 100644 --- a/src/gallium/auxiliary/draw/draw_vs_exec.c +++ b/src/gallium/auxiliary/draw/draw_vs_exec.c @@ -63,6 +63,7 @@ vs_exec_prepare( struct draw_vertex_shader *shader, { struct exec_vertex_shader *evs = exec_vertex_shader(shader); + debug_assert(!draw_get_option_use_llvm()); /* Specify the vertex program to interpret/execute. * Avoid rebinding when possible. */ @@ -96,6 +97,7 @@ vs_exec_run_linear( struct draw_vertex_shader *shader, unsigned slot; boolean clamp_vertex_color = shader->draw->rasterizer->clamp_vertex_color; + debug_assert(!draw_get_option_use_llvm()); tgsi_exec_set_constant_buffers
Re: [Mesa-dev] [PATCH] draw/llvm: reduce memory usage
> > - tgsi_exec_machine_destroy(draw->vs.tgsi.machine); > + if (draw_get_option_use_llvm()) > + tgsi_exec_machine_destroy(draw->vs.tgsi.machine); That part should have used !draw_get_option_use_llvm() ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw/llvm: reduce memory usage
Am 23.04.2014 23:10, schrieb Zack Rusin: > Lets make draw_get_option_use_llvm function available unconditionally > and use it to avoid useless allocations when LLVM paths are active. > TGSI machine is never used when we're using LLVM. > > Signed-off-by: Zack Rusin > --- > src/gallium/auxiliary/draw/draw_context.c | 6 ++ > src/gallium/auxiliary/draw/draw_context.h | 2 -- > src/gallium/auxiliary/draw/draw_gs.c | 26 -- > src/gallium/auxiliary/draw/draw_vs.c | 11 +++ > src/gallium/auxiliary/draw/draw_vs_exec.c | 2 ++ > 5 files changed, 27 insertions(+), 20 deletions(-) > > diff --git a/src/gallium/auxiliary/draw/draw_context.c > b/src/gallium/auxiliary/draw/draw_context.c > index 0a67879..ddc305b 100644 > --- a/src/gallium/auxiliary/draw/draw_context.c > +++ b/src/gallium/auxiliary/draw/draw_context.c > @@ -68,6 +68,12 @@ draw_get_option_use_llvm(void) > } > return value; > } > +#else > +boolean > +draw_get_option_use_llvm(void) > +{ > + return FALSE; > +} > #endif > > > diff --git a/src/gallium/auxiliary/draw/draw_context.h > b/src/gallium/auxiliary/draw/draw_context.h > index f114f50..48549fe 100644 > --- a/src/gallium/auxiliary/draw/draw_context.h > +++ b/src/gallium/auxiliary/draw/draw_context.h > @@ -288,9 +288,7 @@ draw_get_shader_param(unsigned shader, enum > pipe_shader_cap param); > int > draw_get_shader_param_no_llvm(unsigned shader, enum pipe_shader_cap param); > > -#ifdef HAVE_LLVM > boolean > draw_get_option_use_llvm(void); > -#endif > > #endif /* DRAW_CONTEXT_H */ > diff --git a/src/gallium/auxiliary/draw/draw_gs.c > b/src/gallium/auxiliary/draw/draw_gs.c > index 7de5e03..5e503ff 100644 > --- a/src/gallium/auxiliary/draw/draw_gs.c > +++ b/src/gallium/auxiliary/draw/draw_gs.c > @@ -674,11 +674,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader > *shader, > void draw_geometry_shader_prepare(struct draw_geometry_shader *shader, >struct draw_context *draw) > { > -#ifdef HAVE_LLVM > boolean use_llvm = draw_get_option_use_llvm(); > -#else > - boolean use_llvm = FALSE; > -#endif > if (!use_llvm && shader && shader->machine->Tokens != > shader->state.tokens) { >tgsi_exec_machine_bind_shader(shader->machine, > shader->state.tokens, > @@ -690,16 +686,18 @@ void draw_geometry_shader_prepare(struct > draw_geometry_shader *shader, > boolean > draw_gs_init( struct draw_context *draw ) > { > - draw->gs.tgsi.machine = tgsi_exec_machine_create(); > - if (!draw->gs.tgsi.machine) > - return FALSE; > - > - draw->gs.tgsi.machine->Primitives = align_malloc( > - MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16); > - if (!draw->gs.tgsi.machine->Primitives) > - return FALSE; > - memset(draw->gs.tgsi.machine->Primitives, 0, > - MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector)); > + if (!draw_get_option_use_llvm()) { > + draw->gs.tgsi.machine = tgsi_exec_machine_create(); > + if (!draw->gs.tgsi.machine) > + return FALSE; > + > + draw->gs.tgsi.machine->Primitives = align_malloc( > + MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16); > + if (!draw->gs.tgsi.machine->Primitives) > + return FALSE; > + memset(draw->gs.tgsi.machine->Primitives, 0, > + MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector)); > + } > > return TRUE; > } > diff --git a/src/gallium/auxiliary/draw/draw_vs.c > b/src/gallium/auxiliary/draw/draw_vs.c > index 55cbeb9..8bb9a7f 100644 > --- a/src/gallium/auxiliary/draw/draw_vs.c > +++ b/src/gallium/auxiliary/draw/draw_vs.c > @@ -149,9 +149,11 @@ draw_vs_init( struct draw_context *draw ) > { > draw->dump_vs = debug_get_option_gallium_dump_vs(); > > - draw->vs.tgsi.machine = tgsi_exec_machine_create(); > - if (!draw->vs.tgsi.machine) > - return FALSE; > + if (!draw_get_option_use_llvm()) { > + draw->vs.tgsi.machine = tgsi_exec_machine_create(); > + if (!draw->vs.tgsi.machine) > + return FALSE; > + } > > draw->vs.emit_cache = translate_cache_create(); > if (!draw->vs.emit_cache) > @@ -173,7 +175,8 @@ draw_vs_destroy( struct draw_context *draw ) > if (draw->vs.emit_cache) >translate_cache_destroy(draw->vs.emit_cache); > > - tgsi_exec_machine_destroy(draw->vs.tgsi.machine); > + if (draw_get_option_use_llvm()) > + tgsi_exec_machine_destroy(draw->vs.tgsi.machine); > } > > > diff --git a/src/gallium/auxiliary/draw/draw_vs_exec.c > b/src/gallium/auxiliary/draw/draw_vs_exec.c > index 133b116..6a18d8c 100644 > --- a/src/gallium/auxiliary/draw/draw_vs_exec.c > +++ b/src/gallium/auxiliary/draw/draw_vs_exec.c > @@ -63,6 +63,7 @@ vs_exec_prepare( struct draw_vertex_shader *shader, > { > struct exec_vertex_shader *evs = exec_vertex_shader(shader); > > + debug_assert(!draw_get_option_use_llvm()); > /* Specify the vertex program to inte
[Mesa-dev] [Bug 64386] [865G] White screen using Stellarium
https://bugs.freedesktop.org/show_bug.cgi?id=64386 --- Comment #3 from Götz --- If this is a Mesa bug, what can we do with this hardware to help debug this? This error message doesn't appear anymore with the latest mesa version: "Mesa 9.1.1 implementation error: unexpected format GL_DEPTH_COMPONENT in _mesa_choose_tex_format()" So, the error is a bit different than before (with an empty ~/.stellarium): $ stellarium Using default graphics system specified at build time: raster User config directory does not exist: "/home/test/.stellarium" Creating directory "/home/test/.stellarium" --- [ This is Stellarium 0.12.4 - http://www.stellarium.org ] [ Copyright (C) 2000-2013 Fabien Chereau et al ] --- Writing log file to: "/home/test/.stellarium/log.txt" File search paths: 0 . "/home/test/.stellarium" 1 . "/usr/share/stellarium" Config file "/home/test/.stellarium/config.ini" does not exist. Copying the default file. Config file is: "/home/test/.stellarium/config.ini" Going to initialize the OpenGL 2 renderer OpenGL supported version: "1.3 Mesa 10.1.1" Qt GL paint engine is: "OpenGL" StelQGL2Renderer::init : Failed because Qt paint engine is not OpenGL2 If paint engine is OpenGL3 or higher, this code needs to be updated Failed to initialize the OpenGL 2 renderer, falling back to the OpenGL 1 renderer OpenGL supported version: "1.3 Mesa 10.1.1" Qt GL paint engine is: "OpenGL" GL vendor is "Intel Open Source Technology Center" GL renderer is "Mesa DRI Intel(R) 865G x86/MMX/SSE2" Cache directory is: "/home/test/.cache/stellarium/stellarium" Sky language is "en_US" Application language is "en_US" Loading Solar System data ... Loaded 75 / 75 planet orbits from "/usr/share/stellarium/data/ssystem.ini" Could not find the starsConfig.json file: will copy the default one. Creating directory "/home/test/.stellarium/stars/default" Creates file "/home/test/.stellarium/stars/default/starsConfig.json" Loading star data ... "Loading "/usr/share/stellarium/stars/default/stars_0_0v0_3.cat": 0_0v0_2; 4963" "Loading "/usr/share/stellarium/stars/default/stars_1_0v0_3.cat": 1_0v0_2; 21598" "Loading "/usr/share/stellarium/stars/default/stars_2_0v0_3.cat": 2_0v0_2; 150090" "Loading "/usr/share/stellarium/stars/default/stars_3_1v0_2.cat": 3_1v0_1; 423540" Finished loading star catalogue data, max_geodesic_level: 3 navigation/preset_sky_time is a double - treating as jday: 2.45151e+06 Loaded 10051 NGC records Loading NGC name data ... Loaded 412 / 412 NGC name records successfully Loading star names from "/usr/share/stellarium/skycultures/western/star_names.fab" Loaded 236 / 236 common star names Loading star names from "/usr/share/stellarium/stars/default/name.fab" Loaded 4359 / 4359 scientific star names Loading variable stars from "/usr/share/stellarium/stars/default/gcvs_hip_part.dat" Loaded 6886 / 6886 variable stars Loaded 88 / 88 constellation records successfully for culture "western" Loaded 85 / 85 constellation art records successfully for culture "western" Loaded 89 / 89 constellation names Loading constellation boundary data ... Loaded 782 constellation boundary segments Not using any viewport effect Creating GUI ... Loading shortcuts... shortcuts.json doesn't exist, copying default... Creating directory "/home/test/.stellarium/data/" Creating file "/home/test/.stellarium/data/shortcuts.json" Can't find plugin with id "RendererStatistics" Can't find plugin with id "RendererStatistics" libpng warning: iCCP: Not recognizing known sRGB profile that has been edited Loaded plugin "Oculars" . Ocular plugin - press Command-O to toggle eyepiece view mode. Press ALT-o for configuration. Creating directory "/home/test/.stellarium/modules/Oculars" Oculars::validateIniFile copied default_ocular.ini to "/home/test/.stellarium/modules/Oculars/ocular.ini" Loaded plugin "Satellites" . Creating directory "/home/test/.stellarium/modules/Satellites" Satellites::init satellites.json does not exist - copying default file to "/home/test/.stellarium/modules/Satellites/satellites.json" Satellites::init copied default satellites.json to "/home/test/.stellarium/modules/Satellites/satellites.json" Satellites: loading catalog file: "/home/test/.stellarium/modules/Satellites/satellites.json" Satellite has invalid orbit: "TIANGONG 1" "37820" Loaded plugin "SolarSystemEditor" . Trying to copy ssystem.ini to "/home/test/.stellarium/data/ssystem.ini" Loaded plugin "TimeZoneConfiguration" . QGLFramebufferObject: Framebuffer incomplete attachment. QGLFramebufferObject: Framebuffer incomplete attachment. (this is 10 times repeated) libpng warning: iCCP: Not recognizing known sRGB profile that has been edited libpng warning: iCCP: Not recognizing known sRGB profile that has been edited libpng warning: iCCP: Not recognizing known sRGB profi
Re: [Mesa-dev] [cfe-dev] 3 element vectors in opencl 1.1+
On Wed, 2014-04-23 at 19:49 +0200, Francisco Jerez wrote: > Jan Vesely writes: > > > On Tue, 2014-04-22 at 17:50 -0700, Matt Arsenault wrote: > > > > > >> >> I think this is what v96:128 is for > >> > according to [0], it specifies only alignment, not size. I could not > >> > find an __attribute__ that would change size either. > >> > > >> > It should be possible to have ADMGPUDataLayout: public DataLayout class > >> > that would intercept the call and fix the reported value, but I think it > >> > would only move the hack to different place. > >> > > >> > I have added pocl-devel list as suggested. > >> > > >> > regards, > >> > Jan > >> > > >> > [0]http://llvm.org/docs/LangRef.html#data-layout > >> > > >> > >> Only the size in memory matters, which is what the required alignment > >> specifies. DataLayout::getTypeAllocSize accounts for the alignment, but > >> getTypeStoreSize does not. I actually thought this was half of what > >> getTypeStoreSize was for, but it turns out it isn't. > > > > hm, I always thought that alignment only puts restrictions on starting > > address and using padding was just a tool to do the job. > > > > anyway, thanks for the hint, using getTypeAllocSize works nicely. > > since we are allocating space in the argument vector I think > > getAllocSize is the right function to use. > > > > I'll post a patch. > > > > I don't think that using getTypeAllocSize() instead of > getTypeStoreSize() to calculate clover::argument::size would be a > satisfactory solution. By doing that you'd redefine the API argument > size exposed to the host for *all* argument types to be the > device-dependent aligned size, which is definitely not what you want. > AFAIK 3-element vectors are an exception because they are the only types > that are defined to have a different API size from their actual usable > size, so they probably deserve special handling in invocation.cpp (as > you did in your first patch). As the API size is target-independent I > don't think that the fix belongs in Clang or LLVM, Clover is likely at > fault. The way I understood the ch 6.1.5 is that both API and OpenCL C 3 element vectors are required to be 4*sizeof(component). So a sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and target independent. That's why clang (or more precisely libclc) looked like a correct place. I understand that target device can have stricter alignment rules, but I don't see how it can have different type sizes (my reading of the specs is that these are binding for the target as well). I can resend the original patch with debug output replaced by a comment. regards, Jan > > Thanks. > > > regards, > > Jan > > > > > > -- > > Jan Vesely > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Jan Vesely signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [cfe-dev] 3 element vectors in opencl 1.1+
On Apr 24, 2014, at 0:30 , Jan Vesely wrote: > On Wed, 2014-04-23 at 19:49 +0200, Francisco Jerez wrote: >> Jan Vesely writes: >> >>> On Tue, 2014-04-22 at 17:50 -0700, Matt Arsenault wrote: >>> >>> >> I think this is what v96:128 is for > according to [0], it specifies only alignment, not size. I could not > find an __attribute__ that would change size either. > > It should be possible to have ADMGPUDataLayout: public DataLayout class > that would intercept the call and fix the reported value, but I think it > would only move the hack to different place. > > I have added pocl-devel list as suggested. > > regards, > Jan > > [0]http://llvm.org/docs/LangRef.html#data-layout > Only the size in memory matters, which is what the required alignment specifies. DataLayout::getTypeAllocSize accounts for the alignment, but getTypeStoreSize does not. I actually thought this was half of what getTypeStoreSize was for, but it turns out it isn't. >>> >>> hm, I always thought that alignment only puts restrictions on starting >>> address and using padding was just a tool to do the job. >>> >>> anyway, thanks for the hint, using getTypeAllocSize works nicely. >>> since we are allocating space in the argument vector I think >>> getAllocSize is the right function to use. >>> >>> I'll post a patch. >>> >> >> I don't think that using getTypeAllocSize() instead of >> getTypeStoreSize() to calculate clover::argument::size would be a >> satisfactory solution. By doing that you'd redefine the API argument >> size exposed to the host for *all* argument types to be the >> device-dependent aligned size, which is definitely not what you want. > >> AFAIK 3-element vectors are an exception because they are the only types >> that are defined to have a different API size from their actual usable >> size, so they probably deserve special handling in invocation.cpp (as >> you did in your first patch). As the API size is target-independent I >> don't think that the fix belongs in Clang or LLVM, Clover is likely at >> fault. > > The way I understood the ch 6.1.5 is that both API and OpenCL C 3 > element vectors are required to be 4*sizeof(component). So a > sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and > target independent. That's why clang (or more precisely libclc) looked > like a correct place. > > I understand that target device can have stricter alignment rules, but I > don't see how it can have different type sizes (my reading of the specs > is that these are binding for the target as well). OpenCL specifies the alignment of types; this is not up to the target. For the basic types, the alignment is their size. -erik -- Erik Schnetter http://www.perimeterinstitute.ca/personal/eschnetter/ My email is as private as my paper mail. I therefore support encrypting and signing email messages. Get my PGP key from http://pgp.mit.edu/. signature.asc Description: Message signed with OpenPGP using GPGMail ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/4] gallium, nv50, nvc0: add ARB_sample_shading
This is my latest iteration of the ARB_sample_shading implementation. The only known defect is that gl_SampleMask doesn't appear to work on nv50 nor nvc0. I'm fairly sure it's due to some bit of setup I'm missing, but it has thus far eluded me. I believe I've addressed the various earlier review comments either with replies or by adjusting my code. Let me know if I've missed anything. I reinstated the cso_cache thing, I think it's needed since blits/etc probably should use min_samples == 1. I've used it in all the places I could find saved the sample mask. Both nv50 and nvc0 need to stick the sample positions into a constbuf. Perhaps it would make sense to make that manipulation more generic, but if there's HW that will pull the sample position of the current sample, then maybe not. I've left all these things as system values because (a) they're system values in mesa, and (b) it seems like a lot of trouble to expose the raster object as an addressable item in TGSI for seemingly little benefit. A lowering pass in the driver can just have it do the right thing. Assuming that the generic bits get positive reviews, I'd like to push the nv50/nvc0 bits as well despite the gl_SampleMask failure. That'll get resolved in time, and I doubt there are too many users of that feature. (Admittedly, probably not too many users of ARB_sample_shading in general.) [ Another little short-coming for nv50/nvc0 is that there is no shader variant support, and the code is different for per-sample vs per-fragment due to the interpolation instructions. There are a few other things that need variant support to work, perhaps that'll be done at a later time. ] Ilia Mirkin (4): gallium: add basic support for ARB_sample_shading mesa/st: add support for ARB_sample_shading nv50: add support for PIPE_CAP_SAMPLE_SHADING nvc0: add support for PIPE_CAP_SAMPLE_SHADING src/gallium/auxiliary/cso_cache/cso_context.c | 19 src/gallium/auxiliary/cso_cache/cso_context.h | 4 +++ src/gallium/auxiliary/hud/hud_context.c| 3 ++ src/gallium/auxiliary/postprocess/pp_run.c | 3 ++ src/gallium/auxiliary/tgsi/tgsi_strings.c | 5 ++- src/gallium/auxiliary/util/u_blit.c| 3 ++ src/gallium/docs/source/context.rst| 3 ++ src/gallium/docs/source/screen.rst | 3 ++ src/gallium/docs/source/tgsi.rst | 20 src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir.h | 8 + .../drivers/nouveau/codegen/nv50_ir_driver.h | 3 +- .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 13 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 14 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 10 +- .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 12 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 + .../drivers/nouveau/codegen/nv50_ir_print.cpp | 2 ++ .../drivers/nouveau/codegen/nv50_ir_target.cpp | 6 ++-- .../nouveau/codegen/nv50_ir_target_nv50.cpp| 2 ++ .../nouveau/codegen/nv50_ir_target_nvc0.cpp| 2 ++ src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_context.h| 7 - src/gallium/drivers/nouveau/nv50/nv50_program.c| 6 +++- src/gallium/drivers/nouveau/nv50/nv50_program.h| 2 ++ src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + .../drivers/nouveau/nv50/nv50_shader_state.c | 13 src/gallium/drivers/nouveau/nv50/nv50_state.c | 12 .../drivers/nouveau/nv50/nv50_state_validate.c | 36 -- src/gallium/drivers/nouveau/nv50/nv50_surface.c| 11 ++- src/gallium/drivers/nouveau/nvc0/nvc0_context.h| 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 17 +++--- src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + .../drivers/nouveau/nvc0/nvc0_shader_state.c | 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 12 .../drivers/nouveau/nvc0/nvc0_state_validate.c | 34 +--- src/gallium/drivers/nouveau/nvc0/nvc0_surface.c| 11 ++- src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/include/pipe/p_context.h | 3 ++ src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 5 ++- src/mesa/state_tracker/st_atom.c
[Mesa-dev] [PATCH 1/4] gallium: add basic support for ARB_sample_shading
--- src/gallium/auxiliary/tgsi/tgsi_strings.c| 5 - src/gallium/docs/source/context.rst | 1 + src/gallium/docs/source/screen.rst | 3 +++ src/gallium/docs/source/tgsi.rst | 20 src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/include/pipe/p_context.h | 3 +++ src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 5 - 19 files changed, 48 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index b0ba3ef..2be726c 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -81,7 +81,10 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] = "PCOORD", "VIEWPORT_INDEX", "LAYER", - "CULLDIST" + "CULLDIST", + "SAMPLEID", + "SAMPLEPOS", + "SAMPLEMASK" }; const char *tgsi_texture_names[TGSI_TEXTURE_COUNT] = diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst index 1fc8a3e..fc8dd16 100644 --- a/src/gallium/docs/source/context.rst +++ b/src/gallium/docs/source/context.rst @@ -67,6 +67,7 @@ objects. They all follow simple, one-method binding calls, e.g. which are used as comparison values in stencil test. * ``set_blend_color`` * ``set_sample_mask`` +* ``set_min_samples`` sets the minimum number of samples that must be run. * ``set_clip_state`` * ``set_polygon_stipple`` * ``set_scissor_states`` sets the bounds for the scissor test, which culls diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 89cbdbf..f5acebb 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -197,6 +197,9 @@ The integer capabilities: in conjunction with a texture gather opcode. * ``PIPE_CAP_MAX_TEXTURE_GATHER_OFFSET``: The maximum offset that can be used in conjunction with a texture gather opcode. +* ``PIPE_CAP_SAMPLE_SHADING``: Whether there is support for per-sample + shading. The context->set_min_samples function will be expected to be + implemented. .. _pipe_capf: diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index d5325f4..b7d016a 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2621,6 +2621,26 @@ distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT which specifies the maximum number of registers which can be annotated with those semantics. +TGSI_SEMANTIC_SAMPLEID +"" + +For fragment shaders, this semantic label indicates that a system value +contains the current sample id (i.e. gl_SampleID). Only the X value is used. + +TGSI_SEMANTIC_SAMPLEPOS +""" + +For fragment shaders, this semantic label indicates that a system value +contains the current sample's position (i.e. gl_SamplePosition). Only the X +and Y values are used. + +TGSI_SEMANTIC_SAMPLEMASK + + +For fragment shaders, this semantic label indicates that an output contains +the sample mask used to disable further sample processing +(i.e. gl_SampleMask). Only the X value is used, up to 32x MS. + Declaration Interpolate ^^^ diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 08556a4..d62d4b6 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -208,6 +208,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT: case PIPE_CAP_FAKE_SW_MSAA: case PIPE_CAP_TEXTURE_QUERY_LOD: +case PIPE_CAP_SAMPLE_SHADING: return 0; /* Stream output. */ diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index b484d36..8ee4330 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -220,6 +220,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_TEXTURE_GATHER_SM5: case PIPE_CAP_FAKE_SW_MSAA: case PIPE_CAP_TEXTURE_QUERY_LOD: + case PIPE_CAP_SAMPLE_SHADING: return 0; c
[Mesa-dev] [PATCH 2/4] mesa/st: add support for ARB_sample_shading
--- src/gallium/auxiliary/cso_cache/cso_context.c | 19 +++ src/gallium/auxiliary/cso_cache/cso_context.h | 4 src/gallium/auxiliary/hud/hud_context.c | 3 +++ src/gallium/auxiliary/postprocess/pp_run.c| 3 +++ src/gallium/auxiliary/util/u_blit.c | 3 +++ src/mesa/state_tracker/st_atom.c | 1 + src/mesa/state_tracker/st_atom.h | 1 + src/mesa/state_tracker/st_atom_msaa.c | 23 +++ src/mesa/state_tracker/st_cb_clear.c | 3 +++ src/mesa/state_tracker/st_extensions.c| 3 ++- src/mesa/state_tracker/st_glsl_to_tgsi.cpp| 16 ++-- src/mesa/state_tracker/st_program.c | 21 + 12 files changed, 89 insertions(+), 11 deletions(-) diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c index dd0e3df..47c40a5 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.c +++ b/src/gallium/auxiliary/cso_cache/cso_context.c @@ -120,6 +120,7 @@ struct cso_context { struct pipe_viewport_state vp, vp_saved; struct pipe_blend_color blend_color; unsigned sample_mask, sample_mask_saved; + unsigned min_samples, min_samples_saved; struct pipe_stencil_ref stencil_ref, stencil_ref_saved; }; @@ -716,6 +717,24 @@ void cso_restore_sample_mask(struct cso_context *ctx) cso_set_sample_mask(ctx, ctx->sample_mask_saved); } +void cso_set_min_samples(struct cso_context *ctx, unsigned min_samples) +{ + if (ctx->min_samples != min_samples && ctx->pipe->set_min_samples) { + ctx->min_samples = min_samples; + ctx->pipe->set_min_samples(ctx->pipe, min_samples); + } +} + +void cso_save_min_samples(struct cso_context *ctx) +{ + ctx->min_samples_saved = ctx->min_samples; +} + +void cso_restore_min_samples(struct cso_context *ctx) +{ + cso_set_min_samples(ctx, ctx->min_samples_saved); +} + void cso_set_stencil_ref(struct cso_context *ctx, const struct pipe_stencil_ref *sr) { diff --git a/src/gallium/auxiliary/cso_cache/cso_context.h b/src/gallium/auxiliary/cso_cache/cso_context.h index 1aa9998..f0a08bb 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.h +++ b/src/gallium/auxiliary/cso_cache/cso_context.h @@ -164,6 +164,10 @@ void cso_set_sample_mask(struct cso_context *cso, unsigned stencil_mask); void cso_save_sample_mask(struct cso_context *ctx); void cso_restore_sample_mask(struct cso_context *ctx); +void cso_set_min_samples(struct cso_context *cso, unsigned min_samples); +void cso_save_min_samples(struct cso_context *ctx); +void cso_restore_min_samples(struct cso_context *ctx); + void cso_set_stencil_ref(struct cso_context *cso, const struct pipe_stencil_ref *sr); void cso_save_stencil_ref(struct cso_context *cso); diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index ccf020b..b6e0184 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -412,6 +412,7 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex) cso_save_framebuffer(cso); cso_save_sample_mask(cso); + cso_save_min_samples(cso); cso_save_blend(cso); cso_save_depth_stencil_alpha(cso); cso_save_fragment_shader(cso); @@ -450,6 +451,7 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex) cso_set_framebuffer(cso, &fb); cso_set_sample_mask(cso, ~0); + cso_set_min_samples(cso, 1); cso_set_blend(cso, &hud->alpha_blend); cso_set_depth_stencil_alpha(cso, &hud->dsa); cso_set_rasterizer(cso, &hud->rasterizer); @@ -538,6 +540,7 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex) /* restore states */ cso_restore_framebuffer(cso); cso_restore_sample_mask(cso); + cso_restore_min_samples(cso); cso_restore_blend(cso); cso_restore_depth_stencil_alpha(cso); cso_restore_fragment_shader(cso); diff --git a/src/gallium/auxiliary/postprocess/pp_run.c b/src/gallium/auxiliary/postprocess/pp_run.c index 7d9330c..06281c8 100644 --- a/src/gallium/auxiliary/postprocess/pp_run.c +++ b/src/gallium/auxiliary/postprocess/pp_run.c @@ -122,6 +122,7 @@ pp_run(struct pp_queue_t *ppq, struct pipe_resource *in, cso_save_geometry_shader(cso); cso_save_rasterizer(cso); cso_save_sample_mask(cso); + cso_save_min_samples(cso); cso_save_samplers(cso, PIPE_SHADER_FRAGMENT); cso_save_sampler_views(cso, PIPE_SHADER_FRAGMENT); cso_save_stencil_ref(cso); @@ -136,6 +137,7 @@ pp_run(struct pp_queue_t *ppq, struct pipe_resource *in, /* set default state */ cso_set_sample_mask(cso, ~0); + cso_set_min_samples(cso, 1); cso_set_stream_outputs(cso, 0, NULL, NULL); cso_set_geometry_shader_handle(cso, NULL); cso_set_render_condition(cso, NULL, FALSE, 0); @@ -187,6 +189,7 @@ pp_run(struct pp_queue_t *ppq, struct pipe_resource *in, cso_restore_geometry_shader(cso); cso_res
[Mesa-dev] [PATCH 4/4] nvc0: add support for PIPE_CAP_SAMPLE_SHADING
Signed-off-by: Ilia Mirkin --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 7 + .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 13 + .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 14 + .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 + .../drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + .../drivers/nouveau/codegen/nv50_ir_target.cpp | 6 ++-- .../nouveau/codegen/nv50_ir_target_nvc0.cpp| 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_context.h| 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 17 +++ src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +- .../drivers/nouveau/nvc0/nvc0_shader_state.c | 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 12 .../drivers/nouveau/nvc0/nvc0_state_validate.c | 34 +++--- src/gallium/drivers/nouveau/nvc0/nvc0_surface.c| 11 ++- 15 files changed, 131 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 96071be..56b0115 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -136,6 +136,7 @@ enum operation OP_DFDY, OP_RDSV, // read system value OP_WRSV, // write system value + OP_PIXLD, // get info about raster object or surfaces OP_QUADOP, OP_QUADON, OP_QUADPOP, @@ -214,6 +215,12 @@ enum operation #define NV50_IR_SUBOP_SUCLAMP_SD(r, d) (( 0 + (r)) | ((d == 2) ? 0x10 : 0)) #define NV50_IR_SUBOP_SUCLAMP_PL(r, d) (( 5 + (r)) | ((d == 2) ? 0x10 : 0)) #define NV50_IR_SUBOP_SUCLAMP_BL(r, d) ((10 + (r)) | ((d == 2) ? 0x10 : 0)) +#define NV50_IR_SUBOP_PIXLD_COUNT 0 +#define NV50_IR_SUBOP_PIXLD_COVMASK 1 +#define NV50_IR_SUBOP_PIXLD_COVERED 2 +#define NV50_IR_SUBOP_PIXLD_OFFSET 3 +#define NV50_IR_SUBOP_PIXLD_CENT_OFFSET 4 +#define NV50_IR_SUBOP_PIXLD_SAMPLEID5 #define NV50_IR_SUBOP_MADSP_SD 0x // Yes, we could represent those with DataType. // Or put the type into operation and have a couple 1000 values in that enum. diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index a4b50ee..c258b6b 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -113,6 +113,8 @@ private: void emitQUADOP(const Instruction *, uint8_t qOp, uint8_t laneMask); + void emitPIXLD(const Instruction *); + void emitFlow(const Instruction *); inline void defId(const ValueDef&, const int pos); @@ -1130,6 +1132,14 @@ CodeEmitterGK110::emitQUADOP(const Instruction *i, uint8_t qOp, uint8_t laneMask } void +CodeEmitterGK110::emitPIXLD(const Instruction *i) +{ + emitForm_L(i, 0x7f4, 2, Modifier(0)); + code[1] |= i->subOp << 2; + code[1] |= 0x0007; +} + +void CodeEmitterGK110::emitFlow(const Instruction *i) { const FlowInstruction *f = i->asFlow(); @@ -1684,6 +1694,9 @@ CodeEmitterGK110::emitInstruction(Instruction *insn) case OP_TEXBAR: emitTEXBAR(insn); break; + case OP_PIXLD: + emitPIXLD(insn); + break; case OP_BRA: case OP_CALL: case OP_PRERET: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index d486c8d..cef92cf 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -135,6 +135,8 @@ private: void emitVSHL(const Instruction *); void emitVectorSubOp(const Instruction *); + void emitPIXLD(const Instruction *); + inline void defId(const ValueDef&, const int pos); inline void defId(const Instruction *, int d, const int pos); inline void srcId(const ValueRef&, const int pos); @@ -2141,6 +2143,15 @@ CodeEmitterNVC0::emitVSHL(const Instruction *i) code[1] |= 1 << 16; } +void +CodeEmitterNVC0::emitPIXLD(const Instruction *i) +{ + assert(i->encSize == 8); + emitForm_A(i, HEX64(1000, 0006)); + code[0] |= i->subOp << 5; + code[1] |= 0x00e0; +} + bool CodeEmitterNVC0::emitInstruction(Instruction *insn) { @@ -2390,6 +2401,9 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn) case OP_VSHL: emitVSHL(insn); break; + case OP_PIXLD: + emitPIXLD(insn); + break; case OP_PHI: case OP_UNION: case OP_CONSTRAINT: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index 44b5ecd..ebdeee4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -1426,6 +1426,27 @@ NVC0LoweringPass::handleRDSV(Instruction *i) bld.mkL
[Mesa-dev] [PATCH 3/4] nv50: add support for PIPE_CAP_SAMPLE_SHADING
--- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + .../drivers/nouveau/codegen/nv50_ir_driver.h | 3 +- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 10 +- .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 12 .../drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + .../nouveau/codegen/nv50_ir_target_nv50.cpp| 2 ++ src/gallium/drivers/nouveau/nv50/nv50_context.h| 7 - src/gallium/drivers/nouveau/nv50/nv50_program.c| 6 +++- src/gallium/drivers/nouveau/nv50/nv50_program.h| 2 ++ src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +- .../drivers/nouveau/nv50/nv50_shader_state.c | 13 src/gallium/drivers/nouveau/nv50/nv50_state.c | 12 .../drivers/nouveau/nv50/nv50_state_validate.c | 36 -- src/gallium/drivers/nouveau/nv50/nv50_surface.c| 11 ++- 14 files changed, 110 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 286ac83..96071be 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -354,6 +354,7 @@ enum SVSemantic SV_POINT_COORD, SV_CLIP_DISTANCE, SV_SAMPLE_INDEX, + SV_SAMPLE_POS, SV_TESS_FACTOR, SV_TESS_COORD, SV_TID, diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h index f2f4ead..2fe5591 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h @@ -74,7 +74,6 @@ struct nv50_ir_varying #define NV50_SEMANTIC_INVOCATIONID (TGSI_SEMANTIC_COUNT + 6) #define NV50_SEMANTIC_TESSFACTOR(TGSI_SEMANTIC_COUNT + 7) #define NV50_SEMANTIC_TESSCOORD (TGSI_SEMANTIC_COUNT + 8) -#define NV50_SEMANTIC_SAMPLEMASK(TGSI_SEMANTIC_COUNT + 9) #define NV50_SEMANTIC_COUNT (TGSI_SEMANTIC_COUNT + 10) #define NV50_TESS_PART_FRACT_ODD 0 @@ -181,12 +180,14 @@ struct nv50_ir_prog_info uint8_t edgeFlagOut; uint8_t fragDepth; /* output index of FragDepth */ uint8_t sampleMask;/* output index of SampleMask */ + boolean sampleInterp; /* perform sample interp on all fp inputs */ uint8_t backFaceColor[2]; /* input/output indices of back face colour */ uint8_t globalAccess; /* 1 for read, 2 for wr, 3 for rw */ boolean nv50styleSurfaces; /* generate gX[] access for raw buffers */ uint8_t resInfoCBSlot; /* cX[] used for tex handles, surface info */ uint16_t texBindBase; /* base address for tex handles (nve4) */ uint16_t suInfoBase; /* base address for surface info (nve4) */ + uint16_t sampleInfoBase; /* base address for sample positions */ uint8_t msInfoCBSlot; /* cX[] used for multisample info */ uint16_t msInfoBase; /* base address for multisample info */ } io; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 90820ea..1fc54cb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -346,6 +346,8 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval) case TGSI_SEMANTIC_BLOCK_ID: return nv50_ir::SV_CTAID; case TGSI_SEMANTIC_BLOCK_SIZE: return nv50_ir::SV_NTID; case TGSI_SEMANTIC_THREAD_ID: return nv50_ir::SV_TID; + case TGSI_SEMANTIC_SAMPLEID: return nv50_ir::SV_SAMPLE_INDEX; + case TGSI_SEMANTIC_SAMPLEPOS: return nv50_ir::SV_SAMPLE_POS; default: assert(0); return nv50_ir::SV_CLOCK; @@ -925,7 +927,7 @@ bool Source::scanDeclaration(const struct tgsi_full_declaration *decl) default: break; } - if (decl->Interp.Centroid) + if (decl->Interp.Centroid || info->io.sampleInterp) info->in[i].centroid = 1; } } @@ -956,6 +958,9 @@ bool Source::scanDeclaration(const struct tgsi_full_declaration *decl) decl->Declaration.UsageMask << (si * 4); info->io.genUserClip = -1; break; + case TGSI_SEMANTIC_SAMPLEMASK: +info->io.sampleMask = i; +break; default: break; } @@ -972,6 +977,9 @@ bool Source::scanDeclaration(const struct tgsi_full_declaration *decl) case TGSI_SEMANTIC_VERTEXID: info->io.vertexId = first; break; + case TGSI_SEMANTIC_SAMPLEMASK: + info->io.sampleMask = first; + break; default: break; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp index 29f85cf..69e88e6 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv
[Mesa-dev] [PATCH 5/7] nvc0/ir: add support for SAMPLEMASK sysval
Signed-off-by: Ilia Mirkin --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 4 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 1 + 5 files changed, 8 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 56b0115..c57729e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -362,6 +362,7 @@ enum SVSemantic SV_CLIP_DISTANCE, SV_SAMPLE_INDEX, SV_SAMPLE_POS, + SV_SAMPLE_MASK, SV_TESS_FACTOR, SV_TESS_COORD, SV_TID, diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 0ec0b9a..2c302a7 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -348,6 +348,7 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval) case TGSI_SEMANTIC_THREAD_ID: return nv50_ir::SV_TID; case TGSI_SEMANTIC_SAMPLEID: return nv50_ir::SV_SAMPLE_INDEX; case TGSI_SEMANTIC_SAMPLEPOS: return nv50_ir::SV_SAMPLE_POS; + case TGSI_SEMANTIC_SAMPLEMASK: return nv50_ir::SV_SAMPLE_MASK; default: assert(0); return nv50_ir::SV_CLOCK; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index ebdeee4..c7e9063 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -1447,6 +1447,10 @@ NVC0LoweringPass::handleRDSV(Instruction *i) off); break; } + case SV_SAMPLE_MASK: + ld = bld.mkOp1(OP_PIXLD, TYPE_U32, i->getDef(0), bld.mkImm(0)); + ld->subOp = NV50_IR_SUBOP_PIXLD_COVMASK; + break; default: if (prog->getType() == Program::TYPE_TESSELLATION_EVAL) vtx = bld.mkOp1v(OP_PFETCH, TYPE_U32, bld.getSSA(), bld.mkImm(0)); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp index e74b25f..42013e5 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp @@ -255,6 +255,7 @@ static const char *SemanticStr[SV_LAST + 1] = "CLIP_DISTANCE", "SAMPLE_INDEX", "SAMPLE_POS", + "SAMPLE_MASK", "TESS_FACTOR", "TESS_COORD", "TID", diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp index 95ed849..c747f3e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp @@ -284,6 +284,7 @@ TargetNVC0::getSVAddress(DataFile shaderFile, const Symbol *sym) const case SV_GRIDID: return kepler ? 0x18 : ~0; case SV_SAMPLE_INDEX: return 0; case SV_SAMPLE_POS: return 0; + case SV_SAMPLE_MASK:return 0; default: return 0x; } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] gallium: add GS_INVOCATIONS property
Signed-off-by: Ilia Mirkin --- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 9 + src/gallium/include/pipe/p_shader_tokens.h | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index 38cce58..2bf93ee 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -1468,6 +1468,14 @@ static void emit_decls( struct ureg_program *ureg ) ureg->property_gs_max_vertices); } + if (ureg->property_gs_invocations != ~0) { + assert(ureg->processor == TGSI_PROCESSOR_GEOMETRY); + + emit_property(ureg, +TGSI_PROPERTY_GS_INVOCATIONS, +ureg->property_gs_invocations); + } + if (ureg->property_fs_coord_origin) { assert(ureg->processor == TGSI_PROCESSOR_FRAGMENT); @@ -1757,6 +1765,7 @@ struct ureg_program *ureg_create( unsigned processor ) ureg->property_gs_input_prim = ~0; ureg->property_gs_output_prim = ~0; ureg->property_gs_max_vertices = ~0; + ureg->property_gs_invocations = ~0; ureg->free_temps = util_bitmask_create(); if (ureg->free_temps == NULL) diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index 1903d53..b537166 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -236,7 +236,8 @@ union tgsi_immediate_data #define TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS 5 #define TGSI_PROPERTY_FS_DEPTH_LAYOUT6 #define TGSI_PROPERTY_VS_PROHIBIT_UCPS 7 -#define TGSI_PROPERTY_COUNT 8 +#define TGSI_PROPERTY_GS_INVOCATIONS 8 +#define TGSI_PROPERTY_COUNT 9 struct tgsi_property { unsigned Type : 4; /**< TGSI_TOKEN_TYPE_PROPERTY */ -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] mesa/st: translate gl_SampleMaskIn to SAMPLEMASK semantic
Signed-off-by: Ilia Mirkin --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 0d69c70..ee8c54a 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -4166,6 +4166,7 @@ static unsigned mesa_sysval_to_semantic[SYSTEM_VALUE_MAX] = { TGSI_SEMANTIC_INSTANCEID, TGSI_SEMANTIC_SAMPLEID, TGSI_SEMANTIC_SAMPLEPOS, + TGSI_SEMANTIC_SAMPLEMASK, }; /** -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] gallium: add INVOCATIONID semantic
Signed-off-by: Ilia Mirkin --- src/gallium/auxiliary/tgsi/tgsi_strings.c | 3 ++- src/gallium/docs/source/tgsi.rst | 6 ++ src/gallium/include/pipe/p_shader_tokens.h | 3 ++- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index 2be726c..5b6e47f 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -84,7 +84,8 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] = "CULLDIST", "SAMPLEID", "SAMPLEPOS", - "SAMPLEMASK" + "SAMPLEMASK", + "INVOCATIONID", }; const char *tgsi_texture_names[TGSI_TEXTURE_COUNT] = diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index b7d016a..170f83e 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2641,6 +2641,12 @@ For fragment shaders, this semantic label indicates that an output contains the sample mask used to disable further sample processing (i.e. gl_SampleMask). Only the X value is used, up to 32x MS. +TGSI_SEMANTIC_INVOCATIONID +"" + +For geometry shaders, this semantic label indicates that a system value +contains the current invocation id (i.e. gl_InvocationID). Only the X value is +used. Declaration Interpolate ^^^ diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index 74e0475..1903d53 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -170,7 +170,8 @@ struct tgsi_declaration_interp #define TGSI_SEMANTIC_SAMPLEID 24 #define TGSI_SEMANTIC_SAMPLEPOS 25 #define TGSI_SEMANTIC_SAMPLEMASK 26 -#define TGSI_SEMANTIC_COUNT 27 /**< number of semantic values */ +#define TGSI_SEMANTIC_INVOCATIONID 27 +#define TGSI_SEMANTIC_COUNT 28 /**< number of semantic values */ struct tgsi_declaration_semantic { -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/7] gallium: catch up with mesa's system values
This adds support for gl_SampleMaskIn and gl_InvocationID system values to gallium, mesa/st, and nvc0. The relevant piglit tests pass, except 2 gl_InvocationID-related ones that fail due to the linking being delayed by mesa/st (I think). I wrote a lame one for gl_SampleMaskIn that just makes sure that it's == 1 in a non-ms scenario, as nothing better seemed to be available. None of this functionality is actually available yet since it's all behind ARB_gpu_shader5. However you can test it out by adding MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5 Note that this patchset applies on top of my ARB_sample_shading patch because that one adds the earlier system values. Ilia Mirkin (7): gallium: add INVOCATIONID semantic gallium: add GS_INVOCATIONS property mesa/st: translate gl_SampleMaskIn to SAMPLEMASK semantic mesa/st: translate gl_InvocationID to INVOCATIONID semantic nvc0/ir: add support for SAMPLEMASK sysval nvc0/ir: add support for INVOCATIONID system value nvc0/ir: set instance count based on the GS_INSTANCES property src/gallium/auxiliary/tgsi/tgsi_strings.c | 3 ++- src/gallium/auxiliary/tgsi/tgsi_ureg.c| 9 + src/gallium/docs/source/tgsi.rst | 6 ++ src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 6 +++--- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 4 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 1 - src/gallium/include/pipe/p_shader_tokens.h| 6 -- src/mesa/state_tracker/st_glsl_to_tgsi.cpp| 2 ++ 11 files changed, 33 insertions(+), 7 deletions(-) -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] nvc0/ir: add support for INVOCATIONID system value
Signed-off-by: Ilia Mirkin --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 2c302a7..e076e72 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -349,6 +349,7 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval) case TGSI_SEMANTIC_SAMPLEID: return nv50_ir::SV_SAMPLE_INDEX; case TGSI_SEMANTIC_SAMPLEPOS: return nv50_ir::SV_SAMPLE_POS; case TGSI_SEMANTIC_SAMPLEMASK: return nv50_ir::SV_SAMPLE_MASK; + case TGSI_SEMANTIC_INVOCATIONID: return nv50_ir::SV_INVOCATION_ID; default: assert(0); return nv50_ir::SV_CLOCK; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index 1df84f3..1d59fc4 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -51,7 +51,6 @@ nvc0_shader_input_address(unsigned sn, unsigned si, unsigned ubase) case TGSI_SEMANTIC_VERTEXID: return 0x2fc; case TGSI_SEMANTIC_TEXCOORD: return 0x300 + si * 0x10; case TGSI_SEMANTIC_FACE: return 0x3fc; - case NV50_SEMANTIC_INVOCATIONID: return ~0; default: assert(!"invalid TGSI input semantic"); return ~0; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] mesa/st: translate gl_InvocationID to INVOCATIONID semantic
Signed-off-by: Ilia Mirkin --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index ee8c54a..e87e761 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -4167,6 +4167,7 @@ static unsigned mesa_sysval_to_semantic[SYSTEM_VALUE_MAX] = { TGSI_SEMANTIC_SAMPLEID, TGSI_SEMANTIC_SAMPLEPOS, TGSI_SEMANTIC_SAMPLEMASK, + TGSI_SEMANTIC_INVOCATIONID, }; /** -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] nvc0/ir: set instance count based on the GS_INVOCATIONS property
Signed-off-by: Ilia Mirkin --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index e076e72..ef00a2f 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -829,11 +829,9 @@ void Source::scanProperty(const struct tgsi_full_property *prop) case TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES: info->prop.gp.maxVertices = prop->u[0].Data; break; -#if 0 - case TGSI_PROPERTY_GS_INSTANCE_COUNT: + case TGSI_PROPERTY_GS_INVOCATIONS: info->prop.gp.instanceCount = prop->u[0].Data; break; -#endif case TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS: info->prop.fp.separateFragData = TRUE; break; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/19] i965/fs: Lower LOAD_PAYLOAD and clean up.
On 04/18/2014 11:56 AM, Matt Turner wrote: > Clean up with with register_coalesce()/dead_code_eliminate(). > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 37 > > src/mesa/drivers/dri/i965/brw_fs.h | 1 + > 2 files changed, 38 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index e963ee8..602fc4a 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -2635,6 +2635,38 @@ fs_visitor::lower_uniform_pull_constant_loads() > } > } > > +bool > +fs_visitor::lower_load_payload() > +{ > + bool progress = false; > + > + foreach_list_safe(node, &instructions) { > + fs_inst *inst = (fs_inst *)node; > + > + if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) { > + fs_reg dst = inst->dst; > + It would be great to have a comment here such as: /* src[0] represents the (optional) message header. */ It might also be worth adding a comment above the opcode definition in the previous patch explaining that src[0] is reserved for an optional message header, and could be BAD_FILE, while the rest of the parameters follow in src[1..n]. Notably, this is the first opcode where you can have a BAD_FILE early, and real parameters later (AFAIK). > + if (inst->src[0].file != BAD_FILE) { > +inst->insert_before(MOV(dst, inst->src[0])); > + } > + dst.reg_offset++; > + > + for (int i = 1; i < inst->sources; i++) { > +inst->insert_before(MOV(dst, inst->src[i])); > +dst.reg_offset++; > + } > + > + inst->remove(); > + progress = true; > + } > + } > + > + if (progress) > + invalidate_live_intervals(); > + > + return progress; > +} > + > void > fs_visitor::dump_instructions() > { > @@ -3085,6 +3117,11 @@ fs_visitor::run() >progress = compute_to_mrf() || progress; >} while (progress); > > + if (lower_load_payload()) { > + register_coalesce(); > + dead_code_eliminate(); > + } > + >lower_uniform_pull_constant_loads(); > >assign_curb_setup(); > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index 4f1bdc6..c1445b8 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -392,6 +392,7 @@ public: > void fail(const char *msg, ...); > void no16(const char *msg, ...); > void lower_uniform_pull_constant_loads(); > + bool lower_load_payload(); > > void push_force_uncompressed(); > void pop_force_uncompressed(); > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/19] i965/fs: Use LOAD_PAYLOAD in emit_texture_gen7().
On 04/18/2014 11:56 AM, Matt Turner wrote: > --- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 135 > +++ > 1 file changed, 73 insertions(+), 62 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index 2aa3acd..91bbe0a 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > @@ -1257,8 +1257,11 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg > dst, fs_reg coordinate, > int reg_width = dispatch_width / 8; > bool header_present = false; > > - fs_reg payload = fs_reg(this, glsl_type::float_type); > - fs_reg next = payload; > + fs_reg *sources = ralloc_array(mem_ctx, fs_reg, MAX_SAMPLER_MESSAGE_SIZE); > + for (int i = 0; i < MAX_SAMPLER_MESSAGE_SIZE; i++) { > + sources[i] = fs_reg(this, glsl_type::float_type); > + } > + int length = 0; > > if (ir->op == ir_tg4 || (ir->offset && ir->op != ir_txf) || sampler >= > 16) { >/* For general texture offsets (no txf workaround), we need a header to > @@ -1272,12 +1275,13 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg > dst, fs_reg coordinate, > * need to offset the Sampler State Pointer in the header. > */ >header_present = true; > - next.reg_offset++; > + sources[length] = reg_undef; > + length++; > } So...if you don't take the header_present = true path...then it looks like the next thing (say, shadow_c) will land in sources[0], rather than leaving sources[0] as BAD_FILE and putting the first part of the payload in sources[1]. Is sources[0] special and reserved for the header or isn't it? > > if (ir->shadow_comparitor) { > - emit(MOV(next, shadow_c)); > - next.reg_offset++; > + emit(MOV(sources[length], shadow_c)); I'm confused by the fact that these MOVs are still here. I would have expected: sources[length] = sample_c; When we visit expression trees, we generate results into various registers. The point of these MOVs is to put them into a large-VGRF we can SEND from, at the right ref_offset. Now, you have a new set of MAX_SAMPLER_MESSAGE_SIZE registers, and copy from the original registers to these new temporaries, then LOAD_PAYLOAD does a second copy from those into the new ones. It seems like we could just use the original registers and do a single copy. I'm probably missing something here, but I can't think of what. > + length++; > } > > bool has_nonconstant_offset = ir->offset && !ir->offset->as_constant(); > @@ -1289,12 +1293,12 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg > dst, fs_reg coordinate, > case ir_lod: >break; > case ir_txb: > - emit(MOV(next, lod)); > - next.reg_offset++; > + emit(MOV(sources[length], lod)); > + length++; >break; > case ir_txl: > - emit(MOV(next, lod)); > - next.reg_offset++; > + emit(MOV(sources[length], lod)); > + length++; >break; > case ir_txd: { >no16("Gen7 does not support sample_d/sample_d_c in SIMD16 mode."); > @@ -1303,21 +1307,21 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg > dst, fs_reg coordinate, > * [hdr], [ref], x, dPdx.x, dPdy.x, y, dPdx.y, dPdy.y, z, dPdx.z, > dPdy.z > */ >for (int i = 0; i < ir->coordinate->type->vector_elements; i++) { > - emit(MOV(next, coordinate)); > + emit(MOV(sources[length], coordinate)); >coordinate.reg_offset++; > - next.reg_offset++; > + length++; > > /* For cube map array, the coordinate is (u,v,r,ai) but there are >* only derivatives for (u, v, r). >*/ > if (i < ir->lod_info.grad.dPdx->type->vector_elements) { > -emit(MOV(next, lod)); > +emit(MOV(sources[length], lod)); > lod.reg_offset++; > -next.reg_offset++; > +length++; > > -emit(MOV(next, lod2)); > +emit(MOV(sources[length], lod2)); > lod2.reg_offset++; > -next.reg_offset++; > +length++; > } >} > > @@ -1325,45 +1329,45 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg > dst, fs_reg coordinate, >break; > } > case ir_txs: > - emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), lod)); > - next.reg_offset++; > + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), lod)); > + length++; >break; > case ir_query_levels: > - emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), fs_reg(0u))); > - next.reg_offset++; > + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), fs_reg(0u))); > + length++; >break; > case ir_txf: >/* Unfortunately, the parameters for LD are intermixed: u, lod, v, r. > */ > - emit(MOV(retype(next, BRW_REGISTER_TYPE_D), coordinate)); > + emit(MOV(retype(sources[lengt
Re: [Mesa-dev] [PATCH 13/19] i965/fs: Only consider real sources when comparing instructions.
On 04/18/2014 11:56 AM, Matt Turner wrote: > --- > src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 19 +++ > 1 file changed, 15 insertions(+), 4 deletions(-) I originally thought this needed to go earlier in the patch series, since by this point you're emitting opcodes with more than 3 sources. However, CSE ignores SHADER_OPCODE_LOAD_PAYLOAD, so this code will never run for opcodes that could break. So, it's probably fine. > diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > index 94f657d..e40567f 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > @@ -94,10 +94,20 @@ is_expression_commutative(enum opcode op) > } > > static bool > -operands_match(enum opcode op, fs_reg *xs, fs_reg *ys) > +operands_match(fs_inst *a, fs_inst *b) > { > - if (!is_expression_commutative(op)) { > - return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && > xs[2].equals(ys[2]); > + fs_reg *xs = a->src; > + fs_reg *ys = b->src; > + > + if (!is_expression_commutative(a->opcode)) { > + bool match = true; > + for (int i = 0; i < a->sources; i++) { > + if (!xs[i].equals(ys[i])) { > +match = false; > +break; > + } > + } > + return match; > } else { >return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) || > (xs[1].equals(ys[0]) && xs[0].equals(ys[1])); It strikes me as a bit asymmetric to have the first block handle an arbitrary number of sources, and the second only check 0 and 1. It makes sense, since our commutative opcodes are all binops, but... How about adding /* All commutative opcodes are binary operations. */ assert(a->sources == 2 && b->sources == 2); here in the commutative case? > @@ -113,7 +123,8 @@ instructions_match(fs_inst *a, fs_inst *b) >a->predicate_inverse == b->predicate_inverse && >a->conditional_mod == b->conditional_mod && >a->dst.type == b->dst.type && > - operands_match(a->opcode, a->src, b->src); > + a->sources == b->sources && > + operands_match(a, b); > } > > bool > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/19] i965/fs: Only consider real sources when comparing instructions.
On Wed, Apr 23, 2014 at 11:25 PM, Kenneth Graunke wrote: > On 04/18/2014 11:56 AM, Matt Turner wrote: >> --- >> src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 19 +++ >> 1 file changed, 15 insertions(+), 4 deletions(-) > > I originally thought this needed to go earlier in the patch series, > since by this point you're emitting opcodes with more than 3 sources. > However, CSE ignores SHADER_OPCODE_LOAD_PAYLOAD, so this code will never > run for opcodes that could break. So, it's probably fine. > >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >> index 94f657d..e40567f 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >> @@ -94,10 +94,20 @@ is_expression_commutative(enum opcode op) >> } >> >> static bool >> -operands_match(enum opcode op, fs_reg *xs, fs_reg *ys) >> +operands_match(fs_inst *a, fs_inst *b) >> { >> - if (!is_expression_commutative(op)) { >> - return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && >> xs[2].equals(ys[2]); >> + fs_reg *xs = a->src; >> + fs_reg *ys = b->src; >> + >> + if (!is_expression_commutative(a->opcode)) { >> + bool match = true; >> + for (int i = 0; i < a->sources; i++) { >> + if (!xs[i].equals(ys[i])) { >> +match = false; >> +break; >> + } >> + } >> + return match; >> } else { >>return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) || >> (xs[1].equals(ys[0]) && xs[0].equals(ys[1])); > > It strikes me as a bit asymmetric to have the first block handle an > arbitrary number of sources, and the second only check 0 and 1. It > makes sense, since our commutative opcodes are all binops, but... > > How about adding > > /* All commutative opcodes are binary operations. */ > assert(a->sources == 2 && b->sources == 2); > > here in the commutative case? What is an example of a commutative non-binary operator? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/19] i965/fs: Combine fs_inst constructors using default parameters.
On 04/18/2014 11:56 AM, Matt Turner wrote: > Wouldn't it be nice if case labels could be non-constant expressions. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 77 > +--- > src/mesa/drivers/dri/i965/brw_fs.h | 10 ++--- > 2 files changed, 31 insertions(+), 56 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index b0d6e4e..bb2d290 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -74,69 +74,46 @@ fs_inst::fs_inst() > this->opcode = BRW_OPCODE_NOP; > } > > -fs_inst::fs_inst(enum opcode opcode) > +fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst, > + const fs_reg &src0, const fs_reg &src1, const fs_reg &src2) > { > - init(); > - this->opcode = opcode; > -} > + if (&dst == ®_undef) > + assert(&src0 == ®_undef); > + if (&src0 == ®_undef) > + assert(&src1 == ®_undef); > + if (&src1 == ®_undef) > + assert(&src2 == ®_undef); > > -fs_inst::fs_inst(enum opcode opcode, fs_reg dst) > -{ > init(); > - this->opcode = opcode; > - this->dst = dst; > > - if (dst.file == GRF) > - assert(dst.reg_offset >= 0); > -} > + if (&src2 != ®_undef) { > + goto src2; > + } else if (&src1 != ®_undef) { > + goto src1; > + } else if (&src0 != ®_undef) { > + goto src0; > + } else if (&dst != ®_undef) { > + goto dst; > + } > > -fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0) > -{ > - init(); > - this->opcode = opcode; > - this->dst = dst; > +src2: > + this->src[2] = src2; > + if (src[2].file == GRF) > + assert(src[2].reg_offset >= 0); > +src1: > + this->src[1] = src1; > + if (src[1].file == GRF) > + assert(src[1].reg_offset >= 0); > +src0: > this->src[0] = src0; > - > - if (dst.file == GRF) > - assert(dst.reg_offset >= 0); > if (src[0].file == GRF) >assert(src[0].reg_offset >= 0); > -} > - > -fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1) > -{ > - init(); > - this->opcode = opcode; > +dst: > this->dst = dst; > - this->src[0] = src0; > - this->src[1] = src1; > - > if (dst.file == GRF) >assert(dst.reg_offset >= 0); > - if (src[0].file == GRF) > - assert(src[0].reg_offset >= 0); > - if (src[1].file == GRF) > - assert(src[1].reg_offset >= 0); > -} > > -fs_inst::fs_inst(enum opcode opcode, fs_reg dst, > - fs_reg src0, fs_reg src1, fs_reg src2) > -{ > - init(); > this->opcode = opcode; > - this->dst = dst; > - this->src[0] = src0; > - this->src[1] = src1; > - this->src[2] = src2; > - > - if (dst.file == GRF) > - assert(dst.reg_offset >= 0); > - if (src[0].file == GRF) > - assert(src[0].reg_offset >= 0); > - if (src[1].file == GRF) > - assert(src[1].reg_offset >= 0); > - if (src[2].file == GRF) > - assert(src[2].reg_offset >= 0); > } > > fs_inst::fs_inst(const fs_inst &that) > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index 8af4520..8e2af4f 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -192,12 +192,10 @@ public: > void init(); > > fs_inst(); > - fs_inst(enum opcode opcode); > - fs_inst(enum opcode opcode, fs_reg dst); > - fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0); > - fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1); > - fs_inst(enum opcode opcode, fs_reg dst, > - fs_reg src0, fs_reg src1,fs_reg src2); > + fs_inst(enum opcode opcode, const fs_reg &dst = reg_undef, > + const fs_reg &src0 = reg_undef, > + const fs_reg &src1 = reg_undef, > + const fs_reg &src2 = reg_undef); > fs_inst(const fs_inst &that); > > bool equals(fs_inst *inst) const; > I really don't like this code...lots if if, goto, and reliance on pointer comparisons with global variables. What about instead making init() look like your array constructor, and building everything on top of that? i.e. void fs_inst::init(enum opcode opcode, const fs_reg &dst, int sources = 1, fs_reg *src = NULL) { memset(this, 0, sizeof(*this)); this->sources = sources; if (src) { this->src = src; } else { this->src = ralloc_array(this, fs_reg, sources); for (int i = 0; i < sources; i++) this->src[i] = reg_undef; } this->conditional_mod = BRW_CONDITIONAL_NONE; this->dst = dst; /* This will be the case for almost all instructions. */ this->regs_written = 1; this->writes_accumulator = false; } fs_inst::fs_inst() { init(BRW_OPCODE_NOP, reg_undef); } fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst) { init(opcode, dst); } fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0) { fs_reg *src = ralloc_array(this, fs_reg, 1); src[0] = src0; init(opcode, d
Re: [Mesa-dev] [PATCH 13/19] i965/fs: Only consider real sources when comparing instructions.
On 04/23/2014 11:39 PM, Matt Turner wrote: > On Wed, Apr 23, 2014 at 11:25 PM, Kenneth Graunke > wrote: >> On 04/18/2014 11:56 AM, Matt Turner wrote: >>> --- >>> src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 19 +++ >>> 1 file changed, 15 insertions(+), 4 deletions(-) >> >> I originally thought this needed to go earlier in the patch series, >> since by this point you're emitting opcodes with more than 3 sources. >> However, CSE ignores SHADER_OPCODE_LOAD_PAYLOAD, so this code will never >> run for opcodes that could break. So, it's probably fine. >> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >>> index 94f657d..e40567f 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp >>> @@ -94,10 +94,20 @@ is_expression_commutative(enum opcode op) >>> } >>> >>> static bool >>> -operands_match(enum opcode op, fs_reg *xs, fs_reg *ys) >>> +operands_match(fs_inst *a, fs_inst *b) >>> { >>> - if (!is_expression_commutative(op)) { >>> - return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && >>> xs[2].equals(ys[2]); >>> + fs_reg *xs = a->src; >>> + fs_reg *ys = b->src; >>> + >>> + if (!is_expression_commutative(a->opcode)) { >>> + bool match = true; >>> + for (int i = 0; i < a->sources; i++) { >>> + if (!xs[i].equals(ys[i])) { >>> +match = false; >>> +break; >>> + } >>> + } >>> + return match; >>> } else { >>>return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) || >>> (xs[1].equals(ys[0]) && xs[0].equals(ys[1])); >> >> It strikes me as a bit asymmetric to have the first block handle an >> arbitrary number of sources, and the second only check 0 and 1. It >> makes sense, since our commutative opcodes are all binops, but... >> >> How about adding >> >> /* All commutative opcodes are binary operations. */ >> assert(a->sources == 2 && b->sources == 2); >> >> here in the commutative case? > > What is an example of a commutative non-binary operator? I don't think we have any in i965, but things like min3/max3/mid3 are three-source operations that are "commutative". At any rate, adding the assertion makes it obvious that this code is correct by design, and we didn't just forget to update it for arbitrary-length source lists...I don't think it's an unreasonable request... signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev