[Mesa-dev] [PATCH v2 0/2] nvc0: support for GK20A (Tegra K1)
The following 2 patches make it possible to run Mesa programs on GK20A (Tegra K1). GK20A is very similar to GK104, but uses a new (backward-compatible) 3D class as well as the same ISA as GK110 (SM35). Taking these differences into account is sufficient to successfully render simple off-screen buffers. Changes since v1: - Update TargetNVC0::getFileSize() to return the right number of GPR - Remove definition for unneeded NVISA_GK110_CHIPSET - Use consistent comparison scheme in nv50_ir_emit_nvc0.cpp Alexandre Courbot (2): nvc0: add GK20A 3D class nvc0: use SM35 ISA with GK20A src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 +- .../drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 15 ++- src/gallium/drivers/nouveau/nv_object.xml.h | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c| 9 - 5 files changed, 21 insertions(+), 8 deletions(-) -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/2] nvc0: use SM35 ISA with GK20A
GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use the GK110 path when this chip is detected. Signed-off-by: Alexandre Courbot --- src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 +- .../drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 15 ++- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h index bbb89d97932e..f829aac0bcc2 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h @@ -91,7 +91,7 @@ struct nv50_ir_prog_symbol #define NVISA_GF100_CHIPSET_C0 0xc0 #define NVISA_GF100_CHIPSET_D0 0xd0 #define NVISA_GK104_CHIPSET0xe0 -#define NVISA_GK110_CHIPSET0xf0 +#define NVISA_GK20A_CHIPSET0xea #define NVISA_GM107_CHIPSET0x110 struct nv50_ir_prog_info diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index b1f76cf80432..f69e6a183e19 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -3027,7 +3027,7 @@ TargetNVC0::createCodeEmitterNVC0(Program::Type type) CodeEmitter * TargetNVC0::getCodeEmitter(Program::Type type) { - if (chipset >= NVISA_GK110_CHIPSET) + if (chipset >= NVISA_GK20A_CHIPSET) return createCodeEmitterGK110(type); return createCodeEmitterNVC0(type); } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp index 064e7a2c63f9..963b6e47ddfc 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp @@ -49,9 +49,12 @@ TargetNVC0::getBuiltinCode(const uint32_t **code, uint32_t *size) const { switch (chipset & ~0xf) { case 0xe0: - *code = (const uint32_t *)&gk104_builtin_code[0]; - *size = sizeof(gk104_builtin_code); - break; + if (chipset < NVISA_GK20A_CHIPSET) { + *code = (const uint32_t *)&gk104_builtin_code[0]; + *size = sizeof(gk104_builtin_code); + break; + } + /* fall-through for GK20A */ case 0xf0: case 0x100: *code = (const uint32_t *)&gk110_builtin_code[0]; @@ -71,7 +74,9 @@ TargetNVC0::getBuiltinOffset(int builtin) const switch (chipset & ~0xf) { case 0xe0: - return gk104_builtin_offsets[builtin]; + if (chipset < NVISA_GK20A_CHIPSET) + return gk104_builtin_offsets[builtin]; + /* fall-through for GK20A */ case 0xf0: case 0x100: return gk110_builtin_offsets[builtin]; @@ -235,7 +240,7 @@ TargetNVC0::getFileSize(DataFile file) const { switch (file) { case FILE_NULL: return 0; - case FILE_GPR: return (chipset >= NVISA_GK110_CHIPSET) ? 255 : 63; + case FILE_GPR: return (chipset >= NVISA_GK20A_CHIPSET) ? 255 : 63; case FILE_PREDICATE: return 7; case FILE_FLAGS: return 1; case FILE_ADDRESS: return 0; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/2] nvc0: add GK20A 3D class
GK20A is mostly compatible with GK104, but features a new 3D class. Add it to the relevant header and use it when GK20A is detected. Signed-off-by: Alexandre Courbot --- src/gallium/drivers/nouveau/nv_object.xml.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 9 - 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv_object.xml.h b/src/gallium/drivers/nouveau/nv_object.xml.h index 4c93e6564838..0a0e187dc028 100644 --- a/src/gallium/drivers/nouveau/nv_object.xml.h +++ b/src/gallium/drivers/nouveau/nv_object.xml.h @@ -190,6 +190,7 @@ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. #define NVC8_3D_CLASS 0x9297 #define NVE4_3D_CLASS 0xa097 #define NVF0_3D_CLASS 0xa197 +#define NVEA_3D_CLASS 0xa297 #define GM107_3D_CLASS 0xb097 #define NV50_2D_CLASS 0x502d #define NVC0_2D_CLASS 0x902d diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index cccfe2bba23d..95e5ef81cd79 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -702,7 +702,14 @@ nvc0_screen_create(struct nouveau_device *dev) obj_class = NVF0_3D_CLASS; break; case 0xe0: - obj_class = NVE4_3D_CLASS; + switch (dev->chipset) { + case 0xea: + obj_class = NVEA_3D_CLASS; + break; + default: + obj_class = NVE4_3D_CLASS; + break; + } break; case 0xd0: obj_class = NVC8_3D_CLASS; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 79230] After upgrade from 10.1.4 to 10.2-rc4 cross-compile fails
https://bugs.freedesktop.org/show_bug.cgi?id=79230 Andreas Boll changed: What|Removed |Added CC||i...@freedesktop.org Keywords||regression Blocks||79039 --- Comment #1 from Andreas Boll --- Can you bisect? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 79039] [TRACKER] Mesa 10.2 release tracker
https://bugs.freedesktop.org/show_bug.cgi?id=79039 Andreas Boll changed: What|Removed |Added Depends on||79230 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 78914] [llvmpipe] Front/Backfaces do not cover the same pixels when rasterized
https://bugs.freedesktop.org/show_bug.cgi?id=78914 --- Comment #16 from Florian Link --- This is strange, since in my renderer I render all back faces to one FBO and all front faces to another FBO, so the front/back faces do not fight in the Z-buffer. I really experience missing pixels on faces that share edges, in one FBO the front triangle edge has different pixels than the back triangle in the other FBO... But maybe your code does something different when depth testing is on? I will try to adapt my example to show the problem without depth test. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 78914] [llvmpipe] Front/Backfaces do not cover the same pixels when rasterized
https://bugs.freedesktop.org/show_bug.cgi?id=78914 --- Comment #17 from Florian Link --- Ok, you are right, it only happens with depth test enabled. The strange thing is that it creates these artifacts in my ray caster, where I get exactly theses holes but both front/back faces have the same interpolated positions, so depth rejection should not create holes because another triangle should be have rendered to that pixel first. Anyway, I think the depth test should still not generate these rejection pixels, since it will create problems in e.g. depth peeling and other algorithms as well. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 79294] New: Xlib-based build broken on non x86/x86-64 architectures
https://bugs.freedesktop.org/show_bug.cgi?id=79294 Priority: medium Bug ID: 79294 Assignee: mesa-dev@lists.freedesktop.org Summary: Xlib-based build broken on non x86/x86-64 architectures Severity: blocker Classification: Unclassified OS: All Reporter: andreas.boll@gmail.com Hardware: Other Status: NEW Version: git Component: Other Product: Mesa Created attachment 99928 --> https://bugs.freedesktop.org/attachment.cgi?id=99928&action=edit Full build log of mesa-10.2.0rc4-powerpc All non x86/x86-64 architectures are affected by the same issue. e.g armel, armhf, mips, mipsel, powerpc, s390x For more information see https://buildd.debian.org/status/package.php?p=mesa&suite=experimental libtool: compile: gcc -DPACKAGE_NAME=\"Mesa\" -DPACKAGE_TARNAME=\"mesa\" -DPACKAGE_VERSION=\"10.2.0-rc4\" "-DPACKAGE_STRING=\"Mesa 10.2.0-rc4\"" "-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\""; -DPACKAGE_URL=\"\" -DPACKAGE=\"mesa\" -DVERSION=\"10.2.0-rc4\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DYYTEXT_POINTER=1 -DHAVE___BUILTIN_BSWAP32=1 -DHAVE___BUILTIN_BSWAP64=1 -DHAVE_DLADDR=1 -DHAVE_CLOCK_GETTIME=1 -DHAVE_PTHREAD=1 -I. -I../../../../../src/mapi/glapi -D_GNU_SOURCE -DHAVE_PTHREAD -DHAVE_DLOPEN -DHAVE_POSIX_MEMALIGN -DHAVE_LIBDRM -DHAVE_LIBUDEV -DUSE_XSHM -DMESA_EGL_NO_X11_HEADERS -I../../../../../include -I../../../../../src/mapi -I../../../src/mapi -I../../../../../src/mesa -DMAPI_MODE_UTIL -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall -Wall -std=c99 -Werror=implicit-function-declaration -Werror=missing-prototypes -fno-strict-aliasing -fno-builtin-memcmp -MT glapi_dispatch.lo -MD -MP -MF .deps/glapi_dispatch.Tpo -c ../../../../../src/mapi/glapi/glapi_dispatch.c -fPIC -DPIC -o .libs/glapi_dispatch.o ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glPointSizePointerOES' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:7689:24: note: in expansion of macro 'NAME' KEYWORD1 void KEYWORD2 NAME(PointSizePointerOES)(GLenum type, GLsizei stride, const GLvoid * pointer) ^ ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glAlphaFuncx' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:9359:24: note: in expansion of macro 'NAME' KEYWORD1 void KEYWORD2 NAME(AlphaFuncx)(GLenum func, GLclampx ref) ^ ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glClearColorx' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:9373:24: note: in expansion of macro 'NAME' KEYWORD1 void KEYWORD2 NAME(ClearColorx)(GLclampx red, GLclampx green, GLclampx blue, GLclampx alpha) ^ ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glClearDepthx' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:9387:24: note: in expansion of macro 'NAME' KEYWORD1 void KEYWORD2 NAME(ClearDepthx)(GLclampx depth) ^ ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glColor4x' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:9401:24: note: in expansion of macro 'NAME' KEYWORD1 void KEYWORD2 NAME(Color4x)(GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha) ^ ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glDepthRangex' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:9415:24: note: in expansion of macro 'NAME' KEYWORD1 void KEYWORD2 NAME(DepthRangex)(GLclampx zNear, GLclampx zFar) ^ ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glFogx' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:9429:24: note: in expansion of macro 'NAME' KEYWORD1 void KEYWORD2 NAME(Fogx)(GLenum pname, GLfixed param) ^ ../../../../../src/mapi/glapi/glapi_dispatch.c:57:21: error: no previous prototype for 'glFogxv' [-Werror=missing-prototypes] #define NAME(func) gl##func ^ ../../../src/mapi/glapi/glapitemp.h:9443:24: note: in exp
[Mesa-dev] [Bug 79039] [TRACKER] Mesa 10.2 release tracker
https://bugs.freedesktop.org/show_bug.cgi?id=79039 Andreas Boll changed: What|Removed |Added Depends on||79294 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 79294] Xlib-based build broken on non x86/x86-64 architectures
https://bugs.freedesktop.org/show_bug.cgi?id=79294 Andreas Boll changed: What|Removed |Added CC||i...@freedesktop.org Keywords||regression Blocks||79039 --- Comment #1 from Andreas Boll --- This issue seems to be related with Bug 79230 Feel free to close this as duplicate. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 78914] [llvmpipe] Front/Backfaces do not cover the same pixels when rasterized
https://bugs.freedesktop.org/show_bug.cgi?id=78914 --- Comment #18 from Florian Link --- Ok, I can confirm that it is a depth fighting problem and found a fix for my ray caster. Thank you for your effort! Still it would be good if LLVM pipe would do the same quality depth test as softpipe and NVidia/ATI do. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 79230] After upgrade from 10.1.4 to 10.2-rc4 cross-compile fails
https://bugs.freedesktop.org/show_bug.cgi?id=79230 Emilio Pozuelo Monfort changed: What|Removed |Added CC||poch...@gmail.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i915: add a missing NULL pointer check
mesaVisual can be NULL with configless context since this commit: commit 551d459af421a2eb937e9e16301bb64da4624f89 Author: Neil Roberts Date: Fri Mar 7 18:05:47 2014 + Add the EGL_MESA_configless_context extension ... Previously the i965 and i915 drivers were explicitly creating a zeroed visual whenever 0 is passed for the EGLConfig. We attempt to dereference the visual in i915 and now we don't create a zeroed-out one one it crashes, breaking at least weston in an i915. There's point in doing so as it would be zero anyway. Signed-off-by: Lubomir Rintel --- This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1100967 src/mesa/drivers/dri/i915/intel_context.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i915/intel_context.c b/src/mesa/drivers/dri/i915/intel_context.c index a6057d3..09fe371 100644 --- a/src/mesa/drivers/dri/i915/intel_context.c +++ b/src/mesa/drivers/dri/i915/intel_context.c @@ -507,7 +507,7 @@ intelInitContext(struct intel_context *intel, _mesa_meta_init(ctx); - intel->hw_stencil = mesaVis->stencilBits && mesaVis->depthBits == 24; + intel->hw_stencil = mesaVis && mesaVis->stencilBits && mesaVis->depthBits == 24; intel->hw_stipple = 1; intel->RenderIndex = ~0; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] targets/opencl: Fix (static) linking with LLVM
On 27/05/14 05:46, Kertesz Laszlo wrote: > On 05/19/2014 06:06 PM, Kai Wasserbäch wrote: >> Michel Dänzer schrieb am 19.05.2014 04:12: >>> On 18.05.2014 18:37, Kai Wasserbäch wrote: And instead of just not starting, my X starts crashing, whenever libGL fails to load a (32 bit) driver. >>> >>> FWIW, some potential alternatives for avoiding the X crashes: >>> >>> With current xserver Git master, you can pass the -iglx parameter to >>> Xorg to prohibit GLX indirect rendering. >>> >>> Or just make sure the 32-bit swrast_dri.so works. >> >> Thanks a lot for those pointers. I think my swrast failed because it had >> picked >> up some newer SO_VERSION as well. Which would bring me back to static >> linking. >> >> Kind regards, >> Kai Wasserbäch > > Hi, > i too hit the X crashing issue. But i am unable to compile the latest > git (10.2-branchpoint-318-g4c7bf8a according to 'git describe') > Here is the errors i get: > > > make[3]: Entering directory '/compile/mesa/src/gallium/targets/gbm' > CC gbm.lo > CXXLDgbm_gallium_drm.la > /usr/local/lib/llvm32/lib/libLLVMSupport.a(Process.o): In function Hmm that file is not provide by us, so I'm afraid I cannot help you here. Perhaps the LLVM folks will have a better idea. > `llvm::sys::Process::FileDescriptorHasColors(int)': > Process.cpp:(.text._ZN4llvm3sys7Process23FileDescriptorHasColorsEi+0x67): > undefined > reference to `setupterm' > Process.cpp:(.text._ZN4llvm3sys7Process23FileDescriptorHasColorsEi+0x92): > undefined > reference to `tigetnum' > Process.cpp:(.text._ZN4llvm3sys7Process23FileDescriptorHasColorsEi+0xa0): > undefined > reference to `set_curterm' > Process.cpp:(.text._ZN4llvm3sys7Process23FileDescriptorHasColorsEi+0xa8): > undefined > reference to `del_curterm' > /usr/local/lib/llvm32/lib/libLLVMSupport.a(Compression.o): In function > `llvm::zlib::compress(llvm::StringRef, llvm::SmallVectorImpl&, > llvm::zlib::CompressionLevel)': > Compression.cpp:(.text._ZN4llvm4zlib8compressENS_9StringRefERNS_15SmallVectorImplIcEENS0_16CompressionLevelE+0x26): > undefined reference to `compressBound' > Compression.cpp:(.text._ZN4llvm4zlib8compressENS_9StringRefERNS_15SmallVectorImplIcEENS0_16CompressionLevelE+0xa7): > undefined reference to `compress2' > collect2: error: ld returned 1 exit status > Makefile:919: recipe for target 'gbm_gallium_drm.la' failed > make[3]: *** [gbm_gallium_drm.la] Error 1 > make[3]: Leaving directory '/compile/mesa/src/gallium/targets/gbm' > Makefile:543: recipe for target 'all-recursive' failed > make[2]: *** [all-recursive] Error 1 > make[2]: Leaving directory '/compile/mesa/src/gallium/targets' > Makefile:530: recipe for target 'all-recursive' failed > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory '/compile/mesa/src' > Makefile:579: recipe for target 'all-recursive' failed > make: *** [all-recursive] Error 1 > > > Build script (i use it since forever, it is working with enabled llvm > shared libs, but as soon as i touch anything OpenGL related, X crashes.) > I would check what exactly is causing the crash and open a ticket at bugzilla, considering it's not already reported. > PKG_CONFIG_PATH=/usr/lib/i386-linux-gnu/pkgconfig:/usr/lib/pkgconfig:/usr/local/share/pkgconfig > ./autogen.sh --sysconfdir=/etc --prefix=/usr \ > --libdir=/usr/lib/i386-linux-gnu --enable-debug \ > CPPFLAGS="-m32" \ > CXXFLAGS="-m32" \ > --with-llvm-prefix=/usr/local/lib/llvm32 \ > LDFLAGS="-L/usr/lib/i386-linux-gnu -L/usr/lib -L/usr/local/lib/llvm32/lib" \ Explicitly setting LDFLAGS is a recipe for disaster. Try to avoid that at all cost. Cheers Emil > --disable-64-bit --enable-32-bit \ > --enable-texture-float \ > --with-gallium-drivers=r600,swrast,radeonsi \ > --with-dri-drivers="" \ > --enable-vdpau \ > --enable-egl --enable-gles1 --enable-gles2 \ > --enable-glx-tls \ > --with-egl-platforms=x11,drm \ > --enable-gbm \ > --enable-gallium-egl \ > --enable-gallium-llvm \ > --disable-r600-llvm-compiler \ > --disable-dri3 \ > --enable-opencl \ > --enable-shared-glapi \ > --enable-gallium-osmesa \ > --disable-llvm-shared-libs > > > >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] i965: Add runtime checks for line antialiasing in Gen < 6
https://bugs.freedesktop.org/show_bug.cgi?id=78679 The c->runtime_check_aads_emit field has been unused since the removal of the old ARB_fragment_shader backend in commit 098acf6c84333edbb7b1228545e4bdb2572ee0cd. This field was relevant in Gen < 6 to do proper rendering of polygons in a scenario where line antialising is enabled and one of the polygon faces is rendered in GL_LINE mode while the other remains GL_FILL. Currently, this scenario is broken in gm45 and ironlake (although ironlake was broken before than commit too). Particularly, the GL_FILL face of the polygon renders incorrectly with noise and wrong colors. Line color interpolation also seems to be incorrect in some cases although I don't know if this is related to the removed code or is a completely different issue. There is a test case attached to the bug report that showcases the problem in Gen4 and Gen5. This patch series fixes the following issues in this scenario: * In Gen5: Fixes incorrect rendering of the polygon's GL_FILL face. * In Gen4: Removes noise and incorrect coloring of the polygon's GL_FILL face. The following issues remain (would need further investigation): * In Gen5 and Gen4: color interpolation in GL_LINE faces is not correct (color is not interpolated in most cases). This seems to be unrelated to antialiasing settings although behavior is improved when AA is enabled with this patches (some lines do interpolate in some cases). * In Gen4: the GL_FILL face's color is flat (does not interpolate). This is unrelated to antialiasing settings, and happens every time there is a face being rendered in GL_LINE and another in GL_FILL. Patch 1: fixes possible crashes when processing code streams that end in a block structure. This popped up while testing since the second patch creates this situation. Patch 2: Checks runtime conditions for proper AA setup in Gen < 6 when doing framebuffer writes. Patch 3: Saves unnecessary MOV in some cases involving AA in Gen < 6. Iago Toral Quiroga (3): i965: Always set a valid block end pointer i965: Add runtime checks for line antialiasing in Gen < 6. i965: Do not prepare antialiasing data if it is not required src/mesa/drivers/dri/i965/brw_cfg.cpp| 5 ++ src/mesa/drivers/dri/i965/brw_fs.h | 5 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 107 ++- 3 files changed, 81 insertions(+), 36 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965: Add runtime checks for line antialiasing in Gen < 6.
In Gen < 6 the hardware generates a runtime bit that indicates whether AA data has to be sent as part of the framebuffer write SEND message. This affects the specific case where we have setup antialiased line rendering and we render polygons which have one face setup in GL_LINE mode (line antialiasing will be used) and the other one in GL_FILL mode (no line antialiasing needed). Currently we are not doing this runtime test and instead we always send AA data, which produces incorrect rendering of the GL_FILL face of the polygon in in the aforementioned scenario (verified in ironlake and gm45). In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In Gen5 this has never worked properly. Gen > 5 are not affected by this. The patch fixes the problem by adding the appropriate runtime check and adjusting the framebuffer write message accordingly in the conflictive scenario (detected with fs_visitor::runtime_check_aads_emit == TRUE). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679 --- src/mesa/drivers/dri/i965/brw_fs.h | 4 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 86 +--- 2 files changed, 58 insertions(+), 32 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 60a4906..ab8912f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -452,6 +452,10 @@ public: void emit_color_write(int target, int index, int first_color_mrf); void emit_alpha_test(); + void do_emit_fb_write(int target, int base_mrf, int mlen, bool eot, + bool header_present); + void emit_fb_write(int target, int base_mrf, int mlen, bool eot, + bool header_present); void emit_fb_writes(); void emit_shader_time_begin(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 171f063..4c3897b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2731,6 +2731,54 @@ fs_visitor::emit_alpha_test() } void +fs_visitor::do_emit_fb_write(int target, int base_mrf, int mlen, bool eot, + bool header_present) +{ + fs_inst *inst = emit(FS_OPCODE_FB_WRITE); + inst->target = target; + inst->base_mrf = base_mrf; + inst->mlen = mlen; + inst->eot = eot; + inst->header_present = header_present; + if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) { + inst->predicate = BRW_PREDICATE_NORMAL; + inst->flag_subreg = 1; + } +} + +void +fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot, + bool header_present) +{ + if (!runtime_check_aads_emit) { + do_emit_fb_write(target, base_mrf, mlen, eot, header_present); + } else { + /* This can only happen in Gen < 6 + */ + fs_reg reg_tmp_ud = fs_reg(this, glsl_type::uint_type); + emit(AND(reg_tmp_ud, + fs_reg(get_element_ud(brw_vec8_grf(1,0), 6)), + fs_reg(brw_imm_ud(1<<26; + emit(CMP(reg_null_ud, + reg_tmp_ud, + fs_reg(brw_imm_ud(0)), + BRW_CONDITIONAL_Z)); + emit(IF(BRW_PREDICATE_NORMAL)); + { + /* Shift message header one register since we are not sending + * AA data stored in base_mrf+2 + */ + do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, header_present); + } + emit(BRW_OPCODE_ELSE); + { + do_emit_fb_write(target, base_mrf, mlen, eot, header_present); + } + emit(BRW_OPCODE_ENDIF); + } +} + +void fs_visitor::emit_fb_writes() { this->current_annotation = "FB write header"; @@ -2848,16 +2896,7 @@ fs_visitor::emit_fb_writes() if (INTEL_DEBUG & DEBUG_SHADER_TIME) emit_shader_time_end(); - fs_inst *inst = emit(FS_OPCODE_FB_WRITE); - inst->target = 0; - inst->base_mrf = base_mrf; - inst->mlen = nr - base_mrf; - inst->eot = true; - inst->header_present = header_present; - if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) { - inst->predicate = BRW_PREDICATE_NORMAL; - inst->flag_subreg = 1; - } + emit_fb_write(0, base_mrf, nr - base_mrf, true, header_present); prog_data->dual_src_blend = true; this->current_annotation = NULL; @@ -2894,19 +2933,10 @@ fs_visitor::emit_fb_writes() emit_shader_time_end(); } - fs_inst *inst = emit(FS_OPCODE_FB_WRITE); - inst->target = target; - inst->base_mrf = base_mrf; - if (src0_alpha_to_render_target && target == 0) - inst->mlen = nr - base_mrf - reg_width; - else - inst->mlen = nr - base_mrf; - inst->eot = eot; - inst->header_present = header_present; - if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) { - inst->predicate = BRW_PREDICATE_NORMAL; - inst->flag
[Mesa-dev] [PATCH 1/3] i965: Always set a valid block end pointer
When a instruction stream ends in a block structure (like a IF/ELSE/ENDIF) the last block's end pointer will not be set, leading to a crash later on in fs_live_variables::setup_def_use(). If we have not assigned the end pointer of the last block, set it to the last instruction. --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index 6bf99f1..d4647c4 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -257,6 +257,11 @@ cfg_t::cfg_t(exec_list *instructions) } } + /* If the instruction stream ended with a block structure we need to + set the block's end pointer to the last instruction here */ + if (!cur->end) + cur->end = (backend_instruction *)instructions->get_tail(); + cur->end_ip = ip; make_block_array(); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965: Do not prepare antialiasing data if it is not required
In Gen < 6 AA data will or will not be sent as part of the framebuffer write SEND message depending on a runtime condition, so don't bother moving AA data to the corresponding MRF register until we know that we need to send it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679 --- src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 23 ++- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index ab8912f..351d0b6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -452,6 +452,7 @@ public: void emit_color_write(int target, int index, int first_color_mrf); void emit_alpha_test(); + void emit_aa(int mrf_aa_reg); void do_emit_fb_write(int target, int base_mrf, int mlen, bool eot, bool header_present); void emit_fb_write(int target, int base_mrf, int mlen, bool eot, diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 4c3897b..de1726f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2731,6 +2731,17 @@ fs_visitor::emit_alpha_test() } void +fs_visitor::emit_aa(int mrf_aa_reg) +{ + if (payload.aa_dest_stencil_reg) { + push_force_uncompressed(); + emit(MOV(fs_reg(MRF, mrf_aa_reg), + fs_reg(brw_vec8_grf(payload.aa_dest_stencil_reg, 0; + pop_force_uncompressed(); + } +} + +void fs_visitor::do_emit_fb_write(int target, int base_mrf, int mlen, bool eot, bool header_present) { @@ -2751,6 +2762,7 @@ fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot, bool header_present) { if (!runtime_check_aads_emit) { + emit_aa(base_mrf + 2); do_emit_fb_write(target, base_mrf, mlen, eot, header_present); } else { /* This can only happen in Gen < 6 @@ -2766,12 +2778,13 @@ fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot, emit(IF(BRW_PREDICATE_NORMAL)); { /* Shift message header one register since we are not sending - * AA data stored in base_mrf+2 + * AA data in base_mrf+2 */ do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, header_present); } emit(BRW_OPCODE_ELSE); { + emit_aa(base_mrf + 2); do_emit_fb_write(target, base_mrf, mlen, eot, header_present); } emit(BRW_OPCODE_ENDIF); @@ -2819,11 +2832,11 @@ fs_visitor::emit_fb_writes() nr += 2; } + /* AA data. Depending on runtime conditions we might not need to send it +* but we reserve space for it for now. +*/ if (payload.aa_dest_stencil_reg) { - push_force_uncompressed(); - emit(MOV(fs_reg(MRF, nr++), - fs_reg(brw_vec8_grf(payload.aa_dest_stencil_reg, 0; - pop_force_uncompressed(); + nr += 1; } prog_data->uses_omask = -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 78914] [llvmpipe] Front/Backfaces do not cover the same pixels when rasterized
https://bugs.freedesktop.org/show_bug.cgi?id=78914 --- Comment #19 from Roland Scheidegger --- depth test as such is as accurate as it could be. Doing interpolation with as much precision as possible is not all that easy due to properties of floating point arithmetic. In particular for the math the order of vertices matter. Reordering would be possible, though still does not guarantee the same results for fragments along a shared edge (unless the tri shares all vertices, that is it's really the same tri with reordered edges). But I agree doing better would be nice, I'm just not entirely sure what clever tricks need to be done to achieve this. There's also a slight bug in the implementation I believe, the interpolation should be done with snapped (fixed point) coordinates, however we do the interpolation setup with float coordinates. I'm not sure though this would help here, but at least in contrast to other interpolation issues this one wouldn't be all that difficult to fix. Another issue is that if you have some attributes with large gradients on a somewhat small triangle, you can get huge errors the further the triangle is away from the framebuffer origin. So, interpolation is definitely not perfect. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] meta/blit: Use gl_FragColor also in the msaa blit shader
Fixes framebuffer_blit_functionality_multisampled_to_singlesampled_blit es3 cts test on bdw. Also fixes this on ivb when ivb is forced to use the meta path. No piglit regressions on IVB. Signed-off-by: Topi Pohjolainen Cc: Eric Anholt Cc: Matt Turner Cc: Kenneth Graunke Cc: Anuj Phogat Cc: "10.2" --- src/mesa/drivers/common/meta_blit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index 84594d1..5929619 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -273,7 +273,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, samples); } else { ralloc_asprintf_append(&sample_resolve, - " out_color = sample_%d_0 / %f;\n", + " gl_FragColor = sample_%d_0 / %f;\n", samples, (float)samples); } } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] vl: add interface for non-referenced frames
Signed-off-by: Leo Liu --- src/gallium/include/pipe/p_video_state.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/include/pipe/p_video_state.h b/src/gallium/include/pipe/p_video_state.h index 0256a8f..6621dbd 100644 --- a/src/gallium/include/pipe/p_video_state.h +++ b/src/gallium/include/pipe/p_video_state.h @@ -371,6 +371,8 @@ struct pipe_h264_enc_picture_desc unsigned pic_order_cnt; unsigned ref_idx_l0; unsigned ref_idx_l1; + + bool not_referenced; }; #ifdef __cplusplus -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] st/omx/enc: implement restricted b frames pattern
Signed-off-by: Leo Liu --- src/gallium/state_trackers/omx/vid_enc.c | 11 +-- src/gallium/state_trackers/omx/vid_enc.h | 1 + 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/state_trackers/omx/vid_enc.c b/src/gallium/state_trackers/omx/vid_enc.c index ee31452..e64928b 100644 --- a/src/gallium/state_trackers/omx/vid_enc.c +++ b/src/gallium/state_trackers/omx/vid_enc.c @@ -259,6 +259,7 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam priv->force_pic_type.IntraRefreshVOP = OMX_FALSE; priv->frame_num = 0; priv->pic_order_cnt = 0; + priv->restricted_b_frames = debug_get_bool_option("OMX_USE_RESTRICTED_B_FRAMES", FALSE); priv->scale.xWidth = OMX_VID_ENC_SCALING_WIDTH_DEFAULT; priv->scale.xHeight = OMX_VID_ENC_SCALING_WIDTH_DEFAULT; @@ -994,6 +995,8 @@ static void enc_HandleTask(omx_base_PortType *port, struct encode_task *task, picture.picture_type = picture_type; picture.pic_order_cnt = task->pic_order_cnt; + if (priv->restricted_b_frames && picture_type == PIPE_H264_ENC_PICTURE_TYPE_B) + picture.not_referenced = true; enc_ControlPicture(port, &picture); /* -- encode frame - */ @@ -1023,7 +1026,9 @@ static void enc_ClearBframes(omx_base_PortType *port, struct input_buf_private * /* handle B frames */ LIST_FOR_EACH_ENTRY(task, &priv->b_frames, list) { enc_HandleTask(port, task, PIPE_H264_ENC_PICTURE_TYPE_B); - priv->ref_idx_l0 = priv->frame_num++; + if (!priv->restricted_b_frames) + priv->ref_idx_l0 = priv->frame_num; + priv->frame_num++; } enc_MoveTasks(&priv->b_frames, &inp->tasks); @@ -1091,7 +1096,9 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD /* handle B frames */ LIST_FOR_EACH_ENTRY(task, &priv->b_frames, list) { enc_HandleTask(port, task, PIPE_H264_ENC_PICTURE_TYPE_B); - priv->ref_idx_l0 = priv->frame_num++; + if (!priv->restricted_b_frames) +priv->ref_idx_l0 = priv->frame_num; + priv->frame_num++; } enc_MoveTasks(&priv->b_frames, &inp->tasks); diff --git a/src/gallium/state_trackers/omx/vid_enc.h b/src/gallium/state_trackers/omx/vid_enc.h index 22f276f..d0350d6 100644 --- a/src/gallium/state_trackers/omx/vid_enc.h +++ b/src/gallium/state_trackers/omx/vid_enc.h @@ -77,6 +77,7 @@ DERIVEDCLASS(vid_enc_PrivateType, omx_base_filter_PrivateType) OMX_U32 frame_num; \ OMX_U32 pic_order_cnt; \ OMX_U32 ref_idx_l0, ref_idx_l1; \ + OMX_BOOL restricted_b_frames; \ OMX_VIDEO_PARAM_BITRATETYPE bitrate; \ OMX_VIDEO_PARAM_QUANTIZATIONTYPE quant; \ OMX_CONFIG_INTRAREFRESHVOPTYPE force_pic_type; \ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] radeon/vce: implement non-referenced frames
Signed-off-by: Leo Liu --- src/gallium/drivers/radeon/radeon_vce.c| 6 -- src/gallium/drivers/radeon/radeon_vce_40_2_2.c | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vce.c b/src/gallium/drivers/radeon/radeon_vce.c index 222f32e..81e62d3 100644 --- a/src/gallium/drivers/radeon/radeon_vce.c +++ b/src/gallium/drivers/radeon/radeon_vce.c @@ -231,11 +231,13 @@ static void rvce_end_frame(struct pipe_video_codec *encoder, flush(enc); /* update the CPB backtrack with the just encoded frame */ - LIST_DEL(&slot->list); slot->picture_type = enc->pic.picture_type; slot->frame_num = enc->pic.frame_num; slot->pic_order_cnt = enc->pic.pic_order_cnt; - LIST_ADD(&slot->list, &enc->cpb_slots); + if (!enc->pic.not_referenced) { + LIST_DEL(&slot->list); + LIST_ADD(&slot->list, &enc->cpb_slots); + } } static void rvce_get_feedback(struct pipe_video_codec *encoder, diff --git a/src/gallium/drivers/radeon/radeon_vce_40_2_2.c b/src/gallium/drivers/radeon/radeon_vce_40_2_2.c index 3b67b31..3010c5b 100644 --- a/src/gallium/drivers/radeon/radeon_vce_40_2_2.c +++ b/src/gallium/drivers/radeon/radeon_vce_40_2_2.c @@ -283,7 +283,7 @@ static void encode(struct rvce_encoder *enc) RVCE_CS(enc->pic.picture_type == PIPE_H264_ENC_PICTURE_TYPE_IDR); // encIdrFlag RVCE_CS(0x); // encIdrPicId RVCE_CS(0x); // encMGSKeyPic - RVCE_CS(0x0001); // encReferenceFlag + RVCE_CS(!enc->pic.not_referenced); // encReferenceFlag RVCE_CS(0x); // encTemporalLayerIndex RVCE_CS(0x); // num_ref_idx_active_override_flag RVCE_CS(0x); // num_ref_idx_l0_active_minus1 -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] glsl/tests: call create_test_cases.py in optimization-test
This way, when someone modifies create_test_cases.py and forgets to commit their changes again, people will notice. Signed-off-by: Connor Abbott --- src/glsl/tests/optimization-test | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/glsl/tests/optimization-test b/src/glsl/tests/optimization-test index 8ca7776..d58e335 100755 --- a/src/glsl/tests/optimization-test +++ b/src/glsl/tests/optimization-test @@ -9,6 +9,13 @@ fi total=0 pass=0 +echo "== Generating tests ==" +for dir in */; do +if [ -e "$dir/create_test_cases.py" ]; then +cd $dir; python create_test_cases.py; cd .. +fi +done + echo "== Testing optimization passes ==" for test in `find . -iname '*.opt_test'`; do echo -n "Testing $test..." -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] glsl: be more consistent about printing constants
Make sure that we print the same number of digits when printing 0.0 as any other floating-point number. This will make generating expected output files for tests easier. To avoid breaking "make check," update the generated tests for lower_jumps before the next commit which will bring create_test_cases.py in line with them. Signed-off-by: Connor Abbott --- src/glsl/ir_print_visitor.cpp | 2 +- src/glsl/tests/lower_jumps/lower_breaks_1.opt_test | 3 +-- src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected| 3 +-- src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_4.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_5.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_6.opt_test.expected| 10 +- .../lower_guarded_conditional_break.opt_test.expected | 6 +++--- .../tests/lower_jumps/lower_pulled_out_jump.opt_test.expected | 8 src/glsl/tests/lower_jumps/lower_returns_3.opt_test.expected | 4 ++-- src/glsl/tests/lower_jumps/lower_returns_4.opt_test.expected | 2 +- .../lower_jumps/lower_returns_main_false.opt_test.expected | 4 ++-- .../lower_jumps/lower_returns_main_true.opt_test.expected | 4 ++-- .../lower_jumps/lower_returns_sub_false.opt_test.expected | 4 ++-- .../tests/lower_jumps/lower_returns_sub_true.opt_test.expected | 4 ++-- .../tests/lower_jumps/lower_unified_returns.opt_test.expected | 8 .../tests/lower_jumps/remove_continue_at_end_of_loop.opt_test | 3 +-- .../remove_continue_at_end_of_loop.opt_test.expected | 3 +-- .../return_void_at_end_of_loop_lower_nothing.opt_test | 3 +-- .../return_void_at_end_of_loop_lower_nothing.opt_test.expected | 3 +-- .../return_void_at_end_of_loop_lower_return.opt_test | 3 +-- .../return_void_at_end_of_loop_lower_return_and_break.opt_test | 3 +-- 23 files changed, 40 insertions(+), 48 deletions(-) diff --git a/src/glsl/ir_print_visitor.cpp b/src/glsl/ir_print_visitor.cpp index 0a7695a..a3d851e 100644 --- a/src/glsl/ir_print_visitor.cpp +++ b/src/glsl/ir_print_visitor.cpp @@ -430,7 +430,7 @@ void ir_print_visitor::visit(ir_constant *ir) case GLSL_TYPE_FLOAT: if (ir->value.f[i] == 0.0f) /* 0.0 == -0.0, so print with %f to get the proper sign. */ - fprintf(f, "%.1f", ir->value.f[i]); + fprintf(f, "%f", ir->value.f[i]); else if (fabs(ir->value.f[i]) < 0.01f) fprintf(f, "%a", ir->value.f[i]); else if (fabs(ir->value.f[i]) > 100.0f) diff --git a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test index b412ba8..e2d4ed1 100755 --- a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test +++ b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test @@ -8,6 +8,5 @@ ((declare (out) float a) (function main (signature void (parameters) - ((loop - ((assign (x) (var_ref a) (constant float (1.00))) break)) + ((loop ((assign (x) (var_ref a) (constant float (1.00))) break)) EOF diff --git a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected index 56ef3e4..270a43d 100644 --- a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected +++ b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected @@ -1,5 +1,4 @@ ((declare (out) float a) (function main (signature void (parameters) - ((loop - ((assign (x) (var_ref a) (constant float (1.00))) break)) + ((loop ((assign (x) (var_ref a) (constant float (1.00))) break)) diff --git a/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected b/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected index dc231f9..73a1d56 100644 --- a/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected +++ b/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected @@ -3,5 +3,5 @@ (signature void (parameters) ((loop ((assign (x) (var_ref a) (constant float (1.00))) - (if (expression bool > (var_ref b) (constant float (0.0))) (break) + (if (expression bool > (var_ref b) (constant float (0.00))) (break) ( diff --git a/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected b/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected index 8131b66..53d5392 100644 --- a/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected +++ b/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected @@ -3,6 +3,6 @@ (signature void (parameters) ((loop ((assign (x) (var_ref a) (constant float (1.00))) - (if (expression bool > (var_ref b) (constant float (0.0))) + (if (expression bool > (var_ref b) (constant float (0.00))) ((assign (x) (
[Mesa-dev] [PATCH 2/4] glsl/tests/lower_jumps: fix generated sexpr's for loops
In 088494aa (as well as other commits in the series) Paul Berry modified the tests for lower_jumps to account for the fact that the s-expression for the loop IR instruction changed from (loop () () () () (statements...)) to (loop (statements...)), but he forgot to update create_test_cases.py which he used to create the tests. Fix that, so that now create_test_cases.py is synced with the generated tests. Signed-off-by: Connor Abbott --- src/glsl/tests/lower_jumps/create_test_cases.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/tests/lower_jumps/create_test_cases.py b/src/glsl/tests/lower_jumps/create_test_cases.py index 9974681..3be1079 100644 --- a/src/glsl/tests/lower_jumps/create_test_cases.py +++ b/src/glsl/tests/lower_jumps/create_test_cases.py @@ -126,7 +126,7 @@ def loop(statements): body. """ check_sexp(statements) -return [['loop', [], [], [], [], statements]] +return [['loop', statements]] def declare_temp(var_type, var_name): """Create a declaration of the form -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] glsl/tests: remove generated tests from the repo
They were made unneccesary by the last commit. Signed-off-by: Connor Abbott --- src/glsl/tests/lower_jumps/.gitignore | 2 ++ src/glsl/tests/lower_jumps/lower_breaks_1.opt_test | 12 - .../lower_jumps/lower_breaks_1.opt_test.expected | 4 --- src/glsl/tests/lower_jumps/lower_breaks_2.opt_test | 15 --- .../lower_jumps/lower_breaks_2.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_3.opt_test | 17 - .../lower_jumps/lower_breaks_3.opt_test.expected | 8 -- src/glsl/tests/lower_jumps/lower_breaks_4.opt_test | 15 --- .../lower_jumps/lower_breaks_4.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_5.opt_test | 16 .../lower_jumps/lower_breaks_5.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_6.opt_test | 29 -- .../lower_jumps/lower_breaks_6.opt_test.expected | 29 -- .../lower_guarded_conditional_break.opt_test | 21 ...wer_guarded_conditional_break.opt_test.expected | 20 --- .../lower_jumps/lower_pulled_out_jump.opt_test | 28 - .../lower_pulled_out_jump.opt_test.expected| 25 --- .../tests/lower_jumps/lower_returns_1.opt_test | 12 - .../lower_jumps/lower_returns_1.opt_test.expected | 4 --- .../tests/lower_jumps/lower_returns_2.opt_test | 13 -- .../lower_jumps/lower_returns_2.opt_test.expected | 5 .../tests/lower_jumps/lower_returns_3.opt_test | 20 --- .../lower_jumps/lower_returns_3.opt_test.expected | 21 .../tests/lower_jumps/lower_returns_4.opt_test | 14 --- .../lower_jumps/lower_returns_4.opt_test.expected | 16 .../lower_jumps/lower_returns_main_false.opt_test | 17 - .../lower_returns_main_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_main_true.opt_test | 17 - .../lower_returns_main_true.opt_test.expected | 13 -- .../lower_jumps/lower_returns_sub_false.opt_test | 16 .../lower_returns_sub_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_sub_true.opt_test| 16 .../lower_returns_sub_true.opt_test.expected | 13 -- .../lower_jumps/lower_unified_returns.opt_test | 26 --- .../lower_unified_returns.opt_test.expected| 21 .../remove_continue_at_end_of_loop.opt_test| 12 - ...emove_continue_at_end_of_loop.opt_test.expected | 4 --- ..._non_void_at_end_of_loop_lower_nothing.opt_test | 16 ..._at_end_of_loop_lower_nothing.opt_test.expected | 8 -- ...n_non_void_at_end_of_loop_lower_return.opt_test | 16 ...d_at_end_of_loop_lower_return.opt_test.expected | 19 -- ..._at_end_of_loop_lower_return_and_break.opt_test | 16 ...f_loop_lower_return_and_break.opt_test.expected | 19 -- ...turn_void_at_end_of_loop_lower_nothing.opt_test | 13 -- ..._at_end_of_loop_lower_nothing.opt_test.expected | 5 ...eturn_void_at_end_of_loop_lower_return.opt_test | 13 -- ...d_at_end_of_loop_lower_return.opt_test.expected | 11 ..._at_end_of_loop_lower_return_and_break.opt_test | 13 -- ...f_loop_lower_return_and_break.opt_test.expected | 11 49 files changed, 2 insertions(+), 696 deletions(-) delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_3.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_4.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_4.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_5.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_5.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_6.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_6.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_guarded_conditional_break.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_guarded_conditional_break.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_pulled_out_jump.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_pulled_out_jump.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_returns_1.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_returns_1.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_returns_2.opt_test delete m
[Mesa-dev] [PATCH 0/4] glsl/tests: remove generated files
While trying to modify the lower_jumps unit tests to account for my SSA changes, I realized that the tests were not in sync with the file that generated them. There were two problems: -The *.expected files all had the same number of digits after the decimal place (6) whereas the *.out files had 1 digit in "0.0" and 6 digits in "1.00" when printing constants, which led to failures due to diffs like: - ((if (expression bool > (var_ref b) (constant float (0.00))) + ((if (expression bool > (var_ref b) (constant float (0.0))) -Loops were incorrect in the input files. I fixed both problems, and then I removed the generated tests so that stuff like this won't happen again. Connor Abbott (4): glsl: be more consistent about printing constants glsl/tests/lower_jumps: fix generated sexpr's for loops glsl/tests: call create_test_cases.py in optimization-test glsl/tests: remove generated tests from the repo src/glsl/ir_print_visitor.cpp | 2 +- src/glsl/tests/lower_jumps/.gitignore | 2 ++ src/glsl/tests/lower_jumps/create_test_cases.py| 2 +- src/glsl/tests/lower_jumps/lower_breaks_1.opt_test | 13 -- .../lower_jumps/lower_breaks_1.opt_test.expected | 5 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test | 15 --- .../lower_jumps/lower_breaks_2.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_3.opt_test | 17 - .../lower_jumps/lower_breaks_3.opt_test.expected | 8 -- src/glsl/tests/lower_jumps/lower_breaks_4.opt_test | 15 --- .../lower_jumps/lower_breaks_4.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_5.opt_test | 16 .../lower_jumps/lower_breaks_5.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_6.opt_test | 29 -- .../lower_jumps/lower_breaks_6.opt_test.expected | 29 -- .../lower_guarded_conditional_break.opt_test | 21 ...wer_guarded_conditional_break.opt_test.expected | 20 --- .../lower_jumps/lower_pulled_out_jump.opt_test | 28 - .../lower_pulled_out_jump.opt_test.expected| 25 --- .../tests/lower_jumps/lower_returns_1.opt_test | 12 - .../lower_jumps/lower_returns_1.opt_test.expected | 4 --- .../tests/lower_jumps/lower_returns_2.opt_test | 13 -- .../lower_jumps/lower_returns_2.opt_test.expected | 5 .../tests/lower_jumps/lower_returns_3.opt_test | 20 --- .../lower_jumps/lower_returns_3.opt_test.expected | 21 .../tests/lower_jumps/lower_returns_4.opt_test | 14 --- .../lower_jumps/lower_returns_4.opt_test.expected | 16 .../lower_jumps/lower_returns_main_false.opt_test | 17 - .../lower_returns_main_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_main_true.opt_test | 17 - .../lower_returns_main_true.opt_test.expected | 13 -- .../lower_jumps/lower_returns_sub_false.opt_test | 16 .../lower_returns_sub_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_sub_true.opt_test| 16 .../lower_returns_sub_true.opt_test.expected | 13 -- .../lower_jumps/lower_unified_returns.opt_test | 26 --- .../lower_unified_returns.opt_test.expected| 21 .../remove_continue_at_end_of_loop.opt_test| 13 -- ...emove_continue_at_end_of_loop.opt_test.expected | 5 ..._non_void_at_end_of_loop_lower_nothing.opt_test | 16 ..._at_end_of_loop_lower_nothing.opt_test.expected | 8 -- ...n_non_void_at_end_of_loop_lower_return.opt_test | 16 ...d_at_end_of_loop_lower_return.opt_test.expected | 19 -- ..._at_end_of_loop_lower_return_and_break.opt_test | 16 ...f_loop_lower_return_and_break.opt_test.expected | 19 -- ...turn_void_at_end_of_loop_lower_nothing.opt_test | 14 --- ..._at_end_of_loop_lower_nothing.opt_test.expected | 6 - ...eturn_void_at_end_of_loop_lower_return.opt_test | 14 --- ...d_at_end_of_loop_lower_return.opt_test.expected | 11 ..._at_end_of_loop_lower_return_and_break.opt_test | 14 --- ...f_loop_lower_return_and_break.opt_test.expected | 11 src/glsl/tests/optimization-test | 7 ++ 52 files changed, 11 insertions(+), 706 deletions(-) delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_3.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_3.opt_
Re: [Mesa-dev] [PATCH 1/3] vl: add interface for non-referenced frames
Am 27.05.2014 16:12, schrieb Leo Liu: Signed-off-by: Leo Liu Reviewed and pushed upstream. Thanks, Christian. --- src/gallium/include/pipe/p_video_state.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/include/pipe/p_video_state.h b/src/gallium/include/pipe/p_video_state.h index 0256a8f..6621dbd 100644 --- a/src/gallium/include/pipe/p_video_state.h +++ b/src/gallium/include/pipe/p_video_state.h @@ -371,6 +371,8 @@ struct pipe_h264_enc_picture_desc unsigned pic_order_cnt; unsigned ref_idx_l0; unsigned ref_idx_l1; + + bool not_referenced; }; #ifdef __cplusplus ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nvc0: add GK20A 3D class
On Tue, May 27, 2014 at 2:35 AM, Alexandre Courbot wrote: > On 05/27/2014 02:29 PM, Ilia Mirkin wrote: >> >> On Tue, May 27, 2014 at 12:59 AM, Alexandre Courbot >> wrote: >>> >>> GK20A is mostly compatible with GK104, but features a new 3D >>> class. Add it to the relevant header and use it when GK20A is >>> detected. >>> >>> Signed-off-by: Alexandre Courbot >>> --- >>> src/gallium/drivers/nouveau/nv_object.xml.h| 1 + >>> src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 9 - >>> 2 files changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/gallium/drivers/nouveau/nv_object.xml.h >>> b/src/gallium/drivers/nouveau/nv_object.xml.h >>> index 4c93e6564838..0a0e187dc028 100644 >>> --- a/src/gallium/drivers/nouveau/nv_object.xml.h >>> +++ b/src/gallium/drivers/nouveau/nv_object.xml.h >>> @@ -190,6 +190,7 @@ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE >>> SOFTWARE. >>> #define NVC8_3D_CLASS >>> 0x9297 >>> #define NVE4_3D_CLASS >>> 0xa097 >>> #define NVF0_3D_CLASS >>> 0xa197 >>> +#define NVEA_3D_CLASS >>> 0xa297 >>> #define GM107_3D_CLASS >>> 0xb097 >>> #define NV50_2D_CLASS >>> 0x502d >>> #define NVC0_2D_CLASS >>> 0x902d >>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> index cccfe2bba23d..95e5ef81cd79 100644 >>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> @@ -702,7 +702,14 @@ nvc0_screen_create(struct nouveau_device *dev) >>> obj_class = NVF0_3D_CLASS; >>> break; >>> case 0xe0: >>> - obj_class = NVE4_3D_CLASS; >>> + switch (dev->chipset) { >>> + case 0xea: >>> + obj_class = NVEA_3D_CLASS; >> >> >> Again, would be nice to be consistent with the way you set the ISA... >> perhaps change this to a >= as well? But I guess the two could be >> disconnected. Up to you, just thought I'd bring it up. > > > Right below we have the following being done: > > switch (dev->chipset) { > case 0xc8: > obj_class = NVC8_3D_CLASS; > break; > case 0xc1: > obj_class = NVC1_3D_CLASS; > break; > default: > obj_class = NVC0_3D_CLASS; > break; > } > > Shouldn't we try to be consistent with this more local example instead? Which is why I didn't insist. The situation with nvcx is a little different -- nvc8 (GF110) and nvc1 (GF108) are special, but e.g. nvce (GF114) and nvcf (GF116) want the nvc0 class. OTOH you're using >= 0xea as the metric for selecting SM35, so I was just pointing out the inconsistency. Of course there needn't be a 1:1 mapping between these things, and the likelihood of another 0xex chipset being released is fairly low. So: Reviewed-by: Ilia Mirkin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/2] nvc0: use SM35 ISA with GK20A
On Tue, May 27, 2014 at 3:03 AM, Alexandre Courbot wrote: > GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use > the GK110 path when this chip is detected. > > Signed-off-by: Alexandre Courbot Reviewed-by: Ilia Mirkin > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 2 +- > src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 +- > .../drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 15 > ++- > 3 files changed, 12 insertions(+), 7 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h > b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h > index bbb89d97932e..f829aac0bcc2 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h > @@ -91,7 +91,7 @@ struct nv50_ir_prog_symbol > #define NVISA_GF100_CHIPSET_C0 0xc0 > #define NVISA_GF100_CHIPSET_D0 0xd0 > #define NVISA_GK104_CHIPSET0xe0 > -#define NVISA_GK110_CHIPSET0xf0 > +#define NVISA_GK20A_CHIPSET0xea > #define NVISA_GM107_CHIPSET0x110 > > struct nv50_ir_prog_info > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > index b1f76cf80432..f69e6a183e19 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > @@ -3027,7 +3027,7 @@ TargetNVC0::createCodeEmitterNVC0(Program::Type type) > CodeEmitter * > TargetNVC0::getCodeEmitter(Program::Type type) > { > - if (chipset >= NVISA_GK110_CHIPSET) > + if (chipset >= NVISA_GK20A_CHIPSET) >return createCodeEmitterGK110(type); > return createCodeEmitterNVC0(type); > } > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp > index 064e7a2c63f9..963b6e47ddfc 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp > @@ -49,9 +49,12 @@ TargetNVC0::getBuiltinCode(const uint32_t **code, uint32_t > *size) const > { > switch (chipset & ~0xf) { > case 0xe0: > - *code = (const uint32_t *)&gk104_builtin_code[0]; > - *size = sizeof(gk104_builtin_code); > - break; > + if (chipset < NVISA_GK20A_CHIPSET) { > + *code = (const uint32_t *)&gk104_builtin_code[0]; > + *size = sizeof(gk104_builtin_code); > + break; > + } > + /* fall-through for GK20A */ > case 0xf0: > case 0x100: >*code = (const uint32_t *)&gk110_builtin_code[0]; > @@ -71,7 +74,9 @@ TargetNVC0::getBuiltinOffset(int builtin) const > > switch (chipset & ~0xf) { > case 0xe0: > - return gk104_builtin_offsets[builtin]; > + if (chipset < NVISA_GK20A_CHIPSET) > + return gk104_builtin_offsets[builtin]; > + /* fall-through for GK20A */ > case 0xf0: > case 0x100: >return gk110_builtin_offsets[builtin]; > @@ -235,7 +240,7 @@ TargetNVC0::getFileSize(DataFile file) const > { > switch (file) { > case FILE_NULL: return 0; > - case FILE_GPR: return (chipset >= NVISA_GK110_CHIPSET) ? 255 : > 63; > + case FILE_GPR: return (chipset >= NVISA_GK20A_CHIPSET) ? 255 : > 63; > case FILE_PREDICATE: return 7; > case FILE_FLAGS: return 1; > case FILE_ADDRESS: return 0; > -- > 1.9.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta/blit: Use gl_FragColor also in the msaa blit shader
On Tue, May 27, 2014 at 6:21 AM, Topi Pohjolainen wrote: > Fixes framebuffer_blit_functionality_multisampled_to_singlesampled_blit > es3 cts test on bdw. Also fixes this on ivb when ivb is forced to use > the meta path. > > No piglit regressions on IVB. > > Signed-off-by: Topi Pohjolainen > Cc: Eric Anholt > Cc: Matt Turner > Cc: Kenneth Graunke > Cc: Anuj Phogat > Cc: "10.2" > --- > src/mesa/drivers/common/meta_blit.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/common/meta_blit.c > b/src/mesa/drivers/common/meta_blit.c > index 84594d1..5929619 100644 > --- a/src/mesa/drivers/common/meta_blit.c > +++ b/src/mesa/drivers/common/meta_blit.c > @@ -273,7 +273,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, > samples); > } else { > ralloc_asprintf_append(&sample_resolve, > - " out_color = sample_%d_0 / %f;\n", > + " gl_FragColor = sample_%d_0 / %f;\n", > samples, (float)samples); > } >} > -- > 1.8.3.1 > This fixes msaa blits to multiple render targets for float buffers. Reviewed-by: Anuj Phogat ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/8] gallium: create TGSI_PROPERTY to disable viewport and clipping
From: Christoph Bumiller Marek v2: add a cap Signed-off-by: Marek Olšák --- src/gallium/auxiliary/tgsi/tgsi_strings.c| 1 + src/gallium/auxiliary/tgsi/tgsi_ureg.c | 16 src/gallium/auxiliary/tgsi/tgsi_ureg.h | 4 src/gallium/docs/source/screen.rst | 3 +++ src/gallium/docs/source/tgsi.rst | 9 + src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 ++ src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 3 ++- 19 files changed, 49 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index 34dec4f..713631f 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -122,6 +122,7 @@ const char *tgsi_property_names[TGSI_PROPERTY_COUNT] = "FS_DEPTH_LAYOUT", "VS_PROHIBIT_UCPS", "GS_INVOCATIONS", + "VS_POSITION_WINDOW_SPACE" }; const char *tgsi_type_names[5] = diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index 2bf93ee..bd0a3f7 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -173,6 +173,7 @@ struct ureg_program unsigned char property_fs_coord_pixel_center; /* = TGSI_FS_COORD_PIXEL_CENTER_* */ unsigned char property_fs_color0_writes_all_cbufs; /* = TGSI_FS_COLOR0_WRITES_ALL_CBUFS * */ unsigned char property_fs_depth_layout; /* TGSI_FS_DEPTH_LAYOUT */ + boolean property_vs_window_space_position; /* TGSI_VS_WINDOW_SPACE_POSITION */ unsigned nr_addrs; unsigned nr_preds; @@ -331,6 +332,13 @@ ureg_property_fs_depth_layout(struct ureg_program *ureg, ureg->property_fs_depth_layout = fs_depth_layout; } +void +ureg_property_vs_window_space_position(struct ureg_program *ureg, + boolean vs_window_space_position) +{ + ureg->property_vs_window_space_position = vs_window_space_position; +} + struct ureg_src ureg_DECL_fs_input_cyl_centroid(struct ureg_program *ureg, unsigned semantic_name, @@ -1508,6 +1516,14 @@ static void emit_decls( struct ureg_program *ureg ) ureg->property_fs_depth_layout); } + if (ureg->property_vs_window_space_position) { + assert(ureg->processor == TGSI_PROCESSOR_VERTEX); + + emit_property(ureg, +TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION, +ureg->property_vs_window_space_position); + } + if (ureg->processor == TGSI_PROCESSOR_VERTEX) { for (i = 0; i < UREG_MAX_INPUT; i++) { if (ureg->vs_inputs[i/32] & (1 << (i%32))) { diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h index a0a50b7..28edea6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h @@ -184,6 +184,10 @@ void ureg_property_fs_depth_layout(struct ureg_program *ureg, unsigned fs_depth_layout); +void +ureg_property_vs_window_space_position(struct ureg_program *ureg, + boolean vs_window_space_position); + /*** * Build shader declarations: diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index b292257..b8e356f 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -202,6 +202,9 @@ The integer capabilities: implemented. * ``PIPE_CAP_TEXTURE_GATHER_OFFSETS``: Whether the ``TG4`` instruction can accept 4 offsets. +* ``PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION``: Whether + TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION is supported, which disables clipping + and viewport transformation. .. _pipe_capf: diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 9500b9d..2ca3c3b 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2848,6 +2848,15 @@ input primitive. Each invocation will have a different TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to be 1. +VS_WINDOW_SPACE_POSITION +"" +If this property is set on the vertex shader, the TGSI_S
Re: [Mesa-dev] [PATCH] i915: add a missing NULL pointer check
On 05/27/2014 03:31 AM, Lubomir Rintel wrote: > mesaVisual can be NULL with configless context since this commit: > > commit 551d459af421a2eb937e9e16301bb64da4624f89 > Author: Neil Roberts > Date: Fri Mar 7 18:05:47 2014 + > > Add the EGL_MESA_configless_context extension > ... > Previously the i965 and i915 drivers were explicitly creating a zeroed > visual > whenever 0 is passed for the EGLConfig. > > We attempt to dereference the visual in i915 and now we don't create a > zeroed-out one one it crashes, breaking at least weston in an i915. There's > point in doing so as it would be zero anyway. I think you mean "There's no point". Yeah? > > Signed-off-by: Lubomir Rintel > --- > This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1100967 This should go in the commit message as Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1100967 This commit should also be tagged for the 10.2 branch: Cc: "10.2" > src/mesa/drivers/dri/i915/intel_context.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i915/intel_context.c > b/src/mesa/drivers/dri/i915/intel_context.c > index a6057d3..09fe371 100644 > --- a/src/mesa/drivers/dri/i915/intel_context.c > +++ b/src/mesa/drivers/dri/i915/intel_context.c > @@ -507,7 +507,7 @@ intelInitContext(struct intel_context *intel, > > _mesa_meta_init(ctx); > > - intel->hw_stencil = mesaVis->stencilBits && mesaVis->depthBits == 24; > + intel->hw_stencil = mesaVis && mesaVis->stencilBits && mesaVis->depthBits > == 24; Other then the complaints about the commit message, the code chnage looks good. Reviewed-by: Ian Romanick > intel->hw_stipple = 1; > > intel->RenderIndex = ~0; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta/blit: Use gl_FragColor also in the msaa blit shader
On 05/27/2014 06:21 AM, Topi Pohjolainen wrote: > Fixes framebuffer_blit_functionality_multisampled_to_singlesampled_blit > es3 cts test on bdw. Also fixes this on ivb when ivb is forced to use > the meta path. > > No piglit regressions on IVB. > > Signed-off-by: Topi Pohjolainen > Cc: Eric Anholt > Cc: Matt Turner > Cc: Kenneth Graunke > Cc: Anuj Phogat > Cc: "10.2" > --- > src/mesa/drivers/common/meta_blit.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/common/meta_blit.c > b/src/mesa/drivers/common/meta_blit.c > index 84594d1..5929619 100644 > --- a/src/mesa/drivers/common/meta_blit.c > +++ b/src/mesa/drivers/common/meta_blit.c > @@ -273,7 +273,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, > samples); > } else { > ralloc_asprintf_append(&sample_resolve, > - " out_color = sample_%d_0 / %f;\n", > + " gl_FragColor = sample_%d_0 / %f;\n", > samples, (float)samples); > } >} > Seems reasonable in the short term, and this gets: Reviewed-by: Kenneth Graunke Unfortunately, this doesn't fix MRT for integer data. In the single-sampled case, since we're directly copying data, we were read/copy/write data as "float" values, which actually contained the integer bits. Here, we can't do that since we need to process the actual integer data. I do wonder if we could use intBitsToFloat/uintBitsToFloat to stuff the integer bits in the float gl_FragColor output. Just a crazy idea. In the long term (post 10.2), I think we should draft an extension that allows you to do "layout(location = all)" on user-defined fragment shader outputs. (Or some similar syntax.) --Ken signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Set correct number of regs_written for MCS fetches.
regs_written is in units of virtual GRFs. --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 171f063..b51ecc1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1578,9 +1578,9 @@ fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, int sampler) inst->base_mrf = -1; inst->mlen = next.reg_offset * reg_width; inst->header_present = false; - inst->regs_written = 4 * reg_width; /* we only care about one reg of response, -* but the sampler always writes 4/8 -*/ + inst->regs_written = 4; /* we only care about one reg of response, +* but the sampler always writes 4/8 +*/ inst->sampler = sampler; return dest; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Set correct number of regs_written for MCS fetches.
Reviewed-by: Chris Forbes On Wed, May 28, 2014 at 10:27 AM, Matt Turner wrote: > regs_written is in units of virtual GRFs. > --- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index 171f063..b51ecc1 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > @@ -1578,9 +1578,9 @@ fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg > coordinate, int sampler) > inst->base_mrf = -1; > inst->mlen = next.reg_offset * reg_width; > inst->header_present = false; > - inst->regs_written = 4 * reg_width; /* we only care about one reg of > response, > -* but the sampler always writes 4/8 > -*/ > + inst->regs_written = 4; /* we only care about one reg of response, > +* but the sampler always writes 4/8 > +*/ > inst->sampler = sampler; > > return dest; > -- > 1.8.3.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] Gallium/dri2: implement blitImage
Signed-off-by: Axel Davy --- src/gallium/state_trackers/dri/drm/dri2.c | 43 --- 1 file changed, 40 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/dri/drm/dri2.c b/src/gallium/state_trackers/dri/drm/dri2.c index b5bc16b..f01257a 100644 --- a/src/gallium/state_trackers/dri/drm/dri2.c +++ b/src/gallium/state_trackers/dri/drm/dri2.c @@ -1251,6 +1251,42 @@ dri2_from_dma_bufs(__DRIscreen *screen, } static void +dri2_blit_image(__DRIcontext *context, __DRIimage *dst, __DRIimage *src, +int dstx0, int dsty0, int dstwidth, int dstheight, +int srcx0, int srcy0, int srcwidth, int srcheight) +{ + struct dri_context *ctx = dri_context(context); + struct pipe_context *pipe = ctx->st->pipe; + struct pipe_blit_info blit; + + if (!dst || !src) + return; + + memset(&blit, 0, sizeof(blit)); + blit.dst.resource = dst->texture; + blit.dst.box.x = dstx0; + blit.dst.box.y = dsty0; + blit.dst.box.width = dstwidth; + blit.dst.box.height = dstheight; + blit.dst.box.depth = 1; + blit.dst.format = dst->texture->format; + blit.src.resource = src->texture; + blit.src.box.x = srcx0; + blit.src.box.y = srcy0; + blit.src.box.width = srcwidth; + blit.src.box.height = srcheight; + blit.src.box.depth = 1; + blit.src.format = src->texture->format; + blit.mask = PIPE_MASK_RGBA; + blit.filter = PIPE_TEX_FILTER_NEAREST; + + pipe->blit(pipe, &blit); + + ctx->st->flush(ctx->st, 0, NULL); + pipe->flush_resource(pipe, dst->texture); +} + +static void dri2_destroy_image(__DRIimage *img) { pipe_resource_reference(&img->texture, NULL); @@ -1259,7 +1295,7 @@ dri2_destroy_image(__DRIimage *img) /* The extension is modified during runtime if DRI_PRIME is detected */ static __DRIimageExtension dri2ImageExtension = { -.base = { __DRI_IMAGE, 6 }, +.base = { __DRI_IMAGE, 9 }, .createImageFromName = dri2_create_image_from_name, .createImageFromRenderbuffer = dri2_create_image_from_renderbuffer, @@ -1271,6 +1307,9 @@ static __DRIimageExtension dri2ImageExtension = { .createImageFromNames = dri2_from_names, .fromPlanar = dri2_from_planar, .createImageFromTexture = dri2_create_from_texture, +.createImageFromFds = NULL, +.createImageFromDmaBufs = NULL, +.blitImage= dri2_blit_image, }; /* @@ -1325,8 +1364,6 @@ dri2_init_screen(__DRIscreen * sPriv) if (drmGetCap(sPriv->fd, DRM_CAP_PRIME, &cap) == 0 && (cap & DRM_PRIME_CAP_IMPORT)) { - - dri2ImageExtension.base.version = 8; dri2ImageExtension.createImageFromFds = dri2_from_fds; dri2ImageExtension.createImageFromDmaBufs = dri2_from_dma_bufs; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] loader: Use drirc device_id parameter in complement to DRI_PRIME
Signed-off-by: Axel Davy --- src/Makefile.am | 4 +++- src/loader/Makefile.am | 21 --- src/loader/loader.c | 27 + src/mesa/drivers/dri/common/xmlconfig.h | 2 ++ src/mesa/drivers/dri/common/xmlpool/t_options.h | 14 + 5 files changed, 64 insertions(+), 4 deletions(-) diff --git a/src/Makefile.am b/src/Makefile.am index 9d1580f..d4a7090 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -19,12 +19,14 @@ # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS # IN THE SOFTWARE. -SUBDIRS = gtest loader mapi +SUBDIRS = gtest mapi if NEED_OPENGL_COMMON SUBDIRS += glsl mesa endif +SUBDIRS += loader + if HAVE_DRI_GLX SUBDIRS += glx endif diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am index bddf7ac..3503a51 100644 --- a/src/loader/Makefile.am +++ b/src/loader/Makefile.am @@ -29,6 +29,23 @@ libloader_la_CPPFLAGS = \ $(VISIBILITY_CFLAGS) \ $(LIBUDEV_CFLAGS) +libloader_la_SOURCES = $(LOADER_C_FILES) +libloader_la_LIBADD = $() + +if NEED_OPENGL_COMMON +libloader_la_CPPFLAGS += \ + -I$(top_srcdir)/src/mesa/drivers/dri/common/ \ + -I$(top_srcdir)/src/mesa/ \ + -I$(top_srcdir)/src/mapi/ \ + -DUSE_DRICONF + +libloader_la_SOURCES += \ + $(top_srcdir)/src/mesa/drivers/dri/common/xmlconfig.c + +libloader_la_LIBADD += \ + -lexpat +endif + if !HAVE_LIBDRM libloader_la_CPPFLAGS += \ -D__NOT_HAVE_DRM_H @@ -36,8 +53,6 @@ else libloader_la_CPPFLAGS += \ $(LIBDRM_CFLAGS) -libloader_la_LIBADD = \ +libloader_la_LIBADD += \ $(LIBDRM_LIBS) endif - -libloader_la_SOURCES = $(LOADER_C_FILES) diff --git a/src/loader/loader.c b/src/loader/loader.c index 3d504f7..e9a8c46 100644 --- a/src/loader/loader.c +++ b/src/loader/loader.c @@ -74,6 +74,10 @@ #include #include #include +#ifdef USE_DRICONF +#include "xmlconfig.h" +#include "xmlpool.h" +#endif #endif #include "loader.h" @@ -310,9 +314,22 @@ drm_open_device(const char *device_name) return fd; } +#ifdef USE_DRICONF +const char __driConfigOptionsLoader[] = +DRI_CONF_BEGIN +DRI_CONF_SECTION_INITIALIZATION +DRI_CONF_DEVICE_ID_PATH_TAG() +DRI_CONF_SECTION_END +DRI_CONF_END; +#endif + int loader_get_user_preferred_fd(int default_fd, int *different_device) { struct udev *udev; +#ifdef USE_DRICONF + driOptionCache defaultInitOptions; + driOptionCache userInitOptions; +#endif const char *dri_prime = getenv("DRI_PRIME"); char *prime = NULL; int is_different_device = 0, fd = default_fd; @@ -324,6 +341,16 @@ int loader_get_user_preferred_fd(int default_fd, int *different_device) if (dri_prime) prime = strdup(dri_prime); +#ifdef USE_DRICONF + else { + driParseOptionInfo(&defaultInitOptions, __driConfigOptionsLoader); + driParseConfigFiles(&userInitOptions, &defaultInitOptions, 0, "loader"); + if (driCheckOption(&userInitOptions, "device_id", DRI_STRING)) + prime = strdup(driQueryOptionstr(&userInitOptions, "device_id")); + driDestroyOptionCache(&userInitOptions); + driDestroyOptionInfo(&defaultInitOptions); + } +#endif if (prime == NULL) { *different_device = 0; diff --git a/src/mesa/drivers/dri/common/xmlconfig.h b/src/mesa/drivers/dri/common/xmlconfig.h index 786caae..a4daa6b 100644 --- a/src/mesa/drivers/dri/common/xmlconfig.h +++ b/src/mesa/drivers/dri/common/xmlconfig.h @@ -30,6 +30,8 @@ #ifndef __XMLCONFIG_H #define __XMLCONFIG_H +#include + #define STRING_CONF_MAXLEN 25 /** \brief Option data types */ diff --git a/src/mesa/drivers/dri/common/xmlpool/t_options.h b/src/mesa/drivers/dri/common/xmlpool/t_options.h index 3bf804a..fc9e104 100644 --- a/src/mesa/drivers/dri/common/xmlpool/t_options.h +++ b/src/mesa/drivers/dri/common/xmlpool/t_options.h @@ -321,3 +321,17 @@ DRI_CONF_SECTION_BEGIN \ DRI_CONF_OPT_BEGIN_B(always_have_depth_buffer, def) \ DRI_CONF_DESC(en,gettext("Create all visuals with a depth buffer")) \ DRI_CONF_OPT_END + + + +/** + * \brief Initialization configuration options + */ +#define DRI_CONF_SECTION_INITIALIZATION \ +DRI_CONF_SECTION_BEGIN \ +DRI_CONF_DESC(en,gettext("Initialization")) + +#define DRI_CONF_DEVICE_ID_PATH_TAG(def) \ +DRI_CONF_OPT_BEGIN(device_id, string, def) \ +DRI_CONF_DESC(en,gettext("Define the graphic device to use if possible")) \ +DRI_CONF_OPT_END -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/11] gallium: Use base.stamp for all drawable invalidation checks.
From: Keith Packard Upper levels of the stack use base.stamp to tell when a drawable needs to be revalidated, but the dri state tracker was using dPriv->lastStamp. Those two, along with dri2.stamp, all get simultaneously incremented when a dri2 invalidate event was delivered, and so end up containing precisely the same value. This patch doesn't change the fact that there are three variables, rather it switches all of the tests to use only base.stamp, which is functionally equivalent to the previous code. Then, it passes base.stamp to the image loader getBuffers function so that the one which is checked will get updated by the XCB special event queue used by DRI3. Signed-off-by: Keith Packard Reviewed-by: Marek Olšák --- src/gallium/state_trackers/dri/common/dri_drawable.c | 4 ++-- src/gallium/state_trackers/dri/drm/dri2.c| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/dri/common/dri_drawable.c b/src/gallium/state_trackers/dri/common/dri_drawable.c index b7df053..b321415 100644 --- a/src/gallium/state_trackers/dri/common/dri_drawable.c +++ b/src/gallium/state_trackers/dri/common/dri_drawable.c @@ -73,7 +73,7 @@ dri_st_framebuffer_validate(struct st_context_iface *stctx, * checked. */ do { - lastStamp = drawable->dPriv->lastStamp; + lastStamp = drawable->base.stamp; new_stamp = (drawable->texture_stamp != lastStamp); if (new_stamp || new_mask || screen->broken_invalidate) { @@ -91,7 +91,7 @@ dri_st_framebuffer_validate(struct st_context_iface *stctx, drawable->texture_stamp = lastStamp; drawable->texture_mask = statt_mask; } - } while (lastStamp != drawable->dPriv->lastStamp); + } while (lastStamp != drawable->base.stamp); if (!out) return TRUE; diff --git a/src/gallium/state_trackers/dri/drm/dri2.c b/src/gallium/state_trackers/dri/drm/dri2.c index 2dc1d47..b5bc16b 100644 --- a/src/gallium/state_trackers/dri/drm/dri2.c +++ b/src/gallium/state_trackers/dri/drm/dri2.c @@ -590,7 +590,7 @@ dri_image_allocate_textures(struct dri_context *ctx, (*sPriv->image.loader->getBuffers) (dPriv, image_format, - &dPriv->dri2.stamp, + (uint32_t *) &drawable->base.stamp, dPriv->loaderPrivate, buffer_mask, &images); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] Wayland/egl: Add Gpu offloading support
Signed-off-by: Axel Davy --- src/egl/drivers/dri2/egl_dri2.h | 5 +- src/egl/drivers/dri2/platform_wayland.c | 171 ++-- 2 files changed, 142 insertions(+), 34 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h index 0dd9d69..4b70c48 100644 --- a/src/egl/drivers/dri2/egl_dri2.h +++ b/src/egl/drivers/dri2/egl_dri2.h @@ -195,6 +195,8 @@ struct dri2_egl_display int authenticated; int formats; uint32_t capabilities; + int is_different_gpu; + int blit_front; #endif }; @@ -247,7 +249,8 @@ struct dri2_egl_surface struct { #ifdef HAVE_WAYLAND_PLATFORM struct wl_buffer *wl_buffer; - __DRIimage *dri_image; + __DRIimage *rendering_image; + __DRIimage *shared_image; #endif #ifdef HAVE_DRM_PLATFORM struct gbm_bo *bo; diff --git a/src/egl/drivers/dri2/platform_wayland.c b/src/egl/drivers/dri2/platform_wayland.c index 537d26e..8d0a90c 100644 --- a/src/egl/drivers/dri2/platform_wayland.c +++ b/src/egl/drivers/dri2/platform_wayland.c @@ -238,8 +238,10 @@ dri2_wl_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf) for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) { if (dri2_surf->color_buffers[i].wl_buffer) wl_buffer_destroy(dri2_surf->color_buffers[i].wl_buffer); - if (dri2_surf->color_buffers[i].dri_image) - dri2_dpy->image->destroyImage(dri2_surf->color_buffers[i].dri_image); + if (dri2_surf->color_buffers[i].rendering_image) { + dri2_dpy->image->destroyImage(dri2_surf->color_buffers[i].rendering_image); + dri2_dpy->image->destroyImage(dri2_surf->color_buffers[i].shared_image); + } } for (i = 0; i < __DRI_BUFFER_COUNT; i++) @@ -272,11 +274,14 @@ dri2_wl_release_buffers(struct dri2_egl_surface *dri2_surf) if (dri2_surf->color_buffers[i].wl_buffer && !dri2_surf->color_buffers[i].locked) wl_buffer_destroy(dri2_surf->color_buffers[i].wl_buffer); - if (dri2_surf->color_buffers[i].dri_image) - dri2_dpy->image->destroyImage(dri2_surf->color_buffers[i].dri_image); + if (dri2_surf->color_buffers[i].rendering_image) { + dri2_dpy->image->destroyImage(dri2_surf->color_buffers[i].rendering_image); + dri2_dpy->image->destroyImage(dri2_surf->color_buffers[i].shared_image); + } dri2_surf->color_buffers[i].wl_buffer = NULL; - dri2_surf->color_buffers[i].dri_image = NULL; + dri2_surf->color_buffers[i].rendering_image = NULL; + dri2_surf->color_buffers[i].shared_image = NULL; dri2_surf->color_buffers[i].locked = 0; } @@ -292,6 +297,7 @@ get_back_bo(struct dri2_egl_surface *dri2_surf) { struct dri2_egl_display *dri2_dpy = dri2_egl_display(dri2_surf->base.Resource.Display); + unsigned int use_flags; int i; /* We always want to throttle to some event (either a frame callback or @@ -311,24 +317,45 @@ get_back_bo(struct dri2_egl_surface *dri2_surf) continue; if (dri2_surf->back == NULL) dri2_surf->back = &dri2_surf->color_buffers[i]; - else if (dri2_surf->back->dri_image == NULL) + else if (dri2_surf->back->rendering_image == NULL) dri2_surf->back = &dri2_surf->color_buffers[i]; } } if (dri2_surf->back == NULL) return -1; - if (dri2_surf->back->dri_image == NULL) { - dri2_surf->back->dri_image = + + if (dri2_surf->back->rendering_image == NULL) { + use_flags = __DRI_IMAGE_USE_SHARE; + + if (dri2_dpy->is_different_gpu) + use_flags |= __DRI_IMAGE_USE_LINEAR; + + dri2_surf->back->shared_image = dri2_dpy->image->createImage(dri2_dpy->dri_screen, dri2_surf->base.Width, dri2_surf->base.Height, __DRI_IMAGE_FORMAT_ARGB, - __DRI_IMAGE_USE_SHARE, + use_flags, NULL); + if (dri2_surf->back->shared_image == NULL) + return -1; + + if (dri2_dpy->blit_front) + dri2_surf->back->rendering_image = +dri2_dpy->image->createImage(dri2_dpy->dri_screen, + dri2_surf->base.Width, + dri2_surf->base.Height, + __DRI_IMAGE_FORMAT_ARGB, + 0, + NULL); + else + dri2_surf->back->rendering_image = +dri2_dpy->image->dupImage(dri2_surf->back->shared_image, NULL); + dri2_surf->back->age = 0; } - if (dri2_surf->back->dri_image == NULL) + if (dri2_surf->
[Mesa-dev] [PATCH 01/11] gallium: Add __DRIimageDriverExtension support to gallium
From: Keith Packard Provide the hook to pull textures out of __DRIimage structures and use them as renderbuffers. Signed-off-by: Keith Packard --- src/gallium/state_trackers/dri/drm/dri2.c | 238 +- 1 file changed, 230 insertions(+), 8 deletions(-) diff --git a/src/gallium/state_trackers/dri/drm/dri2.c b/src/gallium/state_trackers/dri/drm/dri2.c index 7dccc5e..cd9964c 100644 --- a/src/gallium/state_trackers/dri/drm/dri2.c +++ b/src/gallium/state_trackers/dri/drm/dri2.c @@ -498,6 +498,219 @@ dri2_release_buffer(__DRIscreen *sPriv, __DRIbuffer *bPriv) FREE(buffer); } +static void +dri_image_allocate_textures(struct dri_context *ctx, + struct dri_drawable *drawable, + const enum st_attachment_type *statts, + unsigned statts_count) +{ + __DRIdrawable *dPriv = drawable->dPriv; + __DRIscreen *sPriv = drawable->sPriv; + struct dri_screen *screen = dri_screen(sPriv); + unsigned int image_format = __DRI_IMAGE_FORMAT_NONE; + uint32_t buffer_mask = 0; + struct __DRIimageList images; + boolean alloc_depthstencil = FALSE; + int i, j; + struct pipe_resource templ; + + /* See if we need a depth-stencil buffer. */ + for (i = 0; i < statts_count; i++) { + if (statts[i] == ST_ATTACHMENT_DEPTH_STENCIL) { + alloc_depthstencil = TRUE; + break; + } + } + + /* Delete the resources we won't need. */ + for (i = 0; i < ST_ATTACHMENT_COUNT; i++) { + /* Don't delete the depth-stencil buffer, we can reuse it. */ + if (i == ST_ATTACHMENT_DEPTH_STENCIL && alloc_depthstencil) + continue; + + pipe_resource_reference(&drawable->textures[i], NULL); + } + + if (drawable->stvis.samples > 1) { + for (i = 0; i < ST_ATTACHMENT_COUNT; i++) { + boolean del = TRUE; + + /* Don't delete MSAA resources for the attachments which are enabled, + * we can reuse them. */ + for (j = 0; j < statts_count; j++) { +if (i == statts[j]) { + del = FALSE; + break; +} + } + + if (del) { +pipe_resource_reference(&drawable->msaa_textures[i], NULL); + } + } + } + + for (i = 0; i < statts_count; i++) { + enum pipe_format pf; + unsigned bind; + + dri_drawable_get_format(drawable, statts[i], &pf, &bind); + if (pf == PIPE_FORMAT_NONE) + continue; + + switch (pf) { + case PIPE_FORMAT_B5G6R5_UNORM: + image_format = __DRI_IMAGE_FORMAT_RGB565; + break; + case PIPE_FORMAT_B8G8R8X8_UNORM: + image_format = __DRI_IMAGE_FORMAT_XRGB; + break; + case PIPE_FORMAT_B8G8R8A8_UNORM: + image_format = __DRI_IMAGE_FORMAT_ARGB; + break; + case PIPE_FORMAT_R8G8B8A8_UNORM: + image_format = __DRI_IMAGE_FORMAT_ABGR; + break; + default: + image_format = __DRI_IMAGE_FORMAT_NONE; + break; + } + + switch (statts[i]) { + case ST_ATTACHMENT_FRONT_LEFT: + buffer_mask |= __DRI_IMAGE_BUFFER_FRONT; + break; + case ST_ATTACHMENT_BACK_LEFT: + buffer_mask |= __DRI_IMAGE_BUFFER_BACK; + break; + default: + continue; + } + } + + (*sPriv->image.loader->getBuffers) (dPriv, + image_format, + &dPriv->dri2.stamp, + dPriv->loaderPrivate, + buffer_mask, + &images); + + if (images.image_mask & __DRI_IMAGE_BUFFER_FRONT) { + struct pipe_resource *texture = images.front->texture; + + dPriv->w = texture->width0; + dPriv->h = texture->height0; + + pipe_resource_reference(&drawable->textures[ST_ATTACHMENT_FRONT_LEFT], texture); + } + + if (images.image_mask & __DRI_IMAGE_BUFFER_BACK) { + struct pipe_resource *texture = images.back->texture; + + dPriv->w = images.back->texture->width0; + dPriv->h = images.back->texture->height0; + + pipe_resource_reference(&drawable->textures[ST_ATTACHMENT_BACK_LEFT], texture); + } + + memset(&templ, 0, sizeof(templ)); + templ.target = screen->target; + templ.last_level = 0; + templ.width0 = dPriv->w; + templ.height0 = dPriv->h; + templ.depth0 = 1; + templ.array_size = 1; + + /* Allocate private MSAA colorbuffers. */ + if (drawable->stvis.samples > 1) { + for (i = 0; i < statts_count; i++) { + enum st_attachment_type att = statts[i]; + + if (att == ST_ATTACHMENT_DEPTH_STENCIL) +continue; + + if (drawable->textures[att]) { +templ.format = drawable->textures[att]->format; +templ.bind = drawable->textures[att]->bind; +templ.nr_samples = drawable->stvis.samples; + +/* Try to reuse the resource. + * (the oth
[Mesa-dev] [PATCH 02/11] gallium/dri: fix unsetting of format when encountering depth/stencil
From: Ben Skeggs Signed-off-by: Ben Skeggs Signed-off-by: Keith Packard --- src/gallium/state_trackers/dri/drm/dri2.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/src/gallium/state_trackers/dri/drm/dri2.c b/src/gallium/state_trackers/dri/drm/dri2.c index cd9964c..2dc1d47 100644 --- a/src/gallium/state_trackers/dri/drm/dri2.c +++ b/src/gallium/state_trackers/dri/drm/dri2.c @@ -558,6 +558,17 @@ dri_image_allocate_textures(struct dri_context *ctx, if (pf == PIPE_FORMAT_NONE) continue; + switch (statts[i]) { + case ST_ATTACHMENT_FRONT_LEFT: + buffer_mask |= __DRI_IMAGE_BUFFER_FRONT; + break; + case ST_ATTACHMENT_BACK_LEFT: + buffer_mask |= __DRI_IMAGE_BUFFER_BACK; + break; + default: + continue; + } + switch (pf) { case PIPE_FORMAT_B5G6R5_UNORM: image_format = __DRI_IMAGE_FORMAT_RGB565; @@ -575,17 +586,6 @@ dri_image_allocate_textures(struct dri_context *ctx, image_format = __DRI_IMAGE_FORMAT_NONE; break; } - - switch (statts[i]) { - case ST_ATTACHMENT_FRONT_LEFT: - buffer_mask |= __DRI_IMAGE_BUFFER_FRONT; - break; - case ST_ATTACHMENT_BACK_LEFT: - buffer_mask |= __DRI_IMAGE_BUFFER_BACK; - break; - default: - continue; - } } (*sPriv->image.loader->getBuffers) (dPriv, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] GLX/DRI3: Add Gpu offloading support.
Signed-off-by: Axel Davy --- src/glx/dri3_glx.c | 235 +++- src/glx/dri3_priv.h | 2 + 2 files changed, 200 insertions(+), 37 deletions(-) diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c index 3d8a662..54030bb 100644 --- a/src/glx/dri3_glx.c +++ b/src/glx/dri3_glx.c @@ -596,6 +596,7 @@ dri3_copy_sub_buffer(__GLXDRIdrawable *pdraw, int x, int y, { struct dri3_drawable *priv = (struct dri3_drawable *) pdraw; struct dri3_screen *psc = (struct dri3_screen *) pdraw->psc; + struct dri3_context *pcp = (struct dri3_context *) __glXGetCurrentContext(); xcb_connection_t *c = XGetXCBConnection(priv->base.psc->dpy); struct dri3_buffer *back = dri3_back_buffer(priv); @@ -605,6 +606,30 @@ dri3_copy_sub_buffer(__GLXDRIdrawable *pdraw, int x, int y, if (!priv->have_back || priv->is_pixmap) return; + /* When on a different gpu than the server, we use blitImage +* for the copies. Do the needed copies before flushing. +*/ + if (psc->is_different_gpu && pcp && pcp->driContext) { + /* Update the linear buffer part of the back buffer + * for the dri3_copy_area operation + */ + psc->image->blitImage(pcp->driContext, +back->linear_buffer, +back->image, +0, 0, back->width, +back->height, +0, 0, back->width, +back->height); + /* We use blitImage to update our fake front, + */ + if (priv->have_fake_front) + psc->image->blitImage(pcp->driContext, + dri3_fake_front_buffer(priv)->image, + back->image, + x, y, width, height, + x, y, width, height); + } + flags = __DRI2_FLUSH_DRAWABLE; if (flush) flags |= __DRI2_FLUSH_CONTEXT; @@ -622,7 +647,7 @@ dri3_copy_sub_buffer(__GLXDRIdrawable *pdraw, int x, int y, /* Refresh the fake front (if present) after we just damaged the real * front. */ - if (priv->have_fake_front) { + if (priv->have_fake_front && !psc->is_different_gpu) { dri3_fence_reset(c, dri3_fake_front_buffer(priv)); dri3_copy_area(c, dri3_back_buffer(priv)->pixmap, @@ -655,25 +680,62 @@ dri3_copy_drawable(struct dri3_drawable *priv, Drawable dest, Drawable src) static void dri3_wait_x(struct glx_context *gc) { + struct dri3_context *pcp = (struct dri3_context *) gc; struct dri3_drawable *priv = (struct dri3_drawable *) GetGLXDRIDrawable(gc->currentDpy, gc->currentDrawable); + struct dri3_screen *psc; + struct dri3_buffer *front; if (priv == NULL || !priv->have_fake_front) return; - dri3_copy_drawable(priv, dri3_fake_front_buffer(priv)->pixmap, priv->base.xDrawable); + psc = (struct dri3_screen *) priv->base.psc; + front = dri3_fake_front_buffer(priv); + + dri3_copy_drawable(priv, front->pixmap, priv->base.xDrawable); + + /* In the psc->is_different_gpu case, the linear buffer has been updated, +* but not yet the tiled buffer. +* Copy back to the tiled buffer we use for rendering. +* Note that we don't need flushing. +*/ + if (psc->is_different_gpu && pcp->driContext) + psc->image->blitImage(pcp->driContext, +front->image, +front->linear_buffer, +0, 0, front->width, +front->height, +0, 0, front->width, +front->height); } static void dri3_wait_gl(struct glx_context *gc) { + struct dri3_context *pcp = (struct dri3_context *) gc; struct dri3_drawable *priv = (struct dri3_drawable *) GetGLXDRIDrawable(gc->currentDpy, gc->currentDrawable); + struct dri3_screen *psc; + struct dri3_buffer *front; if (priv == NULL || !priv->have_fake_front) return; - dri3_copy_drawable(priv, priv->base.xDrawable, dri3_fake_front_buffer(priv)->pixmap); + psc = (struct dri3_screen *) priv->base.psc; + front = dri3_fake_front_buffer(priv); + + /* In the psc->is_different_gpu case, we update the linear_buffer +* before updating the real front. +*/ + if (psc->is_different_gpu && pcp->driContext) + psc->image->blitImage(pcp->driContext, +front->linear_buffer, +front->image, +0, 0, front->width, +front->height, +0, 0, front->width, +front->height); + dri3_copy_drawable(priv, priv->base.xDrawable, front->pixmap); } /** @@ -741,6 +803,7 @@ dri3_alloc_render_buffer(struct glx_screen *glx_screen, Drawable draw, struct dri3_screen *psc = (struct dri3_screen *) glx_screen; Display *dpy = glx_
[Mesa-dev] [PATCH 04/11] Loader: Add gpu selection code with DRI_PRIME.
Signed-off-by: Axel Davy --- src/loader/loader.c | 188 src/loader/loader.h | 7 ++ 2 files changed, 195 insertions(+) diff --git a/src/loader/loader.c b/src/loader/loader.c index 666d015..3d504f7 100644 --- a/src/loader/loader.c +++ b/src/loader/loader.c @@ -70,6 +70,10 @@ #ifdef HAVE_LIBUDEV #include #include +#include +#include +#include +#include #endif #include "loader.h" @@ -202,6 +206,180 @@ out: return (*chip_id >= 0); } +static char * +get_render_node_from_id_path_tag(struct udev *udev, + char *id_path_tag, + char another_tag) +{ + struct udev_device *device; + struct udev_enumerate *e; + struct udev_list_entry *entry; + const char *path, *id_path_tag_tmp; + char *path_res; + char found = 0; + UDEV_SYMBOL(struct udev_enumerate *, udev_enumerate_new, + (struct udev *)); + UDEV_SYMBOL(int, udev_enumerate_add_match_subsystem, + (struct udev_enumerate *, const char *)); + UDEV_SYMBOL(int, udev_enumerate_add_match_sysname, + (struct udev_enumerate *, const char *)); + UDEV_SYMBOL(int, udev_enumerate_scan_devices, + (struct udev_enumerate *)); + UDEV_SYMBOL(struct udev_list_entry *, udev_enumerate_get_list_entry, + (struct udev_enumerate *)); + UDEV_SYMBOL(struct udev_list_entry *, udev_list_entry_get_next, + (struct udev_list_entry *)); + UDEV_SYMBOL(const char *, udev_list_entry_get_name, + (struct udev_list_entry *)); + UDEV_SYMBOL(struct udev_device *, udev_device_new_from_syspath, + (struct udev *, const char *)); + UDEV_SYMBOL(const char *, udev_device_get_property_value, + (struct udev_device *, const char *)); + UDEV_SYMBOL(const char *, udev_device_get_devnode, + (struct udev_device *)); + UDEV_SYMBOL(struct udev_device *, udev_device_unref, + (struct udev_device *)); + + e = udev_enumerate_new(udev); + udev_enumerate_add_match_subsystem(e, "drm"); + udev_enumerate_add_match_sysname(e, "render*"); + + udev_enumerate_scan_devices(e); + udev_list_entry_foreach(entry, udev_enumerate_get_list_entry(e)) { + path = udev_list_entry_get_name(entry); + device = udev_device_new_from_syspath(udev, path); + if (!device) + continue; + id_path_tag_tmp = udev_device_get_property_value(device, "ID_PATH_TAG"); + if (id_path_tag_tmp) { + if ((!another_tag && !strcmp(id_path_tag, id_path_tag_tmp)) || + (another_tag && strcmp(id_path_tag, id_path_tag_tmp))) { +found = 1; +break; + } + } + udev_device_unref(device); + } + + if (found) { + path_res = strdup(udev_device_get_devnode(device)); + udev_device_unref(device); + return path_res; + } + return NULL; +} + +static char * +get_id_path_tag_from_fd(struct udev *udev, int fd) +{ + struct udev_device *device; + const char *id_path_tag_tmp; + char *id_path_tag; + UDEV_SYMBOL(const char *, udev_device_get_property_value, + (struct udev_device *, const char *)); + UDEV_SYMBOL(struct udev_device *, udev_device_unref, + (struct udev_device *)); + + device = udev_device_new_from_fd(udev, fd); + if (!device) + return NULL; + + id_path_tag_tmp = udev_device_get_property_value(device, "ID_PATH_TAG"); + if (!id_path_tag_tmp) + return NULL; + + id_path_tag = strdup(id_path_tag_tmp); + + udev_device_unref(device); + return id_path_tag; +} + +static int +drm_open_device(const char *device_name) +{ + int fd; +#ifdef O_CLOEXEC + fd = open(device_name, O_RDWR | O_CLOEXEC); + if (fd == -1 && errno == EINVAL) +#endif + { + fd = open(device_name, O_RDWR); + if (fd != -1) + fcntl(fd, F_SETFD, fcntl(fd, F_GETFD) | FD_CLOEXEC); + } + return fd; +} + +int loader_get_user_preferred_fd(int default_fd, int *different_device) +{ + struct udev *udev; + const char *dri_prime = getenv("DRI_PRIME"); + char *prime = NULL; + int is_different_device = 0, fd = default_fd; + char *default_device_id_path_tag; + char *device_name = NULL; + char another_tag = 0; + UDEV_SYMBOL(struct udev *, udev_new, (void)); + UDEV_SYMBOL(struct udev *, udev_unref, (struct udev *)); + + if (dri_prime) + prime = strdup(dri_prime); + + if (prime == NULL) { + *different_device = 0; + return default_fd; + } + + udev = udev_new(); + if (!udev) + goto prime_clean; + + default_device_id_path_tag = get_id_path_tag_from_fd(udev, default_fd); + if (!default_device_id_path_tag) + goto udev_clean; + + is_different_device = 1; + /* two format are supported: +* "1": choose any other card than the card used by default. +* id_path_tag: (for example "pci-_02_00_0") choose the card +* with this id_path_tag. +*/ + if
[Mesa-dev] [PATCH 11/11] Radeonsi: Use dma_copy when possible for si_blit.
This improves GLX DRI3 Gpu offloading significantly on cpu bound benchmarks particularly. No performance impact for DRI2 Gpu offloading. Signed-off-by: Axel Davy --- src/gallium/drivers/radeonsi/si_blit.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_blit.c b/src/gallium/drivers/radeonsi/si_blit.c index 6bc89ab..0e327b5 100644 --- a/src/gallium/drivers/radeonsi/si_blit.c +++ b/src/gallium/drivers/radeonsi/si_blit.c @@ -712,6 +712,21 @@ static void si_blit(struct pipe_context *ctx, return; } + if (info->src.box.width == info->dst.box.width && + info->src.format == info->dst.format && + info->src.box.width >=0 && + info->src.resource->nr_samples == 0 && + info->dst.resource->nr_samples == 0 && + info->src.box.depth == 1 && + info->dst.box.depth == 1 && + info->mask == PIPE_MASK_RGBA) { + sctx->b.dma_copy(ctx, info->dst.resource, info->dst.level, +info->dst.box.x, info->dst.box.y, +info->dst.box.z, info->src.resource, +info->src.level, &(info->src.box)); + return; + } + assert(util_blitter_is_blit_supported(sctx->blitter, info)); /* The driver doesn't decompress resources automatically while -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/11] GPU offloading for GLX DRI3 and EGL Wayland
Currently Gpu offloading is supported only with GLX DRI2. You need to set it up with xrandr, and you need a DDX loaded for the secondary device, even if it has no screen. You use the DRI_PRIME env var to set up which Gpu the application should use. Unfortunately it has some issues: Rendering to a pixmap is unsupported, and you need either to be fullscreen, or to be in a composited environment to not get a black content. These patches add Gpu offloading support to GLX DRI3 and EGL Wayland. Most of the limitations mentioned above are adressed. The first three patches add the __DRIimageDriverExtension support to gallium. It is needed for GLX DRI3 and to use a render-node for EGL Wayland. The next three patches add to Loader the support needed to change the device EGL Wayland or GLX DRI3 would use, if the user specifies another device via DRI_PRIME or via drirc device_id. For example if drirc contains: Then glmark2-wayland will use if possible the render-node of ID_PATH_TAG pci-_01_00_0. The ID_PATH_TAG of a device is filled by udev, and you can get it with the command: "udevadm info /dev/cardX" DRI_PRIME can be set both to "1" (meaning 'another device than the server') or to the ID_PATH_TAG. If render-nodes are not enabled, or if the indicated device doesn't exist, the server device is used. There is no need to have a DDX loaded for the device we want to use, nor you need to configure something with xrandr. Two patches add blitImage specification and implementation to gallium. It enables to blit two __DRIimage together. The GLX DRI3 Gpu offloading implementation allows to render to a pixmap, and will get in sync the back and front buffer with copies. There is no need to be in a composited environment. The last patch allows blitImage to use dma copy on radeonsi. It gives a performance boost, especially in cpu limited benchmarks. This makes for these cards Gpu offloading faster for DRI3 than for DRI2. Nouveau already has an optimised blitImage path because it uses the 2D engine to copy. Currently no official DDX release supports DRI3. It's, I think, mainly because the Present DDX side API makes it difficult to implement. If the DDX doesn't support Present, the Xserver uses a fallback with copies. When rendering on the server card, it's worse performance than Dri2, but since the copy is on the server card, it won't affect the performance of Gpu offloading. One proposition would be to add basic DRI3 support to the DDXs (without Present support for now), but Mesa would use it only when DRI2 fails, or if we want to use DRI3 Gpu offloading. Axel Davy (8): Loader: Add gpu selection code via DRI_PRIME. drirc: Add string support loader: Use drirc device_id parameter in complement to DRI_PRIME DRIimage: add blitImage to the specification Gallium/dri2: implement blitImage GLX/DRI3: Add Gpu offloading support. Wayland/egl: Add Gpu offloading support Radeonsi: Use dma_copy when possible for si_blit. Ben Skeggs (1): gallium/dri: fix unsetting of format when encountering depth/stencil Keith Packard (2): gallium: Add __DRIimageDriverExtension support to gallium gallium: Use base.stamp for all drawable invalidation checks. include/GL/internal/dri_interface.h| 11 +- src/Makefile.am| 4 +- src/egl/drivers/dri2/egl_dri2.h| 5 +- src/egl/drivers/dri2/platform_wayland.c| 171 ++--- src/gallium/drivers/radeonsi/si_blit.c | 15 ++ .../state_trackers/dri/common/dri_drawable.c | 4 +- src/gallium/state_trackers/dri/drm/dri2.c | 281 - src/glx/dri3_glx.c | 235 ++--- src/glx/dri3_priv.h| 2 + src/loader/Makefile.am | 21 +- src/loader/loader.c| 215 src/loader/loader.h| 7 + src/mesa/drivers/dri/common/xmlconfig.c| 29 +++ src/mesa/drivers/dri/common/xmlconfig.h| 9 +- src/mesa/drivers/dri/common/xmlpool/t_options.h| 14 + 15 files changed, 933 insertions(+), 90 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] DRIimage: add blitImage to the specification
It allows to blit two __DRIimages. Signed-off-by: Axel Davy --- include/GL/internal/dri_interface.h | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/include/GL/internal/dri_interface.h b/include/GL/internal/dri_interface.h index 4d57d0b..2ee3164 100644 --- a/include/GL/internal/dri_interface.h +++ b/include/GL/internal/dri_interface.h @@ -1005,7 +1005,7 @@ struct __DRIdri2ExtensionRec { * extensions. */ #define __DRI_IMAGE "DRI_IMAGE" -#define __DRI_IMAGE_VERSION 8 +#define __DRI_IMAGE_VERSION 9 /** * These formats correspond to the similarly named MESA_FORMAT_* @@ -1239,6 +1239,15 @@ struct __DRIimageExtensionRec { enum __DRIChromaSiting vert_siting, unsigned *error, void *loaderPrivate); + + /** +* Blit a part of a __DRIimage to another and flushes +* +* \since 9 +*/ + void (*blitImage)(__DRIcontext *context, __DRIimage *dst, __DRIimage *src, + int dstx0, int dsty0, int dstwidth, int dstheight, + int srcx0, int srcy0, int srcwidth, int srcheight); }; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/11] drirc: Add string support
Signed-off-by: Axel Davy --- src/mesa/drivers/dri/common/xmlconfig.c | 29 + src/mesa/drivers/dri/common/xmlconfig.h | 7 ++- 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/common/xmlconfig.c b/src/mesa/drivers/dri/common/xmlconfig.c index b95e452..d41d2b2 100644 --- a/src/mesa/drivers/dri/common/xmlconfig.c +++ b/src/mesa/drivers/dri/common/xmlconfig.c @@ -311,6 +311,11 @@ static GLboolean parseValue (driOptionValue *v, driOptionType type, case DRI_FLOAT: v->_float = strToF (string, &tail); break; + case DRI_STRING: + if (v->_string) + free (v->_string); + v->_string = strndup(string, STRING_CONF_MAXLEN); + return GL_TRUE; } if (tail == string) @@ -404,6 +409,8 @@ static GLboolean checkValue (const driOptionValue *v, const driOptionInfo *info) v->_float <= info->ranges[i].end._float) return GL_TRUE; break; + case DRI_STRING: + break; default: assert (0); /* should never happen */ } @@ -567,6 +574,8 @@ static void parseOptInfoAttr (struct OptInfoData *data, const XML_Char **attr) { cache->info[opt].type = DRI_INT; else if (!strcmp (attrVal[OA_TYPE], "float")) cache->info[opt].type = DRI_FLOAT; +else if (!strcmp (attrVal[OA_TYPE], "string")) + cache->info[opt].type = DRI_STRING; else XML_FATAL ("illegal type in option: %s.", attrVal[OA_TYPE]); @@ -867,6 +876,7 @@ static void optConfEndElem (void *userData, const XML_Char *name) { /** \brief Initialize an option cache based on info */ static void initOptionCache (driOptionCache *cache, const driOptionCache *info) { +GLuint i, size = 1 << info->tableSize; cache->info = info->info; cache->tableSize = info->tableSize; cache->values = malloc((1values, info->values, (1 info[i].type == DRI_STRING) + XSTRDUP(cache->values[i]._string, info->values[i]._string); +} } /** \brief Parse the named configuration file */ @@ -981,6 +995,13 @@ void driDestroyOptionInfo (driOptionCache *info) { } void driDestroyOptionCache (driOptionCache *cache) { +if (cache->info) { + GLuint i, size = 1 << cache->tableSize; + for (i = 0; i < size; ++i) { + if (cache->info[i].type == DRI_STRING) + free(cache->values[i]._string); + } +} free(cache->values); } @@ -1013,3 +1034,11 @@ GLfloat driQueryOptionf (const driOptionCache *cache, const char *name) { assert (cache->info[i].type == DRI_FLOAT); return cache->values[i]._float; } + +char *driQueryOptionstr (const driOptionCache *cache, const char *name) { +GLuint i = findOption (cache, name); + /* make sure the option is defined and has the correct type */ +assert (cache->info[i].name != NULL); +assert (cache->info[i].type == DRI_STRING); +return cache->values[i]._string; +} diff --git a/src/mesa/drivers/dri/common/xmlconfig.h b/src/mesa/drivers/dri/common/xmlconfig.h index d0ad42c..786caae 100644 --- a/src/mesa/drivers/dri/common/xmlconfig.h +++ b/src/mesa/drivers/dri/common/xmlconfig.h @@ -30,9 +30,11 @@ #ifndef __XMLCONFIG_H #define __XMLCONFIG_H +#define STRING_CONF_MAXLEN 25 + /** \brief Option data types */ typedef enum driOptionType { -DRI_BOOL, DRI_ENUM, DRI_INT, DRI_FLOAT +DRI_BOOL, DRI_ENUM, DRI_INT, DRI_FLOAT, DRI_STRING } driOptionType; /** \brief Option value */ @@ -40,6 +42,7 @@ typedef union driOptionValue { GLboolean _bool; /**< \brief Boolean */ GLint _int; /**< \brief Integer or Enum */ GLfloat _float; /**< \brief Floating-point */ +char *_string; /**< \brief String */ } driOptionValue; /** \brief Single range of valid values @@ -118,5 +121,7 @@ GLboolean driQueryOptionb (const driOptionCache *cache, const char *name); GLint driQueryOptioni (const driOptionCache *cache, const char *name); /** \brief Query a floating-point option value */ GLfloat driQueryOptionf (const driOptionCache *cache, const char *name); +/** \brief Query a string option value */ +char *driQueryOptionstr (const driOptionCache *cache, const char *name); #endif -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Fix repeated usage of rectangle texture coordinate scaling.
Previously, we set up new entries in the params[] array on every access of a rectangle texture. Unfortunately, we only reserve space for (2 * MaxTextureImageUnits) extra entries, so programs which accessed rectangle textures more times than that would write off the end of the array and likely crash. We don't really have a decent mapping between the index returned by _mesa_add_state_reference and our index into the params array, so we have to manually search for it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78691 Signed-off-by: Kenneth Graunke Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 171f063..be6b8ac 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1486,15 +1486,28 @@ fs_visitor::rescale_texcoord(ir_texture *ir, fs_reg coordinate, return coordinate; } - scale_x = fs_reg(UNIFORM, uniforms); - scale_y = fs_reg(UNIFORM, uniforms + 1); - GLuint index = _mesa_add_state_reference(params, (gl_state_index *)tokens); - stage_prog_data->param[uniforms++] = - &prog->Parameters->ParameterValues[index][0].f; - stage_prog_data->param[uniforms++] = - &prog->Parameters->ParameterValues[index][1].f; + /* Try to find existing copies of the texrect scale uniforms. */ + for (unsigned i = 0; i < uniforms; i++) { + if (stage_prog_data->param[i] == + &prog->Parameters->ParameterValues[index][0].f) { +scale_x = fs_reg(UNIFORM, i); +scale_y = fs_reg(UNIFORM, i + 1); +break; + } + } + + /* If we didn't already set them up, do so now. */ + if (scale_x.file == BAD_FILE) { + scale_x = fs_reg(UNIFORM, uniforms); + scale_y = fs_reg(UNIFORM, uniforms + 1); + + stage_prog_data->param[uniforms++] = +&prog->Parameters->ParameterValues[index][0].f; + stage_prog_data->param[uniforms++] = +&prog->Parameters->ParameterValues[index][1].f; + } } /* The 965 requires the EU to do the normalization of GL rectangle -- 1.9.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/4] glsl/tests/lower_jumps: fix generated sexpr's for loops
In 088494aa (as well as other commits in the series) Paul Berry modified the tests for lower_jumps to account for the fact that the s-expression for the loop IR instruction changed from (loop () () () () (statements...)) to (loop (statements...)), but he forgot to update create_test_cases.py which he used to create the tests. Fix that, so that now create_test_cases.py is synced with the generated tests. Signed-off-by: Connor Abbott --- src/glsl/tests/lower_jumps/create_test_cases.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/tests/lower_jumps/create_test_cases.py b/src/glsl/tests/lower_jumps/create_test_cases.py index 9974681..3be1079 100644 --- a/src/glsl/tests/lower_jumps/create_test_cases.py +++ b/src/glsl/tests/lower_jumps/create_test_cases.py @@ -126,7 +126,7 @@ def loop(statements): body. """ check_sexp(statements) -return [['loop', [], [], [], [], statements]] +return [['loop', statements]] def declare_temp(var_type, var_name): """Create a declaration of the form -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/4] glsl/tests: remove generated files
While trying to modify the lower_jumps unit tests to account for my SSA changes, I realized that the tests were not in sync with the file that generated them. There were two problems: -The *.expected files all had the same number of digits after the decimal place (6) whereas the *.out files had 1 digit in "0.0" and 6 digits in "1.00" when printing constants, which led to failures due to diffs like: - ((if (expression bool > (var_ref b) (constant float (0.00))) + ((if (expression bool > (var_ref b) (constant float (0.0))) -Loops were incorrect in the input files. This series fixes both problems, and then removes the generated tests so that stuff like this won't happen again. v2: actually generate the test files Connor Abbott (4): glsl: be more consistent about printing constants glsl/tests/lower_jumps: fix generated sexpr's for loops glsl/tests: call create_test_cases.py in optimization-test glsl/tests: remove generated tests from the repo src/glsl/ir_print_visitor.cpp | 2 +- src/glsl/tests/lower_jumps/.gitignore | 2 ++ src/glsl/tests/lower_jumps/create_test_cases.py| 2 +- src/glsl/tests/lower_jumps/lower_breaks_1.opt_test | 13 -- .../lower_jumps/lower_breaks_1.opt_test.expected | 5 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test | 15 --- .../lower_jumps/lower_breaks_2.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_3.opt_test | 17 - .../lower_jumps/lower_breaks_3.opt_test.expected | 8 -- src/glsl/tests/lower_jumps/lower_breaks_4.opt_test | 15 --- .../lower_jumps/lower_breaks_4.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_5.opt_test | 16 .../lower_jumps/lower_breaks_5.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_6.opt_test | 29 -- .../lower_jumps/lower_breaks_6.opt_test.expected | 29 -- .../lower_guarded_conditional_break.opt_test | 21 ...wer_guarded_conditional_break.opt_test.expected | 20 --- .../lower_jumps/lower_pulled_out_jump.opt_test | 28 - .../lower_pulled_out_jump.opt_test.expected| 25 --- .../tests/lower_jumps/lower_returns_1.opt_test | 12 - .../lower_jumps/lower_returns_1.opt_test.expected | 4 --- .../tests/lower_jumps/lower_returns_2.opt_test | 13 -- .../lower_jumps/lower_returns_2.opt_test.expected | 5 .../tests/lower_jumps/lower_returns_3.opt_test | 20 --- .../lower_jumps/lower_returns_3.opt_test.expected | 21 .../tests/lower_jumps/lower_returns_4.opt_test | 14 --- .../lower_jumps/lower_returns_4.opt_test.expected | 16 .../lower_jumps/lower_returns_main_false.opt_test | 17 - .../lower_returns_main_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_main_true.opt_test | 17 - .../lower_returns_main_true.opt_test.expected | 13 -- .../lower_jumps/lower_returns_sub_false.opt_test | 16 .../lower_returns_sub_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_sub_true.opt_test| 16 .../lower_returns_sub_true.opt_test.expected | 13 -- .../lower_jumps/lower_unified_returns.opt_test | 26 --- .../lower_unified_returns.opt_test.expected| 21 .../remove_continue_at_end_of_loop.opt_test| 13 -- ...emove_continue_at_end_of_loop.opt_test.expected | 5 ..._non_void_at_end_of_loop_lower_nothing.opt_test | 16 ..._at_end_of_loop_lower_nothing.opt_test.expected | 8 -- ...n_non_void_at_end_of_loop_lower_return.opt_test | 16 ...d_at_end_of_loop_lower_return.opt_test.expected | 19 -- ..._at_end_of_loop_lower_return_and_break.opt_test | 16 ...f_loop_lower_return_and_break.opt_test.expected | 19 -- ...turn_void_at_end_of_loop_lower_nothing.opt_test | 14 --- ..._at_end_of_loop_lower_nothing.opt_test.expected | 6 - ...eturn_void_at_end_of_loop_lower_return.opt_test | 14 --- ...d_at_end_of_loop_lower_return.opt_test.expected | 11 ..._at_end_of_loop_lower_return_and_break.opt_test | 14 --- ...f_loop_lower_return_and_break.opt_test.expected | 11 src/glsl/tests/optimization-test | 8 ++ 52 files changed, 12 insertions(+), 706 deletions(-) delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_3.opt_test delete mode 100644
[Mesa-dev] [PATCH v2 4/4] glsl/tests: remove generated tests from the repo
They were made unneccesary by the last commit. Signed-off-by: Connor Abbott --- src/glsl/tests/lower_jumps/.gitignore | 2 ++ src/glsl/tests/lower_jumps/lower_breaks_1.opt_test | 12 - .../lower_jumps/lower_breaks_1.opt_test.expected | 4 --- src/glsl/tests/lower_jumps/lower_breaks_2.opt_test | 15 --- .../lower_jumps/lower_breaks_2.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_3.opt_test | 17 - .../lower_jumps/lower_breaks_3.opt_test.expected | 8 -- src/glsl/tests/lower_jumps/lower_breaks_4.opt_test | 15 --- .../lower_jumps/lower_breaks_4.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_5.opt_test | 16 .../lower_jumps/lower_breaks_5.opt_test.expected | 7 -- src/glsl/tests/lower_jumps/lower_breaks_6.opt_test | 29 -- .../lower_jumps/lower_breaks_6.opt_test.expected | 29 -- .../lower_guarded_conditional_break.opt_test | 21 ...wer_guarded_conditional_break.opt_test.expected | 20 --- .../lower_jumps/lower_pulled_out_jump.opt_test | 28 - .../lower_pulled_out_jump.opt_test.expected| 25 --- .../tests/lower_jumps/lower_returns_1.opt_test | 12 - .../lower_jumps/lower_returns_1.opt_test.expected | 4 --- .../tests/lower_jumps/lower_returns_2.opt_test | 13 -- .../lower_jumps/lower_returns_2.opt_test.expected | 5 .../tests/lower_jumps/lower_returns_3.opt_test | 20 --- .../lower_jumps/lower_returns_3.opt_test.expected | 21 .../tests/lower_jumps/lower_returns_4.opt_test | 14 --- .../lower_jumps/lower_returns_4.opt_test.expected | 16 .../lower_jumps/lower_returns_main_false.opt_test | 17 - .../lower_returns_main_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_main_true.opt_test | 17 - .../lower_returns_main_true.opt_test.expected | 13 -- .../lower_jumps/lower_returns_sub_false.opt_test | 16 .../lower_returns_sub_false.opt_test.expected | 8 -- .../lower_jumps/lower_returns_sub_true.opt_test| 16 .../lower_returns_sub_true.opt_test.expected | 13 -- .../lower_jumps/lower_unified_returns.opt_test | 26 --- .../lower_unified_returns.opt_test.expected| 21 .../remove_continue_at_end_of_loop.opt_test| 12 - ...emove_continue_at_end_of_loop.opt_test.expected | 4 --- ..._non_void_at_end_of_loop_lower_nothing.opt_test | 16 ..._at_end_of_loop_lower_nothing.opt_test.expected | 8 -- ...n_non_void_at_end_of_loop_lower_return.opt_test | 16 ...d_at_end_of_loop_lower_return.opt_test.expected | 19 -- ..._at_end_of_loop_lower_return_and_break.opt_test | 16 ...f_loop_lower_return_and_break.opt_test.expected | 19 -- ...turn_void_at_end_of_loop_lower_nothing.opt_test | 13 -- ..._at_end_of_loop_lower_nothing.opt_test.expected | 5 ...eturn_void_at_end_of_loop_lower_return.opt_test | 13 -- ...d_at_end_of_loop_lower_return.opt_test.expected | 11 ..._at_end_of_loop_lower_return_and_break.opt_test | 13 -- ...f_loop_lower_return_and_break.opt_test.expected | 11 49 files changed, 2 insertions(+), 696 deletions(-) delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_3.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_4.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_4.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_5.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_5.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_breaks_6.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_breaks_6.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_guarded_conditional_break.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_guarded_conditional_break.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_pulled_out_jump.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_pulled_out_jump.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_returns_1.opt_test delete mode 100644 src/glsl/tests/lower_jumps/lower_returns_1.opt_test.expected delete mode 100755 src/glsl/tests/lower_jumps/lower_returns_2.opt_test delete m
[Mesa-dev] [PATCH v2 3/4] glsl/tests: call create_test_cases.py in optimization-test
This way, when someone modifies create_test_cases.py and forgets to commit their changes again, people will notice. v2: make sure we parse the right directories and check for existance the right way. Signed-off-by: Connor Abbott --- src/glsl/tests/optimization-test | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/tests/optimization-test b/src/glsl/tests/optimization-test index 8ca7776..bf15153 100755 --- a/src/glsl/tests/optimization-test +++ b/src/glsl/tests/optimization-test @@ -9,6 +9,14 @@ fi total=0 pass=0 +echo "== Generating tests ==" +for dir in tests/*/; do +if [ -e "${dir}create_test_cases.py" ]; then +cd $dir; python create_test_cases.py; cd .. +fi +echo "$dir" +done + echo "== Testing optimization passes ==" for test in `find . -iname '*.opt_test'`; do echo -n "Testing $test..." -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/4] glsl: be more consistent about printing constants
Make sure that we print the same number of digits when printing 0.0 as any other floating-point number. This will make generating expected output files for tests easier. To avoid breaking "make check," update the generated tests for lower_jumps before the next commit which will bring create_test_cases.py in line with them. Signed-off-by: Connor Abbott --- src/glsl/ir_print_visitor.cpp | 2 +- src/glsl/tests/lower_jumps/lower_breaks_1.opt_test | 3 +-- src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected| 3 +-- src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_4.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_5.opt_test.expected| 2 +- src/glsl/tests/lower_jumps/lower_breaks_6.opt_test.expected| 10 +- .../lower_guarded_conditional_break.opt_test.expected | 6 +++--- .../tests/lower_jumps/lower_pulled_out_jump.opt_test.expected | 8 src/glsl/tests/lower_jumps/lower_returns_3.opt_test.expected | 4 ++-- src/glsl/tests/lower_jumps/lower_returns_4.opt_test.expected | 2 +- .../lower_jumps/lower_returns_main_false.opt_test.expected | 4 ++-- .../lower_jumps/lower_returns_main_true.opt_test.expected | 4 ++-- .../lower_jumps/lower_returns_sub_false.opt_test.expected | 4 ++-- .../tests/lower_jumps/lower_returns_sub_true.opt_test.expected | 4 ++-- .../tests/lower_jumps/lower_unified_returns.opt_test.expected | 8 .../tests/lower_jumps/remove_continue_at_end_of_loop.opt_test | 3 +-- .../remove_continue_at_end_of_loop.opt_test.expected | 3 +-- .../return_void_at_end_of_loop_lower_nothing.opt_test | 3 +-- .../return_void_at_end_of_loop_lower_nothing.opt_test.expected | 3 +-- .../return_void_at_end_of_loop_lower_return.opt_test | 3 +-- .../return_void_at_end_of_loop_lower_return_and_break.opt_test | 3 +-- 23 files changed, 40 insertions(+), 48 deletions(-) diff --git a/src/glsl/ir_print_visitor.cpp b/src/glsl/ir_print_visitor.cpp index 0a7695a..a3d851e 100644 --- a/src/glsl/ir_print_visitor.cpp +++ b/src/glsl/ir_print_visitor.cpp @@ -430,7 +430,7 @@ void ir_print_visitor::visit(ir_constant *ir) case GLSL_TYPE_FLOAT: if (ir->value.f[i] == 0.0f) /* 0.0 == -0.0, so print with %f to get the proper sign. */ - fprintf(f, "%.1f", ir->value.f[i]); + fprintf(f, "%f", ir->value.f[i]); else if (fabs(ir->value.f[i]) < 0.01f) fprintf(f, "%a", ir->value.f[i]); else if (fabs(ir->value.f[i]) > 100.0f) diff --git a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test index b412ba8..e2d4ed1 100755 --- a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test +++ b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test @@ -8,6 +8,5 @@ ((declare (out) float a) (function main (signature void (parameters) - ((loop - ((assign (x) (var_ref a) (constant float (1.00))) break)) + ((loop ((assign (x) (var_ref a) (constant float (1.00))) break)) EOF diff --git a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected index 56ef3e4..270a43d 100644 --- a/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected +++ b/src/glsl/tests/lower_jumps/lower_breaks_1.opt_test.expected @@ -1,5 +1,4 @@ ((declare (out) float a) (function main (signature void (parameters) - ((loop - ((assign (x) (var_ref a) (constant float (1.00))) break)) + ((loop ((assign (x) (var_ref a) (constant float (1.00))) break)) diff --git a/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected b/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected index dc231f9..73a1d56 100644 --- a/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected +++ b/src/glsl/tests/lower_jumps/lower_breaks_2.opt_test.expected @@ -3,5 +3,5 @@ (signature void (parameters) ((loop ((assign (x) (var_ref a) (constant float (1.00))) - (if (expression bool > (var_ref b) (constant float (0.0))) (break) + (if (expression bool > (var_ref b) (constant float (0.00))) (break) ( diff --git a/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected b/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected index 8131b66..53d5392 100644 --- a/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected +++ b/src/glsl/tests/lower_jumps/lower_breaks_3.opt_test.expected @@ -3,6 +3,6 @@ (signature void (parameters) ((loop ((assign (x) (var_ref a) (constant float (1.00))) - (if (expression bool > (var_ref b) (constant float (0.0))) + (if (expression bool > (var_ref b) (constant float (0.00))) ((assign (x) (
[Mesa-dev] [PATCH 01/19] i965/fs: Add and use an fs_inst copy constructor.
Will get more complicated when fs_reg src becomes a pointer. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 + src/mesa/drivers/dri/i965/brw_fs.h | 1 + 2 files changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index bd77e0c..5b7d84f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -139,6 +139,11 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst, assert(src[2].reg_offset >= 0); } +fs_inst::fs_inst(const fs_inst &that) +{ + memcpy(this, &that, sizeof(that)); +} + #define ALU1(op)\ fs_inst *\ fs_visitor::op(fs_reg dst, fs_reg src0) \ diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 789f0b3..bda233c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -197,6 +197,7 @@ public: fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1); fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1,fs_reg src2); + fs_inst(const fs_inst &that); bool equals(fs_inst *inst) const; bool overwrites_reg(const fs_reg ®) const; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/19] i965/fs: load_payload on Gen7+.
Here's a respin of my load_payload series from mid-April with some feedback from Ken addressed and some bugs fixed. This series is available in my tree (with a few unrelated patches before it) git://people.freedesktop.org/~mattst88/mesa tex-sources This is a prep series for implementing SSA in the i965 fragment shader backend. I haven't done any testing on Gen < 7, but I think everything should still work afterward on those platforms. While we won't generate load_payload instructions from the texture visitor, we may generate them from the cubemap fixup and from CSE. Both should be safe since their destinations are generate purpose registers, not MRFs. I'd be nice to extend this to MRFs, since as I've planned it this work will be necessary for SSA in the fs backend. total instructions in shared programs: 1686122 -> 1677922 (-0.49%) instructions in affected programs: 635490 -> 627290 (-1.29%) GAINED:20 LOST: 0 A small handful of shaders gain an extra instruction or two. I'd really like to get some R-b tags this time around, since this series is necessary for the SSA work I've been anxious to get back to for a while. i965/fs: Add and use an fs_inst copy constructor. i965/fs: Disable fs_inst assignment operator. i965/fs: ralloc fs_inst's fs_reg sources. i965/fs: Store the number of sources an fs_inst has. i965/fs: Loop from 0 to inst->sources, not 0 to 3. i965/fs: Clean up fs_inst constructors. i965/fs: Add a function to resize fs_inst's sources i965/fs: Add fs_inst constructor that takes a list of Preparatory work and infrastructure for instructions with variable numbers of sources. i965/fs: Add SHADER_OPCODE_LOAD_PAYLOAD. i965/fs: Lower LOAD_PAYLOAD and clean up. i965/fs: Use LOAD_PAYLOAD in emit_texture_gen7(). i965/fs: Apply cube map array fixup and restore the The main implementation of load_payload on Gen7. i965/fs: Only consider real sources when comparing i965/fs: Emit load_payload instead of multiple MOVs for i965/fs: Support register coalescing on LOAD_PAYLOAD i965/fs: Perform CSE on load_payload instructions if i965/fs: Copy propagate from load_payload. A series of patches to teach our optimization passes about the new virtual instruction. i965/fs: Perform CSE on texture operations. i965/fs: Optimize SEL with the same sources into a MOV. A couple more optimizations enabled by this series. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/19] i965/fs: Disable fs_inst assignment operator.
The fs_reg src array is going to turn into a pointer and we'd rather not consider the implications of shallow copying fs_insts. Reviewed-by: Topi Pohjolainen --- src/mesa/drivers/dri/i965/brw_fs.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index bda233c..e079842 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -185,6 +185,8 @@ public: }; class fs_inst : public backend_instruction { + fs_inst &operator=(const fs_inst &); + public: DECLARE_RALLOC_CXX_OPERATORS(fs_inst) -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/19] i965/fs: Copy propagate from load_payload.
But only into non-load_payload instructions. Otherwise we would prevent register coalescing from combining identical payloads. --- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 22 ++ 1 file changed, 22 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index a1aff21..f00ccf2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -42,6 +42,7 @@ namespace { /* avoid conflict with opt_copy_propagation_elements */ struct acp_entry : public exec_node { fs_reg dst; fs_reg src; + bool is_from_load_payload; }; struct block_data { @@ -278,6 +279,10 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) if (entry->src.file == IMM) return false; + if (entry->is_from_load_payload && + inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) + return false; + /* Bail if inst is reading more than entry is writing. */ if ((inst->regs_read(this, arg) * inst->src[arg].stride * type_sz(inst->src[arg].type)) > type_sz(entry->dst.type)) @@ -545,7 +550,24 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block, acp_entry *entry = ralloc(copy_prop_ctx, acp_entry); entry->dst = inst->dst; entry->src = inst->src[0]; + entry->is_from_load_payload = false; acp[entry->dst.reg % ACP_HASH_SIZE].push_tail(entry); + } else if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD && + inst->dst.file == GRF) { + for (int i = 0; i < inst->sources; i++) { +if (inst->src[i].file == GRF) { + acp_entry *entry = ralloc(copy_prop_ctx, acp_entry); + entry->dst = inst->dst; + entry->dst.reg_offset = i; + entry->src = inst->src[i]; + entry->is_from_load_payload = true; + if (!entry->dst.equals(inst->src[i])) { + acp[entry->dst.reg % ACP_HASH_SIZE].push_tail(entry); + } else { + ralloc_free(entry); + } +} + } } } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/19] i965/fs: Emit load_payload instead of multiple MOVs for large VGRFs.
--- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 33 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index e40567f..5037579 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -177,15 +177,20 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb) entry->tmp = tmp; entry->generator->dst = tmp; - for (int i = 0; i < written; i++) { - fs_inst *copy = MOV(orig_dst, tmp); + fs_inst *copy; + if (written > 1) { + fs_reg *sources = ralloc_array(mem_ctx, fs_reg, written); + for (int i = 0; i < written; i++) { + sources[i] = tmp; + sources[i].reg_offset = i; + } + copy = LOAD_PAYLOAD(orig_dst, sources, written); + } else { + copy = MOV(orig_dst, tmp); copy->force_writemask_all = entry->generator->force_writemask_all; - entry->generator->insert_after(copy); - - orig_dst.reg_offset++; - tmp.reg_offset++; } + entry->generator->insert_after(copy); } /* dest <- temp */ @@ -195,15 +200,19 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb) assert(inst->dst.type == entry->tmp.type); fs_reg dst = inst->dst; fs_reg tmp = entry->tmp; - fs_inst *copy = NULL; - for (int i = 0; i < written; i++) { + fs_inst *copy; + if (written > 1) { + fs_reg *sources = ralloc_array(mem_ctx, fs_reg, written); + for (int i = 0; i < written; i++) { + sources[i] = tmp; + sources[i].reg_offset = i; + } + copy = LOAD_PAYLOAD(dst, sources, written); + } else { copy = MOV(dst, tmp); copy->force_writemask_all = inst->force_writemask_all; - inst->insert_before(copy); - - dst.reg_offset++; - tmp.reg_offset++; } + inst->insert_before(copy); } /* Set our iterator so that next time through the loop inst->next -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/19] i965/fs: Loop from 0 to inst->sources, not 0 to 3.
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 24 +++--- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +- .../dri/i965/brw_fs_dead_code_eliminate.cpp| 2 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- .../drivers/dri/i965/brw_fs_live_variables.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 6 +++--- .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 2 +- .../dri/i965/brw_fs_saturate_propagation.cpp | 2 +- .../drivers/dri/i965/brw_schedule_instructions.cpp | 10 - 10 files changed, 28 insertions(+), 28 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a9a8ac1..8b13683 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1472,7 +1472,7 @@ fs_visitor::assign_curb_setup() foreach_list(node, &this->instructions) { fs_inst *inst = (fs_inst *)node; - for (unsigned int i = 0; i < 3; i++) { + for (unsigned int i = 0; i < inst->sources; i++) { if (inst->src[i].file == UNIFORM) { int uniform_nr = inst->src[i].reg + inst->src[i].reg_offset; int constant_nr; @@ -1670,7 +1670,7 @@ fs_visitor::split_virtual_grfs() * the send is reading the whole thing. */ if (inst->is_send_from_grf()) { - for (int i = 0; i < 3; i++) { + for (int i = 0; i < inst->sources; i++) { if (inst->src[i].file == GRF) { split_grf[inst->src[i].reg] = false; } @@ -1703,7 +1703,7 @@ fs_visitor::split_virtual_grfs() inst->dst.reg_offset - 1); inst->dst.reg_offset = 0; } - for (int i = 0; i < 3; i++) { + for (int i = 0; i < inst->sources; i++) { if (inst->src[i].file == GRF && split_grf[inst->src[i].reg] && inst->src[i].reg_offset != 0) { @@ -1741,7 +1741,7 @@ fs_visitor::compact_virtual_grfs() if (inst->dst.file == GRF) remap_table[inst->dst.reg] = 0; - for (int i = 0; i < 3; i++) { + for (int i = 0; i < inst->sources; i++) { if (inst->src[i].file == GRF) remap_table[inst->src[i].reg] = 0; } @@ -1767,7 +1767,7 @@ fs_visitor::compact_virtual_grfs() if (inst->dst.file == GRF) inst->dst.reg = remap_table[inst->dst.reg]; - for (int i = 0; i < 3; i++) { + for (int i = 0; i < inst->sources; i++) { if (inst->src[i].file == GRF) inst->src[i].reg = remap_table[inst->src[i].reg]; } @@ -1807,7 +1807,7 @@ fs_visitor::move_uniform_array_access_to_pull_constants() foreach_list_safe(node, &this->instructions) { fs_inst *inst = (fs_inst *)node; - for (int i = 0 ; i < 3; i++) { + for (int i = 0 ; i < inst->sources; i++) { if (inst->src[i].file != UNIFORM || !inst->src[i].reladdr) continue; @@ -1857,7 +1857,7 @@ fs_visitor::assign_constant_locations() foreach_list(node, &this->instructions) { fs_inst *inst = (fs_inst *) node; - for (int i = 0; i < 3; i++) { + for (int i = 0; i < inst->sources; i++) { if (inst->src[i].file != UNIFORM) continue; @@ -1928,7 +1928,7 @@ fs_visitor::demote_pull_constants() foreach_list(node, &this->instructions) { fs_inst *inst = (fs_inst *)node; - for (int i = 0; i < 3; i++) { + for (int i = 0; i < inst->sources; i++) { if (inst->src[i].file != UNIFORM) continue; @@ -2180,7 +2180,7 @@ fs_visitor::compute_to_mrf() * MRF's source GRF that we wanted to rewrite, that stops us. */ bool interfered = false; -for (int i = 0; i < 3; i++) { +for (int i = 0; i < scan_inst->sources; i++) { if (scan_inst->src[i].file == GRF && scan_inst->src[i].reg == inst->src[0].reg && scan_inst->src[i].reg_offset == inst->src[0].reg_offset) { @@ -2319,7 +2319,7 @@ clear_deps_for_inst_src(fs_inst *inst, int dispatch_width, bool *deps, !inst->force_sechalf); /* Clear the flag for registers that actually got read (as expected). */ - for (int i = 0; i < 3; i++) { + for (int i = 0; i < inst->sources; i++) { int grf; if (inst->src[i].file == GRF) { grf = inst->src[i].reg; @@ -2697,7 +2697,7 @@ fs_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) } fprintf(file, ":%s, ", brw_reg_type_letters(inst->dst.type)); - for (int i = 0; i < 3 && inst->src[i].file != BAD_FILE; i++) { + for (int i = 0; i < inst->sources && inst->src[i].file != BAD_FILE; i++) { if (inst->src[i].negate) fprintf(file, "-"); if (inst->src[i].abs) @@ -2786,7 +2786,7 @@ fs_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) fprintf(file
[Mesa-dev] [PATCH 11/19] i965/fs: Use LOAD_PAYLOAD in emit_texture_gen7().
--- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 135 +++ 1 file changed, 73 insertions(+), 62 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index b51ecc1..10ec254 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1268,8 +1268,11 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, int reg_width = dispatch_width / 8; bool header_present = false; - fs_reg payload = fs_reg(this, glsl_type::float_type); - fs_reg next = payload; + fs_reg *sources = ralloc_array(mem_ctx, fs_reg, MAX_SAMPLER_MESSAGE_SIZE); + for (int i = 0; i < MAX_SAMPLER_MESSAGE_SIZE; i++) { + sources[i] = fs_reg(this, glsl_type::float_type); + } + int length = 0; if (ir->op == ir_tg4 || (ir->offset && ir->op != ir_txf) || sampler >= 16) { /* For general texture offsets (no txf workaround), we need a header to @@ -1283,12 +1286,13 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, * need to offset the Sampler State Pointer in the header. */ header_present = true; - next.reg_offset++; + sources[length] = reg_undef; + length++; } if (ir->shadow_comparitor) { - emit(MOV(next, shadow_c)); - next.reg_offset++; + emit(MOV(sources[length], shadow_c)); + length++; } bool has_nonconstant_offset = ir->offset && !ir->offset->as_constant(); @@ -1300,12 +1304,12 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, case ir_lod: break; case ir_txb: - emit(MOV(next, lod)); - next.reg_offset++; + emit(MOV(sources[length], lod)); + length++; break; case ir_txl: - emit(MOV(next, lod)); - next.reg_offset++; + emit(MOV(sources[length], lod)); + length++; break; case ir_txd: { no16("Gen7 does not support sample_d/sample_d_c in SIMD16 mode."); @@ -1314,21 +1318,21 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, * [hdr], [ref], x, dPdx.x, dPdy.x, y, dPdx.y, dPdy.y, z, dPdx.z, dPdy.z */ for (int i = 0; i < ir->coordinate->type->vector_elements; i++) { -emit(MOV(next, coordinate)); +emit(MOV(sources[length], coordinate)); coordinate.reg_offset++; -next.reg_offset++; +length++; /* For cube map array, the coordinate is (u,v,r,ai) but there are * only derivatives for (u, v, r). */ if (i < ir->lod_info.grad.dPdx->type->vector_elements) { -emit(MOV(next, lod)); +emit(MOV(sources[length], lod)); lod.reg_offset++; -next.reg_offset++; +length++; -emit(MOV(next, lod2)); +emit(MOV(sources[length], lod2)); lod2.reg_offset++; -next.reg_offset++; +length++; } } @@ -1336,45 +1340,45 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, break; } case ir_txs: - emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), lod)); - next.reg_offset++; + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), lod)); + length++; break; case ir_query_levels: - emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), fs_reg(0u))); - next.reg_offset++; + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), fs_reg(0u))); + length++; break; case ir_txf: /* Unfortunately, the parameters for LD are intermixed: u, lod, v, r. */ - emit(MOV(retype(next, BRW_REGISTER_TYPE_D), coordinate)); + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_D), coordinate)); coordinate.reg_offset++; - next.reg_offset++; + length++; - emit(MOV(retype(next, BRW_REGISTER_TYPE_D), lod)); - next.reg_offset++; + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_D), lod)); + length++; for (int i = 1; i < ir->coordinate->type->vector_elements; i++) { -emit(MOV(retype(next, BRW_REGISTER_TYPE_D), coordinate)); +emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_D), coordinate)); coordinate.reg_offset++; -next.reg_offset++; +length++; } coordinate_done = true; break; case ir_txf_ms: - emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), sample_index)); - next.reg_offset++; + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), sample_index)); + length++; /* data from the multisample control surface */ - emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), mcs)); - next.reg_offset++; + emit(MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), mcs)); + length++; /* there is no offsetting for this message; just copy in the integer * texture coordina
[Mesa-dev] [PATCH 06/19] i965/fs: Clean up fs_inst constructors.
In a fashion suggested by Ken. --- Allocating fewer sources than 3 is not handled in this series. src/mesa/drivers/dri/i965/brw_fs.cpp | 90 ++-- src/mesa/drivers/dri/i965/brw_fs.h | 17 --- 2 files changed, 32 insertions(+), 75 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 8b13683..f926d97 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -52,95 +52,53 @@ extern "C" { #include "glsl/glsl_types.h" void -fs_inst::init(int sources) +fs_inst::init(enum opcode opcode, const fs_reg &dst, fs_reg *src, int sources) { memset(this, 0, sizeof(*this)); + this->opcode = opcode; + this->dst = dst; + this->src = src; this->sources = sources; - this->src = ralloc_array(this, fs_reg, sources); this->conditional_mod = BRW_CONDITIONAL_NONE; - this->dst = reg_undef; - this->src[0] = reg_undef; - this->src[1] = reg_undef; - this->src[2] = reg_undef; - /* This will be the case for almost all instructions. */ this->regs_written = 1; this->writes_accumulator = false; } -fs_inst::fs_inst() +fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst) { - init(3); - this->opcode = BRW_OPCODE_NOP; + fs_reg *src = ralloc_array(this, fs_reg, 3); + init(opcode, dst, src, 0); } -fs_inst::fs_inst(enum opcode opcode) +fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0) { - init(3); - this->opcode = opcode; + fs_reg *src = ralloc_array(this, fs_reg, 3); + src[0] = src0; + init(opcode, dst, src, 1); } -fs_inst::fs_inst(enum opcode opcode, fs_reg dst) +fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0, + const fs_reg &src1) { - init(3); - this->opcode = opcode; - this->dst = dst; - - if (dst.file == GRF) - assert(dst.reg_offset >= 0); + fs_reg *src = ralloc_array(this, fs_reg, 3); + src[0] = src0; + src[1] = src1; + init(opcode, dst, src, 2); } -fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0) +fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0, + const fs_reg &src1, const fs_reg &src2) { - init(3); - this->opcode = opcode; - this->dst = dst; - this->src[0] = src0; - - if (dst.file == GRF) - assert(dst.reg_offset >= 0); - if (src[0].file == GRF) - assert(src[0].reg_offset >= 0); -} - -fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1) -{ - init(3); - this->opcode = opcode; - this->dst = dst; - this->src[0] = src0; - this->src[1] = src1; - - if (dst.file == GRF) - assert(dst.reg_offset >= 0); - if (src[0].file == GRF) - assert(src[0].reg_offset >= 0); - if (src[1].file == GRF) - assert(src[1].reg_offset >= 0); -} - -fs_inst::fs_inst(enum opcode opcode, fs_reg dst, -fs_reg src0, fs_reg src1, fs_reg src2) -{ - init(3); - this->opcode = opcode; - this->dst = dst; - this->src[0] = src0; - this->src[1] = src1; - this->src[2] = src2; - - if (dst.file == GRF) - assert(dst.reg_offset >= 0); - if (src[0].file == GRF) - assert(src[0].reg_offset >= 0); - if (src[1].file == GRF) - assert(src[1].reg_offset >= 0); - if (src[2].file == GRF) - assert(src[2].reg_offset >= 0); + fs_reg *src = ralloc_array(this, fs_reg, 3); + src[0] = src0; + src[1] = src1; + src[2] = src2; + init(opcode, dst, src, 3); } fs_inst::fs_inst(const fs_inst &that) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 4f8a2b2..fb68923 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -190,15 +190,14 @@ class fs_inst : public backend_instruction { public: DECLARE_RALLOC_CXX_OPERATORS(fs_inst) - void init(int sources); - - fs_inst(); - fs_inst(enum opcode opcode); - fs_inst(enum opcode opcode, fs_reg dst); - fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0); - fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1); - fs_inst(enum opcode opcode, fs_reg dst, - fs_reg src0, fs_reg src1,fs_reg src2); + void init(enum opcode opcode, const fs_reg &dst, fs_reg *src, int sources); + + fs_inst(enum opcode opcode = BRW_OPCODE_NOP, const fs_reg &dst = reg_undef); + fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0); + fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0, + const fs_reg &src1); + fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0, + const fs_reg &src1, const fs_reg &src2); fs_inst(const fs_inst &that); bool equals(fs_inst *inst) const; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/19] i965/fs: Add fs_inst constructor that takes a list of sources.
Also add an emit() function that calls it. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 12 src/mesa/drivers/dri/i965/brw_fs.h | 3 +++ 2 files changed, 15 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 1f174d3..c86cb42 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -101,6 +101,11 @@ fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0, init(opcode, dst, src, 3); } +fs_inst::fs_inst(enum opcode opcode, const fs_reg &dst, fs_reg src[], int sources) +{ + init(opcode, dst, src, sources); +} + fs_inst::fs_inst(const fs_inst &that) { memcpy(this, &that, sizeof(that)); @@ -740,6 +745,13 @@ fs_visitor::emit(enum opcode opcode, fs_reg dst, return emit(new(mem_ctx) fs_inst(opcode, dst, src0, src1, src2)); } +fs_inst * +fs_visitor::emit(enum opcode opcode, fs_reg dst, + fs_reg src[], int sources) +{ + return emit(new(mem_ctx) fs_inst(opcode, dst, src, sources)); +} + void fs_visitor::push_force_uncompressed() { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index b7cfb3c..527c3f3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -198,6 +198,7 @@ public: const fs_reg &src1); fs_inst(enum opcode opcode, const fs_reg &dst, const fs_reg &src0, const fs_reg &src1, const fs_reg &src2); + fs_inst(enum opcode opcode, const fs_reg &dst, fs_reg src[], int sources); fs_inst(const fs_inst &that); void resize_sources(uint8_t num_sources); @@ -295,6 +296,8 @@ public: fs_inst *emit(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1); fs_inst *emit(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2); + fs_inst *emit(enum opcode opcode, fs_reg dst, + fs_reg src[], int sources); fs_inst *MOV(fs_reg dst, fs_reg src); fs_inst *NOT(fs_reg dst, fs_reg src); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/19] i965/fs: Store the number of sources an fs_inst has.
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 21 +++-- src/mesa/drivers/dri/i965/brw_fs.h | 3 ++- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index b06966a..a9a8ac1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -52,11 +52,12 @@ extern "C" { #include "glsl/glsl_types.h" void -fs_inst::init() +fs_inst::init(int sources) { memset(this, 0, sizeof(*this)); - this->src = ralloc_array(this, fs_reg, 3); + this->sources = sources; + this->src = ralloc_array(this, fs_reg, sources); this->conditional_mod = BRW_CONDITIONAL_NONE; @@ -73,19 +74,19 @@ fs_inst::init() fs_inst::fs_inst() { - init(); + init(3); this->opcode = BRW_OPCODE_NOP; } fs_inst::fs_inst(enum opcode opcode) { - init(); + init(3); this->opcode = opcode; } fs_inst::fs_inst(enum opcode opcode, fs_reg dst) { - init(); + init(3); this->opcode = opcode; this->dst = dst; @@ -95,7 +96,7 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst) fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0) { - init(); + init(3); this->opcode = opcode; this->dst = dst; this->src[0] = src0; @@ -108,7 +109,7 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0) fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1) { - init(); + init(3); this->opcode = opcode; this->dst = dst; this->src[0] = src0; @@ -125,7 +126,7 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1) fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2) { - init(); + init(3); this->opcode = opcode; this->dst = dst; this->src[0] = src0; @@ -146,9 +147,9 @@ fs_inst::fs_inst(const fs_inst &that) { memcpy(this, &that, sizeof(that)); - this->src = ralloc_array(this, fs_reg, 3); + this->src = ralloc_array(this, fs_reg, that.sources); - for (int i = 0; i < 3; i++) + for (int i = 0; i < that.sources; i++) this->src[i] = that.src[i]; } diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 11a5c7c..4f8a2b2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -190,7 +190,7 @@ class fs_inst : public backend_instruction { public: DECLARE_RALLOC_CXX_OPERATORS(fs_inst) - void init(); + void init(int sources); fs_inst(); fs_inst(enum opcode opcode); @@ -216,6 +216,7 @@ public: uint32_t texture_offset; /**< Texture offset bitfield */ uint32_t offset; /* spill/unspill offset */ + uint8_t sources; /**< Number of fs_reg sources. */ uint8_t conditional_mod; /**< BRW_CONDITIONAL_* */ /* Chooses which flag subregister (f0.0 or f0.1) is used for conditional -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/19] i965/fs: Lower LOAD_PAYLOAD and clean up.
Clean up with with register_coalesce()/dead_code_eliminate(). --- src/mesa/drivers/dri/i965/brw_fs.cpp | 42 src/mesa/drivers/dri/i965/brw_fs.h | 1 + 2 files changed, 43 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 0856b6b..c0af6d0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2574,6 +2574,43 @@ fs_visitor::lower_uniform_pull_constant_loads() } } +bool +fs_visitor::lower_load_payload() +{ + bool progress = false; + + foreach_list_safe(node, &instructions) { + fs_inst *inst = (fs_inst *)node; + + if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) { + fs_reg dst = inst->dst; + + /* The generator creates the message header if present, which is in + * the first register of the message payload. + */ + if (!inst->header_present) { +inst->insert_before(MOV(dst, inst->src[0])); + } else { +assert(inst->src[0].file == BAD_FILE); + } + dst.reg_offset++; + + for (int i = 1; i < inst->sources; i++) { +inst->insert_before(MOV(dst, inst->src[i])); +dst.reg_offset++; + } + + inst->remove(); + progress = true; + } + } + + if (progress) + invalidate_live_intervals(); + + return progress; +} + void fs_visitor::dump_instructions() { @@ -3071,6 +3108,11 @@ fs_visitor::run() progress = OPT(compute_to_mrf) || progress; } while (progress); + if (lower_load_payload()) { + register_coalesce(); + dead_code_eliminate(); + } + lower_uniform_pull_constant_loads(); assign_curb_setup(); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index d0e459c..2b60945 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -391,6 +391,7 @@ public: void fail(const char *msg, ...); void no16(const char *msg, ...); void lower_uniform_pull_constant_loads(); + bool lower_load_payload(); void push_force_uncompressed(); void pop_force_uncompressed(); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 19/19] i965/fs: Optimize SEL with the same sources into a MOV.
instructions in affected programs: 474 -> 462 (-2.53%) --- src/mesa/drivers/dri/i965/brw_fs.cpp | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c0af6d0..453683c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2021,7 +2021,12 @@ fs_visitor::opt_algebraic() } break; case BRW_OPCODE_SEL: - if (inst->saturate && inst->src[1].file == IMM) { + if (inst->src[0].equals(inst->src[1])) { +inst->opcode = BRW_OPCODE_MOV; +inst->src[1] = reg_undef; +inst->predicate = BRW_PREDICATE_NONE; +progress = true; + } else if (inst->saturate && inst->src[1].file == IMM) { switch (inst->conditional_mod) { case BRW_CONDITIONAL_LE: case BRW_CONDITIONAL_L: -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/19] i965/fs: Add SHADER_OPCODE_LOAD_PAYLOAD.
Will be used to simplify the handling of large virtual GRFs in SSA form. --- src/mesa/drivers/dri/i965/brw_defines.h| 2 ++ src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++ src/mesa/drivers/dri/i965/brw_fs.h | 2 ++ src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 3 +++ 5 files changed, 20 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index c38e447..34467e9 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -797,6 +797,8 @@ enum opcode { SHADER_OPCODE_TG4, SHADER_OPCODE_TG4_OFFSET, + SHADER_OPCODE_LOAD_PAYLOAD, + SHADER_OPCODE_SHADER_TIME_ADD, SHADER_OPCODE_UNTYPED_ATOMIC, diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c86cb42..0856b6b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -241,6 +241,16 @@ fs_visitor::CMP(fs_reg dst, fs_reg src0, fs_reg src1, uint32_t condition) return inst; } +fs_inst * +fs_visitor::LOAD_PAYLOAD(const fs_reg &dst, fs_reg *src, int sources) +{ + fs_inst *inst = new(mem_ctx) fs_inst(SHADER_OPCODE_LOAD_PAYLOAD, dst, src, +sources); + inst->regs_written = sources; + + return inst; +} + exec_list fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_reg &dst, const fs_reg &surf_index, diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 527c3f3..d0e459c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -338,6 +338,8 @@ public: fs_inst *end, const fs_reg ®); + fs_inst *LOAD_PAYLOAD(const fs_reg &dst, fs_reg *src, int sources); + exec_list VARYING_PULL_CONSTANT_LOAD(const fs_reg &dst, const fs_reg &surf_index, const fs_reg &varying_offset, diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index a5be0ec..26b963b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1713,6 +1713,9 @@ fs_generator::generate_code(exec_list *instructions, generate_discard_jump(inst); break; + case SHADER_OPCODE_LOAD_PAYLOAD: + break; + case SHADER_OPCODE_SHADER_TIME_ADD: generate_shader_time_add(inst, src[0], src[1], src[2]); break; diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 254afef..b35862c 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -452,6 +452,9 @@ brw_instruction_name(enum opcode op) case SHADER_OPCODE_TG4_OFFSET: return "tg4_offset"; + case SHADER_OPCODE_LOAD_PAYLOAD: + return "load_payload"; + case SHADER_OPCODE_GEN4_SCRATCH_READ: return "gen4_scratch_read"; case SHADER_OPCODE_GEN4_SCRATCH_WRITE: -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/19] i965/fs: Perform CSE on load_payload instructions if it's not a copy.
Since CSE creates instructions, if we let CSE generate things register coalescing can't remove, bad things will happen. Only let CSE combine non-copy load_payloads. E.g., allow CSE to handle this load_payload vgrf4+0, vgrf5, vgrf6 but not this load_payload vgrf4+0, vgrf5+0, vgrf5+1 --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 5037579..75c6aab 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -43,6 +43,22 @@ struct aeb_entry : public exec_node { } static bool +is_copy_payload(const fs_inst *inst) +{ + const int reg = inst->src[0].reg; + if (inst->src[0].reg_offset != 0) + return false; + + for (int i = 1; i < inst->sources; i++) { + if (inst->src[i].reg != reg || + inst->src[i].reg_offset != i) { + return false; + } + } + return true; +} + +static bool is_expression(const fs_inst *const inst) { switch (inst->opcode) { @@ -73,6 +89,8 @@ is_expression(const fs_inst *const inst) case FS_OPCODE_CINTERP: case FS_OPCODE_LINTERP: return true; + case SHADER_OPCODE_LOAD_PAYLOAD: + return !is_copy_payload(inst); default: return false; } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 18/19] i965/fs: Perform CSE on texture operations.
Helps Unigine Tropics and some (old) gstreamer shaders in shader-db. instructions in affected programs: 792 -> 744 (-6.06%) --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 75c6aab..6e36b8c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -92,7 +92,7 @@ is_expression(const fs_inst *const inst) case SHADER_OPCODE_LOAD_PAYLOAD: return !is_copy_payload(inst); default: - return false; + return inst->is_tex(); } } @@ -142,6 +142,16 @@ instructions_match(fs_inst *a, fs_inst *b) a->conditional_mod == b->conditional_mod && a->dst.type == b->dst.type && a->sources == b->sources && + (a->is_tex() ? (a->texture_offset == b->texture_offset && + a->mlen == b->mlen && + a->regs_written == b->regs_written && + a->base_mrf == b->base_mrf && + a->sampler == b->sampler && + a->target == b->target && + a->eot == b->eot && + a->header_present == b->header_present && + a->shadow_compare == b->shadow_compare) + : true) && operands_match(a, b); } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/19] i965/fs: Apply cube map array fixup and restore the payload.
So that we don't have partial writes to a large VGRF. Will be cleaned up by register coalescing. --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 10ec254..b94141a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1726,7 +1726,20 @@ fs_visitor::visit(ir_texture *ir) type->sampler_array) { fs_reg depth = dst; depth.reg_offset = 2; - emit_math(SHADER_OPCODE_INT_QUOTIENT, depth, depth, fs_reg(6)); + fs_reg fixed_depth = fs_reg(this, glsl_type::int_type); + emit_math(SHADER_OPCODE_INT_QUOTIENT, fixed_depth, depth, fs_reg(6)); + + fs_reg *fixed_payload = ralloc_array(mem_ctx, fs_reg, inst->regs_written); + fs_reg d = dst; + for (int i = 0; i < inst->regs_written; i++) { +if (i == 2) { + fixed_payload[i] = fixed_depth; +} else { + d.reg_offset = i; + fixed_payload[i] = d; +} + } + emit(LOAD_PAYLOAD(dst, fixed_payload, inst->regs_written)); } } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/19] i965/fs: Support register coalescing on LOAD_PAYLOAD operands.
--- .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 59 ++ 1 file changed, 49 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp index a0aa169..0aa4b3e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp @@ -46,7 +46,14 @@ static bool is_nop_mov(const fs_inst *inst) { - if (inst->opcode == BRW_OPCODE_MOV) { + if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) { + for (int i = 0; i < inst->sources; i++) { + if (!inst->dst.equals(inst->src[i])) { +return false; + } + } + return true; + } else if (inst->opcode == BRW_OPCODE_MOV) { return inst->dst.equals(inst->src[0]); } @@ -54,9 +61,26 @@ is_nop_mov(const fs_inst *inst) } static bool +is_copy_payload(const fs_inst *inst) +{ + const int reg = inst->src[0].reg; + if (inst->src[0].reg_offset != 0) + return false; + + for (int i = 1; i < inst->sources; i++) { + if (inst->src[i].reg != reg || + inst->src[i].reg_offset != i) { + return false; + } + } + return true; +} + +static bool is_coalesce_candidate(const fs_inst *inst, const int *virtual_grf_sizes) { - if (inst->opcode != BRW_OPCODE_MOV || + if ((inst->opcode != BRW_OPCODE_MOV && +inst->opcode != SHADER_OPCODE_LOAD_PAYLOAD) || inst->is_partial_write() || inst->saturate || inst->src[0].file != GRF || @@ -72,6 +96,12 @@ is_coalesce_candidate(const fs_inst *inst, const int *virtual_grf_sizes) virtual_grf_sizes[inst->dst.reg]) return false; + if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) { + if (!is_copy_payload(inst)) { + return false; + } + } + return true; } @@ -161,10 +191,18 @@ fs_visitor::register_coalesce() if (reg_to != inst->dst.reg) continue; - const int offset = inst->src[0].reg_offset; - reg_to_offset[offset] = inst->dst.reg_offset; - mov[offset] = inst; - channels_remaining--; + if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) { + for (int i = 0; i < src_size; i++) { +reg_to_offset[i] = i; + } + mov[0] = inst; + channels_remaining -= inst->sources; + } else { + const int offset = inst->src[0].reg_offset; + reg_to_offset[offset] = inst->dst.reg_offset; + mov[offset] = inst; + channels_remaining--; + } if (channels_remaining) continue; @@ -186,15 +224,16 @@ fs_visitor::register_coalesce() continue; progress = true; + bool was_load_payload = inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD; for (int i = 0; i < src_size; i++) { if (mov[i]) { mov[i]->opcode = BRW_OPCODE_NOP; mov[i]->conditional_mod = BRW_CONDITIONAL_NONE; mov[i]->dst = reg_undef; -mov[i]->src[0] = reg_undef; -mov[i]->src[1] = reg_undef; -mov[i]->src[2] = reg_undef; +for (int j = 0; j < mov[i]->sources; j++) { + mov[i]->src[j] = reg_undef; +} } } @@ -202,7 +241,7 @@ fs_visitor::register_coalesce() fs_inst *scan_inst = (fs_inst *)node; for (int i = 0; i < src_size; i++) { -if (mov[i]) { +if (mov[i] || was_load_payload) { if (scan_inst->dst.file == GRF && scan_inst->dst.reg == reg_from && scan_inst->dst.reg_offset == i) { -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/19] i965/fs: ralloc fs_inst's fs_reg sources.
--- Allocating fewer sources than 3 is not handled in this series. src/mesa/drivers/dri/i965/brw_fs.cpp | 8 src/mesa/drivers/dri/i965/brw_fs.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5b7d84f..b06966a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -55,6 +55,9 @@ void fs_inst::init() { memset(this, 0, sizeof(*this)); + + this->src = ralloc_array(this, fs_reg, 3); + this->conditional_mod = BRW_CONDITIONAL_NONE; this->dst = reg_undef; @@ -142,6 +145,11 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_inst::fs_inst(const fs_inst &that) { memcpy(this, &that, sizeof(that)); + + this->src = ralloc_array(this, fs_reg, 3); + + for (int i = 0; i < 3; i++) + this->src[i] = that.src[i]; } #define ALU1(op)\ diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index e079842..11a5c7c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -211,7 +211,7 @@ public: bool writes_flag() const; fs_reg dst; - fs_reg src[3]; + fs_reg *src; uint32_t texture_offset; /**< Texture offset bitfield */ uint32_t offset; /* spill/unspill offset */ -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/19] i965/fs: Add a function to resize fs_inst's sources array.
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 9 + src/mesa/drivers/dri/i965/brw_fs.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f926d97..1f174d3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -111,6 +111,15 @@ fs_inst::fs_inst(const fs_inst &that) this->src[i] = that.src[i]; } +void +fs_inst::resize_sources(uint8_t num_sources) +{ + if (this->sources != num_sources) { + this->src = reralloc(this, this->src, fs_reg, num_sources); + this->sources = num_sources; + } +} + #define ALU1(op)\ fs_inst *\ fs_visitor::op(fs_reg dst, fs_reg src0) \ diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index fb68923..b7cfb3c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -200,6 +200,8 @@ public: const fs_reg &src1, const fs_reg &src2); fs_inst(const fs_inst &that); + void resize_sources(uint8_t num_sources); + bool equals(fs_inst *inst) const; bool overwrites_reg(const fs_reg ®) const; bool is_send_from_grf() const; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/19] i965/fs: Only consider real sources when comparing instructions.
--- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 94f657d..e40567f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -94,10 +94,20 @@ is_expression_commutative(enum opcode op) } static bool -operands_match(enum opcode op, fs_reg *xs, fs_reg *ys) +operands_match(fs_inst *a, fs_inst *b) { - if (!is_expression_commutative(op)) { - return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2]); + fs_reg *xs = a->src; + fs_reg *ys = b->src; + + if (!is_expression_commutative(a->opcode)) { + bool match = true; + for (int i = 0; i < a->sources; i++) { + if (!xs[i].equals(ys[i])) { +match = false; +break; + } + } + return match; } else { return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) || (xs[1].equals(ys[0]) && xs[0].equals(ys[1])); @@ -113,7 +123,8 @@ instructions_match(fs_inst *a, fs_inst *b) a->predicate_inverse == b->predicate_inverse && a->conditional_mod == b->conditional_mod && a->dst.type == b->dst.type && - operands_match(a->opcode, a->src, b->src); + a->sources == b->sources && + operands_match(a, b); } bool -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/21] glsl: Store short variable names inside ir_variable
From: Ian Romanick Most of the overhead of the name allocation is the ralloc tracking, especially on 64-bit. The allocation of the variable name "i" is 2 bytes for the name and 40 bytes for the ralloc tracking. Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 225KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5746368 1439077 7185445 After: IR MEM: variable usage / name / total: 5746368 1208630 6954998 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 70KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4327584 915817 5243401 After: IR MEM: variable usage / name / total: 4327584 844096 5171680 Signed-off-by: Ian Romanick --- src/glsl/ir.cpp | 11 ++- src/glsl/ir_memory_usage.cpp | 3 ++- src/glsl/ir_validate.cpp | 2 +- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 4907b34..69a0345 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1536,7 +1536,16 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, : ir_instruction(ir_type_variable), max_ifc_array_access(NULL) { this->type = type; - this->name = ralloc_strdup(this, name); + + if (name == NULL) { + this->padding[0] = 0; + this->name = (char *) this->padding; + } else if (strlen(name) < sizeof(this->padding)) { + this->name = strcpy((char *) this->padding, name); + } else { + this->name = ralloc_strdup(this, name); + } + this->data.explicit_location = false; this->data.has_initializer = false; this->data.location = -1; diff --git a/src/glsl/ir_memory_usage.cpp b/src/glsl/ir_memory_usage.cpp index 68c0b5c..4918824 100644 --- a/src/glsl/ir_memory_usage.cpp +++ b/src/glsl/ir_memory_usage.cpp @@ -63,7 +63,8 @@ ir_memory_usage::visit(ir_variable *ir) this->s.variable_usage += (sizeof(ir_state_slot) * ir->num_state_slots) + ralloc_header_size; - this->s.variable_name_usage += strlen(ir->name) + 1 + ralloc_header_size; + if (ir->name != (char *) ir->padding) + this->s.variable_name_usage += strlen(ir->name) + 1 + ralloc_header_size; return visit_continue; } diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index 1cfd0d5..08dd250 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -634,7 +634,7 @@ ir_validate::visit(ir_variable *ir) * in the ir_dereference_variable handler to ensure that a variable is * declared before it is dereferenced. */ - if (ir->name) + if (ir->name && ir->name != (char *) ir->padding) assert(ralloc_parent(ir->name) == ir); hash_table_insert(ht, ir, ir); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/21] glsl: Store ir_variable::ir_type in 8 bits instead of 32
From: Ian Romanick No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 64-bit. No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 32-bit. Signed-off-by: Ian Romanick --- src/glsl/ir.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 7faee74..bc02f6e 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -92,12 +92,13 @@ enum ir_node_type { */ class ir_instruction : public exec_node { private: - enum ir_node_type ir_type; + uint8_t ir_type; public: inline enum ir_node_type get_ir_type() const { - return this->ir_type; + STATIC_ASSERT(ir_type_max < 256); + return (enum ir_node_type) this->ir_type; } /** -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 18/21] glsl: Squish ir_variable::max_ifc_array_access and ::state_slots together
From: Ian Romanick At least one of these pointers must be NULL, and we can determine which will be NULL by looking at other fields. Use this information to store both pointers in the same location. If anyone can think of a better name for the union than "u", I'm all ears. Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 204KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5746368 1121441 6867809 After: IR MEM: variable usage / name / total: 5537064 1121441 6658505 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 102KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4327584 787727 5115311 After: IR MEM: variable usage / name / total: 4222932 787727 5010659 Signed-off-by: Ian Romanick --- src/glsl/ir.cpp | 4 ++- src/glsl/ir.h | 78 ++- src/glsl/ir_clone.cpp | 4 +-- 3 files changed, 51 insertions(+), 35 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 69a0345..50660ac 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1533,7 +1533,7 @@ ir_swizzle::variable_referenced() const ir_variable::ir_variable(const struct glsl_type *type, const char *name, ir_variable_mode mode) - : ir_instruction(ir_type_variable), max_ifc_array_access(NULL) + : ir_instruction(ir_type_variable) { this->type = type; @@ -1546,6 +1546,8 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this->name = ralloc_strdup(this, name); } + this->u.max_ifc_array_access = NULL; + this->data.explicit_location = false; this->data.has_initializer = false; this->data.location = -1; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index ab9f27b..95182fb 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -476,7 +476,7 @@ public: assert(this->interface_type == NULL); this->interface_type = type; if (this->is_interface_instance()) { - this->max_ifc_array_access = + this->u.max_ifc_array_access = rzalloc_array(this, unsigned, type->length); } } @@ -488,7 +488,7 @@ public: */ void change_interface_type(const struct glsl_type *type) { - if (this->max_ifc_array_access != NULL) { + if (this->u.max_ifc_array_access != NULL) { /* max_ifc_array_access has already been allocated, so make sure the * new interface has the same number of fields as the old one. */ @@ -505,7 +505,7 @@ public: */ void reinit_interface_type(const struct glsl_type *type) { - if (this->max_ifc_array_access != NULL) { + if (this->u.max_ifc_array_access != NULL) { #ifndef NDEBUG /* Redeclaring gl_PerVertex is only allowed if none of the built-ins * it defines have been accessed yet; so it's safe to throw away the @@ -513,10 +513,10 @@ public: * zero. */ for (unsigned i = 0; i < this->interface_type->length; i++) -assert(this->max_ifc_array_access[i] == 0); +assert(this->u.max_ifc_array_access[i] == 0); #endif - ralloc_free(this->max_ifc_array_access); - this->max_ifc_array_access = NULL; + ralloc_free(this->u.max_ifc_array_access); + this->u.max_ifc_array_access = NULL; } this->interface_type = NULL; init_interface_type(type); @@ -535,33 +535,45 @@ public: */ inline unsigned *get_max_ifc_array_access() { - return this->max_ifc_array_access; + assert(this->data._num_state_slots == 0); + return this->u.max_ifc_array_access; } inline unsigned get_num_state_slots() const { + assert(!this->is_interface_instance() + || this->data._num_state_slots == 0); return this->data._num_state_slots; } inline void set_num_state_slots(unsigned n) { + assert(!this->is_interface_instance() + || n == 0); this->data._num_state_slots = n; } inline ir_state_slot *get_state_slots() { - return this->state_slots; + return this->is_interface_instance() ? NULL : this->u.state_slots; + } + + inline const ir_state_slot *get_state_slots() const + { + return this->is_interface_instance() ? NULL : this->u.state_slots; } inline ir_state_slot *allocate_state_slots(unsigned n) { - this->state_slots = ralloc_array(this, ir_state_slot, n); + assert(!this->is_interface_instance()); + + this->u.state_slots = ralloc_array(this, ir_state_slot, n); this->data._num_state_slots = 0; - if (this->state_slots != NULL) + if (this->u.state_slots != NULL) this->data._num_state_slots = n; - return this->state_slots; + return this->u.state_slots; } /** @@ -818,28 +830,30 @@ public: private: static const char *const warn_extension_table[]; - /** -* For variables which satisfy the is_interface_inst
[Mesa-dev] [PATCH 07/21] glsl: Replace ir_variable::warn_extension pointer with an 8-bit index
From: Ian Romanick Also move the new warn_extension_index into ir_variable::data. This enables slightly better packing. Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 204KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5955672 1439077 7394749 After: IR MEM: variable usage / name / total: 5746368 1439077 7185445 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 102KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4536888 915817 5452705 After: IR MEM: variable usage / name / total: 4432236 915817 5348053 Signed-off-by: Ian Romanick --- src/glsl/ir.cpp | 21 ++--- src/glsl/ir.h | 18 +- src/glsl/ir_clone.cpp | 2 -- 3 files changed, 31 insertions(+), 10 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 3d6af56..65a755e 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1547,7 +1547,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this->data.location = -1; this->data.location_frac = 0; this->data.binding = 0; - this->warn_extension = NULL; + this->data.warn_extension_index = 0; this->constant_value = NULL; this->constant_initializer = NULL; this->data.origin_upper_left = false; @@ -1610,16 +1610,31 @@ ir_variable::determine_interpolation_mode(bool flat_shade) return INTERP_QUALIFIER_SMOOTH; } +const char *const ir_variable::warn_extension_table[] = { + "", + "GL_ARB_shader_stencil_export", + "GL_AMD_shader_stencil_export", +}; + void ir_variable::enable_extension_warning(const char *extension) { - this->warn_extension = extension; + for (unsigned i = 0; i < Elements(warn_extension_table); i++) { + if (strcmp(warn_extension_table[i], extension) == 0) { + this->data.warn_extension_index = i; + return; + } + } + + assert(!"Should not get here."); + this->data.warn_extension_index = 0; } const char * ir_variable::get_extension_warning() const { - return this->warn_extension; + return this->data.warn_extension_index == 0 + ? NULL : warn_extension_table[this->data.warn_extension_index]; } ir_function_signature::ir_function_signature(const glsl_type *return_type, diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 3298a50..4147bbc 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -685,6 +685,13 @@ public: uint16_t image_format; /** + * Emit a warning if this variable is accessed. + */ + private: + uint8_t warn_extension_index; + + public: + /** * \brief Layout qualifier for gl_FragDepth. * * This is not equal to \c ir_depth_layout_none if and only if this @@ -733,6 +740,10 @@ public: */ unsigned max_array_access; + /** + * Allow (only) ir_variable direct access private members. + */ + friend class ir_variable; } data; /** @@ -767,6 +778,8 @@ public: ir_constant *constant_initializer; private: + static const char *const warn_extension_table[]; + /** * For variables that are in an interface block or are an instance of an * interface block, this is the \c GLSL_TYPE_INTERFACE type for that block. @@ -774,11 +787,6 @@ private: * \sa ir_variable::location */ const glsl_type *interface_type; - - /** -* Emit a warning if this variable is accessed. -*/ - const char *warn_extension; }; /** diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp index c00adc5..d594529 100644 --- a/src/glsl/ir_clone.cpp +++ b/src/glsl/ir_clone.cpp @@ -53,8 +53,6 @@ ir_variable::clone(void *mem_ctx, struct hash_table *ht) const memcpy(&var->data, &this->data, sizeof(var->data)); - var->warn_extension = this->warn_extension; - var->num_state_slots = this->num_state_slots; if (this->state_slots) { /* FINISHME: This really wants to use something like talloc_reference, but -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/21] glsl: Store ir_variable::depth_layout using 3 bits
From: Ian Romanick warn_extension_index was moved to improve packing. No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 64-bit. Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 102KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4432236 915817 5348053 After: IR MEM: variable usage / name / total: 4327584 915817 5243401 Signed-off-by: Ian Romanick --- src/glsl/ast_to_hir.cpp | 4 ++-- src/glsl/ir.h | 19 +-- 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 0621ea7..ef1607d 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2824,8 +2824,8 @@ get_variable_being_redeclared(ir_variable *var, YYLTYPE loc, "gl_FragDepth: depth layout is declared here " "as '%s, but it was previously declared as " "'%s'", - depth_layout_string(var->data.depth_layout), - depth_layout_string(earlier->data.depth_layout)); + depth_layout_string(ir_depth_layout(var->data.depth_layout)), + depth_layout_string(ir_depth_layout(earlier->data.depth_layout))); } earlier->data.depth_layout = var->data.depth_layout; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 4147bbc..8515124 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -671,6 +671,13 @@ public: */ unsigned index:1; + /** + * \brief Layout qualifier for gl_FragDepth. + * + * This is not equal to \c ir_depth_layout_none if and only if this + * variable is \c gl_FragDepth and a layout qualifier is specified. + */ + unsigned depth_layout:3; /** * ARB_shader_image_load_store qualifiers. @@ -681,9 +688,6 @@ public: unsigned image_volatile:1; unsigned image_restrict:1; - /** Image internal format if specified explicitly, otherwise GL_NONE. */ - uint16_t image_format; - /** * Emit a warning if this variable is accessed. */ @@ -691,13 +695,8 @@ public: uint8_t warn_extension_index; public: - /** - * \brief Layout qualifier for gl_FragDepth. - * - * This is not equal to \c ir_depth_layout_none if and only if this - * variable is \c gl_FragDepth and a layout qualifier is specified. - */ - ir_depth_layout depth_layout; + /** Image internal format if specified explicitly, otherwise GL_NONE. */ + uint16_t image_format; /** * Storage location of the base of this variable -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/21] glsl: Use bit-flags image attributes and uint16_t for the image format
From: Ian Romanick All of the GL image enums fit in 16-bits. Also move the fields from the anonymous "image" structucture to the next higher structure. This will enable packing the bits with the other bitfield. Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 204KiB on 64-bit. Before: IR MEM: variable usage / name / total: 6164976 1439077 7604053 After: IR MEM: variable usage / name / total: 5955672 1439077 7394749 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 204KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4746192 915817 5662009 After: IR MEM: variable usage / name / total: 4536888 915817 5452705 Signed-off-by: Ian Romanick --- src/glsl/ast_function.cpp | 10 +- src/glsl/ast_to_hir.cpp| 14 +++--- src/glsl/builtin_functions.cpp | 10 +- src/glsl/ir.cpp| 20 ++-- src/glsl/ir.h | 27 +-- src/glsl/link_uniforms.cpp | 4 ++-- 6 files changed, 42 insertions(+), 43 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index 4b84470..c70b519 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -106,35 +106,35 @@ verify_image_parameter(YYLTYPE *loc, _mesa_glsl_parse_state *state, * qualifiers. [...] It is legal to have additional qualifiers * on a formal parameter, but not to have fewer." */ - if (actual->data.image.coherent && !formal->data.image.coherent) { + if (actual->data.image_coherent && !formal->data.image_coherent) { _mesa_glsl_error(loc, state, "function call parameter `%s' drops " "`coherent' qualifier", formal->name); return false; } - if (actual->data.image._volatile && !formal->data.image._volatile) { + if (actual->data.image_volatile && !formal->data.image_volatile) { _mesa_glsl_error(loc, state, "function call parameter `%s' drops " "`volatile' qualifier", formal->name); return false; } - if (actual->data.image.restrict_flag && !formal->data.image.restrict_flag) { + if (actual->data.image_restrict && !formal->data.image_restrict) { _mesa_glsl_error(loc, state, "function call parameter `%s' drops " "`restrict' qualifier", formal->name); return false; } - if (actual->data.image.read_only && !formal->data.image.read_only) { + if (actual->data.image_read_only && !formal->data.image_read_only) { _mesa_glsl_error(loc, state, "function call parameter `%s' drops " "`readonly' qualifier", formal->name); return false; } - if (actual->data.image.write_only && !formal->data.image.write_only) { + if (actual->data.image_write_only && !formal->data.image_write_only) { _mesa_glsl_error(loc, state, "function call parameter `%s' drops " "`writeonly' qualifier", formal->name); diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 0128b3f..0621ea7 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2314,11 +2314,11 @@ apply_image_qualifier_to_variable(const struct ast_type_qualifier *qual, "global variables"); } - var->data.image.read_only |= qual->flags.q.read_only; - var->data.image.write_only |= qual->flags.q.write_only; - var->data.image.coherent |= qual->flags.q.coherent; - var->data.image._volatile |= qual->flags.q._volatile; - var->data.image.restrict_flag |= qual->flags.q.restrict_flag; + var->data.image_read_only |= qual->flags.q.read_only; + var->data.image_write_only |= qual->flags.q.write_only; + var->data.image_coherent |= qual->flags.q.coherent; + var->data.image_volatile |= qual->flags.q._volatile; + var->data.image_restrict |= qual->flags.q.restrict_flag; var->data.read_only = true; if (qual->flags.q.explicit_image_format) { @@ -2332,7 +2332,7 @@ apply_image_qualifier_to_variable(const struct ast_type_qualifier *qual, "base data type of the image"); } - var->data.image.format = qual->image_format; + var->data.image_format = qual->image_format; } else { if (var->data.mode == ir_var_uniform && !qual->flags.q.write_only) { _mesa_glsl_error(loc, state, "uniforms not qualified with " @@ -2340,7 +2340,7 @@ apply_image_qualifier_to_variable(const struct ast_type_qualifier *qual, "qualifier"); } - var->data.image.format = GL_NONE; + var->data.image_format = GL_NONE; } } } diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index f9f0686..4b538c7 100644 --- a/src/glsl/builtin_functions.cpp
[Mesa-dev] [PATCH 16/21] glsl: Make ir_variable::max_ifc_array_access private
From: Ian Romanick The payoff for this will come in a few more patches. Signed-off-by: Ian Romanick --- src/glsl/ast_array_index.cpp | 10 -- src/glsl/ir.h| 37 - src/glsl/ir_validate.cpp | 9 +++-- src/glsl/link_functions.cpp | 14 +++--- src/glsl/linker.cpp | 5 +++-- 5 files changed, 53 insertions(+), 22 deletions(-) diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp index f3b060e..ecc4200 100644 --- a/src/glsl/ast_array_index.cpp +++ b/src/glsl/ast_array_index.cpp @@ -88,8 +88,14 @@ update_max_array_access(ir_rvalue *ir, unsigned idx, YYLTYPE *loc, unsigned field_index = deref_record->record->type->field_index(deref_record->field); assert(field_index < interface_type->length); -if (idx > deref_var->var->max_ifc_array_access[field_index]) { - deref_var->var->max_ifc_array_access[field_index] = idx; + +unsigned *const max_ifc_array_access = + deref_var->var->get_max_ifc_array_access(); + +assert(max_ifc_array_access != NULL); + +if (idx > max_ifc_array_access[field_index]) { + max_ifc_array_access[field_index] = idx; /* Check whether this access will, as a side effect, implicitly * cause the size of a built-in array to be too large. diff --git a/src/glsl/ir.h b/src/glsl/ir.h index fb10c32..bfd790e 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -528,6 +528,17 @@ public: } /** +* Get the max_ifc_array_access pointer +* +* A "set" function is not needed because the array is dynmically allocated +* as necessary. +*/ + inline unsigned *get_max_ifc_array_access() + { + return this->max_ifc_array_access; + } + + /** * Enable emitting extension warnings for this variable */ void enable_extension_warning(const char *extension); @@ -549,19 +560,6 @@ public: */ const char *name; - /** -* For variables which satisfy the is_interface_instance() predicate, this -* points to an array of integers such that if the ith member of the -* interface block is an array, max_ifc_array_access[i] is the maximum -* array element of that member that has been accessed. If the ith member -* of the interface block is not an array, max_ifc_array_access[i] is -* unused. -* -* For variables whose type is not an interface block, this pointer is -* NULL. -*/ - unsigned *max_ifc_array_access; - struct ir_variable_data { /** @@ -807,6 +805,19 @@ private: static const char *const warn_extension_table[]; /** +* For variables which satisfy the is_interface_instance() predicate, this +* points to an array of integers such that if the ith member of the +* interface block is an array, max_ifc_array_access[i] is the maximum +* array element of that member that has been accessed. If the ith member +* of the interface block is not an array, max_ifc_array_access[i] is +* unused. +* +* For variables whose type is not an interface block, this pointer is +* NULL. +*/ + unsigned *max_ifc_array_access; + + /** * For variables that are in an interface block or are an instance of an * interface block, this is the \c GLSL_TYPE_INTERFACE type for that block. * diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index 08dd250..2fca437 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -662,10 +662,15 @@ ir_validate::visit(ir_variable *ir) ir->get_interface_type()->fields.structure; for (unsigned i = 0; i < ir->get_interface_type()->length; i++) { if (fields[i].type->array_size() > 0) { -if (ir->max_ifc_array_access[i] >= fields[i].type->length) { +const unsigned *const max_ifc_array_access = + ir->get_max_ifc_array_access(); + +assert(max_ifc_array_access != NULL); + +if (max_ifc_array_access[i] >= fields[i].type->length) { printf("ir_variable has maximum access out of bounds for " "field %s (%d vs %d)\n", fields[i].name, - ir->max_ifc_array_access[i], fields[i].type->length); + max_ifc_array_access[i], fields[i].type->length); ir->print(); abort(); } diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp index 56f3f20..60d0d13 100644 --- a/src/glsl/link_functions.cpp +++ b/src/glsl/link_functions.cpp @@ -246,11 +246,19 @@ public: /* Similarly, we need implicit sizes of arrays within interface * blocks to be sized by the maximal access in *any* shader. */ + unsigned *const linked_max_ifc_array_access = + var->get_max_ifc_array_access(); +
[Mesa-dev] [PATCH 09/21] glsl: Set ir_instruction::ir_type in the base class constructor
From: Ian Romanick This has the added perk that if you forget to set ir_type in the constructor of a new subclass (or a new constructor of an existing subclass) the compiler will tell you... instead of relying on ir_validate or similar run-time detection. Signed-off-by: Ian Romanick --- src/glsl/ir.cpp | 65 ++--- src/glsl/ir.h | 46 +++- 2 files changed, 57 insertions(+), 54 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 65a755e..4907b34 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -26,7 +26,8 @@ #include "ir_visitor.h" #include "glsl_types.h" -ir_rvalue::ir_rvalue() +ir_rvalue::ir_rvalue(enum ir_node_type t) + : ir_instruction(t) { this->type = glsl_type::error_type; } @@ -153,8 +154,8 @@ ir_assignment::whole_variable_written() ir_assignment::ir_assignment(ir_dereference *lhs, ir_rvalue *rhs, ir_rvalue *condition, unsigned write_mask) + : ir_instruction(ir_type_assignment) { - this->ir_type = ir_type_assignment; this->condition = condition; this->rhs = rhs; this->lhs = lhs; @@ -173,8 +174,8 @@ ir_assignment::ir_assignment(ir_dereference *lhs, ir_rvalue *rhs, ir_assignment::ir_assignment(ir_rvalue *lhs, ir_rvalue *rhs, ir_rvalue *condition) + : ir_instruction(ir_type_assignment) { - this->ir_type = ir_type_assignment; this->condition = condition; this->rhs = rhs; @@ -198,8 +199,8 @@ ir_assignment::ir_assignment(ir_rvalue *lhs, ir_rvalue *rhs, ir_expression::ir_expression(int op, const struct glsl_type *type, ir_rvalue *op0, ir_rvalue *op1, ir_rvalue *op2, ir_rvalue *op3) + : ir_rvalue(ir_type_expression) { - this->ir_type = ir_type_expression; this->type = type; this->operation = ir_expression_operation(op); this->operands[0] = op0; @@ -215,9 +216,8 @@ ir_expression::ir_expression(int op, const struct glsl_type *type, } ir_expression::ir_expression(int op, ir_rvalue *op0) + : ir_rvalue(ir_type_expression) { - this->ir_type = ir_type_expression; - this->operation = ir_expression_operation(op); this->operands[0] = op0; this->operands[1] = NULL; @@ -324,9 +324,8 @@ ir_expression::ir_expression(int op, ir_rvalue *op0) } ir_expression::ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1) + : ir_rvalue(ir_type_expression) { - this->ir_type = ir_type_expression; - this->operation = ir_expression_operation(op); this->operands[0] = op0; this->operands[1] = op1; @@ -420,9 +419,8 @@ ir_expression::ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1) ir_expression::ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1, ir_rvalue *op2) + : ir_rvalue(ir_type_expression) { - this->ir_type = ir_type_expression; - this->operation = ir_expression_operation(op); this->operands[0] = op0; this->operands[1] = op1; @@ -610,25 +608,25 @@ ir_expression::get_operator(const char *str) } ir_constant::ir_constant() + : ir_rvalue(ir_type_constant) { - this->ir_type = ir_type_constant; } ir_constant::ir_constant(const struct glsl_type *type, const ir_constant_data *data) + : ir_rvalue(ir_type_constant) { assert((type->base_type >= GLSL_TYPE_UINT) && (type->base_type <= GLSL_TYPE_BOOL)); - this->ir_type = ir_type_constant; this->type = type; memcpy(& this->value, data, sizeof(this->value)); } ir_constant::ir_constant(float f, unsigned vector_elements) + : ir_rvalue(ir_type_constant) { assert(vector_elements <= 4); - this->ir_type = ir_type_constant; this->type = glsl_type::get_instance(GLSL_TYPE_FLOAT, vector_elements, 1); for (unsigned i = 0; i < vector_elements; i++) { this->value.f[i] = f; @@ -639,9 +637,9 @@ ir_constant::ir_constant(float f, unsigned vector_elements) } ir_constant::ir_constant(unsigned int u, unsigned vector_elements) + : ir_rvalue(ir_type_constant) { assert(vector_elements <= 4); - this->ir_type = ir_type_constant; this->type = glsl_type::get_instance(GLSL_TYPE_UINT, vector_elements, 1); for (unsigned i = 0; i < vector_elements; i++) { this->value.u[i] = u; @@ -652,9 +650,9 @@ ir_constant::ir_constant(unsigned int u, unsigned vector_elements) } ir_constant::ir_constant(int integer, unsigned vector_elements) + : ir_rvalue(ir_type_constant) { assert(vector_elements <= 4); - this->ir_type = ir_type_constant; this->type = glsl_type::get_instance(GLSL_TYPE_INT, vector_elements, 1); for (unsigned i = 0; i < vector_elements; i++) { this->value.i[i] = integer; @@ -665,9 +663,9 @@ ir_constant::ir_constant(int integer, unsigned vector_elements) } ir_constant::ir_constant(bool b, unsigned vector_elements) + : ir_rvalue(ir_type_constant) { assert(vector_elements <= 4);
[Mesa-dev] [PATCH 14/21] glsl: Use short names for flattening temporaries
From: Ian Romanick Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 66KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5746368 1208630 6954998 After: IR MEM: variable usage / name / total: 5746368 1140817 6887185 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 42KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4327584 844096 5171680 After: IR MEM: variable usage / name / total: 4327584 800183 5127767 Signed-off-by: Ian Romanick --- src/glsl/ir_expression_flattening.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp | 16 +--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/src/glsl/ir_expression_flattening.cpp b/src/glsl/ir_expression_flattening.cpp index c1cadb1..4fd4733 100644 --- a/src/glsl/ir_expression_flattening.cpp +++ b/src/glsl/ir_expression_flattening.cpp @@ -78,7 +78,7 @@ ir_expression_flattening_visitor::handle_rvalue(ir_rvalue **rvalue) void *ctx = ralloc_parent(ir); - var = new(ctx) ir_variable(ir->type, "flattening_tmp", ir_var_temporary); + var = new(ctx) ir_variable(ir->type, "$f", ir_var_temporary); base_ir->insert_before(var); assign = new(ctx) ir_assignment(new(ctx) ir_dereference_variable(var), diff --git a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp index a9125ca..aac515b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp @@ -368,9 +368,19 @@ brw_do_vector_splitting(exec_list *instructions) entry->mem_ctx = ralloc_parent(entry->var); for (unsigned int i = 0; i < entry->var->type->vector_elements; i++) { -const char *name = ralloc_asprintf(mem_ctx, "%s_%c", - entry->var->name, - "xyzw"[i]); + const char *name; + char buf[3]; + + if (entry->var->name[0] == '$') { +buf[0] = '$'; +buf[1] = "xyzw"[i]; +buf[2] = '\0'; +name = buf; + } else { +name = ralloc_asprintf(mem_ctx, "%s_%c", + entry->var->name, + "xyzw"[i]); + } entry->components[i] = new(entry->mem_ctx) ir_variable(type, name, ir_var_temporary); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 20/21] glsl: Use short names for function return value variables
From: Ian Romanick Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 181KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5327760 1121441 6449201 After: IR MEM: variable usage / name / total: 5327760 935234 6262994 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 114KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4118280 787727 4906007 After: IR MEM: variable usage / name / total: 4118280 670980 4789260 Signed-off-by: Ian Romanick --- src/glsl/ast_function.cpp | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index bad410b..24bbd90 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -394,12 +394,9 @@ generate_call(exec_list *instructions, ir_function_signature *sig, ir_dereference_variable *deref = NULL; if (!sig->return_type->is_void()) { /* Create a new temporary to hold the return value. */ - ir_variable *var; + ir_variable *var = + new(ctx) ir_variable(sig->return_type, "$r", ir_var_temporary); - var = new(ctx) ir_variable(sig->return_type, -ralloc_asprintf(ctx, "%s_retval", -sig->function_name()), -ir_var_temporary); instructions->push_tail(var); deref = new(ctx) ir_dereference_variable(var); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/21] glsl: Use a single bit for the dual-source blend index
From: Ian Romanick The only values allowed are 0 and 1, and the value is checked before assigning. With the previous changes, reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 204KiB on 64-bit. Before: IR MEM: variable usage / name / total: 6374280 1439077 7813357 After: IR MEM: variable usage / name / total: 6164976 1439077 7604053 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 102KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4850844 915817 571 After: IR MEM: variable usage / name / total: 4746192 915817 5662009 Signed-off-by: Ian Romanick --- src/glsl/ir.h | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 93d5aef..3767f2a 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -651,6 +651,15 @@ public: unsigned from_named_ifc_block_array:1; /** + * Output index for dual source blending. + * + * \note + * The GLSL spec only allows the values 0 or 1 for the index in \b dual + * source blending. + */ + unsigned index:1; + + /** * \brief Layout qualifier for gl_FragDepth. * * This is not equal to \c ir_depth_layout_none if and only if this @@ -679,11 +688,6 @@ public: int location; /** - * output index for dual source blending. - */ - int index; - - /** * Initial binding point for a sampler, atomic, or UBO. * * For array types, this represents the binding point for the first element. -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/21] mesa: Log memory usage statistics for all known shaders
From: Ian Romanick Currently this is done at each call to glLinkProgram. This seems like as good a place as any. This is the main place where memory usage will change, and it enables tracking as applications progress (e.g., load new levels). Signed-off-by: Ian Romanick --- src/mesa/main/shaderapi.c | 79 +++ 1 file changed, 79 insertions(+) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index 28739da..8e8170e 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -58,6 +58,7 @@ #include "../glsl/ir.h" #include "../glsl/ir_uniform.h" #include "../glsl/program.h" +#include "../glsl/ir_memory_usage.h" /** Define this to enable shader substitution (see below) */ #define SHADER_SUBST 0 @@ -887,6 +888,56 @@ compile_shader(struct gl_context *ctx, GLuint shaderObj) } +#ifdef DEBUG +static void +memory_stats_cb(GLuint key, void *data, void *userData) +{ + struct ir_memory_statistics *total = + (struct ir_memory_statistics *) userData; + struct ir_memory_statistics stats; + struct gl_shader_program *shProg = (struct gl_shader_program *) data; + + (void) key; + + if (shProg->Type == GL_SHADER_PROGRAM_MESA) { + unsigned i; + + for (i = 0; i < MESA_SHADER_STAGES; i++) { + struct gl_shader *sh = shProg->_LinkedShaders[i]; + + if (shProg->_LinkedShaders[i] != NULL) { +calculate_ir_tree_memory_usage(sh->ir, &stats); + +total->variable_usage += stats.variable_usage; +total->variable_name_usage += stats.variable_name_usage; +total->dereference_variable_usage += + stats.dereference_variable_usage; +total->dereference_array_usage += stats.dereference_array_usage; +total->dereference_record_usage += stats.dereference_record_usage; +total->dereference_record_field_usage += + stats.dereference_record_field_usage; + } + } + } else { + struct gl_shader *sh = (struct gl_shader *) data; + + assert(sh->Type == GL_FRAGMENT_SHADER + || sh->Type == GL_VERTEX_SHADER + || sh->Type == GL_GEOMETRY_SHADER_ARB); + + calculate_ir_tree_memory_usage(sh->ir, &stats); + + total->variable_usage += stats.variable_usage; + total->variable_name_usage += stats.variable_name_usage; + total->dereference_variable_usage += stats.dereference_variable_usage; + total->dereference_array_usage += stats.dereference_array_usage; + total->dereference_record_usage += stats.dereference_record_usage; + total->dereference_record_field_usage += + stats.dereference_record_field_usage; + } +} +#endif /* DEBUG */ + /** * Link a program's shaders. */ @@ -920,6 +971,34 @@ link_program(struct gl_context *ctx, GLuint program) shProg->Name, shProg->InfoLog); } + /* On the first draw call, dump the memory usage statistics for *ALL* +* known shaders. +*/ +#ifdef DEBUG + if (ctx->_Shader->Flags & GLSL_LOG) { + struct ir_memory_statistics stats; + + memset(&stats, 0, sizeof(stats)); + _mesa_HashWalk(ctx->Shared->ShaderObjects, + memory_stats_cb, + &stats); + + printf("IR MEM: variable usage / name / total: %u %u %u\n", + stats.variable_usage, + stats.variable_name_usage, + stats.variable_usage + stats.variable_name_usage); + printf("IR MEM: dereference variable usage: %u\n", + stats.dereference_variable_usage); + printf("IR MEM: dereference array usage: %u\n", + stats.dereference_array_usage); + printf("IR MEM: dereference record usage / field / total: %u %u %u\n", + stats.dereference_record_usage, + stats.dereference_record_field_usage, + stats.dereference_record_usage + + stats.dereference_record_field_usage); + } +#endif /* DEBUG */ + /* debug code */ if (0) { GLuint i; -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/21] glsl: Use short names for conditional temporaries
From: Ian Romanick Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 18KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5746368 1140817 6887185 After: IR MEM: variable usage / name / total: 5746368 1121441 6867809 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 12KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4327584 800183 5127767 After: IR MEM: variable usage / name / total: 4327584 787727 5115311 Signed-off-by: Ian Romanick --- src/glsl/ast_to_hir.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 3fcec19..bff2e0a 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1596,7 +1596,7 @@ ast_expression::do_hir(exec_list *instructions, result = (cond_val->value.b[0]) ? then_val : else_val; } else { ir_variable *const tmp = -new(ctx) ir_variable(type, "conditional_tmp", ir_var_temporary); +new(ctx) ir_variable(type, "$c", ir_var_temporary); instructions->push_tail(tmp); ir_if *const stmt = new(ctx) ir_if(op[0]); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/21] glsl: Use accessors for ir_variable::warn_extension
From: Ian Romanick The payoff for this will come in the next patch. No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 64-bit. No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 32-bit. Signed-off-by: Ian Romanick --- src/glsl/builtin_variables.cpp | 4 ++-- src/glsl/ir.cpp| 11 +++ src/glsl/ir.h | 22 +- 3 files changed, 30 insertions(+), 7 deletions(-) diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 9b35850..1461953 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -913,14 +913,14 @@ builtin_variable_generator::generate_fs_special_vars() ir_variable *const var = add_output(FRAG_RESULT_STENCIL, int_t, "gl_FragStencilRefARB"); if (state->ARB_shader_stencil_export_warn) - var->warn_extension = "GL_ARB_shader_stencil_export"; + var->enable_extension_warning("GL_ARB_shader_stencil_export"); } if (state->AMD_shader_stencil_export_enable) { ir_variable *const var = add_output(FRAG_RESULT_STENCIL, int_t, "gl_FragStencilRefAMD"); if (state->AMD_shader_stencil_export_warn) - var->warn_extension = "GL_AMD_shader_stencil_export"; + var->enable_extension_warning("GL_AMD_shader_stencil_export"); } if (state->ARB_sample_shading_enable) { diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index c727d89..3d6af56 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1610,6 +1610,17 @@ ir_variable::determine_interpolation_mode(bool flat_shade) return INTERP_QUALIFIER_SMOOTH; } +void +ir_variable::enable_extension_warning(const char *extension) +{ + this->warn_extension = extension; +} + +const char * +ir_variable::get_extension_warning() const +{ + return this->warn_extension; +} ir_function_signature::ir_function_signature(const glsl_type *return_type, builtin_available_predicate b) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index fac24df..3298a50 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -501,6 +501,18 @@ public: } /** +* Enable emitting extension warnings for this variable +*/ + void enable_extension_warning(const char *extension); + + /** +* Get the extension warning string for this variable +* +* If warnings are not enabled, \c NULL is returned. +*/ + const char *get_extension_warning() const; + + /** * Declared type of the variable */ const struct glsl_type *type; @@ -740,11 +752,6 @@ public: /*@}*/ /** -* Emit a warning if this variable is accessed. -*/ - const char *warn_extension; - - /** * Value assigned in the initializer of a variable declared "const" */ ir_constant *constant_value; @@ -767,6 +774,11 @@ private: * \sa ir_variable::location */ const glsl_type *interface_type; + + /** +* Emit a warning if this variable is accessed. +*/ + const char *warn_extension; }; /** -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/21] glsl: Add a facility to get some memory usage statistics for a shader
From: Ian Romanick v2: Also account for the ralloc header overhead. Signed-off-by: Ian Romanick --- src/glsl/Makefile.sources| 1 + src/glsl/ir_memory_usage.cpp | 104 +++ src/glsl/ir_memory_usage.h | 48 3 files changed, 153 insertions(+) create mode 100644 src/glsl/ir_memory_usage.cpp create mode 100644 src/glsl/ir_memory_usage.h diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index 5945590..6e230f7 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -41,6 +41,7 @@ LIBGLSL_FILES = \ $(GLSL_SRCDIR)/ir_hierarchical_visitor.cpp \ $(GLSL_SRCDIR)/ir_hv_accept.cpp \ $(GLSL_SRCDIR)/ir_import_prototypes.cpp \ + $(GLSL_SRCDIR)/ir_memory_usage.cpp \ $(GLSL_SRCDIR)/ir_print_visitor.cpp \ $(GLSL_SRCDIR)/ir_reader.cpp \ $(GLSL_SRCDIR)/ir_rvalue_visitor.cpp \ diff --git a/src/glsl/ir_memory_usage.cpp b/src/glsl/ir_memory_usage.cpp new file mode 100644 index 000..68c0b5c --- /dev/null +++ b/src/glsl/ir_memory_usage.cpp @@ -0,0 +1,104 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +/** + * \file ir_memory_usage.cpp + * Determine the amount of memory used by different kinds of IR in a shader. + */ + +#include "ir.h" +#include "ir_hierarchical_visitor.h" +#include "ir_memory_usage.h" + +class ir_memory_usage : public ir_hierarchical_visitor { +public: + ir_memory_usage() + { + memset(&this->s, 0, sizeof(this->s)); + } + + ~ir_memory_usage() + { + /* empty */ + } + + virtual ir_visitor_status visit(ir_variable *v); + virtual ir_visitor_status visit(ir_dereference_variable *ir); + virtual ir_visitor_status visit_enter(class ir_dereference_array *); + virtual ir_visitor_status visit_enter(class ir_dereference_record *); + + ir_memory_statistics s; +}; + +/* In release builds, the ralloc header contains 5 pointers. + */ +static const unsigned ralloc_header_size = 5 * sizeof(void *); + +ir_visitor_status +ir_memory_usage::visit(ir_variable *ir) +{ + this->s.variable_usage += sizeof(*ir); + + if (ir->state_slots != NULL) + this->s.variable_usage += (sizeof(ir_state_slot) * ir->num_state_slots) + + ralloc_header_size; + + this->s.variable_name_usage += strlen(ir->name) + 1 + ralloc_header_size; + + return visit_continue; +} + + +ir_visitor_status +ir_memory_usage::visit(ir_dereference_variable *ir) +{ + this->s.dereference_variable_usage += sizeof(*ir); + + return visit_continue; +} + +ir_visitor_status +ir_memory_usage::visit_enter(class ir_dereference_array *ir) +{ + this->s.dereference_array_usage += sizeof(*ir); + return visit_continue; +} + +ir_visitor_status +ir_memory_usage::visit_enter(class ir_dereference_record *ir) +{ + this->s.dereference_record_usage += sizeof(*ir); + this->s.dereference_record_field_usage += strlen(ir->field) + 1 + + ralloc_header_size; + return visit_continue; +} + +extern "C" void +calculate_ir_tree_memory_usage(exec_list *instructions, + struct ir_memory_statistics *stats) +{ + ir_memory_usage v; + + v.run(instructions); + *stats = v.s; +} diff --git a/src/glsl/ir_memory_usage.h b/src/glsl/ir_memory_usage.h new file mode 100644 index 000..3d137a3 --- /dev/null +++ b/src/glsl/ir_memory_usage.h @@ -0,0 +1,48 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditi
[Mesa-dev] [PATCH 10/21] glsl: Make ir_instruction::ir_type private
From: Ian Romanick In the next patch, the type of ir_type is going to change from enum to uint8_t. Since the type won't be an enum, we won't get compiler warnings about, for example, switch statements that don't have cases for all the enum values. Using a getter that returns the enum type will enable us to continue getting those warnings. Also, ir_type should never be changed after an object is created. Having it set in the constructor and no setter effectively makes it write-once. Signed-off-by: Ian Romanick --- src/glsl/ast_function.cpp | 2 +- src/glsl/ast_to_hir.cpp| 2 +- src/glsl/ir.h | 8 +++- src/glsl/ir_constant_expression.cpp| 4 ++-- src/glsl/ir_print_visitor.cpp | 2 +- src/glsl/ir_validate.cpp | 6 +++--- src/glsl/loop_analysis.cpp | 2 +- src/glsl/loop_controls.cpp | 2 +- src/glsl/loop_unroll.cpp | 2 +- src/glsl/lower_clip_distance.cpp | 4 ++-- src/glsl/lower_if_to_cond_assign.cpp | 4 ++-- src/glsl/lower_jumps.cpp | 6 +++--- src/glsl/lower_offset_array.cpp| 2 +- src/glsl/lower_ubo_reference.cpp | 4 ++-- src/glsl/lower_vector.cpp | 4 ++-- src/glsl/lower_vector_insert.cpp | 2 +- src/glsl/opt_constant_folding.cpp | 2 +- src/glsl/opt_cse.cpp | 2 +- src/glsl/opt_redundant_jumps.cpp | 6 +++--- src/glsl/opt_structure_splitting.cpp | 2 +- src/glsl/opt_vectorize.cpp | 2 +- src/mesa/program/ir_to_mesa.cpp| 2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +- 23 files changed, 40 insertions(+), 34 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index c70b519..bad410b 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -175,7 +175,7 @@ verify_parameter_modes(_mesa_glsl_parse_state *state, /* Verify that 'const_in' parameters are ir_constants. */ if (formal->data.mode == ir_var_const_in && - actual->ir_type != ir_type_constant) { + actual->get_ir_type() != ir_type_constant) { _mesa_glsl_error(&loc, state, "parameter `in %s' must be a constant expression", formal->name); diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index ef1607d..3fcec19 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -756,7 +756,7 @@ do_assignment(exec_list *instructions, struct _mesa_glsl_parse_state *state, /* If the assignment LHS comes back as an ir_binop_vector_extract * expression, move it to the RHS as an ir_triop_vector_insert. */ - if (lhs->ir_type == ir_type_expression) { + if (lhs->get_ir_type() == ir_type_expression) { ir_expression *const lhs_expr = lhs->as_expression(); if (unlikely(lhs_expr->operation == ir_binop_vector_extract)) { diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 5d45469..7faee74 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -91,9 +91,15 @@ enum ir_node_type { * Base class of all IR instructions */ class ir_instruction : public exec_node { -public: +private: enum ir_node_type ir_type; +public: + inline enum ir_node_type get_ir_type() const + { + return this->ir_type; + } + /** * GCC 4.7+ and clang warn when deleting an ir_instruction unless * there's a virtual destructor present. Because we almost diff --git a/src/glsl/ir_constant_expression.cpp b/src/glsl/ir_constant_expression.cpp index 8afe8f7..c07b951 100644 --- a/src/glsl/ir_constant_expression.cpp +++ b/src/glsl/ir_constant_expression.cpp @@ -403,7 +403,7 @@ constant_referenced(const ir_dereference *deref, if (variable_context == NULL) return false; - switch (deref->ir_type) { + switch (deref->get_ir_type()) { case ir_type_dereference_array: { const ir_dereference_array *const da = (const ir_dereference_array *) deref; @@ -1785,7 +1785,7 @@ bool ir_function_signature::constant_expression_evaluate_expression_list(const s { foreach_list(n, &body) { ir_instruction *inst = (ir_instruction *)n; - switch(inst->ir_type) { + switch(inst->get_ir_type()) { /* (declare () type symbol) */ case ir_type_variable: { diff --git a/src/glsl/ir_print_visitor.cpp b/src/glsl/ir_print_visitor.cpp index 0a7695a..e5ac50e 100644 --- a/src/glsl/ir_print_visitor.cpp +++ b/src/glsl/ir_print_visitor.cpp @@ -70,7 +70,7 @@ _mesa_print_ir(FILE *f, exec_list *instructions, foreach_list(n, instructions) { ir_instruction *ir = (ir_instruction *) n; ir->fprint(f); - if (ir->ir_type != ir_type_function) + if (ir->get_ir_type() != ir_type_function) fprintf(f, "\n"); } fprintf(f, "\n)"); diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index 71defc8..1cfd0
[Mesa-dev] [PATCH 12/21] glsl: Make compiler-added padding ir_instruction usable
From: Ian Romanick No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 64-bit. No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 32-bit. Signed-off-by: Ian Romanick --- src/glsl/ir.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index bc02f6e..fb10c32 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -95,6 +95,20 @@ private: uint8_t ir_type; public: + /** +* Alignment padding that would be added by the compiler +* +* Putting a field here makes what would otherwise be dead space usabled. +* Subclasses of ir_instruction can store data here. Care must be taken +* for two reasons: +* +* 1. Direct descendents in the class hierarchy (e.g., \c ir_dereference +*and \c ir_dereference_array) must not try to use this space. +* +* 2. The size of the padding depends on the architecture. +*/ + uint8_t padding[sizeof(intptr_t) - 1]; + inline enum ir_node_type get_ir_type() const { STATIC_ASSERT(ir_type_max < 256); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/21] glsl: Eliminate ir_variable::data.atomic.buffer_index
From: Ian Romanick Just use ir_variable::data.binding... because that's the where the binding is stored for everything else that can use layout(binding=). No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 64-bit. Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 102KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4955496 915817 5871313 After: IR MEM: variable usage / name / total: 4850844 915817 571 Signed-off-by: Ian Romanick --- src/glsl/ir.cpp| 2 +- src/glsl/ir.h | 3 +-- src/glsl/link_atomics.cpp | 4 +++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +- 5 files changed, 7 insertions(+), 6 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index ba8a839..65541c2 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1546,6 +1546,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this->data.has_initializer = false; this->data.location = -1; this->data.location_frac = 0; + this->data.binding = 0; this->warn_extension = NULL; this->constant_value = NULL; this->constant_initializer = NULL; @@ -1561,7 +1562,6 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this->data.mode = mode; this->data.interpolation = INTERP_QUALIFIER_NONE; this->data.max_array_access = 0; - this->data.atomic.buffer_index = 0; this->data.atomic.offset = 0; this->data.image.read_only = false; this->data.image.write_only = false; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index ef4a12d..93d5aef 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -684,7 +684,7 @@ public: int index; /** - * Initial binding point for a sampler or UBO. + * Initial binding point for a sampler, atomic, or UBO. * * For array types, this represents the binding point for the first element. */ @@ -694,7 +694,6 @@ public: * Location an atomic counter is stored at. */ struct { - unsigned buffer_index; unsigned offset; } atomic; diff --git a/src/glsl/link_atomics.cpp b/src/glsl/link_atomics.cpp index d92cdb1..8655269 100644 --- a/src/glsl/link_atomics.cpp +++ b/src/glsl/link_atomics.cpp @@ -192,7 +192,9 @@ link_assign_atomic_counter_resources(struct gl_context *ctx, gl_uniform_storage *const storage = &prog->UniformStorage[id]; mab.Uniforms[j] = id; - var->data.atomic.buffer_index = i; + if (!var->data.explicit_binding) +var->data.binding = i; + storage->atomic_buffer_index = i; storage->offset = var->data.atomic.offset; storage->array_stride = (var->type->is_array() ? diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index dcc8441..f2b34e2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2278,7 +2278,7 @@ fs_visitor::visit_atomic_counter_intrinsic(ir_call *ir) ir->actual_parameters.get_head()); ir_variable *location = deref->variable_referenced(); unsigned surf_index = (prog_data->base.binding_table.abo_start + - location->data.atomic.buffer_index); + location->data.binding); /* Calculate the surface offset */ fs_reg offset(this, glsl_type::uint_type); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 7bad81c..d72c47c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2198,7 +2198,7 @@ vec4_visitor::visit_atomic_counter_intrinsic(ir_call *ir) ir->actual_parameters.get_head()); ir_variable *location = deref->variable_referenced(); unsigned surf_index = (prog_data->base.binding_table.abo_start + - location->data.atomic.buffer_index); + location->data.binding); /* Calculate the surface offset */ src_reg offset(this, glsl_type::uint_type); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/21] i965: Use short names for channel_expressions temporaries
From: Ian Romanick Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 39KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5327760 935234 6262994 After: IR MEM: variable usage / name / total: 5327760 894914 6222674 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 26KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4118280 670980 4789260 After: IR MEM: variable usage / name / total: 4118280 644100 4762380 Signed-off-by: Ian Romanick --- src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp index ae5bc56..7d4e25b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp @@ -163,8 +163,7 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment *ir) assert(!expr->operands[i]->type->is_matrix()); op_var[i] = new(mem_ctx) ir_variable(expr->operands[i]->type, - "channel_expressions", - ir_var_temporary); + "$c", ir_var_temporary); ir->insert_before(op_var[i]); deref = new(mem_ctx) ir_dereference_variable(op_var[i]); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 19/21] glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits each
From: Ian Romanick Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 204KiB on 64-bit. Before: IR MEM: variable usage / name / total: 5537064 1121441 6658505 After: IR MEM: variable usage / name / total: 5327760 1121441 6449201 Reduces the peak ir_variable memory usage in a trimmed apitrace of dota2 by 102KiB on 32-bit. Before: IR MEM: variable usage / name / total: 4222932 787727 5010659 After: IR MEM: variable usage / name / total: 4118280 787727 4906007 Signed-off-by: Ian Romanick --- src/glsl/ir.h | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 95182fb..fccbfdd 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -762,10 +762,25 @@ public: uint16_t image_format; private: - unsigned _num_state_slots;/**< Number of state slots used */ + /** + * Number of state slots used + * + * \note + * This could be stored in as few as 7-bits, if necessary. If it is made + * smaller, add an assertion to \c ir_variable::allocate_state_slots to + * be safe. + */ + uint16_t _num_state_slots; public: /** + * Initial binding point for a sampler, atomic, or UBO. + * + * For array types, this represents the binding point for the first element. + */ + int16_t binding; + + /** * Storage location of the base of this variable * * The precise meaning of this field depends on the nature of the variable. @@ -786,13 +801,6 @@ public: int location; /** - * Initial binding point for a sampler, atomic, or UBO. - * - * For array types, this represents the binding point for the first element. - */ - int binding; - - /** * Location an atomic counter is stored at. */ struct { -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/21] glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private
From: Ian Romanick Also move num_state_slots inside ir_variable_data for better packing. The payoff for this will come in a few more patches. Signed-off-by: Ian Romanick --- src/glsl/builtin_variables.cpp | 5 +-- src/glsl/ir.h | 56 ++ src/glsl/ir_clone.cpp | 13 ++ src/glsl/ir_memory_usage.cpp | 5 ++- src/glsl/linker.cpp| 7 ++-- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +-- src/mesa/drivers/dri/i965/brw_shader.cpp | 6 +-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 6 +-- src/mesa/program/ir_to_mesa.cpp| 14 +++ src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 14 +++ 10 files changed, 75 insertions(+), 57 deletions(-) diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 1461953..5878fbf 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -489,12 +489,9 @@ builtin_variable_generator::add_uniform(const glsl_type *type, &_mesa_builtin_uniform_desc[i]; const unsigned array_count = type->is_array() ? type->length : 1; - uni->num_state_slots = array_count * statevar->num_elements; ir_state_slot *slots = - ralloc_array(uni, ir_state_slot, uni->num_state_slots); - - uni->state_slots = slots; + uni->allocate_state_slots(array_count * statevar->num_elements); for (unsigned a = 0; a < array_count; a++) { for (unsigned j = 0; j < statevar->num_elements; j++) { diff --git a/src/glsl/ir.h b/src/glsl/ir.h index bfd790e..ab9f27b 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -538,6 +538,32 @@ public: return this->max_ifc_array_access; } + inline unsigned get_num_state_slots() const + { + return this->data._num_state_slots; + } + + inline void set_num_state_slots(unsigned n) + { + this->data._num_state_slots = n; + } + + inline ir_state_slot *get_state_slots() + { + return this->state_slots; + } + + inline ir_state_slot *allocate_state_slots(unsigned n) + { + this->state_slots = ralloc_array(this, ir_state_slot, n); + this->data._num_state_slots = 0; + + if (this->state_slots != NULL) + this->data._num_state_slots = n; + + return this->state_slots; + } + /** * Enable emitting extension warnings for this variable */ @@ -723,6 +749,10 @@ public: /** Image internal format if specified explicitly, otherwise GL_NONE. */ uint16_t image_format; + private: + unsigned _num_state_slots;/**< Number of state slots used */ + + public: /** * Storage location of the base of this variable * @@ -771,22 +801,6 @@ public: } data; /** -* Built-in state that backs this uniform -* -* Once set at variable creation, \c state_slots must remain invariant. -* This is because, ideally, this array would be shared by all clones of -* this variable in the IR tree. In other words, we'd really like for it -* to be a fly-weight. -* -* If the variable is not a uniform, \c num_state_slots will be zero and -* \c state_slots will be \c NULL. -*/ - /*@{*/ - unsigned num_state_slots;/**< Number of state slots used */ - ir_state_slot *state_slots; /**< State descriptors. */ - /*@}*/ - - /** * Value assigned in the initializer of a variable declared "const" */ ir_constant *constant_value; @@ -818,6 +832,16 @@ private: unsigned *max_ifc_array_access; /** +* Built-in state that backs this uniform +* +* Once set at variable creation, \c state_slots must remain invariant. +* +* If the variable is not a uniform, \c _num_state_slots will be zero and +* \c state_slots will be \c NULL. +*/ + ir_state_slot *state_slots; + + /** * For variables that are in an interface block or are an instance of an * interface block, this is the \c GLSL_TYPE_INTERFACE type for that block. * diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp index d594529..0cd35f0 100644 --- a/src/glsl/ir_clone.cpp +++ b/src/glsl/ir_clone.cpp @@ -53,15 +53,10 @@ ir_variable::clone(void *mem_ctx, struct hash_table *ht) const memcpy(&var->data, &this->data, sizeof(var->data)); - var->num_state_slots = this->num_state_slots; - if (this->state_slots) { - /* FINISHME: This really wants to use something like talloc_reference, but - * FINISHME: ralloc doesn't have any similar function. - */ - var->state_slots = ralloc_array(var, ir_state_slot, - this->num_state_slots); - memcpy(var->state_slots, this->state_slots, -sizeof(this->state_slots[0]) * var->num_state_slots); + if (this->get_state_slots()) { + ir_state_slot *s = var->allocate_state_slots(this->get_num_state_slots()); + memcpy(s, this->get_state_slots()
[Mesa-dev] [PATCH 00/21] Reduce ir_variable memory usage
This series reduces the memory usage of ir_variable quite significantly. The first couple patches add a mechanism to determine the amount of memory used by any kind of IR object. This is used to collect the data that is shown in the commit messages through the series. Most of the rest of the patches rearrange data or store things in smaller fields. The two interesting "subseries" are: Patches 9 through 15 and 20 through 21: Store short variable names in otherwise "dead" space in the base class. I didn't rebase these patches to all be together because I didn't want to re-collect all the data. :) A small amount more savings could be had here, but in the test case at hand, it didn't appear worth the effort. Adding diff --git a/src/glsl/ir_memory_usage.cpp b/src/glsl/ir_memory_usage.cpp index f122635..2aead7c 100644 --- a/src/glsl/ir_memory_usage.cpp +++ b/src/glsl/ir_memory_usage.cpp @@ -67,6 +67,9 @@ ir_memory_usage::visit(ir_variable *ir) if (ir->name != (char *) ir->padding) this->s.variable_name_usage += strlen(ir->name) + 1 + ralloc_header_size; + if (ir->name != (char *) ir->padding && ir->data.mode == ir_var_temporary) + printf("IR MEM: %s\n", ir->name); + return visit_continue; } may show some other possibilities. Patches 16 through 18: Store two fields that are never both used in the same location. Here's the punchline. In a trimmed trace from dota2 on 32-bit, ir_variable accounts for ~5.5MB before this series. After this series, it accounts for only ~4.5MB. Before: IR MEM: variable usage / name / total: 4955496 915817 5871313 After: IR MEM: variable usage / name / total: 4118280 644100 4762380 I would love to see before / after data for a full run of dota2 with all the shaders compiled. :) This is also available in the ir_variable-diet-v2 branch of my fdo tree. The ir_variable-diet branch contains a false start. I tried to move a bunch of fields that are only used for shader interface variables (e.g., uniforms or varyings) to a dynamically allocated structure. At least on my test case, the added ralloc overhead made that a loss. It may be possible to try a similar techinique by subclassing ir_varible, but I think that will be a lot of work. The biggest annoyance is that when ast_to_hir allocates an ir_variable, it doesn't yet know what it will be... and changes the ir_variable_data::mode after allocation. As a side note, pahole is a really useful utility, but it lies a little bit on C++ objects. It will say things like: class ir_rvalue : public ir_instruction { public: /* class ir_instruction ; */ /* 0 0 */ /* XXX 16 bytes hole, try to pack */ const class glsl_type * type; /*16 4 */ /* size: 20, cachelines: 1, members: 2 */ /* sum members: 4, holes: 1, sum holes: 16 */ /* last cacheline: 20 bytes */ }; I've trimmed out all the methods and other noise. It says there's this 16-byte "hole." Notice sizeof(ir_instruction) is 16 bytes and the total size of ir_rvalue is 20 bytes. This 16-byte hole is the storage for the base class members! :) Calling it a hole is a bit misleading. This also sent me down a false path, but, thankfully, not for too long. Cc: Eero Tamminen ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/21] glsl: Store ir_variable::ir_type in 8 bits instead of 32
On Tue, May 27, 2014 at 7:49 PM, Ian Romanick wrote: > From: Ian Romanick > > No change in the peak ir_variable memory usage in a trimmed apitrace of > dota2 on 64-bit. > > No change in the peak ir_variable memory usage in a trimmed apitrace of > dota2 on 32-bit. > > Signed-off-by: Ian Romanick > --- > src/glsl/ir.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/glsl/ir.h b/src/glsl/ir.h > index 7faee74..bc02f6e 100644 > --- a/src/glsl/ir.h > +++ b/src/glsl/ir.h > @@ -92,12 +92,13 @@ enum ir_node_type { > */ > class ir_instruction : public exec_node { > private: > - enum ir_node_type ir_type; > + uint8_t ir_type; > > public: > inline enum ir_node_type get_ir_type() const > { > - return this->ir_type; > + STATIC_ASSERT(ir_type_max < 256); > + return (enum ir_node_type) this->ir_type; > } > > /** > -- > 1.8.1.4 Instead of doing this, you can mark the enum type with the PACKED attribute. I did this in a similar change in i965 already. See http://lists.freedesktop.org/archives/mesa-dev/2014-February/054643.html This way we still get enum type checking and warnings out of switch statements and such. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/11] loader: Use drirc device_id parameter in complement to DRI_PRIME
This breaks the build for me, see below. That's an out-of-tree build FWIW. make[2]: Entering directory '/home/daenzer/src/mesa-git/mesa/build-amd64/src/loader' cd ../../.. && automake-1.14 --foreign src/loader/Makefile src/loader/Makefile.am:42: warning: source file '$(top_srcdir)/src/mesa/drivers/dri/common/xmlconfig.c' is in a subdirectory, src/loader/Makefile.am:42: but option 'subdir-objects' is disabled automake-1.14: warning: possible forward-incompatibility. automake-1.14: At least a source file is in a subdirectory, but the 'subdir-objects' automake-1.14: automake option hasn't been enabled. For now, the corresponding output automake-1.14: object file(s) will be placed in the top-level directory. However, automake-1.14: this behaviour will change in future Automake versions: they will automake-1.14: unconditionally cause object files to be placed in the same subdirectory automake-1.14: of the corresponding sources. automake-1.14: You are advised to start using 'subdir-objects' option throughout your automake-1.14: project, to avoid future incompatibilities. cd ../.. && /bin/bash ./config.status src/loader/Makefile depfiles config.status: creating src/loader/Makefile config.status: executing depfiles commands CC libloader_la-loader.lo CC libloader_la-xmlconfig.lo In file included from ../../../src/loader/loader.c:79:0: ../../../src/mesa/drivers/dri/common/xmlpool.h:103:29: fatal error: xmlpool/options.h: No such file or directory #include "xmlpool/options.h" ^ compilation terminated. Makefile:570: recipe for target 'libloader_la-loader.lo' failed make[2]: *** [libloader_la-loader.lo] Error 1 make[2]: Leaving directory '/home/daenzer/src/mesa-git/mesa/build-amd64/src/loader' -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/19] i965/fs: Add and use an fs_inst copy constructor.
Patches 1-8 inclusive are Reviewed-by: Chris Forbes On Wed, May 28, 2014 at 1:47 PM, Matt Turner wrote: > Will get more complicated when fs_reg src becomes a pointer. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 5 + > src/mesa/drivers/dri/i965/brw_fs.h | 1 + > 2 files changed, 6 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index bd77e0c..5b7d84f 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -139,6 +139,11 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst, >assert(src[2].reg_offset >= 0); > } > > +fs_inst::fs_inst(const fs_inst &that) > +{ > + memcpy(this, &that, sizeof(that)); > +} > + > #define ALU1(op)\ > fs_inst *\ > fs_visitor::op(fs_reg dst, fs_reg src0) \ > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index 789f0b3..bda233c 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -197,6 +197,7 @@ public: > fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1); > fs_inst(enum opcode opcode, fs_reg dst, > fs_reg src0, fs_reg src1,fs_reg src2); > + fs_inst(const fs_inst &that); > > bool equals(fs_inst *inst) const; > bool overwrites_reg(const fs_reg ®) const; > -- > 1.8.3.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev